SRE - Resy Migration | Required Technical skills : Development Experience in any one of the programming languages : Java/Python/GO Development experience in React and/or Angular is a plus ( Full stack Java Developer ) CICD Tools hands on : Jenkins , GitHub , GitHub Actions, Cloud Deployment pipelines SRE experience : Observability Tools - Splunk , ELF , Dynatrace Good Problem-solving skills is a must. Solid Development experience of at least 3 years is a must. JOB responsibilities - Scope technical projects and break them down into user stories and tasks within an engineering team - Directly contribute to the design and coding of our software systems. - Contribute to Build systems that are secure, reliable, scalable, and extensible - Make sound technical decisions utilizing the advice of teammates and contribute to technical conversations with other engineering teams - Build and maintain CI/CD pipelines to automate the deployment of our software - Automate the provisioning and management of our infrastructure using Infrastructure as Code (IaC) tools - Define, implement, and maintain observability solutions for our applications to ensure we can proactively detect system degradation, easily understand system state, and quickly diagnose issues - Diagnose and resolve production issues, including performance tuning and capacity planning | Title: Site Reliability Engineer Here are the must-have and good-to-have skills for the positions: Key Proficiencies: GitHub, JIRA, Splunk/Graphana/Datadog, AWS/GCP/Azure Must Have: 3+ yrs software engineering, cloud-native apps, incident response, CI/CD (GitHub Actions/Jenkins), IaC (Terraform), Kubernetes/Docker, observability/monitoring, automation, performance tuning, collaboration Good to Have: Python/Java/Go, frontend (Angular/React), private/public cloud, mentoring, security, post-incident review, scalable systems
Job Description We're looking for engineers to be part of an empowered, self-organizing group, with the opportunity to use modern languages and tools and to operate software in public cloud environments. Our cross-functional teams span the stack, from front ends to APIs to databases, and they have all the skills and resources they need to build, ship and operate their own software.
**Some of the problems we'll work on include:** - Supporting application in production, including incident response and post-incident reviews - Applying observability engineering to our applications to ensure we can proactively detect system degradation, easily understand system state, and quickly diagnose issues - Investigate and resolve production issues - Building automation to reduce toil and improve developer productivity
**As a Site Reliability Engineer in our group, you will:** - Scope technical projects and break them down into user stories and tasks within an engineering team - Directly contribute to the design and coding of our software systems. - Contribute to Build systems that are secure, reliable, scalable, and extensible - Make sound technical decisions utilizing the advice of teammates and contribute to technical conversations with other engineering teams - Build and maintain CI/CD pipelines to automate the deployment of our software - Automate the provisioning and management of our infrastructure using Infrastructure as Code (IaC) tools - Define, implement, and maintain observability solutions for our applications to ensure we can proactively detect system degradation, easily understand system state, and quickly diagnose issues - Diagnose and resolve production issues, including performance tuning and capacity planning
**You may be a fit if:** - You have at least 3 years of experience working in a professional environment as a software engineer - You have contributed to the design, build and operation of cloud-native applications written in any language - You have done "DevOps" work, such as building CI/CD pipelines or setting up cloud hosting environments - You have some experience mentoring more junior engineers, helping them to succeed and grow in their roles - You build effective work relationships, giving and receiving constructive feedback, and your colleagues at all levels and across all teams trust you
**Technologies we use at Amex include:** - Python, Java and Go are our primary server languages. - Our browser applications are based on Angular and React - Code lives in GitHub and flows to production through a CI/CD pipeline built on GitHub Actions, with some on Jenkins - Code runs in Kubernetes-managed Docker containers hosted in a mix of private and public clouds - Datadog is our primary observability tool. |