TekJobs - Easiest way to find your next right candidate

Site Reliability Engineer with Grafana

Code Beacons Inc.

124 Days Ago

Offered Rate NA

Tax Type C2C

Work location Remote

Experience 3-10 Years

Required Skills: SRE, DevOps, Systems Engineering, Linux & Shell Scripting, AWS, Azure, Kubernetes, ECS & Docker, Python, Java, CI/CD, OpenTelemetry, PostgreSQL, Ansible, Chef, Puppet

Job Description

We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of our production systems. This role bridges software development and operations—driving automation, monitoring, and engineering excellence. You will report directly to the Senior Director of Engineering.

What You’ll Do:
Reliability & Performance
Ensure high availability, scalability, and reliability of production systems
Define & manage SLIs, SLOs, and SLAs
Conduct capacity planning and performance optimization
Automation & Tooling
Automate infrastructure using Terraform, Terragrunt, Ansible
Build CI/CD pipelines for rapid, reliable deployments
Reduce manual operations through automation
Monitoring & Incident Response
Design & maintain monitoring, logging, and alerting (Datadog)
Participate in on-call rotations; lead incident response
Perform RCA and write postmortems to prevent recurrences
Systems Engineering
Manage cloud infrastructure (AWS, Azure)
Work with Kubernetes, ECS, Docker
Implement best practices for security, networking, and system resilience
Collaboration & Leadership
Partner with engineering teams to design reliable distributed systems
Advocate SRE best practices across the organization
Mentor engineers on tooling, automation, and reliability

What You’ll Need:
Bachelor’s in CS, Engineering, or equivalent experience
3–7 years in SRE, DevOps, or Systems Engineering
Strong Linux & shell scripting skills
Cloud experience: AWS, Azure
Kubernetes/ECS & Docker expertise
Proficiency in Python or Java
Experience with CI/CD and DevOps tooling
Strong grasp of distributed systems, networking & security fundamentals

Preferred Qualifications:
Observability tools (OpenTelemetry)
PostgreSQL experience
Configuration management: Ansible, Chef, Puppet
Experience with zero-downtime deployments or chaos engineering

Soft Skills:
Strong analytical & problem-solving abilities
Excellent communication and collaboration
Thrives in fast-paced environments
Passion for continuous improvement

Site Reliability Engineer with Grafana

Required Skills: SRE, DevOps, Systems Engineering, Linux & Shell Scripting, AWS, Azure, Kubernetes, ECS & Docker, Python, Java, CI/CD, OpenTelemetry, PostgreSQL, Ansible, Chef, Puppet

Job Description

Similar Jobs

Systems Analyst 3

Senior Site Reliability Engineer

Lead DBA Consultant

Site Reliability Engineer

Looking For Job?

Are You Recruiting?

Starting...