DevOps Engineer – AI/ML (MLOps)
  • Tek Gence
18 Hours Ago
NA
NA
Sunnyvale-CA, Austin-TX
4-17 Years
Required Skills: AI/ML
Job Description
Position:: DevOps Engineer – AI/ML (MLOps)
Location: Austin, TX or Sunnyvale, CA (3x/ week onsite)
 
Job Summary
We are seeking a highly skilled DevOps Engineer with strong AI/ML experience to design, build, and maintain scalable CI/CD pipelines and cloud-native infrastructure. The ideal candidate will have hands-on expertise in container orchestration, cloud platforms, Infrastructure as Code (IaC), and MLOps practices, enabling seamless deployment and monitoring of AI/ML-driven applications.
 
Key Responsibilities
  • Design, develop, maintain, and continuously improve CI/CD pipelines for applications across multiple platforms
  • Collaborate closely with development and data science teams to integrate AI/ML models into production environments
  • Implement and manage containerized workloads using Kubernetes and Docker
  • Manage cloud infrastructure and services on AWS and ISCloud
  • Apply Infrastructure as Code (IaC) principles using tools like Terraform or CloudFormation
  • Monitor system health, performance, and availability using Prometheus, Grafana, and ELK stack
  • Implement automated system recovery solutions and ensure security, reliability, and scalability
  • Conduct system testing focused on security, performance, and availability
  • Troubleshoot complex infrastructure and deployment issues
  • Ensure best practices in DevOps, MLOps, security, and compliance
  • Demonstrate strong communication, documentation, and organizational skills
Required Skills & Qualifications
Essential Skills
  • Strong experience in DevOps with AI/ML (MLOps)
  • Hands-on experience with CI/CD tools such as Jenkins and GitHub CI
  • Proficiency in container orchestration using Kubernetes and Docker
  • Strong cloud experience with Amazon Web Services (AWS) and ISCloud
  • Experience with Infrastructure as Code (IaC) tools (Terraform, CloudFormation, ARM, etc.)
  • Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Elastic Stack (ELK)
  • Proficiency in scripting languages: Shell, Python, Groovy
Desirable Skills
  • Experience with ML pipelines, model versioning, and deployment
  • Exposure to tools like Kubeflow, MLflow, or SageMaker
  • Understanding of DevSecOps practices
  • Experience with high-availability and disaster recovery architectures
Soft Skills
  • Strong analytical and problem-solving abilities
  • Excellent teamwork and cross-functional collaboration skills
  • Effective communication and organizational skills

Jobseeker

Looking For Job?
Search Jobs

Recruiter

Are You Recruiting?
Search Candidates