Required Skills: AI/ML
Job Description
Position:: DevOps Engineer – AI/ML (MLOps)
Location: Austin, TX or Sunnyvale, CA (3x/ week onsite)
Job Summary
We are seeking a highly skilled DevOps Engineer with strong AI/ML experience to design, build, and maintain scalable CI/CD pipelines and cloud-native infrastructure. The ideal candidate will have hands-on expertise in container orchestration, cloud platforms, Infrastructure as Code (IaC), and MLOps practices, enabling seamless deployment and monitoring of AI/ML-driven applications.
Key Responsibilities
- Design, develop, maintain, and continuously improve CI/CD pipelines for applications across multiple platforms
- Collaborate closely with development and data science teams to integrate AI/ML models into production environments
- Implement and manage containerized workloads using Kubernetes and Docker
- Manage cloud infrastructure and services on AWS and ISCloud
- Apply Infrastructure as Code (IaC) principles using tools like Terraform or CloudFormation
- Monitor system health, performance, and availability using Prometheus, Grafana, and ELK stack
- Implement automated system recovery solutions and ensure security, reliability, and scalability
- Conduct system testing focused on security, performance, and availability
- Troubleshoot complex infrastructure and deployment issues
- Ensure best practices in DevOps, MLOps, security, and compliance
- Demonstrate strong communication, documentation, and organizational skills
Required Skills & Qualifications
Essential Skills
- Strong experience in DevOps with AI/ML (MLOps)
- Hands-on experience with CI/CD tools such as Jenkins and GitHub CI
- Proficiency in container orchestration using Kubernetes and Docker
- Strong cloud experience with Amazon Web Services (AWS) and ISCloud
- Experience with Infrastructure as Code (IaC) tools (Terraform, CloudFormation, ARM, etc.)
- Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Elastic Stack (ELK)
- Proficiency in scripting languages: Shell, Python, Groovy
Desirable Skills
- Experience with ML pipelines, model versioning, and deployment
- Exposure to tools like Kubeflow, MLflow, or SageMaker
- Understanding of DevSecOps practices
- Experience with high-availability and disaster recovery architectures
Soft Skills
-
Strong analytical and problem-solving abilities
-
Excellent teamwork and cross-functional collaboration skills
-
Effective communication and organizational skills