AIOps Engineer, SRE with AI/ML
  • Fuge Technologies Inc.
2 Days Ago
50-50 per W2 Hourly
NA
Frisco-TX
10-15 Years
Required Skills: Machine Learning & AI Frameworks, TensorFlow, PyTorch, scikit-learn, Splunk, DevOps, SRE background Prometheus, Grafana, ELK Stack, Python, Bash, Ansible
Job Description
Role: AIOps Engineer (SRE with AI/ML)
Location: Frisco, TX – Onsite
Contract Type: W2
Experience: 10+ Years

Job Overview:
We are seeking an AIOps Engineer experienced in Site Reliability Engineering (SRE) and AI/ML implementation. You will integrate machine learning into monitoring and observability systems to build proactive, self-healing operational capabilities.

Key Responsibilities:
  • Apply machine learning to operational data to predict and prevent system failures.

  • Automate DevOps practices including scaling, optimization, and controlled restarts.

  • Develop anomaly detection and self-healing frameworks.

  • Collaborate with SRE, development, and infrastructure teams to improve reliability.

  • Build AI-driven dashboards and reports to provide actionable insights.


Mandatory Skills:
  • Machine Learning & AI Frameworks (TensorFlow, PyTorch, scikit-learn)

  • Monitoring & Observability Tools (Splunk, Prometheus, Grafana, ELK Stack)

  • Automation & Scripting (Python, Bash, Ansible)

  • DevOps / SRE background


Optional Skills:
  • Product Owner experience

  • Containerization (Docker, Kubernetes)


Preferred Qualifications:
  • 8+ years of experience in machine learning

  • 5+ years working with AI frameworks and observability tools

  • Strong cloud infrastructure knowledge

  • Proven experience with self-healing systems and proactive monitoring


Short Summary:
An experienced SRE with hands-on expertise in implementing AI/ML solutions to drive automation, observability, and operational excellence.

Jobseeker

Looking For Job?
Search Jobs

Recruiter

Are You Recruiting?
Search Candidates