Develop Python-based automation solutions for GCP and Kubernetes infrastructure.
Improve operational excellence by identifying and implementing automation opportunities.
Integrate tools via APIs and client libraries to ensure seamless system interoperability.
Implement configuration automation with tools like Ansible (preferred).
Enhance deployment reliability with failover mechanisms and self-healing infrastructure.
Create monitoring and alerting solutions using tools such as Splunk, Grafana, and Prometheus.
Perform root cause analysis and incident management for complex failures.
Work on system resilience, performance tuning, and high-load operations.
Explore AI/ML techniques for anomaly detection and predictive scaling.
Develop AIOps-driven automation solutions to reduce operational overhead.
Strong background in systems engineering with a focus on automation.
Proficiency in Python for automation and integrations.
Expertise with Kubernetes and cloud platforms (GCP or others).
Experience integrating tools via APIs and client libraries.
Knowledge of monitoring and alerting tools like Splunk, Grafana, and Prometheus.
Ability to operate in high-stakes, mission-critical environments.
Experience with Ansible and large-scale, high-availability systems.
Enthusiasm for AI/ML and AIOps in operational automation.
Jobseeker
Recruiter