-
Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces.
-
System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry.
-
Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues.
-
Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals.
-
Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.
-
Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services.
-
Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring.
-
Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications.
-
Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools.