Description:
Key Responsibilities: Build and operate scalable, secure, and highly available infrastructure in Azure and Google Cloud Platform. Design and maintain observability platforms leveraging Splunk, OpenTelemetry, and cloud-native monitoring tools. Develop and support AI/LLM-driven automation solutions to improve incident triage, alert correlation, and root cause analysis. Partner with application and data teams to define SLOs, SLIs, and error budgets. Drive operational excellence through automation,
Sep 25, 2025;
from:
dice.com