Description:
Ensure high availability, scalability, and reliability of critical services and applications through monitoring, and incident response. Perform regular production readiness reviews (PRRs) and capacity planning to prepare systems for scaling and reliability. Lead incident response efforts, conduct root cause analysis (RCA), and drove post-mortem action items to improve system resilience Thanks and Regards Reina Ext: 109
Aug 20, 2025;
from:
dice.com