Director, Site Reliability Engineering
Oracle Cloud Infrastructure (OCI)
- Led the global SRE organization for OCI Object Storage — 45+ engineers, 3 managers, across US, India, UK, and Mexico. Owned end-to-end service reliability, incident lifecycle, automation strategy, and operational scalability for a business that grew to $600M ARR, 10 EB of customer data, and 100% YoY growth.
- Grew the SRE / Platform org from 5 → 45 engineers and shifted the operating model: service teams stopped doing their own region builds, pre-production qualification, and fleet management — they consumed platforms my org built.
- Reduced production rollbacks per service by 60%, build times by 80%, and change-caused incidents by 50%.
- Delivered the first customized Object Storage at production scale for OpenAI — $100M engagement, 15 Tbps bandwidth, shipped in three months.
- Built AI-driven support and automation workflows that reduced ticket volume by 30% and improved response times by 50%.
- Achieved 100% FedRAMP-compliant patching across the 50,000+ node fleet via automated patching and vulnerability management.