Round 1 — Streaming at Scale, Kubernetes, Cloud & Linux (45 mins)
- How would you design auto-scaling for 50M+ concurrent viewers across multiple Kubernetes clusters without over-provisioning?
- During an IPL final, a new region needs to spin up instantly. How would you pre-warm nodes and scale workloads with zero cold-start impact?
- Explain how you would use Envoy and Istio to route low-latency live streams differently from VOD without service restarts.
- What is your approach to multi-zone pod affinity and anti-affinity to ensure a node failure does not impact regional streaming SLAs?
- How would you monitor HPA scaling decisions in real time and detect if the metrics server is lagging?
- Describe Kubernetes readiness and liveness probe configurations to catch buffering and lag issues in stream-processing microservices before users notice.
- A kube-proxy update rolls out mid-match. What is your network rollback plan to avoid packet drops?
Round 2 — RCA, Fire Drills & Streaming Chaos (75 mins)
- Playback failures spike for only 3% of users in the APAC region. CPU, memory, and pods look fine. Walk through your triage plan.
- Your Kafka ingestion pipeline lags by 2 minutes during a traffic surge. Producers are fine, consumers are idle. What is your debug path?
- Sudden tail latency on a Redis-based stream session store during a Champions League match. How do you find and fix the bottleneck?
- HPA refuses to scale in a critical pod set even though Prometheus shows CPU > 90%. What is the root cause and fix?
- NAT gateway costs double in 24 hours during a live series. No infrastructure changes were made. What could be silently causing it?
Round 3 — Leadership, Reliability Culture & Scaling Influence (30 mins)
- How do you build a culture where latency SLOs are enforced like uptime SLAs in a streaming organization?
- You are asked to ship multi-region failover for live events in 2 weeks with no DNS-based routing allowed. What is your plan?
- How would you simulate chaos in a streaming pipeline without risking real user impact?
- How do you justify infrastructure costs for pre-warmed scaling capacity to executives before a major sports event?
💡 TL;DR