SYSTEM_DESIGN
System Design: Live Streaming Platform
System design of a generic live streaming platform covering RTMP ingestion, real-time transcoding, HLS packaging, viewer synchronization, and interactive features at scale.
Requirements
Functional Requirements:
- Broadcasters go live using RTMP/SRT from desktop (OBS) or mobile apps
- Viewers watch live streams with quality selection and adaptive bitrate
- Real-time interaction via chat, reactions, and polls
- Stream recording for on-demand replay (DVR functionality)
- Multi-host co-streaming with audio/video mixing
- Stream scheduling, notifications to followers, and stream metadata (title, category)
Non-Functional Requirements:
- Support 500K concurrent streams and 50 million concurrent viewers
- Glass-to-glass latency under 5 seconds for standard mode, under 1.5 seconds for ultra-low-latency
- 99.95% stream uptime; automatic ingest failover within 3 seconds
- Horizontal scaling: adding capacity should be linear with demand
- Support streams up to 24 hours continuously
Scale Estimation
500K concurrent streams at average 5 Mbps ingest = 2.5 Tbps ingest bandwidth. Each stream transcoded to 4 quality variants = 10 Tbps transcode output. 50M concurrent viewers at average 3 Mbps = 150 Tbps CDN egress. Chat: assume 10% of viewers send 1 message/min → 5M messages/min = 83K messages/sec. DVR storage: 500K streams × 5 Mbps × 3600 sec/hr = 337TB/hour of recorded content. At 8-hour average stream duration, 2.7PB/day of DVR recordings.
High-Level Architecture
The platform architecture consists of four planes: Ingest, Transcode, Delivery, and Interaction. The Ingest Plane deploys edge ingest servers at 50+ PoPs. Each ingest server runs an RTMP/SRT listener that authenticates the broadcaster's stream key, validates the incoming bitrate/resolution, and establishes a persistent connection. The raw stream is forwarded to the nearest Transcode Cluster via an internal low-latency transport protocol (typically RIST or SRT over a dedicated backbone).
The Transcode Plane runs GPU-accelerated encoders (NVENC or Intel QSV) that produce HLS output in real-time. Each transcoder takes the raw stream and outputs 4 renditions (e.g., 1080p60, 720p30, 480p30, 360p30) as 2-second HLS segments with CMAF packaging. Segments are immediately pushed to the Delivery Plane — a CDN with edge PoPs that serve HLS manifests and segments to viewers. The manifest is a live-updating m3u8 playlist that the viewer's player polls every segment duration.
The Interaction Plane handles chat, reactions, polls, and viewer count. A WebSocket Gateway maintains persistent connections with all viewers. Messages flow through a Chat Service that applies moderation, then publishes to a partitioned message bus (Kafka or NATS) for fan-out to all connected viewers of that stream. A separate Analytics Pipeline processes viewership events in real-time for the streamer's dashboard.
Core Components
Ingest Server Cluster
Ingest servers are the first point of contact for broadcasters. Each server handles up to 1,000 concurrent RTMP connections using a multi-threaded async I/O architecture (built on libuv or Tokio). On connection, the server validates the stream key against the Auth Service (cached locally with 5-minute TTL), performs codec negotiation (H.264 required, H.265 optional), and begins forwarding packets. If the ingest server detects packet loss exceeding 1% or the broadcaster disconnects, an automatic failover redirects the stream to a backup ingest server. The broadcaster's encoder (OBS) is configured with primary and backup ingest URLs for client-side failover.
Real-Time Transcoder
Transcoders run as Kubernetes pods with GPU resource requests. Each pod runs a custom transcoding daemon wrapping FFmpeg with hardware acceleration. The daemon receives raw H.264/H.265 NAL units over SRT, decodes to raw frames, and re-encodes to multiple bitrates simultaneously. The output is CMAF (Common Media Application Format) segments — fragmented MP4 with byte-range addressing — enabling low-latency HLS (LL-HLS) with partial segment delivery. The transcoder publishes completed segments to an internal segment store (Redis for metadata, S3 for segment data) and notifies the Delivery Plane.
Interactive Features Engine
Beyond basic chat, the Interaction Plane supports polls, predictions, and reactions. Polls are created by the streamer via API, stored in DynamoDB, and vote tallies are maintained in Redis counters. Reactions (emoji overlays) use a sampling approach for large audiences: instead of sending every reaction to every viewer, the system samples reactions and sends aggregated counts (e.g., "500 heart reactions in the last second") to viewers, rendering a proportional animation client-side. This reduces message fan-out by 100x for popular streams.
Database Design
Stream metadata (stream_id, broadcaster_id, title, category, started_at, status, ingest_server, thumbnail_url) is stored in PostgreSQL with a partial index on status='live' for efficient discovery queries. Viewer sessions (session_id, stream_id, user_id, joined_at, quality, device_type) are stored in Cassandra for high write throughput — each viewer join/leave generates a write. Chat messages for moderation audit are stored in Cassandra partitioned by stream_id and bucketed by 10-minute intervals.
Follower/subscriber relationships use a separate PostgreSQL table (broadcaster_id, follower_id, subscribed_at, tier). Notification delivery for go-live events reads from this table and fans out via a push notification service (Firebase Cloud Messaging for mobile, WebSocket for web). Revenue data (subscriptions, donations, ad impressions) uses a transactional PostgreSQL cluster with synchronous replication.
API Design
POST /api/v1/streams— Create a stream; body contains title, category; returns stream_key and ingest_url (rtmp://ingest.example.com/live/{stream_key})GET /api/v1/streams/{stream_id}/manifest.m3u8— Fetch the live HLS master playlist; viewer's player uses this to start playbackPOST /api/v1/streams/{stream_id}/chat— Send a chat message; body contains message text; processed through moderation pipelineGET /api/v1/streams/discover?category={cat}&sort=viewers&limit=20— Browse live streams by category, sorted by viewer count
Scaling & Bottlenecks
The transcode tier is the primary bottleneck — each concurrent stream requires dedicated GPU compute. At 500K concurrent streams, this requires ~125K GPU instances (assuming 4 streams per GPU with NVENC). Cost optimization strategies include: tiered transcoding (popular streams get all 4 renditions; low-viewer streams get only 2 — source passthrough + one lower quality); shared transcoding for streams with identical input settings (e.g., all 1080p30 H.264 streams share encoder parameters); and spot/preemptible GPU instances for non-critical renditions with automatic fallback to source passthrough.
The CDN egress at 150 Tbps is the dominant cost. Unlike VOD CDN where content can be pre-positioned, live content is inherently cache-unfriendly — each segment is accessed only once during its 2-second window. The solution is aggressive multicast-like behavior at the edge: all viewers of the same stream in the same PoP receive the same cached segment. With 50M viewers across 500K streams, each stream averages 100 viewers, many from the same PoP — this achieves effective cache hit rates of 80%+ even for live content.
Key Trade-offs
- HLS (LL-HLS) vs WebRTC for delivery: LL-HLS at 1.5-second latency is close to WebRTC but scales via CDN caching; WebRTC provides sub-second but requires per-viewer server-side connections — LL-HLS wins for audiences above 1,000
- GPU transcoding vs source passthrough: Transcoding enables ABR for all viewers but adds latency and cost; passthrough is zero-latency but forces all viewers to the broadcaster's bitrate — tier the decision by stream popularity
- Reaction sampling vs full fan-out: Sending every reaction to every viewer is O(viewers × reactions/sec) which explodes for popular streams; sampling maintains the visual effect at 1% of the network cost
- DVR recording all streams vs opt-in: Recording all streams ensures no content is lost but requires 2.7PB/day storage; opt-in reduces this by 90% but means accidental non-recordings — default-on with 7-day auto-delete is the middle ground
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.