System Design: Twitch Live Streaming

Requirements

Functional Requirements:

Streamers broadcast live video via RTMP/SRT from OBS or mobile apps
Viewers watch live streams with sub-5-second latency (sub-second with low-latency mode)
Real-time chat alongside each stream with emotes and moderation
Channel subscriptions, bits (virtual currency), and donations
Clip creation and VOD (Video on Demand) recording of live streams
Raid and host features to redirect viewers to another stream

Non-Functional Requirements:

30 million DAU, 7 million unique streamers per month, 2.5 million peak concurrent viewers
Stream start-to-glass latency under 3 seconds (low-latency mode)
Chat message delivery under 500ms to all viewers in a channel
99.95% availability; stream interruptions directly impact creator revenue
Handle chat rooms with 500K+ concurrent participants

Scale Estimation

7M unique streamers per month, with ~100K concurrent streams at peak. Average stream bitrate of 6 Mbps ingested → 600 Gbps ingest bandwidth. Each stream is transcoded to 4 quality levels → 2.4 Tbps transcode output. 2.5M concurrent viewers at average 4 Mbps = 10 Tbps CDN egress. Chat: top streams have 500K+ viewers each sending 1 message per minute on average → 8,300 messages/sec per large channel, with global chat volume exceeding 1M messages/sec. VOD storage: 100K concurrent streams × 6 hours average × 3 Mbps archived = 810TB/day.

High-Level Architecture

Twitch's architecture has three real-time pipelines running in parallel. The Video Pipeline handles ingestion and delivery: streamers connect via RTMP to the nearest Ingest Server (deployed at edge PoPs globally). The Ingest Server authenticates the stream key, validates the incoming codec, and forwards the raw video to a Transcoding Cluster. Transcoders run FFmpeg to produce 4 quality variants (160p, 480p, 720p, source quality) packaged as HLS segments (2-second chunks for standard latency, 1-second for low-latency). Segments are pushed to a CDN origin (Twitch uses a combination of its own edge network and Fastly/CloudFront) and distributed to edge nodes.

The Chat Pipeline uses a custom WebSocket-based protocol. Viewers connect to a Chat Edge Server which maintains persistent connections. When a user sends a message, it goes to a Chat Service that performs rate limiting, spam filtering (ML-based classifier), and moderation rule checks, then publishes to a Redis Pub/Sub cluster partitioned by channel_id. All Chat Edge Servers subscribed to that channel receive the message and fan it out to connected viewers. For channels with >100K viewers, a hierarchical fan-out tree is used to avoid overwhelming a single Redis node.

The Metadata Pipeline handles stream status, viewer counts, and channel data. A Stream Registry (backed by DynamoDB) tracks all active streams with their ingest server, transcode status, and viewer count. Viewer counts are maintained using a distributed counter service (HyperLogLog-based for approximate unique viewer counts, simple counters for concurrent viewers).

Core Components

Ingest & Transcoding

Ingest servers accept RTMP connections and perform a protocol handshake to authenticate the stream key against the Auth Service. The raw RTMP stream is demuxed and forwarded over a reliable internal protocol to the nearest transcoding cluster. Transcoders run FFmpeg with hardware acceleration (NVENC on NVIDIA GPUs) for real-time encoding. Each stream is split into HLS segments: the encoder produces 2-second CMAF segments with low-latency HLS (LL-HLS) extensions for sub-3-second latency. Segments are written to a shared NFS tier and immediately registered with the CDN origin for distribution.

Chat System

Twitch chat handles over 1 million messages per second globally. The architecture uses a layered fan-out model. The Chat Service (stateless Go microservices) receives messages, applies business logic (slow mode, subscriber-only mode, banned word filters), and publishes to a channel-partitioned Kafka topic. Chat Edge Servers consume from Kafka and push messages to connected WebSocket clients. For mega-channels (500K+ viewers), a fan-out tree is used: a root Chat Edge fans out to ~100 regional Chat Edge nodes, each serving ~5,000 viewers. This keeps fan-out bounded at O(sqrt(N)) rather than O(N).

Clip & VOD Service

When a viewer creates a clip, the Clip Service reads the last 60 seconds of HLS segments from the CDN edge cache, concatenates them, and stores the clip as a standalone MP4 in S3. VOD recording runs continuously for partner/affiliate streamers: a VOD Worker consumes HLS segments from the transcoding output and appends them to a growing MP4 file in S3. After the stream ends, the VOD is finalized with proper metadata and made available in the channel's video archive.

Database Design

Channel and user data is stored in PostgreSQL (RDS) sharded by user_id. The Streams table tracks active streams: stream_id, channel_id, ingest_server, started_at, title, game_id, viewer_count, status. Stream events (go-live, offline, title change) are published to a Kafka topic consumed by the notification service and analytics. Subscription and transaction data (bits, subs) use a separate PostgreSQL cluster with strict ACID guarantees — these are revenue-critical.

Chat messages are ephemeral by default — they are not persisted to a database during live delivery. For channels that opt into chat logs, messages are asynchronously written to a Cassandra cluster partitioned by channel_id and bucketed by hour. VOD metadata and clip data live in DynamoDB with stream_id as the partition key. The recommendation system (suggesting streams to watch) uses a feature store backed by Redis for real-time features (current viewer count, stream duration) and S3/Athena for historical features.

API Design

POST /api/v1/streams/ingest — Initiate a stream; body contains stream_key, codec, bitrate; returns ingest_endpoint URL
GET /api/v1/streams/{channel_name}/playlist.m3u8 — Fetch the HLS master playlist for a live stream
WS /api/v1/chat/{channel_id} — WebSocket connection for real-time chat; supports PRIVMSG, JOIN, PART commands (IRC-inspired protocol)
POST /api/v1/clips — Create a clip from the last 60 seconds; body contains channel_id, title; returns clip_id and URL

Scaling & Bottlenecks

The transcoding fleet is the primary compute bottleneck. Each concurrent stream requires a dedicated GPU transcode slot. At 100K concurrent streams × 4 quality variants = 400K concurrent encodes. Twitch uses a mix of on-premise GPU servers and cloud GPU instances (EC2 G4/G5) with auto-scaling. During peak hours (major esports events), reserved capacity handles baseline load while on-demand instances handle burst. A priority queue ensures partner streamers always get transcode capacity before non-partner streamers.

Chat fan-out for mega-channels is the second major bottleneck. The hierarchical fan-out tree solves the O(N) problem, but maintaining WebSocket connections for millions of concurrent users requires a large fleet of Chat Edge servers. Each server handles ~50K concurrent WebSocket connections using epoll-based event loops (Go + custom networking). Connection draining during deployments uses a gradual reconnect protocol where clients are told to reconnect to a new server over a 5-minute window to avoid thundering herd.

Key Trade-offs

HLS over WebRTC for viewer delivery: HLS with LL-HLS extensions provides 2-3 second latency vs WebRTC's sub-second, but HLS scales to millions of viewers via CDN caching while WebRTC requires per-viewer connections
Ephemeral chat vs persistent storage: Not persisting chat messages by default saves enormous storage and write throughput; the trade-off is loss of chat history unless explicitly opted in
GPU transcoding vs CPU: GPU encoding (NVENC) is 10x faster than CPU (x264) but produces slightly lower quality per bitrate — acceptable for live content where latency is prioritized over quality
2-second HLS segments vs smaller: Smaller segments reduce latency but increase CDN request rate and manifest size; 2 seconds is the sweet spot for standard latency, 1 second for low-latency mode with CMAF chunked transfer