System Design: Adaptive Bitrate Streaming (ABR)

Requirements

Functional Requirements:

Encode source video into multiple renditions (bitrate-resolution pairs) forming a bitrate ladder
Generate DASH (MPD) and HLS (m3u8) manifests listing all available renditions
Client player dynamically switches between renditions based on network conditions
Support both VOD (complete manifests) and live (rolling window manifests)
Server-side ad insertion (SSAI) compatible with manifest manipulation
DRM encryption (Widevine, FairPlay, PlayReady) integrated into the packaging step

Non-Functional Requirements:

Zero rebuffering for 99% of playback sessions under normal network conditions
Quality switch latency under 2 seconds (time from bandwidth change to rendition switch)
Support 100 million concurrent streams with per-stream manifest generation
Manifest response time under 20ms for VOD, under 50ms for live
Compatible with all major platforms: web (MSE), iOS (AVPlayer), Android (ExoPlayer)

Scale Estimation

100M concurrent streams, each requesting a manifest every 2-4 seconds (for live) or once (for VOD). For live streams: 100M × 0.5 manifests/sec = 50M manifest requests/sec — this is the critical hot path. Segment requests: 100M streams × 1 segment every 4 seconds = 25M segment requests/sec at average 1MB each = 200 Tbps CDN bandwidth. Encoding: each VOD title (assuming 10K new titles/day) needs 20 renditions × average 2 hours = 400K encoding-hours/day. Storage: 20 renditions × 2GB average per rendition × 10K titles = 400TB/day.

High-Level Architecture

The ABR system spans three domains: Encoding, Packaging, and Playback. The Encoding domain produces multiple renditions from a source master. A Bitrate Ladder Generator analyzes the source content complexity (using VMAF or per-scene CRF analysis) and determines the optimal set of bitrate-resolution pairs. For simple content (animation, talking heads), fewer high-bitrate renditions are needed; for complex content (sports, action), more granular bitrate steps are required. The encoder (x264/x265/SVT-AV1) processes the source in parallel segments, producing output files for each rendition.

The Packaging domain takes encoded renditions and produces streaming-format-specific output. A DASH Packager generates the MPD manifest and segments the encoded files into 4-second CMAF fragments. An HLS Packager generates the m3u8 master and variant playlists and produces TS or fMP4 segments. For DRM-protected content, the packager encrypts each segment using Common Encryption (CENC) with keys obtained from a Key Management Service. Manifests and encrypted segments are stored in S3 and served via CDN.

The Playback domain is the client-side ABR algorithm. The player (hls.js, ExoPlayer, AVPlayer) implements a bandwidth estimator and buffer manager. The bandwidth estimator tracks segment download throughput using an exponentially weighted moving average (EWMA). The ABR algorithm compares estimated bandwidth against each rendition's bitrate and selects the highest rendition that can be downloaded faster than real-time playback, maintaining a buffer occupancy target of 30 seconds.

Core Components

Bitrate Ladder Generator

The traditional approach uses a fixed bitrate ladder (e.g., Apple's HLS authoring spec: 240p@145kbps to 1080p@7800kbps). The modern approach, pioneered by Netflix, uses per-title optimization: for each title, the system runs multiple encodes at different bitrate-resolution combinations, measures VMAF quality for each, and selects the Pareto-optimal points on the quality-vs-bitrate curve. This produces a custom ladder — an animated show might have 5 rungs (200kbps to 3Mbps) while a nature documentary needs 8 rungs (300kbps to 12Mbps). The analysis runs as a batch job on the transcoding cluster, adding 2-3x encoding time but saving 20-30% bandwidth at equivalent quality.

Manifest Server

The Manifest Server generates and serves DASH/HLS manifests. For VOD, the manifest is static and cached at the CDN edge with a long TTL. For live, the manifest is dynamic — it is regenerated every segment duration (2-4 seconds) to include the latest segments. The Manifest Server for live streams maintains an in-memory sliding window of the last 30 seconds of segment metadata per stream. At 500K concurrent live streams with manifest requests every 2 seconds, the server handles 250K requests/sec. It is horizontally scaled and stateless — stream segment metadata is stored in Redis (sorted set per stream_id, scored by sequence number).

Client ABR Algorithm

The ABR algorithm on the player is the most critical component for user experience. Modern algorithms (e.g., MPC — Model Predictive Control, or Pensieve — RL-based) consider both throughput estimation and buffer occupancy. The algorithm maintains a throughput history window (last 5 segments' download speeds), applies EWMA smoothing (alpha=0.3 for stability), and selects the rendition whose bitrate is at most 80% of the estimated throughput (leaving a 20% safety margin). Buffer-based logic kicks in when buffer drops below 10 seconds — the algorithm aggressively drops quality to prevent rebuffering. Quality increases are delayed by a hysteresis of 2 segments to avoid oscillation.

Database Design

Rendition metadata is stored in PostgreSQL: Renditions table (rendition_id, video_id, codec, resolution, bitrate_kbps, framerate, segment_duration_ms, total_segments, s3_manifest_path, s3_segments_prefix, drm_key_id). The DRM key store uses a separate encrypted database (PostgreSQL with column-level encryption) mapping key_id → encryption_key, wrapped with a KMS master key. Access to DRM keys requires authentication from a licensed DRM server (Widevine license server).

Live stream segment state is stored in Redis: a sorted set per stream_id stores (segment_sequence_number, segment_metadata_json) with a max window of 30 seconds. When new segments are produced by the transcoder, they are ADDed to the sorted set and old segments beyond the window are ZREMRANGEBYSCOREd. Analytics data (buffer health, bitrate selections, rebuffer events) is sent from the client to a telemetry endpoint and streamed to Kafka → ClickHouse for real-time quality-of-experience dashboards.

API Design

GET /v1/manifests/{video_id}/master.mpd — Fetch DASH manifest (MPD) with all available renditions and segment URLs
GET /v1/manifests/{video_id}/master.m3u8 — Fetch HLS master playlist with variant streams
GET /v1/segments/{video_id}/{rendition_id}/{segment_number}.m4s — Fetch a CMAF video segment
POST /v1/telemetry/qoe — Report client quality-of-experience metrics; body contains session_id, buffer_health_ms, selected_bitrate, rebuffer_count, bandwidth_estimate

Scaling & Bottlenecks

The live manifest generation at 250K requests/sec is the primary hot path. Since each manifest is unique to a stream (containing the latest segment URLs), it cannot be CDN-cached for long. The solution: the Manifest Server generates manifests on-the-fly from Redis-backed segment metadata with sub-10ms latency. Manifests are cached at the CDN edge with a TTL equal to one segment duration (2-4 seconds) — this means the CDN serves slightly stale manifests, but the player handles this gracefully by requesting the next manifest after consuming the last listed segment.

The encoding pipeline bottleneck is per-title optimization. Running multiple trial encodes to find the optimal bitrate ladder can take 10x longer than a single encode. This is acceptable for VOD (the encode runs once, savings accrue over millions of views) but impractical for live. Live streams use a fixed bitrate ladder with optional per-scene complexity adaptation: the encoder adjusts CRF (Constant Rate Factor) on a per-GOP basis based on scene complexity, providing some of the benefits of per-title encoding in real-time.

Key Trade-offs

Per-title bitrate ladder vs fixed ladder: Per-title saves 20-30% bandwidth but requires 10x more encoding compute — the savings at scale vastly exceed encoding costs for any video with more than a few hundred views
CMAF (fMP4) vs MPEG-TS segments: CMAF enables a single segment format for both DASH and HLS, halving storage — the trade-off is slightly more complex packaging and older iOS devices requiring TS
EWMA throughput estimation vs instantaneous: EWMA smooths out network jitter, preventing unnecessary quality oscillation, but reacts slower to sudden bandwidth drops — the 80% safety margin compensates
Buffer-based vs throughput-based ABR: Pure throughput-based ABR is reactive (switches after bandwidth drops); buffer-based ABR is proactive (starts dropping quality before buffer depletes) — modern algorithms combine both for optimal results