SYSTEM_DESIGN
System Design: Adaptive Bitrate Streaming (ABR)
System design of an Adaptive Bitrate Streaming system covering DASH/HLS protocols, bitrate ladder generation, client-side bandwidth estimation, and buffer management for seamless video playback.
Requirements
Functional Requirements:
- Encode source video into multiple renditions (bitrate-resolution pairs) forming a bitrate ladder
- Generate DASH (MPD) and HLS (m3u8) manifests listing all available renditions
- Client player dynamically switches between renditions based on network conditions
- Support both VOD (complete manifests) and live (rolling window manifests)
- Server-side ad insertion (SSAI) compatible with manifest manipulation
- DRM encryption (Widevine, FairPlay, PlayReady) integrated into the packaging step
Non-Functional Requirements:
- Zero rebuffering for 99% of playback sessions under normal network conditions
- Quality switch latency under 2 seconds (time from bandwidth change to rendition switch)
- Support 100 million concurrent streams with per-stream manifest generation
- Manifest response time under 20ms for VOD, under 50ms for live
- Compatible with all major platforms: web (MSE), iOS (AVPlayer), Android (ExoPlayer)
Scale Estimation
100M concurrent streams, each requesting a manifest every 2-4 seconds (for live) or once (for VOD). For live streams: 100M × 0.5 manifests/sec = 50M manifest requests/sec — this is the critical hot path. Segment requests: 100M streams × 1 segment every 4 seconds = 25M segment requests/sec at average 1MB each = 200 Tbps CDN bandwidth. Encoding: each VOD title (assuming 10K new titles/day) needs 20 renditions × average 2 hours = 400K encoding-hours/day. Storage: 20 renditions × 2GB average per rendition × 10K titles = 400TB/day.
High-Level Architecture
The ABR system spans three domains: Encoding, Packaging, and Playback. The Encoding domain produces multiple renditions from a source master. A Bitrate Ladder Generator analyzes the source content complexity (using VMAF or per-scene CRF analysis) and determines the optimal set of bitrate-resolution pairs. For simple content (animation, talking heads), fewer high-bitrate renditions are needed; for complex content (sports, action), more granular bitrate steps are required. The encoder (x264/x265/SVT-AV1) processes the source in parallel segments, producing output files for each rendition.
The Packaging domain takes encoded renditions and produces streaming-format-specific output. A DASH Packager generates the MPD manifest and segments the encoded files into 4-second CMAF fragments. An HLS Packager generates the m3u8 master and variant playlists and produces TS or fMP4 segments. For DRM-protected content, the packager encrypts each segment using Common Encryption (CENC) with keys obtained from a Key Management Service. Manifests and encrypted segments are stored in S3 and served via CDN.
The Playback domain is the client-side ABR algorithm. The player (hls.js, ExoPlayer, AVPlayer) implements a bandwidth estimator and buffer manager. The bandwidth estimator tracks segment download throughput using an exponentially weighted moving average (EWMA). The ABR algorithm compares estimated bandwidth against each rendition's bitrate and selects the highest rendition that can be downloaded faster than real-time playback, maintaining a buffer occupancy target of 30 seconds.
Core Components
Bitrate Ladder Generator
The traditional approach uses a fixed bitrate ladder (e.g., Apple's HLS authoring spec: 240p@145kbps to 1080p@7800kbps). The modern approach, pioneered by Netflix, uses per-title optimization: for each title, the system runs multiple encodes at different bitrate-resolution combinations, measures VMAF quality for each, and selects the Pareto-optimal points on the quality-vs-bitrate curve. This produces a custom ladder — an animated show might have 5 rungs (200kbps to 3Mbps) while a nature documentary needs 8 rungs (300kbps to 12Mbps). The analysis runs as a batch job on the transcoding cluster, adding 2-3x encoding time but saving 20-30% bandwidth at equivalent quality.
Manifest Server
The Manifest Server generates and serves DASH/HLS manifests. For VOD, the manifest is static and cached at the CDN edge with a long TTL. For live, the manifest is dynamic — it is regenerated every segment duration (2-4 seconds) to include the latest segments. The Manifest Server for live streams maintains an in-memory sliding window of the last 30 seconds of segment metadata per stream. At 500K concurrent live streams with manifest requests every 2 seconds, the server handles 250K requests/sec. It is horizontally scaled and stateless — stream segment metadata is stored in Redis (sorted set per stream_id, scored by sequence number).
Client ABR Algorithm
The ABR algorithm on the player is the most critical component for user experience. Modern algorithms (e.g., MPC — Model Predictive Control, or Pensieve — RL-based) consider both throughput estimation and buffer occupancy. The algorithm maintains a throughput history window (last 5 segments' download speeds), applies EWMA smoothing (alpha=0.3 for stability), and selects the rendition whose bitrate is at most 80% of the estimated throughput (leaving a 20% safety margin). Buffer-based logic kicks in when buffer drops below 10 seconds — the algorithm aggressively drops quality to prevent rebuffering. Quality increases are delayed by a hysteresis of 2 segments to avoid oscillation.
Database Design
Rendition metadata is stored in PostgreSQL: Renditions table (rendition_id, video_id, codec, resolution, bitrate_kbps, framerate, segment_duration_ms, total_segments, s3_manifest_path, s3_segments_prefix, drm_key_id). The DRM key store uses a separate encrypted database (PostgreSQL with column-level encryption) mapping key_id → encryption_key, wrapped with a KMS master key. Access to DRM keys requires authentication from a licensed DRM server (Widevine license server).
Live stream segment state is stored in Redis: a sorted set per stream_id stores (segment_sequence_number, segment_metadata_json) with a max window of 30 seconds. When new segments are produced by the transcoder, they are ADDed to the sorted set and old segments beyond the window are ZREMRANGEBYSCOREd. Analytics data (buffer health, bitrate selections, rebuffer events) is sent from the client to a telemetry endpoint and streamed to Kafka → ClickHouse for real-time quality-of-experience dashboards.
API Design
GET /v1/manifests/{video_id}/master.mpd— Fetch DASH manifest (MPD) with all available renditions and segment URLsGET /v1/manifests/{video_id}/master.m3u8— Fetch HLS master playlist with variant streamsGET /v1/segments/{video_id}/{rendition_id}/{segment_number}.m4s— Fetch a CMAF video segmentPOST /v1/telemetry/qoe— Report client quality-of-experience metrics; body contains session_id, buffer_health_ms, selected_bitrate, rebuffer_count, bandwidth_estimate
Scaling & Bottlenecks
The live manifest generation at 250K requests/sec is the primary hot path. Since each manifest is unique to a stream (containing the latest segment URLs), it cannot be CDN-cached for long. The solution: the Manifest Server generates manifests on-the-fly from Redis-backed segment metadata with sub-10ms latency. Manifests are cached at the CDN edge with a TTL equal to one segment duration (2-4 seconds) — this means the CDN serves slightly stale manifests, but the player handles this gracefully by requesting the next manifest after consuming the last listed segment.
The encoding pipeline bottleneck is per-title optimization. Running multiple trial encodes to find the optimal bitrate ladder can take 10x longer than a single encode. This is acceptable for VOD (the encode runs once, savings accrue over millions of views) but impractical for live. Live streams use a fixed bitrate ladder with optional per-scene complexity adaptation: the encoder adjusts CRF (Constant Rate Factor) on a per-GOP basis based on scene complexity, providing some of the benefits of per-title encoding in real-time.
Key Trade-offs
- Per-title bitrate ladder vs fixed ladder: Per-title saves 20-30% bandwidth but requires 10x more encoding compute — the savings at scale vastly exceed encoding costs for any video with more than a few hundred views
- CMAF (fMP4) vs MPEG-TS segments: CMAF enables a single segment format for both DASH and HLS, halving storage — the trade-off is slightly more complex packaging and older iOS devices requiring TS
- EWMA throughput estimation vs instantaneous: EWMA smooths out network jitter, preventing unnecessary quality oscillation, but reacts slower to sudden bandwidth drops — the 80% safety margin compensates
- Buffer-based vs throughput-based ABR: Pure throughput-based ABR is reactive (switches after bandwidth drops); buffer-based ABR is proactive (starts dropping quality before buffer depletes) — modern algorithms combine both for optimal results
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.