SYSTEM_DESIGN

System Design: Netflix

In-depth system design of Netflix covering microservices architecture, content delivery via Open Connect, personalized recommendations, and adaptive streaming for 250 million subscribers.

20 min readUpdated Jan 15, 2025
system-designnetflixvideo-streamingmicroservices

Requirements

Functional Requirements:

  • Users browse a catalog of movies and TV shows with personalized rows
  • Video playback with adaptive bitrate streaming (DASH with Widevine DRM)
  • Multiple profiles per account with independent watch history
  • Download for offline viewing on mobile devices
  • Content search by title, genre, actors, and directors
  • Continue watching feature with cross-device playback position sync

Non-Functional Requirements:

  • 250 million paid subscribers, 150M DAU across 190 countries
  • Video start time under 2 seconds; rebuffer rate below 0.5%
  • 99.99% availability — outages directly impact subscriber retention
  • Strong consistency for billing and entitlement; eventual consistency for recommendations
  • Serve 400+ Tbps of peak traffic from a library of 17,000+ titles

Scale Estimation

150M DAU streaming an average of 2 hours/day = 300M hours of video daily. At average 7 Mbps effective bitrate, peak traffic exceeds 400 Tbps during evening hours. The catalog has ~17,000 titles, each encoded in 2,000+ files (different bitrates × resolutions × audio tracks × subtitle tracks × codecs). Total encoded library size: ~30PB. API traffic: each user session generates ~100 API calls (browsing, search, playback control) = 15 billion API calls/day = 175K requests/sec average, 500K/sec peak.

High-Level Architecture

Netflix uses a three-tier architecture: the Client, the Backend (running on AWS), and the Content Delivery Network (Open Connect). The Backend runs on AWS and consists of hundreds of microservices orchestrated via Zuul (API gateway), Eureka (service discovery), and Ribbon (client-side load balancing) — all open-sourced as Netflix OSS. When a user opens the app, the client hits the API Gateway → the Personalization Service generates the home page layout (each row is a different recommendation algorithm) → the Video Metadata Service provides title details → the Playback Service negotiates DRM licenses and returns manifest URLs.

Open Connect is Netflix's custom CDN, consisting of Open Connect Appliances (OCAs) — purpose-built servers with 100+ TB of SSD/HDD storage — deployed inside ISP networks and at internet exchange points (IXPs) worldwide. Content is pre-positioned to OCAs overnight during off-peak hours using a popularity prediction model. When a user presses play, the Playback Service selects the optimal OCA based on the client's network path (using BGP data and real-time health monitoring), and returns manifest URLs pointing to that OCA. The OCA serves DASH segments directly to the client over HTTPS.

The encoding pipeline runs on AWS. When Netflix acquires or produces a title, the source master (typically 4K HDR ProRes) enters the Encoding Pipeline, which generates a per-title optimized bitrate ladder using shot-based encoding. Each shot is analyzed for complexity, and the encoder allocates bits per shot rather than per-frame, achieving 20% better compression than fixed bitrate encoding.

Core Components

Open Connect CDN

Open Connect is a purpose-built CDN with ~18,000 servers in 1,000+ ISP locations across 60+ countries. Each OCA runs FreeBSD with a custom NGINX-based web server optimized for large file streaming. OCAs are deployed in two configurations: embedded (inside the ISP network, serving that ISP's subscribers directly) and peering (at IXPs, serving multiple ISPs). Content is proactively pushed to OCAs based on predicted demand — the Fill Pipeline runs nightly, analyzing regional viewing patterns and pre-positioning new content. Cache eviction uses an LRU variant weighted by title popularity; a single popular title may consume 2TB across all renditions.

Per-Title Encoding

Netflix's encoding approach is unique: instead of a fixed bitrate ladder, each title gets a custom ladder optimized for its visual complexity. An animated film (simple visuals) can achieve DVD quality at 500 Kbps; a dark action film needs 4 Mbps for the same perceptual quality. The pipeline uses VMAF (Video Multi-Method Assessment Fusion, developed by Netflix) as the quality metric instead of PSNR/SSIM. A convex hull analysis finds the Pareto-optimal bitrate-resolution pairs for each title. Encoding uses x264/x265 for H.264/HEVC and SVT-AV1 for AV1, running on large EC2 fleets.

Personalization & Recommendation

Netflix's home page is entirely personalized — even the row titles and artwork are customized per user. The system uses a two-phase approach: (1) Row Generation — algorithms like 'Because You Watched X', 'Trending Now', and 'Top Picks for You' each produce a ranked list of titles; (2) Page Assembly — a page-level optimization model decides which rows to show and in what order to maximize engagement. The recommendation model is trained on implicit signals (watch time, completion rate, browsing dwell time) using a combination of collaborative filtering and deep learning. Artwork personalization selects the best thumbnail for each title per user using a contextual bandit model.

Database Design

Netflix uses a polyglot persistence approach. User account and billing data is stored in MySQL (RDS) with read replicas. User viewing history and activity data goes into Cassandra (chosen for its write-heavy workload handling and linear scalability). The video catalog metadata lives in a custom document store called Hollow — a read-only, in-memory dataset that is serialized and distributed to all application servers as a single blob. This avoids per-request database lookups for catalog data entirely. Personalization features are stored in a combination of Cassandra (user profiles) and EVCache (Netflix's memcached-based distributed cache) for real-time serving.

DRM licenses and entitlement data use a separate MySQL cluster with strict ACID guarantees. Playback session state (current position, selected audio/subtitle tracks) is stored in Cassandra with quorum reads for cross-device consistency. Content metadata for the encoding pipeline uses a custom asset management system backed by PostgreSQL.

API Design

  • GET /api/v1/pathways?profile_id={id}&row_count=40 — Fetch personalized home page with recommendation rows
  • POST /api/v1/playback/initiate — Start a playback session; body contains title_id, profile_id, device_type; returns manifest URLs and DRM license
  • GET /api/v1/search?q={query}&offset=0&limit=20 — Search catalog by title/actor/genre
  • PUT /api/v1/playback/{session_id}/heartbeat — Update playback position for continue-watching; body contains timestamp_ms, bitrate, buffer_health

Scaling & Bottlenecks

The primary bottleneck is CDN capacity during peak hours (7-10 PM local time in each timezone). Netflix handles this via proactive content positioning — the Fill Pipeline analyzes historical viewing patterns and pre-positions content to OCAs before peak hours. During unexpected demand spikes (e.g., a surprise hit show premiere), OCAs can pull content from peer OCAs rather than going back to origin (S3), forming a peer-to-peer mesh within the CDN tier. Netflix also implements graceful degradation: if CDN capacity is strained, the client may be served a lower bitrate rendition rather than experience buffering.

The microservices backend on AWS uses circuit breakers (Hystrix) to prevent cascade failures. Each microservice is designed for graceful degradation: if the Recommendation Service is down, the home page falls back to a cached, non-personalized layout. Zuul handles rate limiting and request shedding at the API gateway layer. Auto-scaling groups for each service respond to CPU/latency metrics with a 60-second scaling reaction time.

Key Trade-offs

  • Custom CDN (Open Connect) over commercial CDN: Building and operating 18,000 servers is enormously expensive, but at Netflix's scale (400+ Tbps peak), it is far cheaper than paying Akamai/CloudFront per-GB and provides superior quality control
  • Per-title encoding over fixed bitrate ladder: 20% bandwidth savings at the cost of significant compute for quality analysis — at 150M DAU, the CDN savings pay for the encoding compute thousands of times over
  • Hollow (in-memory catalog) over per-request DB reads: Eliminates catalog lookup latency entirely but requires all application servers to hold the full catalog in memory (~2GB) and accept a few minutes of staleness on updates
  • Cassandra over DynamoDB for viewing history: Cassandra's tunable consistency and Netflix's operational expertise with it outweigh DynamoDB's managed convenience; the team can optimize compaction and repair strategies for their specific access patterns

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.