SYSTEM_DESIGN

System Design: Content Delivery Network (CDN)

Design a content delivery network that caches and serves static and dynamic content from geographically distributed edge locations, reducing latency and origin load for global applications.

15 min readUpdated Jan 15, 2025
system-designcdnedge-cachinganycastload-balancingcache-invalidation

Requirements

Functional Requirements:

  • Cache and serve static assets (images, JS, CSS, videos) from edge PoPs
  • Support dynamic content acceleration (TCP optimization, SSL termination at edge)
  • Origin pull: on cache miss, fetch from origin and cache at edge
  • Cache invalidation: purge individual URLs or tag-based groups globally
  • Custom cache rules per URL pattern (TTL, cache key, vary headers)
  • DDoS protection and WAF at the edge

Non-Functional Requirements:

  • Sub-50ms global latency for cached content
  • 500+ Points of Presence globally
  • Handle 10 Tbps aggregate throughput
  • Cache hit rate > 85% for typical web workloads
  • Cache invalidation propagation within 30 seconds globally

Scale Estimation

With 500 PoPs globally and 10 Tbps aggregate throughput, each PoP handles 20 Gbps on average. Peak PoPs (New York, London, Tokyo) handle 200+ Gbps. Each PoP needs 200 servers @ 1 Gbps each, or 50 servers @ 40 Gbps NICs. Total CDN fleet: 500 PoPs × 50 servers = 25,000 servers. Cache storage: 85% hit rate requires caching the "hot" long tail of content. At 10 Tbps and 1 MB average object size, 1.25 billion object requests/sec. Top 10 million objects account for 90% of traffic — 10 million × 1 MB average = 10 TB of hot cache per PoP. 10 TB of SSD per server × 50 servers = 500 TB per PoP.

High-Level Architecture

A CDN consists of three layers: client, edge PoPs, and origin servers. Clients are directed to the nearest PoP via DNS-based or Anycast routing. Edge PoPs are clusters of cache servers. Origin servers are the customer's application servers behind the CDN. When a client requests a URL, DNS resolves to the nearest PoP's IP. The PoP checks its cache. On hit, the response is served directly from PoP SSD. On miss, the PoP establishes a connection to the origin, fetches the content, caches it locally, and returns it to the client.

Traffic routing uses two mechanisms: DNS-based routing maps user's IP to the geographically nearest PoP's IP (via anycast or GeoDNS). GeoDNS returns different IP addresses based on the resolver's location. Anycast assigns the same IP to multiple PoPs — BGP routing ensures traffic reaches the nearest PoP automatically. Anycast is faster (no DNS TTL delay for failover) but less granular than GeoDNS. Modern CDNs use a hybrid: Anycast for PoP-level routing, weighted DNS for load balancing within a PoP cluster.

Edge servers run a caching proxy (Nginx, Varnish, or custom) that implements HTTP caching semantics. Cache keys default to URL + Vary headers (e.g., Accept-Encoding for compressed variants). Custom cache key rules strip session-specific query parameters (e.g., utm_source, ref) to improve hit rates. Request coalescing (collapse N simultaneous cache-miss requests for the same URL into one origin fetch) prevents cache stampedes on popular content. The assembled response is stored to local SSD and served to all waiting clients simultaneously.

Core Components

Anycast Routing & PoP Selection

Anycast assigns a single IP address (e.g., 104.16.0.1) to servers in all 500 PoPs simultaneously. BGP advertises this IP from every PoP; internet routers direct client packets to the PoP with the lowest AS-path cost (approximately the geographically nearest PoP). If a PoP becomes overloaded or experiences failures, BGP withdraws its advertisement for the anycast IP, and traffic automatically reroutes to the next nearest PoP within 30–60 seconds (BGP convergence time). Anycast enables sub-10ms routing decisions without a central load balancer in the critical path.

Cache Hierarchy (L1/L2)

Large CDNs use a two-tier cache hierarchy within a region. L1 caches are small, fast (all-SSD) caches at the edge PoP. L2 caches (regional shield caches) are larger, slower caches in regional hubs. On an L1 miss, the PoP first checks the regional L2 before going to origin. The L2 cache absorbs origin traffic for the entire region, dramatically reducing origin load. With an L1 hit rate of 70% and L2 hit rate of 60% on L1 misses, effective origin hit rate is 70% + 30% × 60% = 88%. Origin receives only 12% of total requests. This shield topology also protects origins from DDoS amplification.

Cache Invalidation System

Cache invalidation is the hardest CDN problem: how to expire content from 25,000 servers worldwide within 30 seconds without overloading origin or invalidation infrastructure. Invalidation is broadcast via a hierarchical pub-sub system: the customer submits a purge request to the CDN API, which writes the purge entry to a central invalidation queue (Kafka). Regional invalidation agents consume from Kafka and broadcast to all PoPs in their region via an internal message bus. Each edge server subscribes to the bus and removes matching cache entries. Wildcard purges (purge all URLs matching /assets/) use tag-based invalidation: objects are tagged at cache time, and purges target a tag, broadcasting a single compact message rather than millions of individual URL purges.

Database Design

The CDN control plane stores configuration in a globally distributed database (Spanner or CockroachDB): (domain, cache_rules, ssl_certificates, origin_config, waf_rules). Cache rules are versioned; a new rule version triggers a configuration push to all edge servers. SSL certificates (managed via ACME/Let's Encrypt integration) are stored encrypted and rotated automatically before expiry. Edge servers receive configuration via a push mechanism (gRPC streaming from a regional config service) or periodic pull (etcd watch), applying new configs without restarts. Analytics data (request logs, cache hit rates, bandwidth) is collected at each PoP, aggregated regionally, and stored in a time-series database for customer dashboards.

API Design

Scaling & Bottlenecks

Cache miss stampede (thundering herd) occurs when a popular object expires simultaneously across thousands of edge servers. All servers simultaneously fetch from origin, overwhelming it. Solutions: (1) staggered TTL expiry — add random jitter (±10%) to TTL to desynchronize expiry; (2) request coalescing (one fetch per PoP rather than per server); (3) stale-while-revalidate — serve the expired (stale) cached response to clients while asynchronously refreshing in the background, preventing user-visible latency spikes. Stale-while-revalidate is the most effective strategy and is now a standard HTTP cache directive (RFC 5861).

SSL/TLS termination at the edge is CPU-intensive. A PoP handling 100,000 HTTPS requests/sec with TLS 1.3 (ECDHE, CHACHA20 cipher) requires ~100 CPU cores for TLS handshakes. Session resumption (TLS session tickets) reduces full handshake overhead by ~10x for returning clients. Hardware TLS offload cards (Marvell LiquidSecurity, Intel QAT) handle TLS at line rate on 40/100 Gbps NICs, freeing CPU for application logic. QUIC/HTTP3 further reduces connection setup latency (0-RTT resumption) at the cost of higher UDP processing overhead.

Key Trade-offs

  • Anycast vs. GeoDNS: Anycast enables instant failover and simpler client configuration but requires BGP management and provides coarser geographic granularity; GeoDNS enables more precise routing but has TTL-dependent failover latency
  • Long TTL vs. content freshness: Long cache TTLs maximize hit rates and reduce origin load but mean users see stale content after deployments; URL versioning (cache-busting) achieves long TTLs with instant updates at the cost of requiring URL changes on every asset update
  • Edge compute vs. pure caching: Running serverless functions at the edge (Cloudflare Workers, Lambda@Edge) enables dynamic content generation near users but increases edge server complexity and cost
  • Single-tier vs. two-tier cache hierarchy: A two-tier hierarchy dramatically reduces origin load for long-tail content but increases miss latency (L2 lookup before origin) and operational complexity

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.