SYSTEM_DESIGN

System Design: DDoS Protection System

Design a multi-layer DDoS protection system that defends against volumetric, protocol, and application-layer attacks. Covers traffic scrubbing, anycast routing, rate limiting, behavioral analysis, and automated mitigation at network and application layers.

14 min readUpdated Jan 15, 2025
system-designddosnetwork-securityrate-limitinganycast

Requirements

Functional Requirements:

  • Detect and mitigate volumetric DDoS attacks (UDP flood, ICMP flood, amplification attacks) at the network edge
  • Defend against protocol attacks (SYN flood, ACK flood, fragmented packet attacks) without blocking legitimate traffic
  • Detect and block application-layer attacks (HTTP flood, Slowloris, credential stuffing disguised as normal traffic)
  • Provide an emergency mode: block all traffic from attacking IP ranges within 30 seconds of attack detection
  • Allow legitimate traffic to pass with less than 5ms additional latency during active mitigation
  • Provide a self-service rule management interface for customers to define custom block/challenge rules

Non-Functional Requirements:

  • Absorb volumetric attacks of up to 1 Tbps without infrastructure saturation
  • Mitigation activation within 10 seconds of attack detection
  • False positive rate (blocking legitimate users) under 0.01%
  • 99.999% availability for the mitigation plane
  • Traffic scrubbing adds less than 5ms latency for clean traffic

Scale Estimation

A 1 Tbps UDP flood = 1.25 billion 100-byte packets/second. No single data center can absorb this; traffic must be distributed across a global anycast network with 20+ PoPs (Points of Presence), each absorbing 50 Gbps. At normal (non-attack) traffic: 500 Gbps global, 25 Gbps per PoP. Attack detection must analyze flow telemetry (NetFlow/IPFIX) at 10 million flows/second across the entire network.

High-Level Architecture

DDoS protection operates at three layers: L3/L4 (network/transport), L7 (application), and analytics (attack detection and policy coordination). At L3/L4, anycast routing distributes attack traffic across a global scrubbing network. At L7, a reverse-proxy cluster runs behavioral analysis and challenge-response mechanisms. The analytics layer processes flow telemetry and application logs to detect attack patterns and push mitigation rules.

Anycast routing: origin servers announce the same IP prefix from all PoPs via BGP. Attackers send traffic to this IP; BGP routing delivers it to the nearest PoP based on the attacker's ISP peering. Each PoP performs L3/L4 filtering (block known bad IP ranges, rate-limit per-source IP) and forwards clean traffic via GRE tunnels or direct peering to the origin servers. During a large attack, BGP communities signal upstream ISPs to apply Remotely Triggered Black Hole (RTBH) routing for the most abusive source ranges.

Application-layer protection runs on the reverse-proxy layer. A traffic scoring engine assigns each request a risk score based on: IP reputation (threat intelligence feeds), header anomalies (missing common browser headers, invalid User-Agent), TLS fingerprint (JA3 hash matching known attack toolkits), request rate per IP, and behavioral signals (request pattern matches attack signatures). High-risk requests receive a JavaScript challenge (CAPTCHAs or invisible proof-of-work); medium-risk requests are served with rate limiting; low-risk requests pass through normally.

Core Components

NetFlow-Based Attack Detection

All edge routers export NetFlow/sFlow records (sampled 1:1000) to a central collector. An Apache Flink job processes flow records in real time, computing: per-source-IP packet rate, per-destination-IP packet rate, protocol distribution, and packet size distribution. A threshold detector fires when any destination receives >10Gbps from a single source IP or >100Gbps aggregate. An ML anomaly detector (trained on historical traffic patterns) identifies unusual protocol distributions indicating amplification attacks (DNS, NTP, SSDP reflection).

L3/L4 Scrubbing Pipeline

Upon attack detection, the mitigation controller pushes ACLs (Access Control Lists) to edge routers via NETCONF/gRPC. ACLs specify: drop all traffic from attacking source IPs/CIDRs, rate-limit UDP traffic to 10% of normal, enable SYN cookies for TCP connections (validates TCP handshake without allocating state, preventing SYN flood state exhaustion). SYN cookies encode the TCP sequence number as a cryptographic function of the source/dest IP and port, verifying the ACK without storing half-open connection state.

Application-Layer Behavioral Analysis

The behavioral analysis engine maintains per-IP request counters in Redis (sliding windows: 1s, 10s, 60s, 600s). A request scoring function combines: IP reputation score (external threat feed lookup), request rate deviation from baseline (Z-score), TLS fingerprint mismatch (JA3 hash not in legitimate browser list), and request pattern similarity (edit distance from known attack patterns). Challenge mechanisms: Proof-of-Work (client must compute a hash with N leading zero bits, taking 100–500ms CPU time) stops automated HTTP flood tools without impacting human users.

Database Design

Redis Cluster for rate limiting: ratelimit:{ip}:{window} → counter. Redis for IP reputation cache: iprep:{ip} → risk_score (TTL 1 hour). PostgreSQL for mitigation rules: rules (rule_id, type ENUM(BLOCK_IP, CHALLENGE, RATE_LIMIT), criteria JSON, action JSON, created_by, expires_at, is_active). TimescaleDB for attack telemetry: (ts TIMESTAMP, source_ip INET, dest_ip INET, protocol INT, pps BIGINT, bps BIGINT, attack_type VARCHAR) partitioned by hour. S3 for raw flow archives (compressed IPFIX) retained 90 days.

API Design

POST /rules — Create a custom mitigation rule (block IP CIDR, rate-limit endpoint, challenge country). GET /attacks/active — Return currently active attacks with type, volume, source distribution, and mitigation status. POST /attacks/{attack_id}/escalate — Escalate mitigation to emergency mode (RTBH for attacking prefixes). GET /traffic/live — Real-time traffic dashboard: clean vs. attack traffic volumes by PoP and protocol.

Scaling & Bottlenecks

The scrubbing network's aggregate capacity (1 Tbps) can be exceeded by nation-state level attacks (10+ Tbps documented). Upstream ISP partnerships for pre-scrubbing (Tier-1 ISP drops attack traffic before it reaches the scrubbing network) extend effective capacity. Anycast distribution limits per-PoP attack volume; BGP traffic engineering can redistribute attack load from saturated PoPs to underloaded ones within 60 seconds.

Application-layer detection latency: analyzing each request's behavioral signals must complete within the 5ms budget. Local Redis lookups (0.5ms), in-process IP reputation cache (0.1ms), and JA3 fingerprint comparison (0.1ms) all fit within budget. Machine learning-based detection (requiring feature assembly and model inference) runs asynchronously: the first request from an IP is allowed through while the risk score is being computed; subsequent requests within 100ms use the computed score.

Key Trade-offs

  • Anycast distribution vs. centralized scrubbing: Anycast distributes attack traffic globally and brings mitigation closer to the attack source; centralized scrubbing simplifies rule management but creates a single absorption bottleneck.
  • Rate limiting vs. behavioral challenge: Rate limiting is simple and effective against volume attacks but has high false positives; behavioral challenges (JS, CAPTCHA) accurately distinguish humans from bots but add latency and friction for legitimate users.
  • Aggressive vs. conservative blocking: Aggressive blocking (block entire ASNs, country blocks) stops attacks faster but risks blocking legitimate users in those regions; conservative blocking minimizes false positives but allows some attack traffic through during signature learning.
  • On-premises vs. cloud DDoS protection: Cloud-based DDoS protection (Cloudflare, AWS Shield) provides massive capacity and zero infrastructure management but routes all traffic through a third party; on-premises scrubbing appliances keep traffic in-house but are limited to available bandwidth.

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.