System Design: TikTok

Requirements

Functional Requirements:

Users upload short videos (up to 10 minutes)
Personalized For You Page (FYP) video feed
Users can follow creators, like, comment, and share videos
Live streaming capability
Search for videos, users, and sounds/music
Duet and Stitch features for video collaboration

Non-Functional Requirements:

1 billion MAU, 600M DAU; 34M videos uploaded per day
FYP first video must start playing within 1 second (instant scroll experience)
99.99% availability; video content must never be lost
Recommendation model freshness: update signals within 30 seconds of engagement

Scale Estimation

600M DAU × 90 minutes average watch time = 54B minutes of video/day. At 1 video per 30 seconds average, that's 108B video views/day = 1.25M video views/sec. Uploads: 34M videos/day = 394 uploads/sec. Average video size 50MB raw, compressed to 10MB → 340TB new storage/day. With multiple resolutions (360p, 720p, 1080p), multiply by 3x = 1PB/day of processed video storage.

High-Level Architecture

TikTok's architecture has two critical paths: the upload pipeline and the recommendation serving pipeline. The upload pipeline: mobile client uploads to the nearest edge PoP → Upload Service streams to object storage (a distributed blob store) → a Video Processing Pipeline (Kubernetes-orchestrated FFmpeg workers) transcodes to multiple resolutions and extracts audio fingerprints, visual embeddings, and metadata → results written to a Video Metadata Service (backed by MySQL + Redis) and the recommendation feature store.

The FYP serving pipeline: a user opens TikTok → the Recommendation Service is called with user_id + device context + session signals → a two-stage retrieval system (ANN search over video embeddings via FAISS on GPU servers) retrieves ~1,000 candidate videos → a heavy ranking model (transformer-based, trained on watch time, completion rate, likes, shares) scores candidates → top 10-15 videos are returned → the client pre-fetches and buffers the next 5 videos. This pre-fetching is why TikTok feels instant.

Core Components

Video Processing Pipeline

Every uploaded video enters a DAG-based processing pipeline orchestrated by Apache Airflow. Tasks include: format validation, virus scanning, transcoding to 5 bitrate ladders (360p to 4K), thumbnail extraction at 3 timestamps, audio fingerprinting (for music rights detection via ACRCloud-compatible service), scene detection for chapter markers, and ML-based content safety classification. Processed assets are stored in a geo-distributed CDN origin (Akamai + proprietary CDN). Processing P99 latency target: under 60 seconds.

For You Page Recommendation Engine

The recommendation engine is a multi-stage funnel. Stage 1 (Retrieval): two sub-systems run in parallel — collaborative filtering (user-item embeddings via matrix factorization) retrieves videos from users with similar engagement patterns; content-based retrieval (video visual/audio embeddings) retrieves videos similar to recently watched content. Stage 2 (Ranking): a 500M parameter deep learning model takes 200+ features (user history, video features, context features) and predicts watch_time_fraction, like_probability, and share_probability. Stage 3 (Post-ranking): diversity injection, creator freshness boost, and policy filters.

Live Streaming Infrastructure

Live streams use WebRTC for mobile broadcast with RTMP ingest at edge servers. Edge servers transcode the stream to HLS and push chunks to CDN. A separate Signaling Service coordinates viewer joins/leaves. The recommendation system gives a real-time boost to live videos from followed creators. Chat during live streams uses a dedicated WebSocket gateway with message fan-out via Redis Pub/Sub (capped at 1,000 messages/sec per stream to avoid overwhelming clients).

Database Design

Video metadata (video_id, creator_id, title, description, hashtags, music_id, duration, resolution_urls, created_at, status) is stored in MySQL sharded by video_id. A separate Redis cache stores hot video metadata with a 24-hour TTL. The social graph (follows) uses Cassandra with wide-column design. Engagement counters (likes, views, shares) use Redis counters with periodic flush to MySQL — this avoids write storms to the primary DB during viral events. A separate ClickHouse cluster stores all interaction events for feature engineering and analytics.

The recommendation feature store uses Redis for real-time features (last 10 videos watched, session-level signals) and a Feast-compatible offline store (backed by Parquet on S3) for historical features. User and video embeddings are stored in a FAISS index loaded into GPU memory on recommendation servers; the index is rebuilt hourly from an Spark job on the engagement log.

API Design

POST /api/v1/video/upload — Initiate multipart upload; returns upload_id and pre-signed S3 URLs for each chunk
GET /api/v1/feed/foryou?count=10&session_id={id} — Fetch next batch of FYP videos with pre-signed CDN URLs
POST /api/v1/video/{video_id}/engage — Record engagement event (watch_time, like, share, skip); used for real-time model updates
GET /api/v1/search?q={query}&type=video&cursor={token} — Search videos by text/hashtag

Scaling & Bottlenecks

The recommendation serving latency is the biggest challenge — running a heavy ranking model over 1,000 candidates must complete in <50ms. TikTok addresses this with: GPU inference clusters with batch inference (multiple users' ranking requests batched together); model distillation (a smaller 50M parameter model for initial filtering before the heavy model); and pre-computation of user embedding vectors refreshed every 5 minutes stored in Redis for instant retrieval without on-the-fly computation.

Video CDN costs are the largest infrastructure expense. TikTok uses a tiered CDN strategy: hot videos (top 0.1% by views) are pushed to all edge PoPs; warm videos (top 10%) are cached regionally; cold videos (long tail) are served from origin with lazy regional caching. A video popularity predictor (gradient boosting model on early engagement signals) decides which tier to push a new video to within the first 30 minutes of upload.

Key Trade-offs

Pre-fetching over on-demand loading: Fetching the next 5 videos before the user needs them adds bandwidth cost but is the core reason TikTok feels instant vs. competitors
Watch time as primary ranking signal over likes: Optimizing for watch completion captures passive engagement (rewatches, loop views) better than explicit signals, but risks amplifying addictive content
Redis counters with async flush: Accepting ~30-second eventual consistency on view counts avoids DB write bottlenecks during viral events — the trade-off is slight inaccuracy in real-time counters
Two-stage retrieval (ANN + heavy ranker): ANN retrieval over embeddings is approximate but fast (O(log n) vs O(n)); the heavy ranker corrects the approximation errors for the final top-10