SYSTEM_DESIGN
System Design: Video Comments System (YouTube-scale)
System design of a YouTube-scale video comments system covering threaded replies, real-time updates, spam detection, sorting algorithms, and handling millions of comments per popular video.
Requirements
Functional Requirements:
- Users post text comments on videos with optional timestamps and mentions
- Threaded replies (1 level of nesting: top-level comments and replies)
- Like/dislike comments; sort by top (likes), newest, or relevance
- Real-time comment updates for live/premiere videos
- Spam and abuse detection with automated filtering
- Creator moderation tools: pin, delete, filter, hold for review, block users
Non-Functional Requirements:
- Support 500 million comments posted per day across all videos
- Read-heavy: 50 billion comment page views per day
- Latency: comment list load under 200ms; comment post acknowledgment under 500ms
- Handle videos with 10M+ comments (mega-comment threads)
- 99.9% availability for reads; comments posted must never be silently lost
- Comply with regional content regulations (automated flagging for legal review)
Scale Estimation
500M comments written/day = 5,800 writes/sec. 50B comment views/day = 580K reads/sec. Average comment size: 200 bytes text + 100 bytes metadata = 300 bytes. Daily storage growth: 500M × 300 bytes = 150GB/day = 55TB/year. A video with 10M comments at 300 bytes each = 3GB of comment data for a single video — this must be paginated efficiently. Reply fan-out: average comment has 2 replies → 1 billion reply relationships/day. Like counters: assume 10% of viewers like a comment → 5B like events/day = 58K like writes/sec.
High-Level Architecture
The comments system has three layers: the Write Path, the Read Path, and the Moderation Pipeline. The Write Path handles comment creation. A user submits a comment via the Comment API → the Comment Service validates the request (auth, rate limiting, text length), assigns a globally unique comment_id (Snowflake ID), and writes the comment to Cassandra (partitioned by video_id). Simultaneously, the comment text is sent to the Spam Detection Service (ML classifier) and the result is stored alongside the comment. If the comment is classified as spam (confidence > 0.95), it is hidden immediately. If borderline (0.5-0.95), it is held for creator review.
The Read Path serves paginated comment lists. When a user opens the comments section, the Comment Service reads the top-level comments for the video_id from Cassandra, sorted by the requested order (top, newest, relevance). Sorting by 'top' uses a precomputed score: score = likes - dislikes + freshness_bonus. This score is maintained in a Redis sorted set per video_id for the top 1,000 comments. For comments beyond the top 1,000, the system falls back to Cassandra scans with client-side sorting. Replies are loaded lazily: when a user expands a comment thread, a separate query fetches replies for that comment_id.
The Moderation Pipeline runs asynchronously. Every comment passes through: (1) a text classifier (BERT-based) detecting spam, hate speech, harassment, and self-harm content; (2) a link scanner checking URLs against a phishing/malware database; (3) a regex filter for creator-defined blocked words. Flagged comments enter a moderation queue visible to the creator and platform moderators.
Core Components
Comment Storage Engine
Cassandra is the primary store, chosen for its write throughput and partition-key-based access patterns. The Comments table uses video_id as the partition key and comment_id (Snowflake, time-ordered) as the clustering key. This allows efficient range queries for 'newest first' (descending clustering order). For 'top' sorting, a separate materialized view or secondary index is impractical at this scale — instead, a background job computes the top 1,000 comments per video hourly and caches them in Redis sorted sets. Videos with <1,000 comments use Cassandra directly with application-level sorting.
Real-Time Comment Stream
For live videos and premieres, comments update in real-time. Viewers subscribe to a WebSocket channel per video_id. When a new comment is posted, after writing to Cassandra, the Comment Service publishes to a Redis Pub/Sub channel comments:{video_id}. WebSocket Edge Servers subscribed to this channel push the comment to all connected viewers. For viral live streams with 1M+ concurrent viewers, the WebSocket tier uses hierarchical fan-out (similar to Twitch chat) with regional aggregation servers. A rate limiter on the WebSocket output caps display at 20 comments/sec per viewer to prevent UI flooding.
Spam Detection Service
The spam classifier is a fine-tuned BERT model served on GPU inference servers (NVIDIA T4) via TorchServe. The model processes comment text and context features (user account age, comment history, video category) and outputs a spam probability. At 5,800 comments/sec, the inference fleet handles this with batch processing (batch size 32, ~10ms per batch on T4). The model is retrained weekly on newly labeled data (human moderator decisions). A separate rule-based system handles obvious spam (comments with 50+ emojis, repeated characters, known spam patterns) before the ML model, reducing GPU load by 30%.
Database Design
Cassandra schema: CREATE TABLE comments (video_id UUID, comment_id BIGINT, parent_comment_id BIGINT, user_id UUID, text TEXT, likes INT, dislikes INT, spam_score FLOAT, status TEXT, created_at TIMESTAMP, PRIMARY KEY (video_id, comment_id)) WITH CLUSTERING ORDER BY (comment_id DESC). This schema supports efficient 'newest first' pagination. A separate table comment_replies has partition key parent_comment_id and clustering key comment_id for loading reply threads.
Redis stores hot data: sorted sets for top comments per video (top_comments:{video_id} scored by engagement score), counters for likes/dislikes per comment (likes:{comment_id}), and Pub/Sub channels for real-time comment streaming. Like/dislike counters in Redis are flushed to Cassandra every 60 seconds via a batch writer. User-level data (is this user blocked by the creator, is this user a moderator) is cached in Redis with a 5-minute TTL. A ClickHouse cluster stores comment analytics (comments/hour, spam rate, moderation queue depth) for creator dashboards.
API Design
GET /api/v1/videos/{video_id}/comments?sort=top&cursor={token}&limit=20— Fetch paginated top-level comments with cursor-based paginationPOST /api/v1/videos/{video_id}/comments— Post a comment; body contains text, optional parent_comment_id for replies; returns comment object with spam statusPOST /api/v1/comments/{comment_id}/like— Like a comment; idempotent (deduplicated by user_id)GET /api/v1/comments/{comment_id}/replies?cursor={token}&limit=10— Fetch replies to a specific comment
Scaling & Bottlenecks
The primary bottleneck is reading comments for mega-videos (10M+ comments). A Cassandra partition with 10M rows (video_id as partition key) becomes a hot partition, causing read latency spikes. The solution is partition splitting: for videos exceeding 100K comments, the partition key becomes video_id:bucket_number where bucket_number = comment_id / 100K. The Comment Service transparently queries the appropriate bucket based on the cursor position. A metadata entry in Redis tracks the current bucket count per video.
Like counter write amplification is the second bottleneck. A popular comment receiving 1M likes in an hour generates 1M Redis INCR operations. Redis handles this easily (300K ops/sec per node), but flushing to Cassandra requires batch writes. The flush job aggregates likes over 60-second windows and writes a single UPDATE per comment, reducing Cassandra writes from 1M to 60 per comment per hour. The trade-off is that like counts may be up to 60 seconds stale in Cassandra — Redis serves the real-time count.
Key Trade-offs
- Cassandra over PostgreSQL: Cassandra's partition-key access pattern perfectly matches the video_id → comments query; PostgreSQL would require sharding to handle 580K reads/sec and struggles with partition-level scaling for mega-videos
- Precomputed top comments vs real-time sorting: Sorting 10M comments by engagement score in real-time is prohibitively expensive; precomputing the top 1,000 hourly is a reasonable staleness trade-off — comments rarely change rank after the first hour
- 1-level threading vs full threading: Full recursive threading (Reddit-style) dramatically complicates pagination and rendering; 1-level (top-level + replies) is simpler and matches YouTube's proven UX pattern
- ML spam detection vs rule-based only: The BERT classifier catches 95% of spam vs 60% for rules alone, but adds GPU inference latency — the two-tier approach (rules first, ML second) optimizes cost while maintaining accuracy
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.