SYSTEM_DESIGN

System Design: User Follow/Follower System

Detailed system design of a follow/follower social graph covering bidirectional relationship storage, fan-out strategies, follower count consistency, and graph traversal at billion-user scale.

15 min readUpdated Jan 15, 2025
system-designsocial-graphfollow-systemgraph-database

Requirements

Functional Requirements:

  • Users can follow and unfollow other users (asymmetric relationship)
  • Users can view their follower list and following list with pagination
  • Display follower and following counts on user profiles
  • Show mutual followers (friends) between two users
  • Suggest users to follow based on social graph proximity
  • Support follow requests for private accounts with accept/reject workflow

Non-Functional Requirements:

  • 1 billion users with an average of 200 following and 200 followers each — 200 billion edges
  • Follow/unfollow operations must complete within 100ms
  • Follower count queries must return within 50ms
  • 99.99% availability; the social graph is foundational to all platform features
  • Strong consistency for follow state (a user must never see stale follow status)

Scale Estimation

1 billion users × 200 average follows = 200 billion directed edges. At 16 bytes per edge (8-byte follower_id + 8-byte followee_id), raw edge storage is 3.2TB — comfortably fits in a distributed database. Follow/unfollow write rate: assuming 1% of DAU (500M) modify their follow list daily with 2 operations each, that is 10M writes/day = 116 writes/sec — a light write load. Read traffic is much heavier: every feed load, profile view, and story tray requires graph lookups. At 500M DAU with 50 graph queries per session, that is 25 billion reads/day = 290K reads/sec. The mutual followers query requires an intersection of two adjacency lists, averaging 200 elements each — O(n) with sorted lists.

High-Level Architecture

The follow system is built as a dedicated Social Graph Service with its own data layer, separated from the main application database. The core storage is a sharded graph store optimized for adjacency list operations. Two separate adjacency lists are maintained per user: a following list (users this person follows) and a followers list (users who follow this person). Every follow operation writes to both lists transactionally to maintain consistency.

The service exposes a gRPC API consumed by other platform services (Feed Service, Story Service, Notification Service, Recommendation Service). A Redis cache layer sits in front of the graph store, caching the most frequently accessed adjacency lists and follower counts. The cache uses a write-through strategy: follow/unfollow operations update both the database and the cache atomically. For the follow suggestions feature, a separate Graph Analytics Service runs nightly batch jobs on a Neo4j or Apache Giraph cluster to compute 2-hop neighbor recommendations and PageRank-based influence scores.

Core Components

Graph Storage Engine

The graph store uses a sharded MySQL cluster (Vitess) with two denormalized tables: user_following (partition key: follower_id, columns: followee_id, created_at) and user_followers (partition key: followee_id, columns: follower_id, created_at). This dual-write pattern optimizes for the two most common queries: 'who does user X follow?' and 'who follows user X?'. Both tables use a composite primary key (partition_key, created_at DESC) for efficient reverse-chronological pagination. The follow operation writes to both tables in a distributed transaction using a two-phase commit coordinator. For cross-shard follows (follower and followee on different shards), a Saga pattern with compensating transactions handles failures.

Follower Count Service

Follower and following counts are materialized as separate counters in Redis (key: count:{user_id}:followers and count:{user_id}:following). These counters are atomically incremented/decremented on every follow/unfollow operation using Redis INCRBY. For celebrity accounts with rapidly changing follower counts (Kafka consumer lag can cause the counter to drift), a periodic reconciliation job runs every hour: it counts the rows in the user_followers table for the top 10,000 accounts by follower count and updates the Redis counter if the drift exceeds 0.1%. Follower counts are displayed with rounding for large numbers (e.g., '12.4M followers') to mask minor inconsistencies.

Mutual Followers & Graph Queries

The mutual followers query ('show me followers of user B who I also follow') requires intersecting two sets. For small accounts (< 10K followers), this is done at query time: fetch both adjacency lists from cache/DB, sort, and intersect in O(n) time. For large accounts (> 10K followers), the intersection is too slow. Instead, a precomputed mutual followers cache is maintained: when user A views user B's profile, the service computes the intersection asynchronously and caches the result in Redis with a 24-hour TTL. The 'friend suggestions' feature uses a Jaccard similarity metric on follower overlap: similarity(A, B) = |followers(A) ∩ followers(B)| / |followers(A) ∪ followers(B)|, computed nightly via a Spark job on the full graph export.

Database Design

The primary graph store is MySQL (Vitess) with the following schema. Table user_following: follower_id (BIGINT, partition key), followee_id (BIGINT), created_at (DATETIME), status (ENUM: active, pending for private accounts). Table user_followers: followee_id (BIGINT, partition key), follower_id (BIGINT), created_at (DATETIME). Both tables have approximately 200 billion rows total, sharded across 1,024 Vitess shards with 200M rows per shard. Each shard runs on a MySQL 8.0 instance with 500GB SSD and 64GB RAM, with two read replicas for read scaling.

Redis caches adjacency lists as sorted sets: following:{user_id} and followers:{user_id} with the score set to the follow timestamp for chronological ordering. To bound memory, only users with fewer than 50K followers have their full adjacency list cached — celebrity follower lists are paginated from MySQL on demand. Redis memory for adjacency list caching: assuming 500M active users with average 200 entries at 8 bytes each = 800GB Redis memory total, spread across a 100-node Redis Cluster.

API Design

  • POST /api/v1/users/{user_id}/follow — Follow a user; returns follow status (followed or pending for private accounts)
  • DELETE /api/v1/users/{user_id}/follow — Unfollow a user; idempotent
  • GET /api/v1/users/{user_id}/followers?cursor={timestamp}&limit=50 — Paginated follower list sorted by recency
  • GET /api/v1/users/{user_id}/following?cursor={timestamp}&limit=50 — Paginated following list sorted by recency
  • GET /api/v1/users/{user_id}/mutual-followers/{other_user_id}?limit=20 — Mutual followers between two users

Scaling & Bottlenecks

The 290K reads/sec graph query load is handled by the Redis cache layer, which absorbs 95% of reads. Cache misses (approximately 14.5K/sec) are served by MySQL read replicas. The cache hit rate is high because of temporal locality — users repeatedly check the same profiles. Cache warming for new users is done lazily on first access. The write path (116 writes/sec) is light but requires cross-shard consistency for the dual-write pattern. The Saga coordinator processes follows sequentially per user (using a per-user lock in Redis) to prevent race conditions like simultaneous follow/unfollow of the same user.

The biggest scalability challenge is the follow suggestions feature, which requires analyzing the global graph structure. Running Personalized PageRank or label propagation on a 200-billion-edge graph requires a dedicated graph processing cluster. Apache Giraph on a 500-node Hadoop cluster processes the full graph in approximately 4 hours. The output (top 100 follow suggestions per user) is written to a Redis-backed feature store consumed by the Recommendation API. This batch computation runs nightly during off-peak hours.

Key Trade-offs

  • Dual-write denormalization vs. single table with index: Dual tables (following + followers) double the write cost but make both query directions O(1) lookups by partition key — with a read/write ratio of 2500:1, optimizing reads is clearly the right trade-off
  • MySQL over a native graph database: MySQL with adjacency list tables handles the primary queries (list followers, list following, check follow status) efficiently; a native graph DB like Neo4j would be better for multi-hop traversals but adds operational complexity for a single-hop-dominant workload
  • Redis materialized counts vs. COUNT(*) queries: Materialized counters are O(1) but can drift from the source of truth — the hourly reconciliation job keeps drift below 0.1%, which is acceptable for display purposes
  • Strong consistency for follow state vs. eventual: Follow/unfollow must be strongly consistent (a user who unfollows should never see content from that account again) — this requires synchronous dual-writes with distributed transactions, adding latency but ensuring correctness*

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.