System Design: Social Graph

Requirements

Functional Requirements:

Follow/unfollow relationships (directed graph, e.g., Twitter model)
Friendship (bidirectional) with accept/reject flow (e.g., Facebook model)
Query followers of user X and followings of user X
Compute mutual friends/followers between two users
Determine 1st/2nd/3rd degree connection paths (shortest path)
'People You May Know' (PYMK) suggestions

Non-Functional Requirements:

1 billion users; average 500 follows per user → 500B directed edges
Follow/unfollow write latency under 100ms; read latency under 50ms
Graph traversal for 2nd-degree connections under 200ms
99.99% availability; no edge should be lost once written

Scale Estimation

1B users × 500 avg follows = 500B directed edges. Each edge: (follower_id u64, followee_id u64, created_at u32) = 20 bytes → 500B × 20 bytes = 10TB for the raw edge set. Follow writes: assuming 100M follow/unfollow events/day = 1,157 writes/sec. Follower list reads: 1B DAU × 5 follower lookups/day = 5B reads/day = 57,870 reads/sec. Mutual friends: O(min(|N(u)|, |N(v)|)) intersection — for users with 10K followers each, this is a 10K-element set intersection.

High-Level Architecture

The Social Graph Service is a standalone microservice with its own sharded data store. The service exposes gRPC endpoints for CRUD operations on edges and graph traversal queries. Internally, it uses a tiered storage architecture: hot adjacency lists (active users' follower/following lists) are stored in Redis as sorted sets; cold lists are served from Cassandra. An in-memory graph store (custom, loaded from Cassandra snapshots) serves 2nd-degree traversal queries with sub-100ms latency.

Writes flow: client calls Follow(follower_id, followee_id) → Social Graph Service writes to Cassandra primary (follows table) → async Kafka event triggers update to Redis sorted set for both follower_id and followee_id → CDN/cache invalidation for affected PYMK lists. Reads for follower lists hit Redis first (L1 cache), fall back to Cassandra on miss. The 2nd-degree traversal service loads the adjacency list of user X into a local BFS queue and expands one hop, intersecting results with a bloom filter of existing 1st-degree connections to find true 2nd-degree connections.

Core Components

Adjacency List Store (Cassandra)

The primary storage for the follow graph uses Cassandra with two tables for efficient bidirectional lookup. The follows table (partition key: follower_id, clustering key: followee_id, created_at) supports 'who does user X follow' queries as a single partition scan. The followers table (partition key: followee_id, clustering key: follower_id, created_at) supports 'who follows user X'. Both tables are written on every follow/unfollow event via a dual-write. Cassandra's wide-column model handles users with millions of followers by spanning multiple SSTables for large partitions.

Redis Cache Layer

For the 10% of users with active feed requirements (online users, celebrities), adjacency lists are mirrored in Redis sorted sets: following:{user_id} → sorted set of followee_ids scored by follow_timestamp. ZADD for new follows, ZREM for unfollows. Redis ZRANGE with BYSCORE enables time-ordered traversal. The cache is populated lazily on first miss and expires after 24 hours of inactivity. For users with >1M followers, the full list is NOT cached in Redis (too large); only the most recent 10K followers are cached.

PYMK (People You May Know) Engine

PYMK runs as a batch + incremental pipeline. The batch job (Spark on Hadoop, running nightly) computes for each user U: the set of 2nd-degree connections (friends of friends) minus existing 1st-degree connections, scored by common connection count. The top-50 PYMK suggestions per user are stored in a PYMK table in Cassandra and cached in Redis. The incremental pipeline processes follow events in real-time via Kafka: when user A follows B, A appears in B's followers' PYMK lists as a new suggestion.

Database Design

Cassandra schema for directed follow graph:

Both tables are sharded by the partition key. Cassandra's consistent hashing distributes partitions across nodes; replication factor 3 with LOCAL_QUORUM writes ensures durability. Tombstones from unfollow deletes are compacted periodically (SizeTieredCompactionStrategy). A TTL is NOT set on follow records — they are deleted explicitly on unfollow.

For friendship (bidirectional) systems, an additional friend_requests table: (from_user_id, to_user_id, status ENUM[pending, accepted, rejected], created_at). On accept, two follows rows are written atomically using a Cassandra BATCH statement (lightweight transactions). The friendship graph uses the same Cassandra infrastructure with a separate keyspace.

API Design

POST /v1/users/{user_id}/follows/{target_id} — Follow a user; idempotent; returns follow relationship object
DELETE /v1/users/{user_id}/follows/{target_id} — Unfollow a user
GET /v1/users/{user_id}/followers?limit=100&cursor={follower_id} — Paginated follower list
GET /v1/users/{user_id}/mutual_follows/{other_user_id}?limit=20 — Mutual connections between two users

Scaling & Bottlenecks

Celebrity accounts with 100M+ followers create hot partition problems in Cassandra. The followers partition for a celebrity is a single logical partition that can grow to gigabytes — Cassandra handles this via sub-partition splitting in Cassandra 4.x and via application-level partition bucketing in older versions (append a bucket ID 0-99 to the partition key). Read amplification is handled by parallel reads across all buckets with a scatter-gather merge.

The 2nd-degree traversal for users with 10K+ followers spawns large BFS queues. The Traversal Service caps BFS at 10K 1st-degree neighbors and uses parallel ANN approximation for PYMK scoring rather than exact set intersection. For the mutual friends query (exact), Redis SINTERSTORE between two follower sorted sets provides O(N+M) set intersection in microseconds for typical users, where N and M are follower list sizes.

Key Trade-offs

Dual-write (follows + followers tables) vs single table with secondary index: Dual-write doubles storage but gives O(1) partition scans for both directions; Cassandra secondary indexes are less efficient and not recommended for high-cardinality queries
Cassandra over a native graph DB (Neo4j): Cassandra scales horizontally to 500B edges with predictable latency; Neo4j and other native graph DBs excel at complex traversals but struggle to horizontally shard at this edge count
Redis sorted set cache over full Cassandra reads: Cassandra partition scans for active users (many reads/sec) add up; Redis sorted sets serve follower lists in microseconds for hot users — the memory cost is justified by latency improvement
Batch PYMK vs real-time: PYMK suggestions can be 24 hours stale with no user-noticeable impact; nightly batch is 100x cheaper than maintaining a real-time graph traversal service