SYSTEM_DESIGN

System Design: Gaming Achievement System

Design a scalable gaming achievement system that tracks player actions in real time, evaluates progress toward complex multi-condition achievements, and delivers unlock notifications with minimal latency.

14 min readUpdated Jan 15, 2025
system-designgamingachievementsevent-drivenreal-time

Requirements

Functional Requirements:

  • Define achievements with complex unlock conditions: single events ("win 1 match"), counters ("win 100 matches"), streaks ("win 5 matches in a row"), and composite conditions ("win 10 ranked matches as character X in region Y")
  • Track player progress toward each achievement in real time during gameplay
  • Notify players instantly on achievement unlock via in-game notification and push
  • Achievements are organized into categories, have point values (Gamerscore-style), and can be secret (hidden until unlocked)
  • Retroactive evaluation: when a new achievement is added, evaluate it against existing player history
  • Anti-cheat: achievement progress is server-authoritative; clients cannot directly trigger unlocks

Non-Functional Requirements:

  • Process 1 million game events per second at peak
  • Achievement unlock latency: player notified within 500ms of the qualifying event
  • Support 500 million players each with progress tracked for up to 10,000 achievements
  • Achievement definitions can be updated without a full service restart
  • Progress state must be recoverable after service crash with zero data loss

Scale Estimation

500M players × 10,000 achievements = 5 trillion progress records. At 100 bytes each, that's 500 TB — too large for a relational DB without aggressive partitioning. In practice, only ~2% of achievement-player pairs have any progress (most achievements are not yet started by most players), so sparse storage reduces this to ~10 TB. At 1M events/second, each event must be matched against the player's in-progress achievement set (typically 50-200 achievements per player). Evaluating 200 achievements per event at 1M events/second = 200M evaluation operations/second — must be done in-memory, not with DB queries.

High-Level Architecture

The achievement system is event-driven. Game servers publish structured game events (KilledEnemy, WonMatch, CompletedQuest, etc.) to a Kafka topic. Achievement evaluation workers consume from Kafka, load the player's in-memory achievement state from a state store (Redis), evaluate all applicable achievement rules against the new event, update progress, and check unlock conditions. If an achievement unlocks, a notification event is published to a separate topic consumed by the notification service.

Achievement definitions are stored in a PostgreSQL table and cached in the evaluation workers' local memory (refreshed every 60 seconds via a watch mechanism). This allows hot updates to achievement definitions (adding new achievements, fixing conditions) without restarting workers. The definition cache uses versioning — each worker checks the definition version on each evaluation cycle and refreshes if stale.

Progress state is stored in Redis (hot) and DynamoDB (durable, sparse). Redis stores the active progress for players who are currently playing (high-churn state). DynamoDB stores the canonical progress for all players at rest. On game session start, the player's progress is loaded from DynamoDB into Redis with a 4-hour TTL. On session end (or periodically every 30 seconds), dirty progress state is flushed back to DynamoDB. This write-through caching pattern ensures durability without requiring every progress update to hit DynamoDB.

Core Components

Achievement Rule Engine

The rule engine evaluates achievement conditions against incoming game events. Each achievement has an array of conditions. Condition types: event_counter (count occurrences of event type matching filter criteria), event_streak (N consecutive occurrences without interruption), event_composite (all sub-conditions must be satisfied), and event_first (first occurrence of event type). The engine uses a finite state machine per achievement per player — each condition is a state transition. On event arrival, the engine checks which achievements have conditions matching the event type (pre-indexed by event_type → achievement_ids), evaluates only those achievements (not all 10,000), and transitions their state machines. This targeted evaluation reduces the average evaluation cost from O(all achievements) to O(relevant achievements per event type), typically 5-20 achievements per event.

Progress State Manager

Progress state for a player is a JSON document: {achievement_id: {conditions: [{type, current_value, target_value, last_event_ts}], unlocked_at: null}}. The state manager provides: load_player_state(player_id) (from Redis, falls back to DynamoDB), update_progress(player_id, achievement_id, condition_index, delta) (atomic Redis HINCRBY), mark_unlocked(player_id, achievement_id) (sets unlocked_at timestamp, publishes unlock event). The state manager maintains a dirty set per player — achievements modified since last flush. The flush job runs every 30 seconds, batching DynamoDB writes for all dirty achievements across all active players using BatchWriteItem.

Retroactive Evaluation Service

When a new achievement is added (e.g., "Win 50 matches with a new character"), the platform must evaluate it against all 500M players' historical data. This is a batch Spark job that reads the event history from the data warehouse (S3 + Parquet), applies the new achievement's conditions to each player's event history, and writes the resulting progress records to DynamoDB. For a typical "win N matches" achievement, the Spark job processes 500M players in ~4 hours on a 200-node cluster. Players who immediately unlock the retroactive achievement are notified in the next session (retroactive notifications are not sent in real time to avoid confusing mid-session players).

Database Design

PostgreSQL: achievements (achievement_id, title, description, category, points, secret, conditions_json, version, created_at, updated_at), achievement_categories (category_id, name, display_order). DynamoDB: player_achievement_progress (PK: player_id, SK: achievement_id, progress_json, unlocked_at, last_updated_at) — sparse, only rows with non-zero progress, player_achievement_unlocks (PK: player_id, SK: unlocked_at#achievement_id, achievement_id) — for chronological unlock history. Redis: achv:player:{player_id} (hash of achievement_id → progress JSON, TTL 4h), achv:dirty:{player_id} (set of dirty achievement_ids awaiting flush). Kafka topics: game-events (input), achievement-unlocks (output for notification service).

API Design

  • GET /players/{player_id}/achievements — returns all achievements with player's progress; unlocked achievements show unlock date; secret achievements show as "??" until unlocked; served from DynamoDB with Redis cache overlay
  • GET /achievements/{achievement_id} — returns achievement details; hides conditions if secret and not yet unlocked by requesting player
  • GET /players/{player_id}/achievements/recent — returns last 10 unlocked achievements in reverse chronological order; from DynamoDB unlock index
  • POST /admin/achievements — body: {conditions_json, title, points, ...}, creates achievement, triggers retroactive evaluation job
  • GET /achievements/leaderboard/{achievement_id} — returns first 100 players to unlock a given achievement (Hall of Fame); from DynamoDB GSI on unlocked_at

Scaling & Bottlenecks

Evaluation worker throughput: 1M events/second across 200 Kafka partitions = 5k events/second per partition. Each evaluation worker processes one partition sequentially. With 20 achievements to check per event at 5k events/second per worker, a single worker needs to run ~100k condition evaluations/second. In-process evaluation (no I/O) can handle this on a modern CPU core. Redis reads (loading player state) are the I/O bottleneck: ~5k Redis reads/second per worker. Pre-loading player state at session start and keeping it in the worker's local memory for the session duration (sticky player assignment to partition) eliminates per-event Redis reads.

DynamoDB flush at 30-second intervals: with 5M active players each having 5 dirty achievements on average, each flush cycle writes 25M DynamoDB items. At 4 KB per item and 25M items per 30 seconds = 3.3 GB/second of DynamoDB writes. DynamoDB on-demand mode handles this burst; provisioned mode would need 825k WCU. This is expensive — reduce by increasing flush interval to 5 minutes (accepting up to 5 minutes of progress loss on crash) or by using DynamoDB streams with batched writes.

Key Trade-offs

  • Server-authoritative vs. client-side progress: Server authority prevents cheating but requires all game events to flow through the server-side pipeline; client-side progress tracking (common in older games) is faster but trivially exploitable by memory editing.
  • Real-time evaluation vs. batch: Real-time evaluation (Kafka + Redis) provides instant notifications but requires complex state management; nightly batch evaluation is operationally simple but notifications are delayed up to 24 hours — acceptable for casual games, unacceptable for competitive.
  • Sparse DynamoDB vs. dense relational: Sparse DynamoDB storage handles the 500M × 10k achievement matrix efficiently (only rows with progress stored), but makes aggregate queries ("how many players have completed achievement X") expensive — requiring a secondary counter updated on each unlock.
  • Secret achievement visibility: Fully hiding secret achievement existence prevents spoilers but breaks achievement hunter tracking tools; showing secret achievements exist (but hiding conditions) balances discovery and community engagement.

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.