System Design: Notification Service

Requirements

Functional Requirements:

Deliver notifications across four channels: in-app, push, email, and SMS
User-configurable preferences per notification category and channel
Template-based notification rendering with variable substitution
Notification aggregation (batch similar notifications to avoid spamming)
Notification history with read/unread status and in-app notification center
Priority levels: critical (immediate), standard (batched), low (digest)

Non-Functional Requirements:

Process 5 billion notifications per day across all channels
Critical notifications delivered within 3 seconds end-to-end
99.99% delivery rate for critical notifications
At-least-once delivery guarantee with client-side deduplication
Support 200+ notification types across multiple product teams

Scale Estimation

With 5 billion notifications per day, the system processes approximately 58,000 notifications per second sustained. Channel distribution: 40% in-app (2B), 30% push (1.5B), 20% email (1B), 10% SMS (500M). Average payload sizes vary by channel: in-app 500 bytes, push 2KB, email 5KB, SMS 160 bytes. Total data throughput: approximately 10TB per day. Notification preference lookups: with 500M users and 200 notification types, the preferences table has 100 billion potential entries (sparsely populated with ~5B actual preferences). Template renders: 5B/day at an average of 200 microseconds each requires ~280 CPU-hours of rendering capacity.

High-Level Architecture

The notification service follows an event-driven architecture with three layers: ingestion, orchestration, and delivery. The Ingestion Layer accepts notification requests from upstream services (order service, messaging service, marketing platform) via a REST API or Kafka topic. Each request contains: event_type, recipient_user_id, template_id, template_variables, and optional channel/priority overrides. The Ingestion Service validates the request and publishes to a notifications-raw Kafka topic.

The Orchestration Layer is the decision engine. A Notification Orchestrator consumes from notifications-raw and performs: (1) preference lookup — checks if the user has opted in to this notification type on each channel; (2) aggregation — checks if this notification should be batched with recent similar notifications (e.g., '5 people liked your post' instead of 5 separate notifications); (3) template rendering — resolves the template with variables to produce the final content for each channel; (4) rate limiting — ensures no user receives more than N notifications per hour. The orchestrator outputs per-channel delivery tasks to channel-specific Kafka topics.

The Delivery Layer has specialized senders per channel. The In-App Sender writes to a per-user notification store (Redis sorted set + Cassandra for persistence) and pushes via WebSocket if the user is online. The Push Sender dispatches to APNs/FCM. The Email Sender renders HTML templates and dispatches via SES or a custom SMTP relay. The SMS Sender routes through an aggregator like Twilio. Each sender tracks delivery status and writes receipts to an analytics pipeline.

Core Components

Notification Orchestrator

The Orchestrator is the brain of the system. It consumes raw notification events and applies a decision pipeline: First, it fetches user preferences from the Preferences Service (Redis-cached, with Cassandra as the backing store). If the user has disabled this category on a specific channel, that channel is skipped. Second, it checks the Aggregation Service — if 3 'like' notifications arrived in the last 5 minutes for the same post, they are merged into 'User A, B, and 2 others liked your post.' Third, it renders the template for each enabled channel using a Template Engine (Mustache/Handlebars with per-channel layouts). Fourth, it applies rate limiting (token bucket per user, per channel).

Aggregation Service

The Aggregation Service prevents notification fatigue by batching similar events. It uses a time-windowed aggregation pattern: events of the same type for the same target (e.g., likes on the same post) are held in a Redis key with a 5-minute window. When the window expires or the count exceeds a threshold (e.g., 10), the aggregated notification is emitted. The aggregation key is {user_id}:{event_type}:{target_id}, and the value is a counter plus a list of actor user_ids. This converts 100 individual 'X liked your post' notifications into a single '100 people liked your post.'

In-App Notification Store

The in-app notification center requires fast reads (user opens the app and sees their notifications) and persistent storage. The hot store is a Redis sorted set per user: notifications:{user_id} scored by timestamp, containing the last 200 notification IDs. The cold store is Cassandra with partition key user_id and clustering key notification_id (Snowflake, DESC). Each notification record contains: type, title, body, deep_link, is_read, actor_ids, created_at. Marking as read updates both Redis and Cassandra. Unread count is maintained as a Redis counter unread:{user_id} incremented on write, decremented on read.

Database Design

User notification preferences are stored in Cassandra: partition key user_id, clustering key category_id. Columns: channel_enabled (map of channel → boolean), frequency (immediate/daily_digest/weekly_digest), updated_at. Default preferences are defined per category in a config service; user-specific overrides are stored sparsely (only when the user changes from default). A Redis cache with a 10-minute TTL stores serialized preferences per user to avoid Cassandra reads on every notification.

Notification templates are stored in a PostgreSQL table: template_id, category, channel, subject_template, body_template, variables_schema (JSON Schema for validation), version, created_at, updated_at. Templates support per-locale variants via a locale column, enabling internationalization. Template rendering is done in-memory using a compiled template cache (templates compiled once on startup and cached as reusable objects).

API Design

POST /api/v1/notifications/send — Send notification: {event_type, user_id, template_variables, channels?: ['push', 'email'], priority?: 'critical'}
GET /api/v1/notifications?user_id={id}&unread_only=true&limit=20&cursor={id} — Fetch in-app notification history
PUT /api/v1/notifications/{id}/read — Mark notification as read; decrements unread counter
PUT /api/v1/users/{id}/preferences/{category} — Update notification preferences: {email: true, push: false, sms: false, frequency: 'daily_digest'}

Scaling & Bottlenecks

The Orchestrator is the primary bottleneck since every notification passes through it. Horizontal scaling is achieved by partitioning the notifications-raw Kafka topic by user_id — each orchestrator instance handles a subset of users, and all notifications for a user are processed by the same instance (enabling local aggregation state). The orchestrator fleet auto-scales based on Kafka consumer lag. During flash events (Super Bowl, New Year), the system pre-scales based on historical patterns.

The Email Sender has the highest latency variance. SES has per-account sending limits (initially 200 emails/sec, increasing with warm-up). For large-scale email campaigns, the sender uses multiple SES accounts across regions and implements a sending rate controller. Email rendering (HTML template + image embedding) is CPU-intensive; a pre-rendering cache stores rendered emails for broadcast campaigns where only the recipient name varies. SMS delivery is the most expensive channel; the system aggressively gates SMS to critical notifications only.

Key Trade-offs

Event-driven (Kafka) over synchronous API calls: Kafka provides durability, backpressure handling, and retry semantics crucial for notification delivery guarantees; the trade-off is increased end-to-end latency (50-100ms Kafka overhead) compared to direct API calls
Time-windowed aggregation over individual delivery: Aggregating similar notifications reduces user fatigue and improves engagement metrics, but introduces delivery delay (up to 5 minutes for the aggregation window) and complexity in managing aggregation state
At-least-once over exactly-once delivery: At-least-once with client-side deduplication (using notification_id) is simpler than distributed exactly-once; the rare duplicate notification is acceptable versus the engineering cost of exactly-once semantics
Redis + Cassandra for in-app store over Cassandra alone: Redis provides sub-millisecond reads for the notification center (critical for app launch time), while Cassandra ensures durability; the cost is dual-write complexity and potential inconsistency during failures