System Design: Real-Time Polling System

Requirements

Functional Requirements:

Presenters create polls with multiple options and launch them in real time
Participants vote via a unique session code; results update live for all viewers
Support poll types: single-choice, multiple-choice, word cloud, rating scale, open-text
Prevent double-voting: each participant can vote once per poll
Results dashboard for presenters with real-time charts and response analytics
Export poll results as CSV or PDF after the session

Non-Functional Requirements:

Support 10,000 concurrent participants per poll with sub-500ms result update latency
Vote processing must be idempotent: network retries don't produce double-counts
99.9% uptime; live event polling failures are highly visible
Results must be eventually consistent within 1 second under maximum load
Participants should be able to submit votes even during brief connectivity drops (offline queue on client)

Scale Estimation

For a platform hosting enterprise and event polling: 100k concurrent presenters × average 200 participants/poll = 20M concurrent participants. Peak burst: a single major conference poll with 10k simultaneous voters submitting within 30 seconds = ~333 votes/second. Result fan-out: every vote triggers a result update to all 10k participants = 10k WebSocket pushes per vote = 3.33M push events/second during burst. Storage: 100k polls/day × 200 votes × 50 bytes each = 1GB/day.

High-Level Architecture

The system has a Poll Management Service, a Vote Ingestion Service, a Vote Counting Service, and a Real-Time Broadcast Service. The Poll Management Service handles CRUD for polls and session management. The Vote Ingestion Service receives vote submissions, deduplicates, and publishes to Kafka. The Vote Counting Service maintains live counts in Redis and publishes count-update events. The Real-Time Broadcast Service pushes count updates to connected WebSocket clients.

The critical path is: participant submits vote → Vote Ingestion Service validates session, deduplicates, writes to Kafka → Vote Counting Service consumes from Kafka and atomically increments Redis counters → publishes a results-update event to a Redis Pub/Sub channel → Broadcast Service subscribers (WebSocket gateways) receive the event and push to connected clients. This pipeline processes a vote and delivers updated results within ~100ms under normal load.

Deduplication is handled at two levels: client-side (generates a vote_id UUID at submission time, used as idempotency key; retried submissions carry the same vote_id) and server-side (a Redis SET voted:{poll_id} tracks participant IDs that have voted; SADD returns 0 if already voted, causing the service to silently discard the duplicate).

Core Components

Vote Ingestion Service

Stateless HTTP service receiving vote submissions. Validates: session code is active, poll is open, participant is in the session (or open anonymous participation is enabled). Deduplicates via Redis SADD on voted:{poll_id}:{participant_id}. On first vote (SADD returns 1), publishes vote event to Kafka with {poll_id, option_id, participant_id, vote_id, timestamp}. Returns {status: "received", vote_id} immediately. Second identical submission returns {status: "already_voted", vote_id} — idempotent, not an error. TTL on the Redis dedup key matches the poll session duration.

Vote Counting Service

Kafka consumer group processing vote events. For each event, atomically increments the Redis counter votes:{poll_id}:{option_id} using INCR. After incrementing, reads all option counts with MGET votes:{poll_id}:* and publishes a results snapshot to Redis Pub/Sub channel results:{poll_id}. The snapshot includes all option counts and total vote count. Debouncing: if votes arrive faster than 100ms, the service batches the Redis reads and publishes a single update per 100ms window, limiting broadcast events to 10/second per poll regardless of vote rate. This prevents WebSocket gateway saturation during vote bursts.*

Real-Time Broadcast Service

A pool of WebSocket gateway servers. Each client (presenter dashboard or participant result view) opens a WebSocket connection and subscribes to a poll's results channel. The gateway subscribes to results:{poll_id} on Redis Pub/Sub and fans out received result snapshots to all connected clients watching that poll. Connection state (which clients are connected to which gateway) is tracked in Redis for routing. The debounced 10 updates/second limit means at peak 10k participants × 10 updates/second = 100k WebSocket messages/second — manageable across a gateway cluster.

Database Design

PostgreSQL: polls (poll_id UUID, session_id, presenter_id, question, options JSONB, poll_type ENUM, status ENUM, created_at, closed_at). sessions (session_id, code VARCHAR(8), presenter_id, created_at, expires_at). votes (vote_id UUID, poll_id, option_id, participant_id, submitted_at) — append-only; this is the durable vote log. The votes table is written after Kafka consumption for durability but is not in the hot read path (counts come from Redis).

Redis: live vote counters votes:{poll_id}:{option_id} → integer. Deduplication sets voted:{poll_id}:{participant_id} → TTL. Results channel for Pub/Sub. On poll close, a job reads the final Redis counters and writes a poll_results (poll_id, option_id, final_count) record to PostgreSQL, then clears the Redis keys. Open-text responses (word cloud polls) are stored in PostgreSQL directly — they're lower volume and not counted, so Redis isn't needed.

API Design

POST /api/v1/polls — presenter creates a poll with options; returns {poll_id, session_code}.

POST /api/v1/polls/{pollId}/vote — body: {option_id, participant_id, vote_id}; idempotent vote submission.

WebSocket /ws/v1/polls/{pollId}/results — real-time results stream; server pushes count updates on each state change.

GET /api/v1/polls/{pollId}/results/export?format=csv — returns final results as downloadable file.

Scaling & Bottlenecks

WebSocket fan-out is the scaling ceiling. Each poll's WebSocket gateway needs to push result updates to up to 10k connected clients. Each gateway server handles ~50k WebSocket connections (on a 8-core machine with async I/O). For a single 10k-participant poll, 1-2 gateway instances suffice. The Redis Pub/Sub channel ensures all gateway instances serving the same poll receive the same update simultaneously. The 100ms debounce on result publishes prevents degenerate cases (100 votes in 100ms causing 100 Pub/Sub messages instead of 1).

The deduplication Redis key (voted:{poll_id}:{participant_id}) is an SADD operation — O(1). A poll with 10k voters has 10k SADD operations during the polling window. With Redis handling 100k+ simple operations/second, this is not a bottleneck. The bigger concern is the Redis Pub/Sub channel: a very active poll can generate 10 result-update messages/second. With 100k concurrent polls, that's 1M Pub/Sub messages/second — manageable by partitioning polls across multiple Redis instances.

Key Trade-offs

WebSocket vs. Server-Sent Events: WebSocket is bidirectional (supports vote submission and result delivery on one connection) but stateful; SSE (Server-Sent Events) is simpler for one-way result delivery but requires a separate HTTP connection for vote submission.
Debounce rate vs. result freshness: 100ms debounce limits WebSocket pushes to 10/second, reducing gateway load but introducing up to 100ms of staleness in displayed results — imperceptible to humans at live event scale.
Redis counters vs. database counts: Redis INCR is atomic and sub-millisecond but volatile; relying solely on Redis for vote counts risks data loss on crash; the Kafka event log plus periodic Redis snapshotting to PostgreSQL provides durability without slowing the hot path.
Open vs. authenticated participation: Anonymous session-code participation maximizes engagement but makes deduplication dependent on ephemeral participant tokens (spoofable); authenticated participation (SSO) is more reliable but adds friction that reduces response rates in large audiences.