System Design: Review & Rating System

Requirements

Functional Requirements:

Users submit text reviews with star ratings (1-5) for purchased products
Verified purchase badge for reviews from confirmed buyers
Helpfulness voting: users can mark reviews as helpful or not helpful
Review sorting: most helpful, most recent, critical first
Aggregated rating summary: average score, rating distribution histogram
Photo and video attachments in reviews

Non-Functional Requirements:

Support 500M reviews across 100M products
Product page rating display under 50ms (p99)
Review submission to visibility latency under 5 minutes (includes moderation)
99.9% availability for read path; eventual consistency for aggregations
Fraud detection must catch 95% of fake reviews before publication

Scale Estimation

500M total reviews, 500K new reviews/day = 5.8 reviews/sec write QPS. Product page loads needing ratings: 50M/day = 579 QPS for aggregated ratings. Review list reads: 10M/day = 116 QPS. Each review: ~1KB text + metadata; 500M reviews = 500GB. Images/videos: 20% of reviews have media, averaging 3 images at 200KB = 30TB media storage. Aggregated ratings: 100M products × 100 bytes = 10GB — easily cached entirely in Redis.

High-Level Architecture

The Review & Rating Service separates the write path (submission + moderation) from the read path (display + aggregation). The write path: Review Submission API → Moderation Pipeline (Kafka) → Review Store (PostgreSQL). The Moderation Pipeline runs three stages: (1) Automated content moderation — ML classifier detects spam, profanity, and policy violations; (2) Fraud detection — signals include review velocity, reviewer history, text sentiment vs. star rating mismatch, and seller-reviewer relationship analysis; (3) Manual review queue for borderline cases flagged by ML. Approved reviews are written to PostgreSQL and a Kafka event triggers cache invalidation and rating re-aggregation.

The read path: Product page rating request → Redis cache (stores pre-computed aggregation: average rating, rating distribution, total count) → served in <5ms. Review list request → Elasticsearch (full-text indexed reviews with sorting by helpfulness_score, recency, rating) → paginated response. The aggregation is updated asynchronously: a Kafka consumer listens for review events (new, edited, deleted) and recomputes the product's aggregate rating, writing the result to Redis.

Core Components

Fraud Detection Engine

Fake review detection uses a multi-signal ML model (gradient-boosted trees) scoring each review 0-100 on fraud likelihood. Features: reviewer account age (<30 days = suspicious), number of reviews in last 24 hours (>5 = suspicious), text similarity to other reviews of the same product (cosine similarity >0.8 = copy-paste), star rating deviation from product average (5-star review for a 2-star average product), seller-reviewer purchase pattern (reviewer bought from only one seller repeatedly), and IP/device clustering (multiple reviews from the same IP). Reviews scoring >70 are blocked; 40-70 enter manual review; <40 are published immediately.

Helpfulness Ranking

Review helpfulness is computed using a Wilson score interval — a statistical method that balances the proportion of helpful votes against the total number of votes, accounting for low sample sizes. The formula: (helpful + 1.9208) / (total + 3.8416) - 1.96 * sqrt(helpful * not_helpful / total + 0.9604) / (total + 3.8416). This ensures a review with 10 helpful out of 10 votes doesn't rank above a review with 500 helpful out of 600 votes. The helpfulness score is stored as a materialized field on the review record and recalculated on every vote via a Kafka consumer.

Rating Aggregation Service

Aggregate ratings are pre-computed and cached in Redis as a hash: rating:{product_id} → {avg: 4.3, total: 1247, dist: {1: 45, 2: 78, 3: 124, 4: 389, 5: 611}}. When a new review is published, the aggregation consumer reads the current aggregate from Redis, applies the incremental update (new_avg = (old_avg * old_count + new_rating) / (old_count + 1)), and writes back. For products with >10,000 reviews, full re-aggregation runs nightly from PostgreSQL as a consistency check.*

Database Design

PostgreSQL schema: reviews table (review_id UUID PK, product_id, user_id, rating SMALLINT, title VARCHAR(200), body TEXT, verified_purchase BOOLEAN, helpful_count INT, not_helpful_count INT, helpfulness_score FLOAT, status ENUM('pending', 'published', 'rejected', 'flagged'), media_urls JSONB, created_at, updated_at). Indexes: (product_id, helpfulness_score DESC) for most-helpful sorting; (product_id, created_at DESC) for recency sorting; (user_id, created_at DESC) for user's review history.

Elasticsearch index mirrors the reviews table with additional analyzed text fields for full-text search. The mapping uses a custom analyzer for review text with synonym expansion and language detection. A separate votes table in PostgreSQL: (user_id, review_id, vote ENUM('helpful', 'not_helpful'), created_at) with UNIQUE constraint on (user_id, review_id) to prevent double-voting.

API Design

POST /api/v1/products/{product_id}/reviews — Submit a review; body contains rating, title, body, media_ids; returns review_id with pending status
GET /api/v1/products/{product_id}/reviews?sort=helpful&rating=5&page=1&size=10 — Fetch reviews with filtering and sorting
GET /api/v1/products/{product_id}/rating-summary — Get aggregated rating (average, distribution, total count); served from Redis
POST /api/v1/reviews/{review_id}/votes — Vote on review helpfulness; body contains vote_type (helpful/not_helpful)

Scaling & Bottlenecks

The rating aggregation hotspot occurs when a viral product receives thousands of reviews in a short period. Each review triggers a Redis read-modify-write cycle. To prevent race conditions, the aggregation update uses Redis MULTI/EXEC transactions with WATCH on the rating key for optimistic locking. For products receiving >100 reviews/minute, updates are batched: a counter rating_pending:{product_id} accumulates new ratings, and a scheduled job flushes them to the aggregate every 10 seconds.

Elasticsearch query performance for products with 100,000+ reviews requires careful index design. Reviews are indexed with product_id as a routing key, ensuring all reviews for a product land on the same shard. This eliminates scatter-gather for product-scoped queries. Shard sizing targets 20GB per shard; products with extreme review counts (>500K) get a dedicated index.

Key Trade-offs

Wilson score over simple average for helpfulness: Statistically robust ranking that handles low-vote-count bias, but more complex to explain to users and harder to debug
Async moderation pipeline (5-minute delay): Catches 95% of fake reviews before publication, but delays legitimate reviews — fast-track path for verified purchases with high-reputation accounts reduces perceived delay
Pre-computed aggregations in Redis over real-time calculation: Sub-5ms rating display on every product page, but introduces eventual consistency — a new review takes 1-2 seconds to affect the displayed average
Elasticsearch for review search over PostgreSQL full-text: Superior relevance ranking and faceting capability, but adds operational complexity and a synchronization pipeline