System Design: Returns & Refunds System

Requirements

Functional Requirements:

Customers initiate returns with reason selection and optional photo evidence
Return authorization (RMA) with printable shipping labels
Multiple refund methods: original payment, store credit, exchange
Automatic refund processing upon warehouse receipt and inspection
Return policy engine: configurable rules per product category, seller, and reason
Seller-managed returns for marketplace orders

Non-Functional Requirements:

Process 500K returns/day across all channels
Return initiation to RMA approval under 30 seconds (automated for eligible returns)
Refund processing within 24 hours of item receipt at warehouse
99.9% availability for return initiation; 99.99% for refund processing
Fraud detection: identify serial returners and wardrobing patterns
Audit trail for every return decision and refund transaction

Scale Estimation

500K returns/day = 5.8 returns/sec. Each return: 2KB metadata + optional photos (average 2 photos × 500KB = 1MB). RMA decisions: 80% auto-approved = 400K/day processed automatically, 100K/day requiring manual review. Refund transactions: 500K/day = 5.8 TPS. Return shipments: 500K/day, each generating 5-10 tracking events = 3.75M tracking events/day. Historical data: 500K returns/day × 365 = 182.5M returns/year at 2KB = 365GB/year (excluding photos).

High-Level Architecture

The Returns & Refunds system follows an event-driven architecture with three main subsystems: Return Authorization, Reverse Logistics, and Refund Processing. The flow: Customer initiates return via Return Portal → Return Policy Engine evaluates eligibility (auto-approve or route to manual review) → RMA generated with shipping label → Customer ships item → Warehouse receives and inspects → Inspection result triggers Refund Service → Refund processed to original payment method or store credit.

The Return Policy Engine is a rules engine (implemented with a decision table pattern) that evaluates: return window (is the item within the 30-day return period?), product category rules (electronics require original packaging, clothing has no restocking fee), customer return history (flagging serial returners), and reason-specific rules (defective items always accepted, 'changed mind' may require restocking fee). Auto-approved returns (80%) generate an RMA immediately. Flagged returns enter a manual review queue.

The Refund Service integrates with the Payment Service to process refunds. It supports partial refunds (return 2 of 3 items in an order), restocking fee deductions, and split refunds (partial original payment + partial store credit). All refund transactions are logged in an immutable ledger for financial reconciliation.

Core Components

Return Policy Engine

The policy engine uses a decision table stored in PostgreSQL: return_policies table (policy_id, category_id, seller_id, return_window_days INT, restocking_fee_pct DECIMAL, requires_photos BOOLEAN, requires_packaging BOOLEAN, auto_approve BOOLEAN, conditions JSONB). The conditions JSONB field supports complex rules: {"reason": {"defective": {"auto_approve": true, "no_restocking_fee": true}, "changed_mind": {"auto_approve": true, "restocking_fee": 15}}}. The engine evaluates policies in priority order: product-specific > category-specific > seller default > platform default. This allows sellers to customize return policies within platform guidelines.

Reverse Logistics Tracker

Once an RMA is issued, the system generates a prepaid shipping label via a carrier API (UPS, FedEx, USPS) and tracks the return shipment. The Tracking Service polls carrier APIs every 2 hours for status updates. When the item arrives at the warehouse, a warehouse worker scans the RMA barcode, triggering the Inspection Workflow: worker records item condition (new, like-new, used, damaged) and photos into the system. The inspection result is published to Kafka topic return-inspections and consumed by the Refund Service. For marketplace orders, the inspection result is also shared with the seller for transparency.

Fraud Detection for Returns

Return fraud costs retailers billions annually. The Fraud Detection Service scores each return request using an ML model (random forest classifier). Features: customer's return rate (returns / orders over last 12 months), wardrobing pattern (return of clothing worn once — detected by inspection notes), return reason consistency (always claims 'defective' across unrelated categories), return value concentration (high-value items returned disproportionately), and account age vs. return frequency. Scores above 80/100 flag the return for manual review. Repeat offenders (3+ flagged returns in 6 months) are added to a restricted list with reduced auto-approval privileges.

Database Design

PostgreSQL schema: returns table (return_id UUID PK, order_id FK, customer_id, status ENUM('initiated', 'approved', 'label_generated', 'shipped', 'received', 'inspected', 'refund_processing', 'completed', 'rejected'), reason ENUM('defective', 'wrong_item', 'changed_mind', 'not_as_described', 'damaged_in_shipping'), created_at, updated_at). return_items table (item_id, return_id FK, order_item_id FK, product_id, quantity, condition_on_receipt ENUM, refund_amount DECIMAL). rma_labels table (label_id, return_id FK, carrier, tracking_number, label_url, created_at). refund_transactions table (transaction_id UUID, return_id FK, amount DECIMAL, method ENUM('original_payment', 'store_credit', 'exchange'), payment_ref_id, status, processed_at).

An event_log table records every state transition: (event_id, return_id, event_type, actor ENUM('system', 'customer', 'agent', 'warehouse'), payload JSONB, created_at). This serves as the audit trail. The fraud scoring model outputs are stored in a return_fraud_scores table (return_id, score, features JSONB, flagged BOOLEAN) for explainability and model monitoring.

API Design

POST /api/v1/returns — Initiate a return; body contains order_id, items [{order_item_id, quantity, reason}], photos[]; returns return_id, approval status, and shipping label URL if auto-approved
GET /api/v1/returns/{return_id} — Fetch return status with timeline (initiated → shipped → received → refund processed)
POST /api/v1/returns/{return_id}/inspect — Warehouse submits inspection result; body contains item conditions and photos; triggers refund calculation
GET /api/v1/orders/{order_id}/returnable-items — Check which items in an order are eligible for return (within window, not already returned)

Scaling & Bottlenecks

The policy engine evaluation must be fast (under 10ms) to enable instant RMA decisions. Policy rules are cached in Redis as a hash: policy:{category_id}:{seller_id} → serialized policy document. Cache invalidation is event-driven: seller policy updates emit a Kafka event consumed by the cache invalidation service. For the 80% of returns that are auto-approved, the entire flow (submit → policy evaluation → RMA generation → label creation) completes in under 5 seconds.

Refund processing peaks during January (post-holiday returns): 3x normal volume = 1.5M returns/day. The system scales by increasing the Refund Worker consumer group from 10 to 30 consumers. Each consumer processes refund transactions sequentially (Stripe refund API calls take 1-2 seconds). 30 consumers × 0.5 refunds/sec = 15 refunds/sec capacity, handling 1.5M/day = 17.4 refunds/sec at peak — requiring 35 consumers during the January surge.

Key Trade-offs

Auto-approval for 80% of returns over manual review for all: Dramatically reduces customer wait time and support costs, but accepts ~2% fraud loss rate on auto-approved returns — the operational savings exceed the fraud cost
Decision table policy engine over hard-coded rules: Sellers and category managers can modify return policies without code changes, but complex rule interactions can produce unexpected results — a policy simulator allows testing before deployment
Prepaid return labels over customer-arranged shipping: Better customer experience and consistent tracking, but the platform absorbs shipping costs — offset by restocking fees on non-defective returns
ML fraud detection over rule-based: Catches sophisticated fraud patterns (wardrobing, organized return fraud rings) that rules miss, but requires labeled training data and periodic retraining — false positives require a human appeal process