System Design: Route Optimization Engine

Requirements

Functional Requirements:

Assign delivery stops to vehicles and compute optimal route ordering
Respect constraints: time windows, vehicle capacity, driver hours-of-service
Re-optimize routes in real time when new orders arrive or drivers are delayed
Support multi-depot scenarios (orders can be fulfilled from multiple warehouses)
Provide ETA per stop for customer notifications
Handle driver breaks and lunch windows in route planning

Non-Functional Requirements:

Initial route optimization for 1,000 stops across 50 vehicles completes in under 30 seconds
Re-optimization on new order insertion returns a result in under 5 seconds
System processes 100,000 deliveries/day across a metropolitan region
Routes must be feasible — constraint violations are never acceptable in output
99.9% availability; delayed route plans cause operational failure

Scale Estimation

A large logistics operation: 100,000 deliveries/day, 2,000 drivers/vehicles, operating 12 hours/day. That's ~1,400 route optimization requests/day, each covering 50 stops on average. Peak is morning dispatch at 6 AM where 500 routes are computed simultaneously. Re-optimization requests (new orders, delays): ~10,000/day = ~14/minute sustained. Routing computation is CPU-intensive: a 50-stop VRP solved with a metaheuristic takes ~5 seconds on a modern CPU core, so 500 simultaneous computations need ~2,500 CPU cores (or a pre-warmed solver pool).

High-Level Architecture

Route optimization is an NP-hard problem (Capacitated Vehicle Routing Problem, CVRP). The system wraps a solver engine behind a scalable job queue and result store. The architecture has three layers: the Problem Builder (assembling the VRP instance from orders and driver data), the Solver Fleet (running optimization algorithms), and the Route Distributor (delivering optimized routes to driver apps and downstream systems).

When dispatch operators trigger route planning, the Problem Builder queries the Order Service for all pending deliveries in the region, the Driver Service for available vehicles and their capacities, and the Maps API for a pre-computed distance/time matrix between all stops. It serializes this into a VRP problem specification and publishes it to a Solver Job Queue (SQS or Kafka).

A pool of Solver Workers pick up jobs and run a metaheuristic optimization — typically a hybrid of Greedy Initial Solution + Large Neighborhood Search (LNS) with time-window feasibility repair. LNS iteratively destroys and rebuilds portions of the solution, accepting improvements. After the time budget (30 seconds for batch, 5 seconds for re-optimization), the best feasible solution is returned. Results are stored in PostgreSQL and distributed to driver apps via the Route Distribution Service.

Core Components

Distance Matrix Service

The VRP solver needs travel times between all pairs of stops. For N stops, that's N² entries. At N=1,000, that's 1 million travel time lookups. Pre-computing via a routing API (Google Maps, OSRM) at 1,000 concurrent requests takes ~2 seconds with a 500-request/second rate limit. The Distance Matrix Service caches previously computed pairs in Redis (keyed by rounded lat/lng pair, 100m precision) to avoid re-querying for nearby stops. Cache hit rates exceed 60% in dense urban areas with recurring delivery addresses.

VRP Solver Engine

The solver is implemented in Java using the OptaPlanner or Google OR-Tools library. Each Solver Worker is a container with 4 vCPUs dedicated to one optimization job. The solver initializes with a constructive heuristic (Clarke-Wright Savings algorithm) to get a feasible solution, then runs LNS with a 30-second time budget for batch jobs. Constraint categories: hard (capacity, time windows — never violated), medium (driver hours — violated only if no feasible solution exists), soft (minimize total distance/time — the objective). The solution quality is measured by total distance deviation from the theoretical lower bound.

Real-Time Re-Optimization Service

When a new order arrives mid-day or a driver reports a delay, the re-optimizer runs a focused LNS on the affected routes only (not the full fleet). It receives the current route state (which stops are completed, current driver positions), inserts the new stop into the affected route(s), and runs a 5-second LNS to find the best insertion point. This produces a minimal-change re-route to avoid confusing drivers with radical changes. Results are pushed to affected driver apps via WebSocket.

Database Design

Route plans in PostgreSQL: routes table (route_id, vehicle_id, driver_id, planned_date, status), route_stops table (stop_id, route_id, sequence, address_id, time_window_start, time_window_end, planned_eta, actual_eta, status). A spatial index on stop coordinates enables proximity queries for re-optimization. The distance matrix cache in Redis uses a hash structure: key = "dtm:{lat1_r}:{lng1_r}:{lat2_r}:{lng2_r}", value = travel_time_seconds, TTL = 24 hours. Historical route performance (planned vs. actual ETAs) is stored in Redshift for ML model training to improve future ETA predictions.

API Design

POST /v1/routes/optimize — Accepts region_id, date, and constraint overrides; queues a batch optimization job; returns job_id
GET /v1/routes/optimize/{job_id}/status — Polls job status; returns QUEUED, RUNNING, COMPLETED (with route_ids), or FAILED
POST /v1/routes/{route_id}/reoptimize — Triggers re-optimization for a specific route given new order or delay event; synchronous with 5s timeout
GET /v1/routes/{route_id}/stops — Returns ordered stop list with planned ETAs for driver app rendering

Scaling & Bottlenecks

The solver fleet is the compute bottleneck. Each solver job occupies 4 CPU cores for up to 30 seconds. For 500 simultaneous batch jobs at morning dispatch, 2,000 CPU cores are needed. Kubernetes auto-scaling spins up solver pods from a warm pool (pre-initialized with OR-Tools) within 30 seconds of a demand spike. Pod startup time (JVM + library init) is the critical path; a pool of 100 always-warm pods handles normal operations, with burst scaling for dispatch peaks.

The distance matrix computation is the latency bottleneck for new jobs. The cache reduces API calls by 60%, but for new regions or addresses, pre-warming the cache by geocoding and computing matrices for anticipated delivery addresses during off-peak hours (2–4 AM) eliminates cold-start latency at dispatch time.

Key Trade-offs

Optimal vs. fast vs. good-enough — exact solvers (branch-and-bound) guarantee optimality but are impractical for N>20; metaheuristics (LNS, genetic algorithms) find near-optimal solutions in fixed time budgets
Batch re-planning vs. continuous re-optimization — re-running the full fleet optimization on every new order is too expensive; focused single-route re-optimization with minimal disruption to other drivers balances quality and driver experience
Time window strictness — hard time windows (never violated) produce longer routes; relaxing windows to soft constraints (penalties) finds shorter routes at the cost of occasional late deliveries
Centralized solver vs. distributed — a centralized solver sees the full fleet and finds globally optimal assignments; distributed (per-vehicle) solvers are faster but miss cross-vehicle improvements