System Design: Nutrition & Diet Tracking App

Requirements

Functional Requirements:

Log meals with food items from a crowdsourced + USDA food database (5M+ items)
Barcode scanning for packaged foods with nutritional data auto-fill
Custom food and recipe creation with nutritional calculation
Daily and weekly macro/micronutrient tracking against user goals
Meal planning: create weekly meal plans and auto-generate shopping lists
Social features: share meals, follow friends, and participate in nutrition challenges

Non-Functional Requirements:

Barcode lookup returns results within 300ms for 95th percentile
Food search returns results within 500ms for 99th percentile across 5M+ items
Support 10M DAU with peak load at breakfast, lunch, and dinner hours
User dietary data is health-sensitive; strict PII and HIPAA-adjacent handling
Food database entries require moderation: crowdsourced items verified before appearing globally

Scale Estimation

For a platform with 10M DAU: 3 meals/day × 10M users = 30M food log entries/day = ~347/second average, with 10x spikes at meal times = ~3,470/second. Barcode scans: 30% of food logs use barcode = 9M barcode lookups/day = ~104/second. Food search queries: 10M users × 5 searches/day = 50M/day = ~579/second. Food database size: 5M items × 2KB metadata = 10GB, comfortably in-memory on a search cluster. Daily nutritional summaries: 10M users × 1 computation/day = 10M aggregations/night.

High-Level Architecture

The platform is organized around a Food Database Service, a Food Log Service, a Nutrition Analytics Service, and a Meal Planning Service. The Food Database Service is the catalog of all known foods — it is read-heavy and can be aggressively cached. The Food Log Service handles the high-frequency meal logging workflow, writing to a time-series-friendly log store. The Nutrition Analytics Service aggregates logs into daily and weekly nutritional summaries.

Barcode scanning is handled client-side (camera decode) with an immediate HTTP lookup to the Food Database Service using the UPC code. Unrecognized barcodes fall back to Open Food Facts API or present a manual entry flow. Successfully identified items are suggested with one-tap add. New crowd-contributed food items enter a moderation queue before being visible to all users (though immediately visible to the contributor).

Food search uses Elasticsearch with a 5M-document index. The index is optimized for prefix and fuzzy matching — users type partial food names like "chick" and expect to see "Chicken Breast, Grilled" as the top result. A custom scoring function boosts: USDA-verified items, items the user has logged before (personalized recency), and items with high community log frequency.

Core Components

Food Database Service

The authoritative catalog. Food items have: food_id, name, brand, upc_code[], serving_size, calories, macros {protein, carbs, fat, fiber}, micros {sodium, potassium, vitamins...}, source ENUM(USDA, branded, user_contributed), status ENUM(active, pending_review, rejected). USDA data is imported from the FoodData Central API and refreshed quarterly. Brand/packaged food data is sourced from commercial databases (Nutritionix, Open Food Facts). User-contributed items go through a moderation queue.

The service is backed by PostgreSQL (source of truth) and Elasticsearch (search). Redis caches hot items (top 100k barcodes cover ~90% of all scans — the Pareto principle applies strongly to packaged food consumption). Barcode lookup hits Redis L1 cache (sub-millisecond) → Elasticsearch L2 (5-10ms) → PostgreSQL L3 (20-50ms). Cache miss on an unknown barcode triggers an async lookup against Open Food Facts and stores the result on response.

Food Log Service

Handles meal logging with high write throughput at meal times. Log entries: log_id, user_id, food_id, meal_type ENUM(breakfast, lunch, dinner, snack), serving_quantity, log_date, created_at. Log writes go to PostgreSQL via a write-through cache — the current day's log is cached in Redis per user for instant read-back and daily total calculations. Edits and deletions update both the database and cache. The daily log cache is pre-populated on first meal log of the day and expires at midnight.

Nutrition Analytics Service

Aggregates food logs into daily and weekly nutritional totals. Real-time daily totals are maintained in Redis as the user logs food — each food log write increments the day's macro counters atomically. Weekly and historical trend computation runs as a nightly batch job, writing results to a nutrition_summaries table: (user_id, period_type, period_start, calories, protein, carbs, fat, fiber, ...). These power the weekly review charts and goal attainment tracking. Micronutrient gap analysis (identifying consistent deficiencies) runs weekly per user and surfaces recommendations.

Database Design

Food items in PostgreSQL: partitioned by source for efficient USDA vs. user-contributed management. Elasticsearch index mirrors search-relevant fields. User food logs in PostgreSQL: food_logs (log_id UUID, user_id, food_id, meal_type, log_date DATE, serving_grams DECIMAL, calories DECIMAL, protein DECIMAL, carbs DECIMAL, fat DECIMAL, logged_at TIMESTAMP). Macro values are denormalized from the food item at log time — food item data can change, but historical logs reflect what was true when the user logged it.

Custom recipes: recipes (recipe_id, user_id, name, servings, ingredients JSONB) where each ingredient is {food_id, quantity_grams}. Recipe nutritional totals are computed on save and stored as a derived food item. Meal plans: meal_plans (plan_id, user_id, week_start, meals JSONB) where meals is a nested structure mapping {day, meal_type, food_id, quantity}. Shopping list generation queries the meal plan and aggregates ingredient quantities.

API Design

GET /api/v1/foods/barcode/{upc} — returns food item for a barcode; 300ms SLA from cache/Elasticsearch.

GET /api/v1/foods/search?q={query}&limit=10 — typeahead food search; personalized ranking.

POST /api/v1/logs — body: {food_id, meal_type, serving_quantity, log_date}; updates daily macro totals.

GET /api/v1/users/{userId}/nutrition/summary?date={date} — returns daily nutritional totals vs. goals from Redis cache.

Scaling & Bottlenecks

Meal-time traffic spikes (7-9 AM breakfast, 12-1 PM lunch, 6-7 PM dinner) create predictable burst patterns at 10x average load. Auto-scaling the Food Log Service and pre-warming the Redis cache for the expected user cohort (timezone-based prediction of active users) handles the spikes. The Food Database Service is the most read-heavy component — the top 100k foods by log frequency are pinned in Redis with no TTL, serving 90%+ of lookup traffic without hitting Elasticsearch or PostgreSQL.

Food search latency is sensitive to index size and query complexity. Fuzzy matching on 5M documents requires careful Elasticsearch tuning: fuzziness: AUTO only on the primary name field, not on all fields; edge n-gram analyzers on prefix searches (more common than fuzzy in food search — users type the beginning of a food name). A dedicated Elasticsearch coordinating node handles query parsing and aggregation without competing with data node resources.

Key Trade-offs

Crowdsourced vs. verified-only food data: Allowing user-contributed items dramatically expands the database but introduces inaccurate nutritional data; a moderation pipeline with community flagging and automated anomaly detection (calorie density outliers) balances coverage and accuracy.
Real-time vs. eventual consistency for daily totals: Real-time Redis counters give instant feedback as users log meals; eventual consistency via end-of-day batch aggregation is simpler but users won't see real-time macro tracking — a non-starter for the core use case.
Denormalized macros in log vs. join to food table: Denormalizing macro values at log time adds storage but ensures historical logs are accurate even if the food database entry is later corrected; a live join would show updated nutritional values but could make historical trends misleading.
Barcode-first vs. search-first design: Barcode scanning is faster for packaged foods but fails for fresh/restaurant items; the design must gracefully degrade from barcode to search to manual entry, with each step collecting data to improve the database for future users.