SYSTEM_DESIGN
System Design: Nutrition & Diet Tracking App
Design a nutrition and diet tracking app like MyFitnessPal supporting food logging, barcode scanning, macro tracking, and meal planning for millions of users. Covers food database design, OCR-based label scanning, and personalized nutritional insights.
Requirements
Functional Requirements:
- Log meals with food items from a crowdsourced + USDA food database (5M+ items)
- Barcode scanning for packaged foods with nutritional data auto-fill
- Custom food and recipe creation with nutritional calculation
- Daily and weekly macro/micronutrient tracking against user goals
- Meal planning: create weekly meal plans and auto-generate shopping lists
- Social features: share meals, follow friends, and participate in nutrition challenges
Non-Functional Requirements:
- Barcode lookup returns results within 300ms for 95th percentile
- Food search returns results within 500ms for 99th percentile across 5M+ items
- Support 10M DAU with peak load at breakfast, lunch, and dinner hours
- User dietary data is health-sensitive; strict PII and HIPAA-adjacent handling
- Food database entries require moderation: crowdsourced items verified before appearing globally
Scale Estimation
For a platform with 10M DAU: 3 meals/day × 10M users = 30M food log entries/day = ~347/second average, with 10x spikes at meal times = ~3,470/second. Barcode scans: 30% of food logs use barcode = 9M barcode lookups/day = ~104/second. Food search queries: 10M users × 5 searches/day = 50M/day = ~579/second. Food database size: 5M items × 2KB metadata = 10GB, comfortably in-memory on a search cluster. Daily nutritional summaries: 10M users × 1 computation/day = 10M aggregations/night.
High-Level Architecture
The platform is organized around a Food Database Service, a Food Log Service, a Nutrition Analytics Service, and a Meal Planning Service. The Food Database Service is the catalog of all known foods — it is read-heavy and can be aggressively cached. The Food Log Service handles the high-frequency meal logging workflow, writing to a time-series-friendly log store. The Nutrition Analytics Service aggregates logs into daily and weekly nutritional summaries.
Barcode scanning is handled client-side (camera decode) with an immediate HTTP lookup to the Food Database Service using the UPC code. Unrecognized barcodes fall back to Open Food Facts API or present a manual entry flow. Successfully identified items are suggested with one-tap add. New crowd-contributed food items enter a moderation queue before being visible to all users (though immediately visible to the contributor).
Food search uses Elasticsearch with a 5M-document index. The index is optimized for prefix and fuzzy matching — users type partial food names like "chick" and expect to see "Chicken Breast, Grilled" as the top result. A custom scoring function boosts: USDA-verified items, items the user has logged before (personalized recency), and items with high community log frequency.
Core Components
Food Database Service
The authoritative catalog. Food items have: food_id, name, brand, upc_code[], serving_size, calories, macros {protein, carbs, fat, fiber}, micros {sodium, potassium, vitamins...}, source ENUM(USDA, branded, user_contributed), status ENUM(active, pending_review, rejected). USDA data is imported from the FoodData Central API and refreshed quarterly. Brand/packaged food data is sourced from commercial databases (Nutritionix, Open Food Facts). User-contributed items go through a moderation queue.
The service is backed by PostgreSQL (source of truth) and Elasticsearch (search). Redis caches hot items (top 100k barcodes cover ~90% of all scans — the Pareto principle applies strongly to packaged food consumption). Barcode lookup hits Redis L1 cache (sub-millisecond) → Elasticsearch L2 (5-10ms) → PostgreSQL L3 (20-50ms). Cache miss on an unknown barcode triggers an async lookup against Open Food Facts and stores the result on response.
Food Log Service
Handles meal logging with high write throughput at meal times. Log entries: log_id, user_id, food_id, meal_type ENUM(breakfast, lunch, dinner, snack), serving_quantity, log_date, created_at. Log writes go to PostgreSQL via a write-through cache — the current day's log is cached in Redis per user for instant read-back and daily total calculations. Edits and deletions update both the database and cache. The daily log cache is pre-populated on first meal log of the day and expires at midnight.
Nutrition Analytics Service
Aggregates food logs into daily and weekly nutritional totals. Real-time daily totals are maintained in Redis as the user logs food — each food log write increments the day's macro counters atomically. Weekly and historical trend computation runs as a nightly batch job, writing results to a nutrition_summaries table: (user_id, period_type, period_start, calories, protein, carbs, fat, fiber, ...). These power the weekly review charts and goal attainment tracking. Micronutrient gap analysis (identifying consistent deficiencies) runs weekly per user and surfaces recommendations.
Database Design
Food items in PostgreSQL: partitioned by source for efficient USDA vs. user-contributed management. Elasticsearch index mirrors search-relevant fields. User food logs in PostgreSQL: food_logs (log_id UUID, user_id, food_id, meal_type, log_date DATE, serving_grams DECIMAL, calories DECIMAL, protein DECIMAL, carbs DECIMAL, fat DECIMAL, logged_at TIMESTAMP). Macro values are denormalized from the food item at log time — food item data can change, but historical logs reflect what was true when the user logged it.
Custom recipes: recipes (recipe_id, user_id, name, servings, ingredients JSONB) where each ingredient is {food_id, quantity_grams}. Recipe nutritional totals are computed on save and stored as a derived food item. Meal plans: meal_plans (plan_id, user_id, week_start, meals JSONB) where meals is a nested structure mapping {day, meal_type, food_id, quantity}. Shopping list generation queries the meal plan and aggregates ingredient quantities.
API Design
GET /api/v1/foods/barcode/{upc} — returns food item for a barcode; 300ms SLA from cache/Elasticsearch.
GET /api/v1/foods/search?q={query}&limit=10 — typeahead food search; personalized ranking.
POST /api/v1/logs — body: {food_id, meal_type, serving_quantity, log_date}; updates daily macro totals.
GET /api/v1/users/{userId}/nutrition/summary?date={date} — returns daily nutritional totals vs. goals from Redis cache.
Scaling & Bottlenecks
Meal-time traffic spikes (7-9 AM breakfast, 12-1 PM lunch, 6-7 PM dinner) create predictable burst patterns at 10x average load. Auto-scaling the Food Log Service and pre-warming the Redis cache for the expected user cohort (timezone-based prediction of active users) handles the spikes. The Food Database Service is the most read-heavy component — the top 100k foods by log frequency are pinned in Redis with no TTL, serving 90%+ of lookup traffic without hitting Elasticsearch or PostgreSQL.
Food search latency is sensitive to index size and query complexity. Fuzzy matching on 5M documents requires careful Elasticsearch tuning: fuzziness: AUTO only on the primary name field, not on all fields; edge n-gram analyzers on prefix searches (more common than fuzzy in food search — users type the beginning of a food name). A dedicated Elasticsearch coordinating node handles query parsing and aggregation without competing with data node resources.
Key Trade-offs
- Crowdsourced vs. verified-only food data: Allowing user-contributed items dramatically expands the database but introduces inaccurate nutritional data; a moderation pipeline with community flagging and automated anomaly detection (calorie density outliers) balances coverage and accuracy.
- Real-time vs. eventual consistency for daily totals: Real-time Redis counters give instant feedback as users log meals; eventual consistency via end-of-day batch aggregation is simpler but users won't see real-time macro tracking — a non-starter for the core use case.
- Denormalized macros in log vs. join to food table: Denormalizing macro values at log time adds storage but ensures historical logs are accurate even if the food database entry is later corrected; a live join would show updated nutritional values but could make historical trends misleading.
- Barcode-first vs. search-first design: Barcode scanning is faster for packaged foods but fails for fresh/restaurant items; the design must gracefully degrade from barcode to search to manual entry, with each step collecting data to improve the database for future users.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.