System Design: Reporting & BI System

Requirements

Functional Requirements:

Create and schedule reports to run on cron schedules and deliver via email, Slack, or S3
Support interactive ad hoc SQL queries via a web-based query editor
Build dashboards by composing multiple charts and KPI tiles from saved queries
Embed reports and dashboards in external applications via iframe or API
Manage user access: organization-wide, team-level, and individual report permissions
Export reports in PDF, Excel, CSV, and PNG formats

Non-Functional Requirements:

Interactive query responses under 5 seconds for 80% of queries
Scheduled reports must complete within 30 minutes of their scheduled time
Support 10,000 concurrent users with 500 simultaneous query executions
Report delivery success rate of 99.5%
Multi-region deployment with data residency controls for EU customers

Scale Estimation

With 10,000 active users, peak concurrency is 500 simultaneous queries. Average query scans 50 GB of warehouse data. With a 10-node Trino cluster (200 cores total), throughput is 50 concurrent queries at full parallelism. A query cache with 80% hit rate reduces effective load to 100 concurrent queries against the warehouse. Scheduled reports: 5,000 reports run daily, 1,000 during morning peak hours = 17 reports/minute on average, 100 at peak.

High-Level Architecture

The BI system consists of three main services: Query Engine, Report Scheduler, and Delivery Service. The Query Engine translates semantic layer definitions (metrics, dimensions) into SQL, submits to the connected data warehouse (Snowflake, BigQuery, Redshift, or Trino), caches results, and serves them to visualization components. The Report Scheduler triggers report regeneration based on cron schedules and dependency completion events from the data pipeline.

A semantic layer (LookML, dbt metrics, or a custom YAML format) defines business metrics as named measures and dimensions, decoupling report definitions from raw SQL and enabling consistent metric definitions across all reports. The query builder UI translates drag-and-drop metric selections into semantic layer queries, which the query engine compiles to optimized SQL. This prevents 50 different SQL variants for the same KPI.

The delivery service renders report outputs (PDF via headless Chrome, Excel via Apache POI, PNG via chart rendering service) and dispatches them to delivery channels. A delivery receipt system tracks email open rates, bounce handling, and retry logic with DLQ for failed deliveries. Embedded analytics generates signed JWT tokens for tenant-isolated iframe embedding.

Core Components

Semantic Layer

The semantic layer stores metric definitions as YAML: revenue_total: type: sum, sql: ${orders.amount}, filters: [{status: completed}]. A compiled metric is always translated to the same SQL regardless of who queries it, ensuring consistent numbers across all reports and dashboards. The layer handles fan-out joins (preventing double-counting when joining multiple fact tables) through symmetric aggregations.

Query Cache

A two-tier cache: L1 is an in-memory LRU cache per query service instance (last 1,000 results, 5-minute TTL); L2 is Redis with results keyed by (query_hash, data_snapshot_id). L2 TTL is configurable per report: 1 minute for real-time dashboards, 24 hours for daily reports. Cache invalidation is triggered by the data pipeline publishing table refresh events; any cached query touching a refreshed table is invalidated.

Report Scheduler

The scheduler stores cron schedules in PostgreSQL and uses a distributed lock (Redis SETNX) to prevent duplicate executions across scheduler instances. Each scheduled report creates a job in the execution queue (SQS or Kafka). Workers pull jobs, execute queries, render outputs, and invoke the delivery service. SLA tracking records expected vs. actual execution times; jobs exceeding 120% of their historical runtime trigger a warning alert.

Database Design

PostgreSQL stores: reports (id, owner_id, name, query_definition_json, schedule_cron, last_run_at, avg_duration_ms), dashboards (id, owner_id, name, layout_json), report_deliveries (id, report_id, run_id, channel, recipient, status, delivered_at), and permissions (resource_type, resource_id, principal_type, principal_id, access_level). Query results are cached in Redis and S3 (for large result sets exceeding 10 MB). Usage analytics (who viewed what, when) are written to a Kafka topic and aggregated into the data warehouse for capacity planning.

API Design

POST /queries — Submit an ad hoc SQL or semantic layer query; returns query_id and estimated execution time. GET /queries/{query_id}/results — Fetch query results as JSON, CSV, or Arrow IPC with pagination. POST /reports/{report_id}/runs — Trigger an immediate report run and delivery. GET /embed/token — Generate a signed JWT for embedding a specific dashboard in an external application with row-level security context.

Scaling & Bottlenecks

Warehouse query concurrency is the primary bottleneck. A concurrency management layer uses a semaphore (Redis SETNX) to cap warehouse queries at the warehouse's optimal concurrency limit. Excess queries queue with priority based on report SLA urgency. Materialized views in the warehouse pre-compute common aggregations, reducing scan cost for popular reports by 95%.

PDF report rendering is CPU-intensive (headless Chrome): a dedicated rendering worker pool sized to handle 50 concurrent renders prevents this from blocking query execution. Reports with many pages (>50) are rendered asynchronously with email delivery of the completed PDF; a status polling endpoint and webhook callback allow the UI to notify users when large reports are ready.

Key Trade-offs

Semantic layer vs. raw SQL: Semantic layer ensures metric consistency and simplifies report creation for non-technical users but adds abstraction and compilation overhead; advanced users often find it limiting for complex analytical queries.
Push (scheduled) vs. pull (interactive) reports: Scheduled reports pre-compute results during off-peak hours, delivering near-instant load times; interactive reports require on-demand query execution but offer full flexibility.
Embedded vs. warehouse caching: Embedding results in the BI tool's cache reduces warehouse load but creates stale data risk; query-time warehouse hits guarantee freshness at the cost of latency.
Multi-warehouse federation vs. single warehouse: Federating across multiple warehouses enables data residency compliance but complicates joins across sources; a centralized warehouse with data replication is simpler but may violate data sovereignty requirements.