Time-Series Data Modeling Explained: Storage, Indexing, and Query Patterns
How to model time-series data effectively — partitioning by time, downsampling, retention policies, and choosing between TimescaleDB, InfluxDB, and Cassandra.
Time-Series Data Modeling
Time-series data modeling is the practice of designing schemas, storage, and query patterns for data that is indexed primarily by time — metrics, logs, IoT sensor readings, financial ticks, and event streams.
What It Really Means
Time-series data has unique characteristics that make general-purpose databases a poor fit at scale. The data is append-heavy (you rarely update historical metrics). Queries almost always filter by time range. Old data becomes less valuable and can be downsampled or deleted. And the volume is enormous — a fleet of 10,000 servers reporting 100 metrics every 10 seconds generates 100 million data points per minute.
General-purpose databases like PostgreSQL can handle small-scale time-series workloads, but they struggle with the write throughput and query patterns at scale. Specialized time-series databases (TimescaleDB, InfluxDB, QuestDB, ClickHouse) optimize for these access patterns with columnar storage, automatic time-based partitioning, built-in downsampling, and efficient time-range queries.
Understanding time-series data modeling is essential for designing monitoring systems, IoT platforms, financial trading systems, and analytics pipelines.
How It Works in Practice
Data Characteristics
Schema Design Approaches
Wide table (one metric per column):
Narrow table (EAV style):
Time-Based Partitioning
Time-based partitioning gives two major benefits:
- Fast range queries: Query for "last 24 hours" only scans 1-2 partitions instead of the entire table
- Efficient data retention: Drop old partitions instantly instead of DELETE (which is slow and creates bloat)
Implementation
TimescaleDB setup and queries:
Downsampling strategy:
Trade-offs
Time-series database comparison:
| Aspect | TimescaleDB | InfluxDB | ClickHouse | Cassandra |
|---|---|---|---|---|
| SQL support | Full PostgreSQL | InfluxQL/Flux | SQL-like | CQL |
| Write throughput | High | Very high | Very high | Very high |
| Aggregation speed | Good | Good | Excellent | Moderate |
| Ecosystem | PostgreSQL tools | Purpose-built | OLAP tools | Distributed |
| Complexity | Low (PostgreSQL) | Medium | Medium | High |
When to use a time-series database:
- Ingesting more than 100K data points per second
- Queries are primarily time-range based
- You need built-in downsampling and retention
- Data is append-only or append-heavy
When PostgreSQL is sufficient:
- Fewer than 10K data points per second
- Time-series is a small part of the application (mixed workload)
- You need ACID transactions on time-series data
- Team expertise is in PostgreSQL
Common Misconceptions
- "You always need a specialized time-series database" — PostgreSQL with proper partitioning handles small to medium time-series workloads well. TimescaleDB adds time-series features while keeping full PostgreSQL compatibility.
- "Schemaless is better for time-series" — Typed schemas (wide tables) are more efficient for storage and queries. Use schemaless (narrow tables) only when metric names change frequently.
- "Store everything at maximum resolution forever" — This is economically impractical. Downsample older data to reduce storage costs by 99%+ with minimal information loss.
- "Time-series databases replace relational databases" — They complement each other. Store metadata (sensor locations, thresholds) in PostgreSQL and time-series data in a specialized store.
- "Timestamps are sufficient as a primary key" — Multiple sources can report at the same timestamp. Use (timestamp, source_id) as a composite key.
How This Appears in Interviews
- "Design a monitoring system for 10,000 servers" — Time-series database with time-based partitioning, downsampling for retention, and pre-aggregated dashboards.
- "How do you handle millions of IoT sensor readings per second?" — LSM-tree based time-series database, partition by time, batch inserts, out-of-order handling.
- "Your metrics dashboard is slow. How do you speed it up?" — Pre-aggregate data at query-time bucket sizes using materialized views or continuous aggregates.
- "How do you manage storage costs for time-series data?" — Tiered retention: raw data for days, downsampled for months, aggregated for years.
Related Concepts
- Database Indexing — LSM Trees dominate time-series write patterns
- Database Partitioning — time-based partitioning is fundamental to time-series
- Materialized Views — continuous aggregates for fast dashboard queries
- Change Data Capture — streaming time-series data to downstream systems
- SLOs, SLIs, and SLAs — time-series data powers SLI measurement
- Back-of-Envelope Estimation — calculate storage requirements for time-series
- System Design Interview Guide
- Algoroq Pricing — access all concept deep-dives
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.