How to model time-series data effectively — partitioning by time, downsampling, retention policies, and choosing between TimescaleDB, InfluxDB, and Cassandra.

Time-Series Data Modeling

Time-series data modeling is the practice of designing schemas, storage, and query patterns for data that is indexed primarily by time — metrics, logs, IoT sensor readings, financial ticks, and event streams.

What It Really Means

Time-series data has unique characteristics that make general-purpose databases a poor fit at scale. The data is append-heavy (you rarely update historical metrics). Queries almost always filter by time range. Old data becomes less valuable and can be downsampled or deleted. And the volume is enormous — a fleet of 10,000 servers reporting 100 metrics every 10 seconds generates 100 million data points per minute.

General-purpose databases like PostgreSQL can handle small-scale time-series workloads, but they struggle with the write throughput and query patterns at scale. Specialized time-series databases (TimescaleDB, InfluxDB, QuestDB, ClickHouse) optimize for these access patterns with columnar storage, automatic time-based partitioning, built-in downsampling, and efficient time-range queries.

Understanding time-series data modeling is essential for designing monitoring systems, IoT platforms, financial trading systems, and analytics pipelines.

How It Works in Practice

Data Characteristics

Schema Design Approaches

Wide table (one metric per column):

sql

Narrow table (EAV style):

sql

Time-Based Partitioning

sql

Time-based partitioning gives two major benefits:

Fast range queries: Query for "last 24 hours" only scans 1-2 partitions instead of the entire table
Efficient data retention: Drop old partitions instantly instead of DELETE (which is slow and creates bloat)

Implementation

TimescaleDB setup and queries:

sql

Downsampling strategy:

Trade-offs

Time-series database comparison:

Aspect	TimescaleDB	InfluxDB	ClickHouse	Cassandra
SQL support	Full PostgreSQL	InfluxQL/Flux	SQL-like	CQL
Write throughput	High	Very high	Very high	Very high
Aggregation speed	Good	Good	Excellent	Moderate
Ecosystem	PostgreSQL tools	Purpose-built	OLAP tools	Distributed
Complexity	Low (PostgreSQL)	Medium	Medium	High

When to use a time-series database:

Ingesting more than 100K data points per second
Queries are primarily time-range based
You need built-in downsampling and retention
Data is append-only or append-heavy

When PostgreSQL is sufficient:

Fewer than 10K data points per second
Time-series is a small part of the application (mixed workload)
You need ACID transactions on time-series data
Team expertise is in PostgreSQL

Common Misconceptions

"You always need a specialized time-series database" — PostgreSQL with proper partitioning handles small to medium time-series workloads well. TimescaleDB adds time-series features while keeping full PostgreSQL compatibility.
"Schemaless is better for time-series" — Typed schemas (wide tables) are more efficient for storage and queries. Use schemaless (narrow tables) only when metric names change frequently.
"Store everything at maximum resolution forever" — This is economically impractical. Downsample older data to reduce storage costs by 99%+ with minimal information loss.
"Time-series databases replace relational databases" — They complement each other. Store metadata (sensor locations, thresholds) in PostgreSQL and time-series data in a specialized store.
"Timestamps are sufficient as a primary key" — Multiple sources can report at the same timestamp. Use (timestamp, source_id) as a composite key.

How This Appears in Interviews

"Design a monitoring system for 10,000 servers" — Time-series database with time-based partitioning, downsampling for retention, and pre-aggregated dashboards.
"How do you handle millions of IoT sensor readings per second?" — LSM-tree based time-series database, partition by time, batch inserts, out-of-order handling.
"Your metrics dashboard is slow. How do you speed it up?" — Pre-aggregate data at query-time bucket sizes using materialized views or continuous aggregates.
"How do you manage storage costs for time-series data?" — Tiered retention: raw data for days, downsampled for months, aggregated for years.

Related Concepts

Database Indexing — LSM Trees dominate time-series write patterns
Database Partitioning — time-based partitioning is fundamental to time-series
Materialized Views — continuous aggregates for fast dashboard queries
Change Data Capture — streaming time-series data to downstream systems
SLOs, SLIs, and SLAs — time-series data powers SLI measurement
Back-of-Envelope Estimation — calculate storage requirements for time-series
System Design Interview Guide
Algoroq Pricing — access all concept deep-dives

Time-Series Data Modeling Explained: Storage, Indexing, and Query Patterns