Master database partitioning strategies — range, hash, list, and composite — with practical examples, partition pruning, and system design applications.

Database Partitioning

Database partitioning is the technique of dividing a large table into smaller, more manageable pieces called partitions, where each partition holds a subset of the data based on a defined rule. Queries that target a specific partition can skip scanning the rest of the table entirely.

What It Really Means

As tables grow beyond millions or billions of rows, even well-indexed queries slow down. Index trees become deeper, buffer pool hit rates drop, and maintenance operations like VACUUM or OPTIMIZE TABLE take longer. Partitioning solves this by splitting one logical table into multiple physical segments.

There is an important distinction between partitioning and sharding. Partitioning typically refers to dividing data within a single database server — PostgreSQL's declarative partitioning or MySQL's partitioned tables. Sharding refers to distributing partitions across multiple database servers. Sharding is partitioning + distribution. In system design interviews, the terms are sometimes used interchangeably, but understanding the difference demonstrates depth.

Partitioning works because of a property called partition pruning: the query optimizer examines the WHERE clause and determines which partitions could possibly contain matching rows, skipping all others. A query for January 2026 orders on a table partitioned by month only scans the January partition, ignoring the other 11 months of data.

How It Works in Practice

Range Partitioning

Range partitioning divides data based on a continuous range of values — typically dates or numeric IDs. Each partition covers a non-overlapping range.

Best for: Time-series data, logs, events, orders — any data with a natural chronological or sequential dimension. This is the most common partitioning strategy in practice.

Real example: Stripe partitions transaction records by month. Queries for a specific billing period hit only the relevant monthly partition. Old partitions can be archived to cold storage or dropped entirely without affecting active data.

Hash Partitioning

Hash partitioning applies a hash function to the partition key and assigns rows to partitions based on the hash value. This distributes data uniformly across partitions regardless of the key's distribution.

Best for: Evenly distributing data when there is no natural range dimension. Prevents hot partitions caused by skewed key distributions. Commonly used when sharding across multiple database servers.

Real example: DynamoDB uses consistent hashing to distribute items across partitions based on the partition key. A user_id-based partition key ensures that data for each user lands on a specific partition, and the hash function prevents all active users from landing on the same partition.

Limitation: Hash partitioning does not support range queries efficiently. A query for WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31' must scan all hash partitions because the hash function scatters date values across all partitions.

List Partitioning

List partitioning assigns rows to partitions based on discrete values of the partition key. You explicitly list which values belong to each partition.

Best for: Data with a small set of categorical values — geographic regions, product categories, tenant IDs in multi-tenant systems.

Real example: A global SaaS application partitions customer data by region: US, EU, APAC. Each region's data lives in its own partition, which can be hosted in a data center closest to those customers for data residency compliance. Queries scoped to a single region hit only one partition.

Composite (Sub-Partitioning)

Composite partitioning combines two strategies. The table is first partitioned by one method, then each partition is further sub-partitioned by another method.

Best for: Very large datasets that benefit from both range and hash semantics. Example: partition by date range first (monthly partitions), then hash each monthly partition by user_id for even distribution.

Real example: A ride-sharing company partitions trip records by month (range) and then by city_id (list) within each month. A query for "all trips in San Francisco in March 2026" prunes to exactly one sub-partition.

Implementation

sql

python

Trade-offs

When to Partition

Tables exceeding tens of millions of rows where queries naturally filter by partition key
Time-series data where old data needs archiving or deletion (drop a partition vs. DELETE millions of rows)
Multi-tenant applications where tenant isolation is required
When maintenance operations (vacuum, reindex) on the full table are too slow

When Not to Partition

Small tables (under a few million rows) — partitioning adds complexity without meaningful performance gain
Queries that frequently span all partitions — partitioning offers no pruning benefit and adds overhead
When proper indexing solves the performance problem — always try indexes first
Early in product development — premature partitioning is premature optimization

Advantages

Dramatic query speedup through partition pruning
Efficient data lifecycle management (drop old partitions instantly)
Parallel query execution across partitions
Smaller indexes per partition (faster index scans)
Reduced lock contention (operations on different partitions do not conflict)

Disadvantages

Cross-partition queries are slower than single-table queries
Unique constraints must include the partition key
Foreign keys referencing partitioned tables have limitations in PostgreSQL
Schema changes require updating all partitions
Choosing the wrong partition key can cause hot partitions (data skew)
ACID transactions spanning multiple shards require distributed transaction protocols (2PC) which add latency and complexity

Common Misconceptions

"Partitioning and sharding are the same thing" — Partitioning divides data within one database server. Sharding distributes partitions across multiple servers. Sharding adds network communication, distributed transactions, and cross-shard query complexity on top of partitioning.
"You should partition every large table" — Partitioning only helps if queries naturally filter by the partition key. A table with 500 million rows where most queries use WHERE user_id = ? benefits more from a B-Tree index on user_id than from partitioning.
"Hash partitioning prevents all hot spots" — Hash partitioning distributes data evenly by key, but if a single key generates disproportionate traffic (a celebrity's user_id), that partition still becomes hot. Application-level strategies like request rate limiting are needed.
"You can change the partition key later" — Changing the partition key requires recreating the entire partitioned table structure and migrating all data. Choose the partition key carefully upfront based on your most critical query patterns.
"More partitions is always better" — Too many partitions create overhead: the query planner must evaluate all partitions for pruning, each partition has its own indexes and metadata, and maintenance operations multiply. Hundreds to low thousands of partitions is typical; millions is problematic.

How This Appears in Interviews

Partitioning is essential in system design interviews for any data-intensive application:

"Design a system to handle 1 billion events per day" — discuss time-based range partitioning for writes, hash partitioning across shards for distribution, and partition pruning for efficient queries. See our system design interview guide.
"How would you handle a hot partition?" — discuss resharding, consistent hashing, salting the partition key, and read replicas for hot partitions. See our interview questions on databases.
"What is the difference between partitioning and sharding?" — explain that sharding is partitioning across servers, adding network, consistency, and query routing complexity.
"How would you migrate from a single table to a partitioned table?" — discuss pg_partman for PostgreSQL, online schema migration tools, dual-write strategies, and rollback plans.

Related Concepts

Database Indexing — Indexes work within each partition independently
ACID Properties — Cross-partition transactions complicate ACID guarantees
BASE Properties — Sharded systems often adopt eventual consistency
CAP Theorem — Sharding across data centers forces CAP trade-offs
Normalization vs Denormalization — Schema design interacts with partition key selection
System Design Interview Guide — Partitioning is a core tool in system design
Algoroq Pricing — Practice database scaling interview questions

Database Partitioning Explained: Range, Hash, List, and Composite Strategies