Database Indexing Explained: B-Tree, LSM Tree, and Hash Index Internals
Understand database indexing internals — B-Tree, LSM Tree, and Hash indexes — with performance characteristics, SQL examples, and interview guidance.
Database Indexing
A database index is a data structure that improves the speed of data retrieval operations on a table at the cost of additional storage space and slower writes. It works like a book index — instead of scanning every page, you look up the topic in the index to find the exact page number.
What It Really Means
Without an index, a database must perform a full table scan to find matching rows — reading every single row in the table and checking whether it matches the query predicate. For a table with 10 million rows, this means reading 10 million rows even if only 3 match. An index lets the database jump directly to the matching rows, reducing the work from millions of reads to a handful.
The core trade-off is simple: indexes make reads faster and writes slower. Every time you insert, update, or delete a row, the database must also update every index on that table. A table with 5 indexes requires 6 writes per insert (1 for the table + 5 for the indexes). This is why over-indexing is as dangerous as under-indexing.
There are three major index data structures, each optimized for different workloads: B-Tree (the default in most relational databases), LSM Tree (optimized for write-heavy workloads), and Hash Index (optimized for exact-match lookups). Understanding when to use each is critical for database performance tuning and system design.
How It Works in Practice
B-Tree Index
B-Trees are the most widely used index structure. PostgreSQL, MySQL InnoDB, Oracle, and SQL Server all default to B-Tree indexes. A B-Tree is a self-balancing tree where each node contains multiple keys and pointers, keeping the tree shallow (typically 3-4 levels deep even for billions of rows).
How lookup works: To find a row where email = 'alice@example.com', the database starts at the root node, compares the search key with the keys in the node, follows the appropriate child pointer, and repeats until it reaches a leaf node containing a pointer to the actual data row. This takes O(log n) comparisons — for 100 million rows, that is roughly 4 node reads.
Key property: B-Trees maintain sorted order, so they support both equality queries (WHERE email = ?) and range queries (WHERE created_at BETWEEN ? AND ?). They also support prefix searches (WHERE name LIKE 'Ali%') but not suffix searches (WHERE name LIKE '%ice').
Where B-Trees are used: PostgreSQL, MySQL, Oracle, SQL Server, SQLite — essentially every relational database defaults to B-Tree indexes.
LSM Tree (Log-Structured Merge Tree)
LSM Trees are optimized for write-heavy workloads. Instead of updating an index in-place (like B-Trees), LSM Trees batch writes in memory (memtable) and periodically flush them to sorted, immutable files on disk (SSTables). Background compaction merges these files to maintain read performance.
How it works:
- Writes go to an in-memory sorted structure (memtable)
- When the memtable reaches a size threshold, it is flushed to disk as a sorted SSTable file
- Reads check the memtable first, then search SSTables from newest to oldest
- Background compaction merges SSTables to reduce the number of files and remove deleted/overwritten entries
Key property: Writes are sequential (appending to the memtable and flushing sorted files), which is much faster than the random I/O of B-Tree page splits. However, reads may need to check multiple SSTables, and compaction consumes CPU and I/O in the background.
Where LSM Trees are used: Apache Cassandra, RocksDB, LevelDB, HBase, ScyllaDB, InfluxDB. These are systems designed for high write throughput — time-series data, event logs, messaging systems.
Hash Index
A hash index maps keys to row locations using a hash function. It provides O(1) lookups for exact-match queries but cannot support range queries, sorting, or partial matches at all.
How it works: The index computes hash(key) to determine the bucket where the row pointer is stored. Lookups are constant time on average but degrade to O(n) in the worst case due to hash collisions.
Where Hash Indexes are used: PostgreSQL supports explicit hash indexes (though B-Trees are usually preferred). Memcached and Redis use hash tables as their primary data structure. MySQL Memory engine supports hash indexes. Hash indexes are most valuable for in-memory databases where the overhead of B-Tree traversal is proportionally significant.
Implementation
Trade-offs
| Property | B-Tree | LSM Tree | Hash Index |
|---|---|---|---|
| Read speed | Fast (O(log n)) | Slower (check multiple SSTables) | Fastest (O(1)) |
| Write speed | Moderate (random I/O) | Fast (sequential I/O) | Fast (O(1)) |
| Range queries | Yes | Yes | No |
| Space overhead | Moderate | Higher (temporary duplication) | Low |
| Write amplification | Low | High (compaction rewrites data) | None |
| Best for | General OLTP | Write-heavy, time-series | Exact-match cache |
Indexing Anti-Patterns
- Over-indexing: Every index slows writes. A table with 10 indexes makes inserts 10x more expensive.
- Indexing low-cardinality columns: An index on a boolean
is_activecolumn is usually useless — the database will scan half the table anyway. - Wrong column order in composite indexes:
INDEX(a, b)serves queries on(a)and(a, b)but NOT(b)alone. - Ignoring index-only scans: If a query only needs columns in the index, adding them as
INCLUDEcolumns avoids the table lookup entirely.
Common Misconceptions
-
"More indexes means faster queries" — Each index speeds up specific query patterns but slows down all writes. Over-indexing is one of the most common database performance problems. Profile your actual query patterns before adding indexes.
-
"The database always uses my index" — The query optimizer may choose a full table scan if it estimates the index would return too many rows (low selectivity). Use
EXPLAIN ANALYZEto verify index usage. -
"Primary keys are the only clustered index" — In InnoDB, the primary key is the clustered index. In PostgreSQL, there is no clustered index by default — you can use
CLUSTERto physically reorder the table once, but it is not maintained automatically. -
"B-Tree indexes only help equality queries" — B-Trees excel at range queries, sorting (avoiding a sort step), and prefix matching. They are the most versatile index type.
-
"Hash indexes are always faster than B-Trees" — For exact-match lookups, hash indexes are O(1) vs O(log n). But the constant factors matter: a B-Tree with 4 levels reading cached pages is extremely fast. PostgreSQL's documentation recommends B-Trees over hash indexes in most cases.
How This Appears in Interviews
Database indexing is a top interview topic for backend and data engineering roles:
- "Your query is slow. How do you investigate?" — walk through
EXPLAIN ANALYZE, check for missing indexes, evaluate selectivity, look for index-only scan opportunities. See our interview questions on databases. - "Design the schema and indexes for an e-commerce order system" — discuss composite indexes, partial indexes for common query patterns (e.g., pending orders), and the write cost of over-indexing. See our system design interview guide.
- "When would you choose an LSM Tree over a B-Tree?" — discuss write amplification, sequential vs random I/O, compaction overhead, and workload characteristics.
- "What is write amplification?" — explain how LSM Trees trade read performance for write throughput, and how B-Trees cause random I/O for page splits.
Related Concepts
- ACID Properties — Transactions interact with index locking strategies
- Database Partitioning — Partitioning affects which indexes are useful
- Normalization vs Denormalization — Normalized schemas need more joins and therefore more indexes
- BASE Properties — Eventually consistent systems use different indexing approaches
- CAP Theorem — Distributed indexes add consistency challenges
- System Design Interview Guide — Apply indexing knowledge in design interviews
- Algoroq Pricing — Practice database optimization interview questions
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.