Courses 0%
28
Database Fundamentals · Chapter 28 of 42

Database Indexing

Akhil
Akhil Sharma
10 min

Database Indexing

The single most impactful database optimization — indexes can turn a 10-second full table scan into a 1-millisecond lookup.

Database Indexing - The Library Card Catalog

🎯 Challenge 3: The Phonebook Problem

Scenario: You need to find "John Smith" in a phonebook with 100,000 names.

Method A: Start at page 1, check every single name until you find John Smith

  • Time: Could take 50,000 comparisons on average! 😰

Method B: Use alphabetical organization:

  • Open to middle (probably around "M")
  • Too far? Go left
  • Not far enough? Go right
  • Repeat until found
  • Time: About 17 comparisons! ⚡

Question: How is Method B so much faster?

The Answer: Indexing!

A database without indexes = Reading every single row to find what you need

A database with indexes = Jump directly to the data you want


📚 What Is an Index?

Think of it like:

  • 📖 Index at the back of a textbook
  • 🗂️ Card catalog in a library
  • 📇 Contacts list sorted by name on your phone

Example: Finding a book in a library

Without Index:

Walk through EVERY aisle Check EVERY book title Found it after searching 50,000 books! ⏰ 2 hours

With Index (Card Catalog):

Look up "Pride and Prejudice" in catalog Card says: "Aisle 12, Shelf 5, Position 3" Walk directly there ⚡ 2 minutes


🗄️ How Database Indexes Work

Imagine a USERS table without index:

Finding user with email "bob@example.com":

Row 1: alice@example.com ❌ Row 2: charlie@example.com ❌ Row 3: david@example.com ❌ ... Row 50,000: bob@example.com ✓ FOUND!

Time: Scanned 50,000 rows (SLOW! 😰)

Same table WITH index on email:

Email Index (automatically sorted): alice@example.com → Row 1 bob@example.com → Row 50,000 ✓ charlie@example.com → Row 2 david@example.com → Row 3 ...

Database uses index:

  1. Binary search in index (fast!)
  2. Index points to Row 50,000
  3. Jump directly to that row

Time: Scanned ~17 entries (FAST! ⚡)


🎮 Interactive Exercise: When to Add an Index?

You have a PRODUCTS table with 1 million products:

sql

Common queries:

sql

Question: Which columns should you index? Think about it...


🎯 Indexing Guidelines: When to Index

✅ GOOD candidates for indexing:

  • Primary keys (usually auto-indexed)
    • Reason: Constantly used for lookups
  • Foreign keys (columns that reference other tables)
    • Reason: Used in JOIN operations
  • Columns frequently used in WHERE clauses
    • Example: WHERE email = '...'
    • Reason: Speeds up filtering
  • Columns used for sorting (ORDER BY)
    • Example: ORDER BY created_at DESC
    • Reason: Avoids sorting entire table

❌ POOR candidates for indexing:

  • Rarely queried columns
    • Example: description text field
    • Reason: Index overhead not worth it
  • Columns with many duplicates
    • Example: gender (only M/F values)
    • Reason: Index doesn't narrow down much
  • Very small tables
    • Example: Table with 50 rows
    • Reason: Full scan is already fast

💰 The Cost of Indexes:

Benefits:

  • ⚡ Dramatically faster SELECT queries
  • 🎯 Efficient searching and filtering
  • 📊 Faster sorting and joining

Costs:

  • 💾 Extra storage space (indexes take up room)
  • ⏱️ Slower INSERT/UPDATE/DELETE (must update indexes too)
  • 🔧 Maintenance overhead

Mental model: Think of indexes like shortcuts in a video game. They get you places faster, but they take time to build and maintain!


📊 Index Types: Quick Overview

Primary Index (Clustered):

  • Determines physical order of data on disk
  • Like books sorted on library shelves
  • One per table

Secondary Index (Non-Clustered):

  • Separate structure pointing to data
  • Like card catalog pointing to books
  • Multiple per table

Unique Index:

  • Ensures no duplicate values
  • Like social security numbers

Composite Index:

  • Index on multiple columns together
  • Like phonebook sorted by (LastName, FirstName)

Example:

Index on single column

sql

Composite index

sql

Unique index

sql

🔥 Real-World Performance Example

Query without index:

sql

Execution time: 8 seconds (full table scan)

Rows examined: 10,000,000

Same query WITH index

sql

Execution time: 0.003 seconds

⚡ Rows examined: 50 (via index)

Speed improvement: 2,667x faster!

Visual analogy:

img1


🎯 Final Synthesis: Putting It All Together

The Complete Picture: Building a Library System

Let's apply everything we've learned!

1️⃣ Database Choice:

  • Need structured data (books, members, loans)
  • Need relationships (members ←→ loans ←→ books)
  • Need accuracy (can't lose loan records!)
  • Choice: SQL Database with ACID properties

| 2️⃣ Table Design with Keys

sql

3️⃣ Add Indexes for Performance

sql

4️⃣ CRUD Operations:

Create: New member joins

sql

Read: Find overdue books

sql

Update: Return a book

UPDATE loans

sql

Delete: Remove old loan records

sql

5️⃣ ACID Transaction: Loan a book

sql

-- All or nothing!**


🏆 Quick Recap: Test Your Understanding

Without looking back, can you explain:

  1. When would you choose SQL vs NoSQL?
  2. What does each letter in ACID stand for and why does it matter?
  3. Why would adding an index make a query faster?
  4. What's the difference between a Primary Key and a Foreign Key?
  5. What do the letters in CRUD stand for?

Mental check: If you can answer these clearly, you've mastered database fundamentals! 🎓


Common Mistakes

MistakeWhy it's wrongCorrect approach
Indexing every columnSlows down writes and wastes storageOnly index columns used in WHERE, JOIN, and ORDER BY
Wrong column order in composite indexesIndex isn't used for queries that don't match the leftmost prefixPut the most selective, most queried column first
Not using EXPLAINGuessing whether queries use indexesAlways verify with EXPLAIN ANALYZE before and after adding indexes
Indexing low-cardinality columnsBoolean or enum columns have too few distinct values for an index to helpIndexes work best on columns with many distinct values
Never removing unused indexesEvery index slows writesAudit and drop indexes that queries no longer use

Key Takeaways

  1. Indexes speed up reads dramatically at the cost of slower writes — like a book's index, they let the database jump directly to relevant rows
  2. B-tree indexes are the default and handle most query patterns — efficient for equality, range queries, and sorting
  3. Only index columns you actually query frequently — each index adds overhead to INSERT, UPDATE, and DELETE operations
  4. Composite indexes cover queries on multiple columns — column order in the index matters (leftmost prefix rule)
  5. Use EXPLAIN to verify your queries actually use indexes — a missing or unused index is a common source of slow queries
Chapter complete!

Course Complete!

You've finished all 42 chapters of

Introduction to System Design

Browse courses
Up next Database Read Write Operations
Continue