Chunking Strategies for RAG Explained: How to Split Documents for Optimal Retrieval
Learn RAG chunking strategies — fixed-size, semantic, recursive, and parent-document chunking with practical guidelines for chunk size and overlap.
Chunking Strategies for RAG
Chunking is the process of splitting documents into smaller segments for embedding and retrieval in RAG systems, directly impacting retrieval quality, relevance, and generation accuracy.
What It Really Means
A RAG pipeline retrieves relevant chunks of text and feeds them to an LLM. The quality of the final answer depends heavily on whether the retrieved chunks contain the right information at the right granularity.
Too large: A 2,000-token chunk might contain the answer but buried in irrelevant context. The embedding model averages the semantics of the entire chunk, diluting the signal. The chunk matches broadly but lacks precision.
Too small: A 50-token chunk might contain a key fact but lack the context needed to interpret it. "The retention rate improved by 40%" means nothing without knowing what product, time period, or comparison baseline.
Chunking strategy is the art of finding the right granularity — chunks that are self-contained enough to be meaningful but focused enough to be retrievable. This is one of the most impactful and underappreciated decisions in RAG system design.
How It Works in Practice
Strategy 1: Fixed-Size Chunking
Split text into chunks of N tokens with M tokens of overlap.
- Chunk size: 256-1024 tokens (512 is a common default)
- Overlap: 10-20% of chunk size (64-128 tokens)
- Pros: Simple, predictable, works everywhere
- Cons: Splits mid-sentence, ignores document structure
Strategy 2: Recursive Character Splitting
Split by hierarchy of separators: paragraphs → sentences → words.
- First try splitting on "\n\n" (paragraphs)
- If chunks are too large, split on "\n" (lines)
- If still too large, split on ". " (sentences)
- If still too large, split on " " (words)
- Pros: Respects natural text boundaries
- Cons: Uneven chunk sizes
Strategy 3: Semantic Chunking
Split where the topic changes. Embed consecutive sentences and split where cosine similarity drops.
- Embed each sentence
- Compare consecutive sentence embeddings
- Split where similarity drops below a threshold
- Pros: Chunks are topically coherent
- Cons: Expensive (requires embedding every sentence), unpredictable chunk sizes
Strategy 4: Document-Structured Chunking
Use document structure (headings, sections, code blocks) as natural boundaries.
- Markdown: split on headings (##, ###)
- HTML: split on semantic tags (section, article, h2)
- Code: split on functions, classes, or modules
- Pros: Preserves author's intended structure
- Cons: Requires format-specific parsers
Strategy 5: Parent-Document Retrieval
Index small chunks for retrieval but return larger parent documents.
- Create small chunks (256 tokens) for precise matching
- Each small chunk references its parent chunk (1024 tokens)
- Retrieve using small chunks, return parent chunks to the LLM
- Pros: Precise retrieval + sufficient context
- Cons: More complex indexing, higher storage
Implementation
Trade-offs
Chunk Size Guidelines
| Use Case | Recommended Size | Rationale |
|---|---|---|
| Q&A / FAQ | 256-512 tokens | Short, focused answers |
| Technical docs | 512-1024 tokens | Need enough context for procedures |
| Legal documents | 1024-2048 tokens | Clauses need surrounding context |
| Code | Function/class level | Natural semantic boundaries |
When to Use Each Strategy
- Fixed-size: Default starting point, works for most cases
- Recursive: When document has natural paragraph/section structure
- Semantic: When topic coherence matters more than fixed boundaries
- Document-structured: Markdown, HTML, or code with clear structure
- Parent-document: When you need both precise retrieval and rich context
Advantages of Good Chunking
- Directly improves retrieval precision and recall
- Reduces noise in LLM context, improving generation quality
- Enables efficient token budgeting
Disadvantages of Over-Engineering
- Semantic chunking is expensive at scale (embedding every sentence)
- Complex chunking strategies are harder to debug
- Domain-specific chunking requires custom parsers for each format
Common Misconceptions
-
"There is one optimal chunk size" — The optimal size depends on document type, query patterns, embedding model, and use case. Always test multiple sizes on your specific data.
-
"Overlap is always necessary" — Overlap helps when chunks split mid-topic, but adds redundancy and cost. With semantic or structure-based chunking, overlap is often unnecessary.
-
"Smaller chunks are always more precise" — Tiny chunks lose context. The sentence "It increased by 40%" is useless without knowing what "it" refers to. Context is essential for both embedding quality and LLM comprehension.
-
"Chunking is a one-time setup" — As your documents, queries, and models evolve, your chunking strategy should be re-evaluated. What works for v1 may not work for v2.
How This Appears in Interviews
Chunking strategy questions test practical RAG engineering knowledge:
- "How would you chunk a 500-page technical manual for a RAG system?" — discuss structure-based chunking on headings, appropriate chunk sizes for technical content, and parent-document retrieval. See our interview questions on RAG systems.
- "Your RAG system returns correct documents but the LLM gives wrong answers. What could be wrong?" — chunks may be splitting relevant context, chunks may be too large (diluted embeddings), or overlap may be insufficient.
- "How do you handle tables and images in chunking?" — discuss multimodal embeddings, table serialization, and image captioning.
Related Concepts
- RAG — Chunking is a critical step in the RAG pipeline
- Vector Embeddings — Chunk quality affects embedding quality
- Semantic Search — Retrieval performance depends on chunking
- Embedding Models — Different models handle different chunk sizes
- Token Budgeting — Chunk size directly impacts token usage
- Algoroq Pricing — Practice RAG design interview questions
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.