Approximate Nearest Neighbor vs Exact Search: Vector Search Trade-offs

Overview

Approximate Nearest Neighbor (ANN) search finds vectors that are close to a query vector in high-dimensional space, trading a small probability of missing the exact nearest neighbors for dramatically faster query times. HNSW (Hierarchical Navigable Small World), IVF-PQ, and ScaNN are the dominant ANN algorithms powering production vector databases. HNSW builds a multi-layer graph where each layer is a small-world network of vector connections, enabling logarithmic-time traversal to find approximate neighbors.

Exact search (also called flat search or brute-force search) computes the exact distance between a query vector and every vector in the corpus to find the true nearest neighbors. FAISS's IndexFlatL2/IndexFlatIP implements this — it guarantees perfect recall but scales as O(n) with corpus size. On modern CPUs with SIMD acceleration or GPUs, exact search handles millions of vectors per second, making it practical for small-to-medium corpora.

Key Technical Differences

HNSW's graph structure is the key to ANN efficiency. During index construction, each vector is connected to M nearest neighbors in multiple hierarchical layers. At query time, traversal starts at the top layer (sparse long-range connections) and greedily navigates toward the query vector, descending layers until reaching the base graph with full density. This greedy graph search achieves O(log n) complexity, enabling 10-100ms queries on billion-scale indexes that would take minutes with exact search.

The recall-latency trade-off is controlled by the ef_search parameter in HNSW: higher values explore more candidate nodes, increasing recall toward 100% at the cost of latency. Production systems typically tune ef_search to achieve 95-99% recall — meaning 1-5% of queries may miss the true nearest neighbor, returning a very similar but not optimal result. For semantic search and RAG, this recall loss is negligible: the difference between the 1st and 2nd nearest neighbor in embedding space is rarely meaningful.

Index construction cost is a one-time investment. HNSW construction for 1M 768-dimensional vectors (typical BERT embeddings) takes 2-10 minutes and requires O(n * M) memory for the graph connections on top of the raw vectors. DiskANN (Microsoft Research) reduces this memory overhead by storing the graph on disk with an SSD-optimized access pattern, enabling billion-scale indexes on commodity hardware.*

Performance & Scale

FAISS benchmarks: exact search (IndexFlatL2) achieves ~1M QPS on GPU for 1M 128-dimensional vectors. HNSW achieves similar QPS at 1B vectors with >95% recall — maintaining sub-millisecond latency at 1000x the scale. For RAG production systems: at 10M document chunks, exact search takes 50-500ms per query; HNSW takes 5-20ms. This difference determines whether vector retrieval is on the critical path of user-facing latency budgets.

When to Choose Each

Choose ANN (HNSW, IVF-PQ, ScaNN) for all production vector search applications exceeding 100K vectors. The recall-latency trade-off is overwhelmingly favorable for real-world semantic search and retrieval applications. Choose exact search for small corpora, evaluation benchmarking, or compliance-sensitive applications requiring guaranteed recall.

Bottom Line

ANN search is the production standard for vector databases at any meaningful scale. Exact search is valuable for correctness testing and small-scale applications. The recall loss from ANN is typically 1-5% and rarely impacts end-user experience in semantic search or RAG applications — the latency benefit is almost always worth the trade-off beyond 50K vectors.