14 min

Vector Database Fundamentals

Why Regular Databases Can't Do This

Why Not Just Use PostgreSQL?

You can store vectors in a regular database column. PostgreSQL with pgvector does exactly this. But the moment you need to search millions of vectors for the nearest neighbors, you hit a wall: brute-force search is O(n).

Searching 1 million vectors with brute force takes ~100ms. At 100 million vectors, that's 10 seconds. At a billion, you're waiting minutes. Vector databases solve this with approximate nearest neighbor (ANN) algorithms that trade a tiny bit of accuracy for massive speed gains.

Exact vs Approximate Search

Approach	Speed	Accuracy	When to Use
Brute force (flat)	O(n) — slow at scale	100% recall	< 100K vectors, or when accuracy is critical
ANN (indexed)	O(log n) — fast	95-99% recall	> 100K vectors, production search

Recall measures what percentage of true nearest neighbors the algorithm finds. 95% recall means it finds 95 out of 100 true nearest neighbors — usually good enough.

How ANN Indexes Work

Three dominant indexing strategies:

Loading diagram...

HNSW (Hierarchical Navigable Small World)

The most popular index type. Builds a multi-layer graph where each vector connects to its nearest neighbors. Search starts at the top layer (coarse, few nodes) and drills down to the bottom layer (fine, all nodes).

How it works:

Insert: Each vector connects to M nearest neighbors at each layer

Search: Start at top layer, greedily navigate toward query vector, drop down layers

Result: Find approximate nearest neighbors in O(log n) time

Trade-offs:

Pros: Best recall, fast search, no training step

Cons: High memory (stores graph structure), slow insert for very large datasets

Tuning: M (connections per node, usually 16-64) and efConstruction (build quality, usually 128-512)

IVF (Inverted File Index)

Clusters vectors into buckets using k-means, then only searches relevant buckets at query time.

How it works:

Training: Run k-means to create N centroids (e.g., 1024)

Insert: Assign each vector to its nearest centroid's bucket

Search: Find nprobe nearest centroids, search only those buckets

Trade-offs:

Pros: Good balance of speed/memory, well-understood

Cons: Requires training step, recall depends on nprobe setting

Tuning: nlist (number of clusters, usually sqrt(n)) and nprobe (clusters to search, usually 10-50)

Product Quantization (PQ)

Compresses vectors by splitting them into subvectors and quantizing each. Reduces memory 4-32x but loses some precision.

How it works:

Split each vector into M subvectors

Cluster each subvector space into 256 centroids

Store only centroid IDs (1 byte each) instead of float32 values

Trade-offs:

Pros: Dramatically reduces memory (e.g., 3072 floats → 96 bytes)

Cons: Lower recall, requires training, lossy compression

Often combined with IVF (IVF-PQ) for large-scale deployments

The Build vs Query Trade-off

Every index has tuning parameters that trade build time and memory for query speed and recall:

Parameter	Higher Value =	Lower Value =
HNSW M	Better recall, more memory	Faster build, less memory
HNSW efSearch	Better recall, slower query	Faster query, lower recall
IVF nprobe	Better recall, slower query	Faster query, lower recall
PQ segments	Better recall, more memory	More compression, lower recall

Filtering: The Hidden Challenge

Real applications don't just search by similarity — they filter. "Find similar documents created in the last 30 days by this author in this department."

Vector databases handle filtering differently:

Strategy	How	Trade-off
Pre-filter	Filter first, then ANN search on subset	Accurate filters, but small subsets break ANN assumptions
Post-filter	ANN search first, then filter results	Fast, but may return fewer results than requested
Integrated	Filter-aware ANN search	Best quality, but complex to implement

This is why choosing the right vector database matters — filtering performance varies dramatically.

Storage Architecture

Vector databases must handle three types of data:

Vectors — The embedding arrays (bulk of storage)

Metadata — Structured fields for filtering (dates, tags, user IDs)

Payloads — Original text or references (for returning results)

Storage layout choices:

In-memory: Fastest search, but limited by RAM. Good for < 10M vectors.

Memory-mapped: Vectors on disk, OS manages caching. Scales to 100M+.

On-disk with caching: Largest scale, but slowest cold queries.

Consistency and Durability

Unlike traditional databases, most vector databases prioritize availability over consistency:

Write path: Vectors are indexed asynchronously. A newly inserted vector may not be searchable for milliseconds to seconds.

Replication: Most support replicas for read scaling, but replication lag exists.

Durability: Write-ahead logs (WAL) prevent data loss, but rebuilding indexes after a crash can take minutes to hours.

If your application needs strong consistency (e.g., a vector must be searchable immediately after insert), your options are more limited — pgvector with PostgreSQL provides this.

Key Takeaways

Vector databases use ANN algorithms for sub-linear search over millions of vectors

HNSW offers best recall, IVF balances speed/memory, PQ minimizes storage

Every index trades build cost for query quality via tuning parameters

Filtering is the hidden challenge — pre-filter vs post-filter vs integrated

Consistency trade-offs differ from traditional databases

For < 100K vectors, brute-force (flat index) is often good enough

This is chapter 3 of Vector Databases & Embeddings.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: How Embedding Models Work

Ch. 4: The Vector DB Landscape