Back to guides
3
14 min

Vector Database Fundamentals

Why Regular Databases Can't Do This

Why Not Just Use PostgreSQL?

You can store vectors in a regular database column. PostgreSQL with pgvector does exactly this. But the moment you need to search millions of vectors for the nearest neighbors, you hit a wall: brute-force search is O(n).

Searching 1 million vectors with brute force takes ~100ms. At 100 million vectors, that's 10 seconds. At a billion, you're waiting minutes. Vector databases solve this with approximate nearest neighbor (ANN) algorithms that trade a tiny bit of accuracy for massive speed gains.

Exact vs Approximate Search

ApproachSpeedAccuracyWhen to Use
Brute force (flat)O(n) — slow at scale100% recall< 100K vectors, or when accuracy is critical
ANN (indexed)O(log n) — fast95-99% recall> 100K vectors, production search

Recall measures what percentage of true nearest neighbors the algorithm finds. 95% recall means it finds 95 out of 100 true nearest neighbors — usually good enough.

How ANN Indexes Work

Three dominant indexing strategies:

Loading diagram...

HNSW (Hierarchical Navigable Small World)

The most popular index type. Builds a multi-layer graph where each vector connects to its nearest neighbors. Search starts at the top layer (coarse, few nodes) and drills down to the bottom layer (fine, all nodes).

How it works:

  • Insert: Each vector connects to M nearest neighbors at each layer
  • Search: Start at top layer, greedily navigate toward query vector, drop down layers
  • Result: Find approximate nearest neighbors in O(log n) time
  • Trade-offs:

  • Pros: Best recall, fast search, no training step
  • Cons: High memory (stores graph structure), slow insert for very large datasets
  • Tuning: M (connections per node, usually 16-64) and efConstruction (build quality, usually 128-512)
  • IVF (Inverted File Index)

    Clusters vectors into buckets using k-means, then only searches relevant buckets at query time.

    How it works:

  • Training: Run k-means to create N centroids (e.g., 1024)
  • Insert: Assign each vector to its nearest centroid's bucket
  • Search: Find nprobe nearest centroids, search only those buckets
  • Trade-offs:

  • Pros: Good balance of speed/memory, well-understood
  • Cons: Requires training step, recall depends on nprobe setting
  • Tuning: nlist (number of clusters, usually sqrt(n)) and nprobe (clusters to search, usually 10-50)
  • Product Quantization (PQ)

    Compresses vectors by splitting them into subvectors and quantizing each. Reduces memory 4-32x but loses some precision.

    How it works:

  • Split each vector into M subvectors
  • Cluster each subvector space into 256 centroids
  • Store only centroid IDs (1 byte each) instead of float32 values
  • Trade-offs:

  • Pros: Dramatically reduces memory (e.g., 3072 floats → 96 bytes)
  • Cons: Lower recall, requires training, lossy compression
  • Often combined with IVF (IVF-PQ) for large-scale deployments
  • The Build vs Query Trade-off

    Every index has tuning parameters that trade build time and memory for query speed and recall:

    ParameterHigher Value =Lower Value =
    HNSW MBetter recall, more memoryFaster build, less memory
    HNSW efSearchBetter recall, slower queryFaster query, lower recall
    IVF nprobeBetter recall, slower queryFaster query, lower recall
    PQ segmentsBetter recall, more memoryMore compression, lower recall

    Filtering: The Hidden Challenge

    Real applications don't just search by similarity — they filter. "Find similar documents created in the last 30 days by this author in this department."

    Vector databases handle filtering differently:

    StrategyHowTrade-off
    Pre-filterFilter first, then ANN search on subsetAccurate filters, but small subsets break ANN assumptions
    Post-filterANN search first, then filter resultsFast, but may return fewer results than requested
    IntegratedFilter-aware ANN searchBest quality, but complex to implement

    This is why choosing the right vector database matters — filtering performance varies dramatically.

    Storage Architecture

    Vector databases must handle three types of data:

  • Vectors — The embedding arrays (bulk of storage)
  • Metadata — Structured fields for filtering (dates, tags, user IDs)
  • Payloads — Original text or references (for returning results)
  • Storage layout choices:

  • In-memory: Fastest search, but limited by RAM. Good for < 10M vectors.
  • Memory-mapped: Vectors on disk, OS manages caching. Scales to 100M+.
  • On-disk with caching: Largest scale, but slowest cold queries.
  • Consistency and Durability

    Unlike traditional databases, most vector databases prioritize availability over consistency:

  • Write path: Vectors are indexed asynchronously. A newly inserted vector may not be searchable for milliseconds to seconds.
  • Replication: Most support replicas for read scaling, but replication lag exists.
  • Durability: Write-ahead logs (WAL) prevent data loss, but rebuilding indexes after a crash can take minutes to hours.
  • If your application needs strong consistency (e.g., a vector must be searchable immediately after insert), your options are more limited — pgvector with PostgreSQL provides this.

    Key Takeaways

  • Vector databases use ANN algorithms for sub-linear search over millions of vectors
  • HNSW offers best recall, IVF balances speed/memory, PQ minimizes storage
  • Every index trades build cost for query quality via tuning parameters
  • Filtering is the hidden challenge — pre-filter vs post-filter vs integrated
  • Consistency trade-offs differ from traditional databases
  • For < 100K vectors, brute-force (flat index) is often good enough
  • This is chapter 3 of Vector Databases & Embeddings.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details