The Vector DB Landscape
Choosing Your Storage Layer
The Landscape at a Glance
The vector database market has exploded. Choosing the right one depends on your scale, infrastructure, and use case. Here's the decision tree:
Database Profiles
pgvector (PostgreSQL Extension)
What it is: A PostgreSQL extension that adds vector column types, indexing (HNSW + IVF), and distance operators.
Strengths:
Limitations:
Best for: Teams already on PostgreSQL who need vector search without adding infrastructure. The default choice for most startups.
Pinecone
What it is: Fully managed, serverless vector database. No infrastructure to manage.
Strengths:
Limitations:
Best for: Teams that want vector search without managing infrastructure and are OK with a managed service.
Qdrant
What it is: Open-source vector database written in Rust. Available self-hosted or managed (Qdrant Cloud).
Strengths:
Limitations:
Best for: Teams that want high performance with the option to self-host. Great for latency-sensitive applications.
Weaviate
What it is: Open-source vector database with built-in vectorization modules.
Strengths:
Limitations:
Best for: Teams that want an all-in-one solution including vectorization, especially for multi-modal data.
Chroma
What it is: Lightweight, open-source embedding database focused on developer experience.
Strengths:
Limitations:
Best for: Prototyping and small-scale applications. Often the first vector DB developers try.
Milvus / Zilliz
What it is: Open-source vector database designed for billion-scale. Zilliz is the managed cloud version.
Strengths:
Limitations:
Best for: Large-scale deployments with billions of vectors and high throughput requirements.
Comparison Matrix
| Feature | pgvector | Pinecone | Qdrant | Weaviate | Chroma | Milvus |
|---|---|---|---|---|---|---|
| Max vectors | ~5M | Unlimited | 100M+ | 100M+ | ~1M | Billions |
| Index types | HNSW, IVF | Proprietary | HNSW | HNSW | HNSW | HNSW, IVF, DiskANN |
| Filtering | SQL WHERE | Metadata | Payload | Filters | Where | Boolean expr |
| Hybrid search | Manual | No | Sparse+dense | BM25+vector | No | Sparse+dense |
| Self-hosted | Yes | No | Yes | Yes | Yes | Yes |
| Managed | Supabase, Neon | Yes | Yes | Yes | Planned | Zilliz |
| Pricing model | Free (DB cost) | Per query | Per node | Per node | Free | Per node |
| Language | C | Unknown | Rust | Go/Java | Python | Go/C++ |
Cost Modeling
For a corpus of 1 million documents (1536-dim vectors):
| Database | Storage Cost | Query Cost (1K qps) | Total Monthly |
|---|---|---|---|
| **pgvector** (Supabase Pro) | Included | Included | ~$25 |
| **Pinecone** (Serverless) | ~$8 | ~$200 | ~$208 |
| Qdrant Cloud | ~$65 (1 node) | Included | ~$65 |
| Self-hosted Qdrant | Server cost | Server cost | ~$30-100 |
At 100 million documents, the picture changes dramatically:
The Migration Question
Switching vector databases is painful because:
Mitigation strategies:
Key Takeaways
This is chapter 4 of Vector Databases & Embeddings.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details