12 min

Your First Vector Pipeline

End-to-End: Ingest, Embed, Store, Search, Evaluate

The Scenario

You're building a customer support knowledge base for a SaaS company. The system needs to:

Ingest 500 help articles (varying lengths, with categories and dates)

Let support agents search by natural language queries

Return relevant articles with highlighted passages

Handle filters by product area and recency

This module walks through every decision in the pipeline.

Step 1: Document Ingestion

Loading diagram...

Key Decisions

Text extraction: HTML articles need boilerplate removal (nav, footer, sidebar). Keep headings as context markers — they help embeddings understand the section topic.

Metadata extraction: Pull structured fields (category, product area, last_updated) into separate metadata. These become filter fields, not part of the embedded text.

Chunking choice: For help articles averaging 1,500 tokens each:

Strategy: Recursive (try heading → paragraph → sentence splits)

Target size: 300 tokens (enough context, not too diluted)

Overlap: 50 tokens (captures cross-paragraph concepts)

Result: ~5 chunks per article × 500 articles = ~2,500 chunks

What to Store per Chunk

Field	Purpose	Example
chunk_id	Unique identifier	"article-42-chunk-3"
embedding	Vector (1536-dim)	[0.02, -0.15, ...]
text	Original chunk text	"To reset your password..."
article_id	Parent document reference	"article-42"
title	Article title for display	"Password Reset Guide"
category	Filter field	"account-management"
product	Filter field	"web-app"
updated_at	Recency filter	"2026-04-15"

Step 2: Embedding Generation

Model Selection for This Use Case

Applying the decision framework from Module 5:

Factor	Requirement	Choice
Languages	English only	No multilingual needed
Quality	High (support accuracy matters)	Top-tier model
Scale	2,500 chunks (small)	Cost isn't a concern
Latency	Sub-100ms query time	Any model works
Infrastructure	Already using Supabase	pgvector available

Decision: OpenAI text-embedding-3-small (1536 dims). Excellent quality, low cost at this scale ($0.05 total for all chunks), well-supported.

Embedding Pipeline Details

Batching: Embed in batches of 100 (API limit varies by provider). Don't embed one at a time — it's 100x slower.

Error handling: API calls can fail. Implement retries with exponential backoff. Track which chunks succeeded so you can resume.

Versioning: Store the model name and version alongside vectors. When you upgrade models, you'll need to re-embed everything — knowing which model generated which vectors prevents mixing incompatible embeddings.

Step 3: Storage Setup

For this scale (2,500 chunks), pgvector in Supabase is the obvious choice:

Schema Design

The table needs: vector column (with HNSW index), text storage, and metadata columns (with B-tree indexes for filtering).

Index Tuning

For 2,500 vectors, a flat (brute-force) scan takes < 1ms. You could skip indexing entirely. But for good practice:

HNSW index with m=16, ef_construction=64 (conservative, fast build)

B-tree indexes on category, product, updated_at for filtering

At this scale, index build takes seconds. At 1M+ vectors, it takes minutes to hours.

Step 4: Search Implementation

Query Pipeline

Receive query — "How do I export my data?"

Embed query — Same model as documents (critical!)

Search with filters — Vector similarity + optional metadata filters

Re-rank (optional at this scale) — Cross-encoder for top results

Return results — Chunk text + article title + similarity score

Hybrid Search Setup

Even at small scale, hybrid search is worth implementing:

Vector search: pgvector cosine similarity → top 20 candidates

Full-text search: PostgreSQL tsvector/tsquery → top 20 candidates

RRF fusion: Merge with reciprocal rank fusion → top 5 results

This catches exact-match queries ("ERR_EXPORT_FAILED") that pure vector search might miss.

Result Enrichment

Don't just return the matching chunk — provide context:

Article title and link

Surrounding chunks from the same article (context expansion)

Similarity score (helps agents gauge confidence)

Highlighted passage (the specific chunk that matched)

Step 5: Evaluation

Building Your Eval Set

Create 50 query-expected_result pairs:

Query	Expected Article(s)	Type
"how to reset password"	Password Reset Guide	Direct match
"account locked after too many attempts"	Password Reset Guide, Account Security	Semantic match
"ERR_EXPORT_FAILED"	Data Export Troubleshooting	Exact match
"cancel my subscription"	Billing FAQ, Account Deletion	Intent match
"GDPR data request"	Privacy Policy, Data Export	Domain-specific

Metrics to Track

Metric	Your Score	Target	Action if Below
Recall@5	?	> 0.90	Add hybrid search or re-ranking
MRR	?	> 0.80	Improve chunking or add title embeddings
Latency P95	?	< 100ms	Add or tune HNSW index
Filter accuracy	?	1.00	Check metadata extraction pipeline

Common Issues and Fixes

Problem	Symptom	Fix
Chunks too small	Results lack context	Increase chunk size to 400-500 tokens
Chunks too large	Wrong passages match	Decrease chunk size, add title embedding
Missing keywords	Exact queries fail	Add hybrid search (BM25)
Redundant results	Top 5 are from same article	Apply MMR (λ=0.7)
Stale results	Outdated articles rank high	Add recency boost or filter
Cross-topic matches	"billing" matches "building"	Try a larger embedding model

The Complete Architecture

Loading diagram...

Decision Memo Template

After completing this pipeline, document your decisions:

Embedding model: OpenAI text-embedding-3-small (1536 dims)

Why: Best quality/cost at this scale, English-only requirement met

Alternative considered: nomic-embed-text (free but requires GPU)

Vector database: pgvector (Supabase)

Why: Already in stack, 2,500 vectors is trivial, ACID consistency

Migrate when: > 5M vectors or need > 1,000 QPS

Search strategy: Hybrid (vector + BM25) with RRF

Why: Catches both semantic and exact-match queries

Re-ranking skipped: Not needed at this scale

Chunking: Recursive, 300 tokens, 50 overlap

Why: Balanced context and precision for help articles

Would change for: Long technical docs (increase to 500)

Key Takeaways

A complete vector pipeline has 5 stages: ingest → embed → store → search → evaluate

Start with pgvector and hybrid search — optimize only when you have evidence

Always use the same embedding model for documents and queries

Build an evaluation set of 50+ queries before optimizing

Document decisions in a decision memo for your future self

The pipeline is never "done" — articles change, models improve, queries evolve

This is chapter 6 of Vector Databases & Embeddings.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 5: Search & Retrieval Patterns