Back to guides
6
12 min

Your First Vector Pipeline

End-to-End: Ingest, Embed, Store, Search, Evaluate

The Scenario

You're building a customer support knowledge base for a SaaS company. The system needs to:

  • Ingest 500 help articles (varying lengths, with categories and dates)
  • Let support agents search by natural language queries
  • Return relevant articles with highlighted passages
  • Handle filters by product area and recency
  • This module walks through every decision in the pipeline.

    Step 1: Document Ingestion

    Loading diagram...

    Key Decisions

    Text extraction: HTML articles need boilerplate removal (nav, footer, sidebar). Keep headings as context markers — they help embeddings understand the section topic.

    Metadata extraction: Pull structured fields (category, product area, last_updated) into separate metadata. These become filter fields, not part of the embedded text.

    Chunking choice: For help articles averaging 1,500 tokens each:

  • Strategy: Recursive (try heading → paragraph → sentence splits)
  • Target size: 300 tokens (enough context, not too diluted)
  • Overlap: 50 tokens (captures cross-paragraph concepts)
  • Result: ~5 chunks per article × 500 articles = ~2,500 chunks
  • What to Store per Chunk

    FieldPurposeExample
    chunk_idUnique identifier"article-42-chunk-3"
    embeddingVector (1536-dim)[0.02, -0.15, ...]
    textOriginal chunk text"To reset your password..."
    article_idParent document reference"article-42"
    titleArticle title for display"Password Reset Guide"
    categoryFilter field"account-management"
    productFilter field"web-app"
    updated_atRecency filter"2026-04-15"

    Step 2: Embedding Generation

    Model Selection for This Use Case

    Applying the decision framework from Module 5:

    FactorRequirementChoice
    LanguagesEnglish onlyNo multilingual needed
    QualityHigh (support accuracy matters)Top-tier model
    Scale2,500 chunks (small)Cost isn't a concern
    LatencySub-100ms query timeAny model works
    InfrastructureAlready using Supabasepgvector available

    Decision: OpenAI text-embedding-3-small (1536 dims). Excellent quality, low cost at this scale ($0.05 total for all chunks), well-supported.

    Embedding Pipeline Details

    Batching: Embed in batches of 100 (API limit varies by provider). Don't embed one at a time — it's 100x slower.

    Error handling: API calls can fail. Implement retries with exponential backoff. Track which chunks succeeded so you can resume.

    Versioning: Store the model name and version alongside vectors. When you upgrade models, you'll need to re-embed everything — knowing which model generated which vectors prevents mixing incompatible embeddings.

    Step 3: Storage Setup

    For this scale (2,500 chunks), pgvector in Supabase is the obvious choice:

    Schema Design

    The table needs: vector column (with HNSW index), text storage, and metadata columns (with B-tree indexes for filtering).

    Index Tuning

    For 2,500 vectors, a flat (brute-force) scan takes < 1ms. You could skip indexing entirely. But for good practice:

  • HNSW index with m=16, ef_construction=64 (conservative, fast build)
  • B-tree indexes on category, product, updated_at for filtering
  • At this scale, index build takes seconds. At 1M+ vectors, it takes minutes to hours.

    Step 4: Search Implementation

    Query Pipeline

  • Receive query — "How do I export my data?"
  • Embed query — Same model as documents (critical!)
  • Search with filters — Vector similarity + optional metadata filters
  • Re-rank (optional at this scale) — Cross-encoder for top results
  • Return results — Chunk text + article title + similarity score
  • Hybrid Search Setup

    Even at small scale, hybrid search is worth implementing:

  • Vector search: pgvector cosine similarity → top 20 candidates
  • Full-text search: PostgreSQL tsvector/tsquery → top 20 candidates
  • RRF fusion: Merge with reciprocal rank fusion → top 5 results
  • This catches exact-match queries ("ERR_EXPORT_FAILED") that pure vector search might miss.

    Result Enrichment

    Don't just return the matching chunk — provide context:

  • Article title and link
  • Surrounding chunks from the same article (context expansion)
  • Similarity score (helps agents gauge confidence)
  • Highlighted passage (the specific chunk that matched)
  • Step 5: Evaluation

    Building Your Eval Set

    Create 50 query-expected_result pairs:

    QueryExpected Article(s)Type
    "how to reset password"Password Reset GuideDirect match
    "account locked after too many attempts"Password Reset Guide, Account SecuritySemantic match
    "ERR_EXPORT_FAILED"Data Export TroubleshootingExact match
    "cancel my subscription"Billing FAQ, Account DeletionIntent match
    "GDPR data request"Privacy Policy, Data ExportDomain-specific

    Metrics to Track

    MetricYour ScoreTargetAction if Below
    Recall@5?> 0.90Add hybrid search or re-ranking
    MRR?> 0.80Improve chunking or add title embeddings
    Latency P95?< 100msAdd or tune HNSW index
    Filter accuracy?1.00Check metadata extraction pipeline

    Common Issues and Fixes

    ProblemSymptomFix
    Chunks too smallResults lack contextIncrease chunk size to 400-500 tokens
    Chunks too largeWrong passages matchDecrease chunk size, add title embedding
    Missing keywordsExact queries failAdd hybrid search (BM25)
    Redundant resultsTop 5 are from same articleApply MMR (λ=0.7)
    Stale resultsOutdated articles rank highAdd recency boost or filter
    Cross-topic matches"billing" matches "building"Try a larger embedding model

    The Complete Architecture

    Loading diagram...

    Decision Memo Template

    After completing this pipeline, document your decisions:

    Embedding model: OpenAI text-embedding-3-small (1536 dims)

  • Why: Best quality/cost at this scale, English-only requirement met
  • Alternative considered: nomic-embed-text (free but requires GPU)
  • Vector database: pgvector (Supabase)

  • Why: Already in stack, 2,500 vectors is trivial, ACID consistency
  • Migrate when: > 5M vectors or need > 1,000 QPS
  • Search strategy: Hybrid (vector + BM25) with RRF

  • Why: Catches both semantic and exact-match queries
  • Re-ranking skipped: Not needed at this scale
  • Chunking: Recursive, 300 tokens, 50 overlap

  • Why: Balanced context and precision for help articles
  • Would change for: Long technical docs (increase to 500)
  • Key Takeaways

  • A complete vector pipeline has 5 stages: ingest → embed → store → search → evaluate
  • Start with pgvector and hybrid search — optimize only when you have evidence
  • Always use the same embedding model for documents and queries
  • Build an evaluation set of 50+ queries before optimizing
  • Document decisions in a decision memo for your future self
  • The pipeline is never "done" — articles change, models improve, queries evolve
  • This is chapter 6 of Vector Databases & Embeddings.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details