Back to guides
3
5 min

Retrieval System

Hybrid Search

Why Retrieval Matters Most

Retrieval is the R in RAG, and it's where most systems succeed or fail. A brilliant AI model with bad retrieval will confidently generate wrong answers. A mediocre model with great retrieval will give useful, grounded responses.

The goal: given a user's question, find the 5-10 most relevant chunks from the vector store — fast, accurately, and with enough context for the AI to synthesize a good answer.

Key Concepts

Semantic Search

Embed the user's query using the same model that embedded the chunks, then find the closest vectors using cosine distance:

User: "What are Acme Corp's main concerns about our product?"

→ Embed query → [0.31, -0.08, 0.55, ...]
→ Find nearest chunks in pgvector
→ Returns: Acme call transcript chunks, support tickets, CRM notes

Semantic search understands meaning — it will match "concerns" with "worries," "issues," and "pain points" even though the words are different. But it can miss exact terms and acronyms.

Keyword Search

Traditional text matching using trigram similarity or BM25 scoring. Catches what semantic search misses:

  • Exact product names — "DataSync Pro" vs "data synchronization product"
  • Acronyms — "MRR" won't semantically match "monthly recurring revenue"
  • Code/technical terms — exact string matching matters
  • Hybrid Search: Best of Both

    Combine semantic and keyword results using Reciprocal Rank Fusion (RRF):

    score(doc) = 1/(k + rank_semantic) + 1/(k + rank_keyword)

    Where k is a smoothing constant (typically 60). Documents that rank high in both lists get the best combined scores. This consistently outperforms either method alone.

    SQL Filters

    The hidden superpower of using pgvector in PostgreSQL — you can filter by metadata before or during vector search:

    SELECT * FROM chunks
    WHERE source_type = 'transcript'
      AND account_name = 'Acme Corp'
      AND date > '2024-01-01'
    ORDER BY embedding <=> query_embedding
    LIMIT 10;

    This turns a generic "search everything" into a precise "search Acme's recent call transcripts" — dramatically improving relevance.

    Reranking

    After retrieving candidates, rerank them using multiple signals:

  • Relevance — semantic similarity score (primary signal)
  • Recency — newer documents score higher (configurable decay)
  • Authority — some sources are more trustworthy (CRM > meeting notes)
  • Diversity — penalize multiple chunks from the same document
  • The reranking formula is tunable per use case. For pre-call briefings, recency matters a lot. For product comparisons, authority and diversity matter more.

    Context Window Assembly

    The final step: take the top-ranked chunks and assemble them into a context block for the AI, with source attribution:

    [Source: CRM - Acme Corp, 2024-03-15]
    Deal stage: Negotiation. Main contact: Jane Smith, VP Engineering.
    Revenue potential: $240K ARR. Key concern: integration timeline.
    
    [Source: Call Transcript - Acme Q4 Review, 2024-02-28]
    Jane mentioned they need the integration complete before their
    Q2 board meeting. Budget is approved but timeline is the blocker.
    
    [Source: Support Ticket #1847, 2024-03-01]
    Acme's team reported latency issues with the current API connector...

    Each chunk carries its source citation, so the AI can reference where its information came from — critical for enterprise trust.

    Architecture Pattern

    Query ──→ Embed ──→ Semantic Search ──┐
      │                                    ├──→ RRF Merge ──→ Rerank ──→ Context Assembly
      └────→ Keyword Search ──────────────┘
                 │
            SQL Filters (account, date, source_type)

    What You'll Build

  • Cosine similarity search on pgvector
  • Keyword matching with hybrid rank fusion
  • Relevance × recency × authority reranking
  • Context window assembly with source citations
  • Glossary

    TermMeaning
    RAGRetrieval Augmented Generation — ground AI responses in real data
    Cosine distanceHow far apart two vectors are (0 = identical, 2 = opposite)
    BM25Classic keyword relevance scoring algorithm
    RRFReciprocal Rank Fusion — combines ranked lists from different search methods
    RerankingRe-scoring search results using multiple relevance signals
    Context windowThe text fed to the AI model alongside the user's question
    Source attributionCiting which document each piece of information came from

    This is chapter 3 of AI Sales Companion.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details