5 min

Retrieval System

Hybrid Search

Why Retrieval Matters Most

Retrieval is the R in RAG, and it's where most systems succeed or fail. A brilliant AI model with bad retrieval will confidently generate wrong answers. A mediocre model with great retrieval will give useful, grounded responses.

The goal: given a user's question, find the 5-10 most relevant chunks from the vector store — fast, accurately, and with enough context for the AI to synthesize a good answer.

Key Concepts

Semantic Search

Embed the user's query using the same model that embedded the chunks, then find the closest vectors using cosine distance:

User: "What are Acme Corp's main concerns about our product?"

→ Embed query → [0.31, -0.08, 0.55, ...]
→ Find nearest chunks in pgvector
→ Returns: Acme call transcript chunks, support tickets, CRM notes

Semantic search understands meaning — it will match "concerns" with "worries," "issues," and "pain points" even though the words are different. But it can miss exact terms and acronyms.

Keyword Search

Traditional text matching using trigram similarity or BM25 scoring. Catches what semantic search misses:

Exact product names — "DataSync Pro" vs "data synchronization product"

Acronyms — "MRR" won't semantically match "monthly recurring revenue"

Code/technical terms — exact string matching matters

Hybrid Search: Best of Both

Combine semantic and keyword results using Reciprocal Rank Fusion (RRF):

score(doc) = 1/(k + rank_semantic) + 1/(k + rank_keyword)

Where k is a smoothing constant (typically 60). Documents that rank high in both lists get the best combined scores. This consistently outperforms either method alone.

SQL Filters

The hidden superpower of using pgvector in PostgreSQL — you can filter by metadata before or during vector search:

SELECT * FROM chunks
WHERE source_type = 'transcript'
  AND account_name = 'Acme Corp'
  AND date > '2024-01-01'
ORDER BY embedding <=> query_embedding
LIMIT 10;

This turns a generic "search everything" into a precise "search Acme's recent call transcripts" — dramatically improving relevance.

Reranking

After retrieving candidates, rerank them using multiple signals:

Relevance — semantic similarity score (primary signal)

Recency — newer documents score higher (configurable decay)

Authority — some sources are more trustworthy (CRM > meeting notes)

Diversity — penalize multiple chunks from the same document

The reranking formula is tunable per use case. For pre-call briefings, recency matters a lot. For product comparisons, authority and diversity matter more.

Context Window Assembly

The final step: take the top-ranked chunks and assemble them into a context block for the AI, with source attribution:

[Source: CRM - Acme Corp, 2024-03-15]
Deal stage: Negotiation. Main contact: Jane Smith, VP Engineering.
Revenue potential: $240K ARR. Key concern: integration timeline.

[Source: Call Transcript - Acme Q4 Review, 2024-02-28]
Jane mentioned they need the integration complete before their
Q2 board meeting. Budget is approved but timeline is the blocker.

[Source: Support Ticket #1847, 2024-03-01]
Acme's team reported latency issues with the current API connector...

Each chunk carries its source citation, so the AI can reference where its information came from — critical for enterprise trust.

Architecture Pattern

Query ──→ Embed ──→ Semantic Search ──┐
  │                                    ├──→ RRF Merge ──→ Rerank ──→ Context Assembly
  └────→ Keyword Search ──────────────┘
             │
        SQL Filters (account, date, source_type)

What You'll Build

Cosine similarity search on pgvector

Keyword matching with hybrid rank fusion

Relevance × recency × authority reranking

Context window assembly with source citations

Glossary

Term	Meaning
RAG	Retrieval Augmented Generation — ground AI responses in real data
Cosine distance	How far apart two vectors are (0 = identical, 2 = opposite)
BM25	Classic keyword relevance scoring algorithm
RRF	Reciprocal Rank Fusion — combines ranked lists from different search methods
Reranking	Re-scoring search results using multiple relevance signals
Context window	The text fed to the AI model alongside the user's question
Source attribution	Citing which document each piece of information came from

This is chapter 3 of AI Sales Companion.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Encoding Pipeline

Ch. 4: AI Gateway