Retrieval System
Hybrid Search
Why Retrieval Matters Most
Retrieval is the R in RAG, and it's where most systems succeed or fail. A brilliant AI model with bad retrieval will confidently generate wrong answers. A mediocre model with great retrieval will give useful, grounded responses.
The goal: given a user's question, find the 5-10 most relevant chunks from the vector store — fast, accurately, and with enough context for the AI to synthesize a good answer.
Key Concepts
Semantic Search
Embed the user's query using the same model that embedded the chunks, then find the closest vectors using cosine distance:
User: "What are Acme Corp's main concerns about our product?"
→ Embed query → [0.31, -0.08, 0.55, ...]
→ Find nearest chunks in pgvector
→ Returns: Acme call transcript chunks, support tickets, CRM notesSemantic search understands meaning — it will match "concerns" with "worries," "issues," and "pain points" even though the words are different. But it can miss exact terms and acronyms.
Keyword Search
Traditional text matching using trigram similarity or BM25 scoring. Catches what semantic search misses:
Hybrid Search: Best of Both
Combine semantic and keyword results using Reciprocal Rank Fusion (RRF):
score(doc) = 1/(k + rank_semantic) + 1/(k + rank_keyword)Where k is a smoothing constant (typically 60). Documents that rank high in both lists get the best combined scores. This consistently outperforms either method alone.
SQL Filters
The hidden superpower of using pgvector in PostgreSQL — you can filter by metadata before or during vector search:
SELECT * FROM chunks
WHERE source_type = 'transcript'
AND account_name = 'Acme Corp'
AND date > '2024-01-01'
ORDER BY embedding <=> query_embedding
LIMIT 10;This turns a generic "search everything" into a precise "search Acme's recent call transcripts" — dramatically improving relevance.
Reranking
After retrieving candidates, rerank them using multiple signals:
The reranking formula is tunable per use case. For pre-call briefings, recency matters a lot. For product comparisons, authority and diversity matter more.
Context Window Assembly
The final step: take the top-ranked chunks and assemble them into a context block for the AI, with source attribution:
[Source: CRM - Acme Corp, 2024-03-15]
Deal stage: Negotiation. Main contact: Jane Smith, VP Engineering.
Revenue potential: $240K ARR. Key concern: integration timeline.
[Source: Call Transcript - Acme Q4 Review, 2024-02-28]
Jane mentioned they need the integration complete before their
Q2 board meeting. Budget is approved but timeline is the blocker.
[Source: Support Ticket #1847, 2024-03-01]
Acme's team reported latency issues with the current API connector...Each chunk carries its source citation, so the AI can reference where its information came from — critical for enterprise trust.
Architecture Pattern
Query ──→ Embed ──→ Semantic Search ──┐
│ ├──→ RRF Merge ──→ Rerank ──→ Context Assembly
└────→ Keyword Search ──────────────┘
│
SQL Filters (account, date, source_type)What You'll Build
Glossary
| Term | Meaning |
|---|---|
| RAG | Retrieval Augmented Generation — ground AI responses in real data |
| Cosine distance | How far apart two vectors are (0 = identical, 2 = opposite) |
| BM25 | Classic keyword relevance scoring algorithm |
| RRF | Reciprocal Rank Fusion — combines ranked lists from different search methods |
| Reranking | Re-scoring search results using multiple relevance signals |
| Context window | The text fed to the AI model alongside the user's question |
| Source attribution | Citing which document each piece of information came from |
This is chapter 3 of AI Sales Companion.
Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.
View course details