Search & Retrieve
Finding the Right Context
Semantic Search vs Keyword Search
You now have chunks stored as vectors. Time to search them.
| Approach | How It Works | Strength | Weakness |
|---|---|---|---|
| Keyword | Match exact words (SQL `LIKE`, full-text search) | Fast, predictable | Misses synonyms ("PTO" vs "vacation") |
| Semantic | Compare meaning-vectors (cosine similarity) | Understands meaning | Can return topically similar but wrong results |
| Hybrid | Combine both, rerank results | Best of both worlds | More complex to implement |
For this course, we'll build semantic search first (it's the core of RAG), then you can add keyword search as an enhancement.
Query Embedding
The search flow is straightforward: embed the user's question with the same model used for chunks, then find the nearest vectors.
User: "Can I work from home on Fridays?"
│
▼
┌───────────────┐
│ Embed question │ → [0.23, -0.45, 0.67, ...]
└───────┬───────┘
│
▼
┌───────────────┐
│ Compare with │ → Find nearest chunk vectors
│ all chunk │
│ embeddings │
└───────┬───────┘
│
▼
Top 3 most similar chunksCritical rule: You must use the same embedding model for queries and chunks. Vectors from different models live in incompatible spaces — comparing them produces meaningless results.
Top-k Retrieval
"Top-k" means retrieving the k most similar chunks. Typical values:
| k | Use Case |
|---|---|
| 3 | Focused Q&A — when you expect one clear answer |
| 5 | General questions — cast a wider net |
| 10 | Research/exploration — gather multiple perspectives |
async function search(query: string, topK: number = 5) {
const queryEmbedding = await embed(query);
const { data } = await supabase.rpc("match_chunks", {
query_embedding: queryEmbedding,
match_count: topK,
});
return data; // [{ id, content, similarity, metadata }]
}The match_chunks function is a PostgreSQL RPC that runs the vector similarity query:
create function match_chunks(
query_embedding vector(1536),
match_count int default 5
) returns table (
id text,
content text,
similarity float,
metadata jsonb
) as $$
select
id,
content,
1 - (embedding <=> query_embedding) as similarity,
metadata
from chunks
order by embedding <=> query_embedding
limit match_count;
$$ language sql;Relevance Scoring
Not all retrieved chunks are equally relevant. A similarity score of 0.92 is a strong match. A score of 0.65 might be noise. Set a threshold to filter out weak matches:
const SIMILARITY_THRESHOLD = 0.7;
const relevant = results.filter(
(r) => r.similarity >= SIMILARITY_THRESHOLD
);If no chunks pass the threshold, that's a signal the question is out of scope — your system should say "I don't have information about that" instead of guessing.
Filtering by Metadata
Remember the metadata you attached during chunking? Now it pays off. Users might want to search within a specific document or topic:
// Only search remote work policies
const results = await search(query, 5, {
source: "remote-work.md",
});
// Only search FAQ entries
const results = await search(query, 5, {
source: "faq.json",
});This combines vector similarity (meaning match) with structured filters (metadata match). The SQL adds a WHERE clause before the vector search:
where metadata->>'source' = filter_source
order by embedding <=> query_embedding
limit match_count;Building the Search Function
Putting it all together:
interface SearchResult {
id: string;
content: string;
similarity: number;
source: string;
heading?: string;
}
async function searchDocs(
query: string,
options: { topK?: number; threshold?: number; source?: string } = {}
): Promise<SearchResult[]> {
const { topK = 5, threshold = 0.7, source } = options;
const queryEmbedding = await embed(query);
// Vector search with optional metadata filter
let results = await matchChunks(queryEmbedding, topK, source);
// Filter by similarity threshold
results = results.filter((r) => r.similarity >= threshold);
return results;
}This function is the bridge between your user's question and the LLM. In the next module, you'll feed these results into a prompt to generate grounded answers.
This is chapter 4 of RAG in 60 Minutes.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details