4 min

Search & Retrieve

Finding the Right Context

Semantic Search vs Keyword Search

You now have chunks stored as vectors. Time to search them.

Approach	How It Works	Strength	Weakness
Keyword	Match exact words (SQL `LIKE`, full-text search)	Fast, predictable	Misses synonyms ("PTO" vs "vacation")
Semantic	Compare meaning-vectors (cosine similarity)	Understands meaning	Can return topically similar but wrong results
Hybrid	Combine both, rerank results	Best of both worlds	More complex to implement

For this course, we'll build semantic search first (it's the core of RAG), then you can add keyword search as an enhancement.

Query Embedding

The search flow is straightforward: embed the user's question with the same model used for chunks, then find the nearest vectors.

User: "Can I work from home on Fridays?"
                │
                ▼
        ┌───────────────┐
        │ Embed question │ → [0.23, -0.45, 0.67, ...]
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │ Compare with   │ → Find nearest chunk vectors
        │ all chunk      │
        │ embeddings     │
        └───────┬───────┘
                │
                ▼
     Top 3 most similar chunks

Critical rule: You must use the same embedding model for queries and chunks. Vectors from different models live in incompatible spaces — comparing them produces meaningless results.

Top-k Retrieval

"Top-k" means retrieving the k most similar chunks. Typical values:

k	Use Case
3	Focused Q&A — when you expect one clear answer
5	General questions — cast a wider net
10	Research/exploration — gather multiple perspectives

async function search(query: string, topK: number = 5) {
  const queryEmbedding = await embed(query);

  const { data } = await supabase.rpc("match_chunks", {
    query_embedding: queryEmbedding,
    match_count: topK,
  });

  return data; // [{ id, content, similarity, metadata }]
}

The match_chunks function is a PostgreSQL RPC that runs the vector similarity query:

create function match_chunks(
  query_embedding vector(1536),
  match_count int default 5
) returns table (
  id text,
  content text,
  similarity float,
  metadata jsonb
) as $$
  select
    id,
    content,
    1 - (embedding <=> query_embedding) as similarity,
    metadata
  from chunks
  order by embedding <=> query_embedding
  limit match_count;
$$ language sql;

Relevance Scoring

Not all retrieved chunks are equally relevant. A similarity score of 0.92 is a strong match. A score of 0.65 might be noise. Set a threshold to filter out weak matches:

const SIMILARITY_THRESHOLD = 0.7;

const relevant = results.filter(
  (r) => r.similarity >= SIMILARITY_THRESHOLD
);

If no chunks pass the threshold, that's a signal the question is out of scope — your system should say "I don't have information about that" instead of guessing.

Filtering by Metadata

Remember the metadata you attached during chunking? Now it pays off. Users might want to search within a specific document or topic:

// Only search remote work policies
const results = await search(query, 5, {
  source: "remote-work.md",
});

// Only search FAQ entries
const results = await search(query, 5, {
  source: "faq.json",
});

This combines vector similarity (meaning match) with structured filters (metadata match). The SQL adds a WHERE clause before the vector search:

where metadata->>'source' = filter_source
order by embedding <=> query_embedding
limit match_count;

Building the Search Function

Putting it all together:

interface SearchResult {
  id: string;
  content: string;
  similarity: number;
  source: string;
  heading?: string;
}

async function searchDocs(
  query: string,
  options: { topK?: number; threshold?: number; source?: string } = {}
): Promise<SearchResult[]> {
  const { topK = 5, threshold = 0.7, source } = options;
  const queryEmbedding = await embed(query);

  // Vector search with optional metadata filter
  let results = await matchChunks(queryEmbedding, topK, source);

  // Filter by similarity threshold
  results = results.filter((r) => r.similarity >= threshold);

  return results;
}

This function is the bridge between your user's question and the LLM. In the next module, you'll feed these results into a prompt to generate grounded answers.

This is chapter 4 of RAG in 60 Minutes.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 3: Embed & Store

Ch. 5: Generate Answers