Semantic Search
Finding Meaning, Not Keywords
The Problem With Keyword Search
You saved a bookmark about "flow states and deep focus techniques." A week later you search for "how to concentrate better." Keyword search returns nothing — none of those exact words appear in the bookmark.
Semantic search solves this. It understands that "concentrate better" and "deep focus techniques" mean the same thing, even though they share zero words.
How Embeddings Work
An embedding is a list of numbers (a vector) that represents the *meaning* of a piece of text. Two texts with similar meanings produce vectors that point in similar directions.
"how to concentrate better" → [0.23, 0.87, -0.14, 0.56, ...]
"deep focus techniques" → [0.21, 0.85, -0.11, 0.59, ...] ← similar!
"quarterly revenue projections" → [-0.67, 0.12, 0.93, -0.31, ...] ← differentThe embedding model (like OpenAI's text-embedding-3-small or Cohere's embed-v3) has learned these meaning-to-number mappings from billions of text examples.
Cosine Similarity
Once everything is a vector, you need a way to compare them. Cosine similarity measures the angle between two vectors:
| Score | Meaning |
|---|---|
| 1.0 | Identical meaning |
| 0.8-0.9 | Strongly related |
| 0.5-0.7 | Somewhat related |
| 0.0-0.4 | Unrelated |
The formula is straightforward — dot product divided by the product of magnitudes:
function cosineSimilarity(a: number[], b: number[]): number {
let dot = 0, magA = 0, magB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
magA += a[i] * a[i];
magB += b[i] * b[i];
}
return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}The Search Pipeline
Semantic search follows a simple flow:
User Query → Embed Query → Compare vs All Chunk Embeddings → Rank by Similarity → Return Top-KAt query time, you embed the query once, then compare it against every chunk embedding. This is fast because vector comparison is just multiplication and addition.
Filtered Search
Pure semantic search searches everything. But often you want to narrow the scope first:
Filtering first, then searching semantically over the filtered set, is both faster and more accurate.
Hybrid Search: Best of Both
Sometimes keyword search catches what semantic search misses (exact names, codes, acronyms) and vice versa. Hybrid search combines both:
final_score = 0.7 * semantic + 0.3 * keywordThe weighting (70/30 toward semantic) works well for personal knowledge bases where you rarely search by exact terms.
Practical Considerations
Key Takeaways
This is chapter 3 of AI-Powered Second Brain.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details