Retrieval System
Hybrid Search + Temporal Reranking
The Retrieval Challenge for Marketing Intelligence
Marketing intelligence retrieval is harder than general-purpose RAG because of three compounding requirements:
No single search technique handles all three. You need a hybrid system that combines semantic search, keyword matching, and temporal reranking into a unified retrieval pipeline.
Semantic Search
The foundation. Given a natural language query, embed it as a vector and find the nearest chunks in pgvector using cosine similarity.
query: "How is CompetitorX positioning against us?"
→ embed query → find nearest vectors → return top-K results with scoresSemantic search handles paraphrasing well. "Positioning against us" matches "competitive differentiation" and "how they compare to our offering" because the embeddings capture meaning, not just keywords.
But it has blind spots. Exact names, dates, and metric values are poorly captured by general-purpose embeddings. "Acme Corp Q4 2024" might match chunks about "Beta Inc Q3 2023" because the semantic meaning (competitor + time period) is similar.
Keyword Search
Covers semantic search's blind spots. PostgreSQL full-text search with tsvector/tsquery finds exact term matches. For marketing queries, this catches:
Keyword search alone is too brittle — it misses paraphrases and synonyms. But layered on top of semantic search, it provides critical precision.
Hybrid Fusion with Reciprocal Rank Fusion
Combining two ranked lists into one requires a fusion algorithm. Reciprocal Rank Fusion (RRF) is the standard approach because it's simple, effective, and doesn't require score normalization.
The formula for each result:
RRF_score = sum(1 / (k + rank_i)) for each list where the result appearsWhere k is typically 60 (a smoothing constant). A result ranked #1 in semantic search and #3 in keyword search gets:
score = 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323A result appearing in only one list gets a lower combined score. This naturally promotes results that both search methods agree are relevant.
Weighting by Query Type
Not all queries should weight semantic and keyword equally:
The query intent classifier (built in Module 4) determines these weights dynamically.
Temporal Reranking
This is the key differentiator for marketing intelligence. After hybrid fusion produces a ranked list, temporal reranking adjusts scores based on:
Recency Boost
Newer data scores higher for most marketing queries. The boost uses exponential decay with a 90-day half-life:
recency_boost = weight * e^(-0.693 * days_old / 90)A chunk from yesterday gets nearly the full boost. A chunk from 6 months ago gets almost none.
Source Authority
Different source types carry different weight depending on the query type:
Temporal Diversity
For trend queries, you NEED results from multiple time periods. If all 10 results are from October, you can't detect a trend. The temporal diversity penalty reduces scores for results from over-represented time periods:
diversity_penalty = 0.05 * (count_of_results_from_same_period - 1)This ensures that "how has competitor positioning changed?" returns results from Q1, Q2, Q3, and Q4 — not just the most recent quarter.
Trend-Aware Context Assembly
The final step: assembling retrieved chunks into a context window for the LLM. For marketing intelligence, this means:
The assembled context reads like a briefing document:
[Source: CompetitorX · Q2 2024 · competitor]
Positioned as "the affordable alternative" with emphasis on SMB pricing...
--- [CHANGE: Q2 → Q4] ---
[Source: CompetitorX · Q4 2024 · competitor]
Shifted to "enterprise-grade at every scale" with new enterprise tier launch...This chronological, annotated context enables the LLM to synthesize trend narratives that raw search results cannot.
Query Intent Classification
Before retrieval even begins, classify the query intent to optimize the search strategy:
| Intent | Retrieval Strategy |
|---|---|
| `competitive` | Keyword-heavy (exact competitor names), temporal diversity ON |
| `trend` | Balanced semantic/keyword, temporal diversity ON, recency moderate |
| `performance` | Campaign-source priority, recency HIGH, temporal diversity OFF |
| `content` | Brand guidelines priority, recency LOW, semantic-heavy |
Intent classification is a simple heuristic (keyword detection + patterns) in Module 3. Module 4 upgrades it to an LLM-powered classifier as part of the AI Gateway.
What You'll Build
Glossary
| Term | Meaning |
|---|---|
| Hybrid search | Combining semantic and keyword search for better recall and precision |
| RRF | Reciprocal Rank Fusion — a method for merging ranked lists without score normalization |
| Temporal reranking | Adjusting search scores based on data recency, authority, and time diversity |
| Temporal diversity | Ensuring search results span multiple time periods for trend detection |
| Context assembly | Building a structured, cited text block from search results for LLM consumption |
| Query intent | Classifying what type of answer a query needs (competitive, trend, performance, content) |
This is chapter 3 of AI Marketing Intelligence.
Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.
View course details