6 min

Retrieval System

Hybrid Search + Temporal Reranking

The Retrieval Challenge for Marketing Intelligence

Marketing intelligence retrieval is harder than general-purpose RAG because of three compounding requirements:

Semantic understanding — "companies shifting to product-led growth" should match chunks about "self-serve onboarding" and "freemium conversion"

Exact matching — "Acme Corp" must find that specific competitor, not similar-sounding companies

Temporal awareness — "this quarter" means entirely different data than "last year," and "how has X changed" requires data from MULTIPLE time periods

No single search technique handles all three. You need a hybrid system that combines semantic search, keyword matching, and temporal reranking into a unified retrieval pipeline.

Semantic Search

The foundation. Given a natural language query, embed it as a vector and find the nearest chunks in pgvector using cosine similarity.

query: "How is CompetitorX positioning against us?"
→ embed query → find nearest vectors → return top-K results with scores

Semantic search handles paraphrasing well. "Positioning against us" matches "competitive differentiation" and "how they compare to our offering" because the embeddings capture meaning, not just keywords.

But it has blind spots. Exact names, dates, and metric values are poorly captured by general-purpose embeddings. "Acme Corp Q4 2024" might match chunks about "Beta Inc Q3 2023" because the semantic meaning (competitor + time period) is similar.

Keyword Search

Covers semantic search's blind spots. PostgreSQL full-text search with tsvector/tsquery finds exact term matches. For marketing queries, this catches:

Competitor names: "Acme Corp" must match exactly

Campaign names: "Summer Launch 2024" is a specific entity

Metric values: "ROI above 300%" requires exact matching

Brand terms: "product-led growth" as a specific concept

Keyword search alone is too brittle — it misses paraphrases and synonyms. But layered on top of semantic search, it provides critical precision.

Hybrid Fusion with Reciprocal Rank Fusion

Combining two ranked lists into one requires a fusion algorithm. Reciprocal Rank Fusion (RRF) is the standard approach because it's simple, effective, and doesn't require score normalization.

The formula for each result:

RRF_score = sum(1 / (k + rank_i)) for each list where the result appears

Where k is typically 60 (a smoothing constant). A result ranked #1 in semantic search and #3 in keyword search gets:

score = 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323

A result appearing in only one list gets a lower combined score. This naturally promotes results that both search methods agree are relevant.

Weighting by Query Type

Not all queries should weight semantic and keyword equally:

Open questions ("what trends are emerging?") → weight semantic higher (70/30)

Specific lookups ("CompetitorX pricing changes") → weight keyword higher (40/60)

Trend queries ("how has engagement changed?") → equal weight (50/50) with temporal boost

The query intent classifier (built in Module 4) determines these weights dynamically.

Temporal Reranking

This is the key differentiator for marketing intelligence. After hybrid fusion produces a ranked list, temporal reranking adjusts scores based on:

Recency Boost

Newer data scores higher for most marketing queries. The boost uses exponential decay with a 90-day half-life:

recency_boost = weight * e^(-0.693 * days_old / 90)

A chunk from yesterday gets nearly the full boost. A chunk from 6 months ago gets almost none.

Source Authority

Different source types carry different weight depending on the query type:

Performance questions ("What campaigns are working?") → campaign reports > social metrics > competitor profiles

Strategic questions ("Where is the market heading?") → industry reports > competitor profiles > campaign reports

Content drafting ("Draft a post about X") → brand guidelines > social metrics > industry reports

Temporal Diversity

For trend queries, you NEED results from multiple time periods. If all 10 results are from October, you can't detect a trend. The temporal diversity penalty reduces scores for results from over-represented time periods:

diversity_penalty = 0.05 * (count_of_results_from_same_period - 1)

This ensures that "how has competitor positioning changed?" returns results from Q1, Q2, Q3, and Q4 — not just the most recent quarter.

Trend-Aware Context Assembly

The final step: assembling retrieved chunks into a context window for the LLM. For marketing intelligence, this means:

Chronological ordering — arrange chunks by observation date so the LLM can see the progression

Change markers — insert annotations between time periods: "[CHANGE: Q2 → Q3]" to highlight transitions

Source attribution — every chunk gets a citation: "Source: CompetitorX Profile · Q4 2024 · competitor"

Context budget — limit total characters to fit within the LLM's effective context window (6000-8000 chars typically)

The assembled context reads like a briefing document:

[Source: CompetitorX · Q2 2024 · competitor]
Positioned as "the affordable alternative" with emphasis on SMB pricing...

--- [CHANGE: Q2 → Q4] ---

[Source: CompetitorX · Q4 2024 · competitor]
Shifted to "enterprise-grade at every scale" with new enterprise tier launch...

This chronological, annotated context enables the LLM to synthesize trend narratives that raw search results cannot.

Query Intent Classification

Before retrieval even begins, classify the query intent to optimize the search strategy:

Intent	Retrieval Strategy
`competitive`	Keyword-heavy (exact competitor names), temporal diversity ON
`trend`	Balanced semantic/keyword, temporal diversity ON, recency moderate
`performance`	Campaign-source priority, recency HIGH, temporal diversity OFF
`content`	Brand guidelines priority, recency LOW, semantic-heavy

Intent classification is a simple heuristic (keyword detection + patterns) in Module 3. Module 4 upgrades it to an LLM-powered classifier as part of the AI Gateway.

What You'll Build

Implement semantic search against pgvector with cosine similarity

Add keyword search with PostgreSQL full-text search

Combine them with Reciprocal Rank Fusion

Build temporal reranking with recency, authority, and diversity scoring

Create trend-aware context assembly with chronological ordering and change markers

Glossary

Term	Meaning
Hybrid search	Combining semantic and keyword search for better recall and precision
RRF	Reciprocal Rank Fusion — a method for merging ranked lists without score normalization
Temporal reranking	Adjusting search scores based on data recency, authority, and time diversity
Temporal diversity	Ensuring search results span multiple time periods for trend detection
Context assembly	Building a structured, cited text block from search results for LLM consumption
Query intent	Classifying what type of answer a query needs (competitive, trend, performance, content)

This is chapter 3 of AI Marketing Intelligence.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Encoding Pipeline

Ch. 4: AI Gateway