Back to guides
3
6 min

Retrieval System

Hybrid Search + Temporal Reranking

The Retrieval Challenge for Marketing Intelligence

Marketing intelligence retrieval is harder than general-purpose RAG because of three compounding requirements:

  • Semantic understanding — "companies shifting to product-led growth" should match chunks about "self-serve onboarding" and "freemium conversion"
  • Exact matching — "Acme Corp" must find that specific competitor, not similar-sounding companies
  • Temporal awareness — "this quarter" means entirely different data than "last year," and "how has X changed" requires data from MULTIPLE time periods
  • No single search technique handles all three. You need a hybrid system that combines semantic search, keyword matching, and temporal reranking into a unified retrieval pipeline.

    Semantic Search

    The foundation. Given a natural language query, embed it as a vector and find the nearest chunks in pgvector using cosine similarity.

    query: "How is CompetitorX positioning against us?"
    → embed query → find nearest vectors → return top-K results with scores

    Semantic search handles paraphrasing well. "Positioning against us" matches "competitive differentiation" and "how they compare to our offering" because the embeddings capture meaning, not just keywords.

    But it has blind spots. Exact names, dates, and metric values are poorly captured by general-purpose embeddings. "Acme Corp Q4 2024" might match chunks about "Beta Inc Q3 2023" because the semantic meaning (competitor + time period) is similar.

    Keyword Search

    Covers semantic search's blind spots. PostgreSQL full-text search with tsvector/tsquery finds exact term matches. For marketing queries, this catches:

  • Competitor names: "Acme Corp" must match exactly
  • Campaign names: "Summer Launch 2024" is a specific entity
  • Metric values: "ROI above 300%" requires exact matching
  • Brand terms: "product-led growth" as a specific concept
  • Keyword search alone is too brittle — it misses paraphrases and synonyms. But layered on top of semantic search, it provides critical precision.

    Hybrid Fusion with Reciprocal Rank Fusion

    Combining two ranked lists into one requires a fusion algorithm. Reciprocal Rank Fusion (RRF) is the standard approach because it's simple, effective, and doesn't require score normalization.

    The formula for each result:

    RRF_score = sum(1 / (k + rank_i)) for each list where the result appears

    Where k is typically 60 (a smoothing constant). A result ranked #1 in semantic search and #3 in keyword search gets:

    score = 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323

    A result appearing in only one list gets a lower combined score. This naturally promotes results that both search methods agree are relevant.

    Weighting by Query Type

    Not all queries should weight semantic and keyword equally:

  • Open questions ("what trends are emerging?") → weight semantic higher (70/30)
  • Specific lookups ("CompetitorX pricing changes") → weight keyword higher (40/60)
  • Trend queries ("how has engagement changed?") → equal weight (50/50) with temporal boost
  • The query intent classifier (built in Module 4) determines these weights dynamically.

    Temporal Reranking

    This is the key differentiator for marketing intelligence. After hybrid fusion produces a ranked list, temporal reranking adjusts scores based on:

    Recency Boost

    Newer data scores higher for most marketing queries. The boost uses exponential decay with a 90-day half-life:

    recency_boost = weight * e^(-0.693 * days_old / 90)

    A chunk from yesterday gets nearly the full boost. A chunk from 6 months ago gets almost none.

    Source Authority

    Different source types carry different weight depending on the query type:

  • Performance questions ("What campaigns are working?") → campaign reports > social metrics > competitor profiles
  • Strategic questions ("Where is the market heading?") → industry reports > competitor profiles > campaign reports
  • Content drafting ("Draft a post about X") → brand guidelines > social metrics > industry reports
  • Temporal Diversity

    For trend queries, you NEED results from multiple time periods. If all 10 results are from October, you can't detect a trend. The temporal diversity penalty reduces scores for results from over-represented time periods:

    diversity_penalty = 0.05 * (count_of_results_from_same_period - 1)

    This ensures that "how has competitor positioning changed?" returns results from Q1, Q2, Q3, and Q4 — not just the most recent quarter.

    Trend-Aware Context Assembly

    The final step: assembling retrieved chunks into a context window for the LLM. For marketing intelligence, this means:

  • Chronological ordering — arrange chunks by observation date so the LLM can see the progression
  • Change markers — insert annotations between time periods: "[CHANGE: Q2 → Q3]" to highlight transitions
  • Source attribution — every chunk gets a citation: "Source: CompetitorX Profile · Q4 2024 · competitor"
  • Context budget — limit total characters to fit within the LLM's effective context window (6000-8000 chars typically)
  • The assembled context reads like a briefing document:

    [Source: CompetitorX · Q2 2024 · competitor]
    Positioned as "the affordable alternative" with emphasis on SMB pricing...
    
    --- [CHANGE: Q2 → Q4] ---
    
    [Source: CompetitorX · Q4 2024 · competitor]
    Shifted to "enterprise-grade at every scale" with new enterprise tier launch...

    This chronological, annotated context enables the LLM to synthesize trend narratives that raw search results cannot.

    Query Intent Classification

    Before retrieval even begins, classify the query intent to optimize the search strategy:

    IntentRetrieval Strategy
    `competitive`Keyword-heavy (exact competitor names), temporal diversity ON
    `trend`Balanced semantic/keyword, temporal diversity ON, recency moderate
    `performance`Campaign-source priority, recency HIGH, temporal diversity OFF
    `content`Brand guidelines priority, recency LOW, semantic-heavy

    Intent classification is a simple heuristic (keyword detection + patterns) in Module 3. Module 4 upgrades it to an LLM-powered classifier as part of the AI Gateway.

    What You'll Build

  • Implement semantic search against pgvector with cosine similarity
  • Add keyword search with PostgreSQL full-text search
  • Combine them with Reciprocal Rank Fusion
  • Build temporal reranking with recency, authority, and diversity scoring
  • Create trend-aware context assembly with chronological ordering and change markers
  • Glossary

    TermMeaning
    Hybrid searchCombining semantic and keyword search for better recall and precision
    RRFReciprocal Rank Fusion — a method for merging ranked lists without score normalization
    Temporal rerankingAdjusting search scores based on data recency, authority, and time diversity
    Temporal diversityEnsuring search results span multiple time periods for trend detection
    Context assemblyBuilding a structured, cited text block from search results for LLM consumption
    Query intentClassifying what type of answer a query needs (competitive, trend, performance, content)

    This is chapter 3 of AI Marketing Intelligence.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details