Retrieval System
Hybrid Search with Financial Awareness
Why Financial Retrieval Is Hard
General-purpose RAG retrieval treats every query the same: embed the question, find similar chunks, return the top K. For financial data, this approach fails in three critical ways.
Problem 1: Exact Financial Terms
A query for "NVDA gross margin Q3 2024" requires exact matching on the ticker (NVDA), the metric (gross margin), and the period (Q3 2024). Semantic search might return AMD's margins or NVDA's Q2 data because the embeddings are similar. You need keyword search and structured filters alongside vector similarity.
Problem 2: Multi-Company Comparisons
"Compare margins across our top 5 competitors" requires retrieving data from 5 different companies, all for the same time period. Pure vector search returns the most semantically similar chunks, which might be 5 chunks from the same company. You need diversity-aware retrieval that ensures each company is represented.
Problem 3: Numerical Reasoning
"Which company has the highest operating margin?" requires extracting numbers from retrieved chunks and comparing them. The retrieval system needs to surface chunks with actual numerical data — not just narrative descriptions of margins.
Hybrid Search Architecture
The retrieval system combines three search strategies:
Semantic Search (Vector Similarity)
Query pgvector using cosine similarity on embeddings. This excels at finding thematically relevant content:
Keyword Search (Full-Text)
PostgreSQL ts_vector/ts_query catches exact terms that semantic search might miss:
Structured Filters (SQL WHERE)
Direct database queries for structured attributes:
ticker = 'NVDA' — specific companyfiscal_period = 'Q3-2024' — specific time periodfiling_type = '10-K' — annual filings onlysource_type = 'transcript' — earnings calls onlyis_table = true — financial tables onlysection_name = 'Risk Factors' — specific filing sectionReciprocal Rank Fusion (RRF)
Combine results from all three strategies using RRF, which is robust to different score scales:
RRF_score(doc) = sum(1 / (k + rank_in_list)) for each list containing docWith k=60 (standard), a document ranked #1 in both semantic and keyword gets a combined score of 1/61 + 1/61 = 0.033, while a document ranked #1 in semantic but absent from keyword gets only 0.016. This naturally boosts documents that match on multiple dimensions.
Financial tuning: Weight keyword search higher (1.2x) for queries containing ticker symbols or specific metric names. Weight semantic higher (1.2x) for thematic queries without explicit tickers.
Numerical-Aware Reranking
After RRF fusion, apply financial-specific reranking:
Recency Boost
Newer data is more relevant for most financial queries. A Q3 2024 filing should rank above a Q1 2024 filing when no specific period is requested. The decay function:
recency_score = max(0, 1 - days_old / 180)This gives full weight to data from the last 6 months, linearly decaying to zero for older data.
Source Authority
Different sources carry different weight depending on the query type:
| Query Type | Highest Authority | Lowest Authority |
|---|---|---|
| Factual (revenue, margins) | SEC Filings | Analyst Notes |
| Outlook (guidance, forecasts) | Earnings Transcripts | Internal Reports |
| Opinion (ratings, targets) | Analyst Notes | SEC Filings |
| Competitive (positioning) | Internal Reports | Market Data |
Numerical Density
For quantitative queries ("compare margins", "revenue growth"), boost chunks with higher numerical density. Count occurrences of patterns like $X.XM, XX.X%, $X,XXX and use them as a ranking signal.
Period Matching
If the query mentions a specific period ("Q3 2024"), boost chunks whose fiscal_period metadata matches exactly. This prevents returning Q2 data when Q3 was explicitly requested.
Diversity Enforcement
For comparison queries, ensure the top results include data from multiple companies. If the top 5 results are all NVDA, demote duplicates and promote other tickers until at least 3 companies are represented.
Context Assembly
The final step before passing data to the LLM: assemble a context window with:
Budget management matters: with a 128K context window, you could include everything. Don't. More context means more noise and higher cost. Target 4,000-6,000 tokens of focused, relevant financial data.
Test Queries
Build your retrieval system to handle these representative queries:
This is chapter 3 of AI Finance Analyst.
Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.
View course details