3 min

What is RAG?

Retrieval-Augmented Generation

Why LLMs Hallucinate

Large language models generate text by predicting the most likely next token. They don't "know" facts — they've memorized patterns from training data. When asked about your company's PTO policy or last quarter's revenue, they'll produce a confident-sounding answer that's completely made up. This is hallucination, and it's the #1 reason enterprises can't just drop an LLM into production.

The core problem: the model's knowledge is frozen at training time. It doesn't know what happened yesterday, can't read your internal docs, and has no way to say "I don't have that information."

The Retrieval + Generation Pattern

RAG solves this with a simple two-step architecture:

┌─────────────────────────────────────────────────────┐
│                    User Question                     │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
              ┌────────────────┐
              │  1. RETRIEVE   │  Search your documents
              │   relevant     │  for matching content
              │   documents    │
              └───────┬────────┘
                      │
                      ▼
              ┌────────────────┐
              │  2. GENERATE   │  Feed retrieved docs
              │   an answer    │  to the LLM as context
              │   with context │
              └───────┬────────┘
                      │
                      ▼
         ┌────────────────────────┐
         │  Grounded Answer with  │
         │  Source Citations       │
         └────────────────────────┘

Instead of asking the LLM to recall facts from memory, you retrieve the relevant documents first, then generate an answer grounded in those documents. The LLM becomes a reasoning engine over your data, not a memorization engine.

When RAG Beats Fine-tuning

Scenario	RAG	Fine-tuning
Data changes frequently	Best choice — just update documents	Must retrain the model
Need source citations	Built-in — you know which docs were used	Model can't trace its reasoning
Small dataset (< 1000 docs)	Works great	Not enough data to fine-tune well
Domain-specific language/tone	Handles factual grounding	Better for style and format
Cost	Embedding once + retrieval per query	Training costs + inference costs

Rule of thumb: Use RAG when the model needs to *know* things. Use fine-tuning when the model needs to *behave* a certain way. Many production systems use both.

Real-World Examples

Internal Q&A bot — Employees ask questions about company policies, and the bot searches the handbook to answer with citations

Customer support — Agent searches knowledge base articles to draft responses grounded in actual documentation

Legal research — Lawyers query case law databases and get summaries with specific statute references

Product documentation — Users ask "how do I configure X?" and get answers pulled from the latest docs

What You'll Build

In this course, you'll build a complete RAG pipeline over company documents:

Upload & Chunk — Split documents into searchable pieces

Embed & Store — Convert text to vectors in pgvector

Search & Retrieve — Find relevant chunks for any question

Generate Answers — Use an LLM to synthesize grounded responses

Add Citations — Show users exactly where answers came from

By Module 6, you'll have a working Q&A system that answers questions about your company handbook with cited sources.

This is chapter 1 of RAG in 60 Minutes.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Upload & Chunk