Back to guides
1
3 min

What is RAG?

Retrieval-Augmented Generation

Why LLMs Hallucinate

Large language models generate text by predicting the most likely next token. They don't "know" facts — they've memorized patterns from training data. When asked about your company's PTO policy or last quarter's revenue, they'll produce a confident-sounding answer that's completely made up. This is hallucination, and it's the #1 reason enterprises can't just drop an LLM into production.

The core problem: the model's knowledge is frozen at training time. It doesn't know what happened yesterday, can't read your internal docs, and has no way to say "I don't have that information."

The Retrieval + Generation Pattern

RAG solves this with a simple two-step architecture:

┌─────────────────────────────────────────────────────┐
│                    User Question                     │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
              ┌────────────────┐
              │  1. RETRIEVE   │  Search your documents
              │   relevant     │  for matching content
              │   documents    │
              └───────┬────────┘
                      │
                      ▼
              ┌────────────────┐
              │  2. GENERATE   │  Feed retrieved docs
              │   an answer    │  to the LLM as context
              │   with context │
              └───────┬────────┘
                      │
                      ▼
         ┌────────────────────────┐
         │  Grounded Answer with  │
         │  Source Citations       │
         └────────────────────────┘

Instead of asking the LLM to recall facts from memory, you retrieve the relevant documents first, then generate an answer grounded in those documents. The LLM becomes a reasoning engine over your data, not a memorization engine.

When RAG Beats Fine-tuning

ScenarioRAGFine-tuning
Data changes frequentlyBest choice — just update documentsMust retrain the model
Need source citationsBuilt-in — you know which docs were usedModel can't trace its reasoning
Small dataset (< 1000 docs)Works greatNot enough data to fine-tune well
Domain-specific language/toneHandles factual groundingBetter for style and format
CostEmbedding once + retrieval per queryTraining costs + inference costs

Rule of thumb: Use RAG when the model needs to *know* things. Use fine-tuning when the model needs to *behave* a certain way. Many production systems use both.

Real-World Examples

  • Internal Q&A bot — Employees ask questions about company policies, and the bot searches the handbook to answer with citations
  • Customer support — Agent searches knowledge base articles to draft responses grounded in actual documentation
  • Legal research — Lawyers query case law databases and get summaries with specific statute references
  • Product documentation — Users ask "how do I configure X?" and get answers pulled from the latest docs
  • What You'll Build

    In this course, you'll build a complete RAG pipeline over company documents:

  • Upload & Chunk — Split documents into searchable pieces
  • Embed & Store — Convert text to vectors in pgvector
  • Search & Retrieve — Find relevant chunks for any question
  • Generate Answers — Use an LLM to synthesize grounded responses
  • Add Citations — Show users exactly where answers came from
  • By Module 6, you'll have a working Q&A system that answers questions about your company handbook with cited sources.

    This is chapter 1 of RAG in 60 Minutes.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details