Data Lake
Ingest & Normalize Support Data
Why a Support Data Lake?
An AI support agent is only as good as the data behind it. But support data is notoriously fragmented — tickets live in Zendesk, knowledge base articles in Confluence, product docs in Notion, escalation rules in a spreadsheet, and satisfaction scores in a survey tool.
Before any AI can answer "My account login isn't working after the password reset — can you help?", you need a unified ingestion pipeline that normalizes all of this into a common format the AI can search, classify, and reason over.
Key Concepts
The Document Interface
The core abstraction. Every piece of support data — whether it's a ticket, KB article, product doc, escalation rule, or CSAT survey — becomes a Document with:
ticket_TKT-00001, kb_KB-001)This is the universal contract that every downstream module depends on. The classification system doesn't care whether a document came from Zendesk or Confluence — it just processes Document[].
Support-Specific Metadata
Unlike generic document systems, support data carries metadata critical for triage and routing:
| Metadata Field | Why It Matters |
|---|---|
| `priority` | Routes urgent tickets ahead of low-priority ones |
| `channel` | Email, chat, phone need different response styles |
| `tags` | Enable topic-based routing and analytics |
| `customer_id` | Links to customer context (plan, history, CSAT) |
| `resolution_time` | Tracks SLA compliance |
| `csat_score` | Measures response quality |
Getting metadata right at ingestion time saves enormous complexity downstream. If you don't tag a ticket with its channel during ingestion, you can't personalize the response style later.
Loaders
One loader per data source. Each loader knows how to:
content field useful for search and classificationMulti-Channel Normalization
Tickets arrive via email, chat, phone, and web forms. Each channel has different data shapes:
The loader normalizes all of these into the same Document format. Downstream systems see a unified stream.
Architecture Pattern
JSON ──→ Ticket Loader ──────┐
JSON ──→ KB Loader ──────────┤
JSON ──→ Product Doc Loader ─┤──→ Validate ──→ Document[]
JSON ──→ Escalation Loader ──┤
CSV ───→ CSAT Loader ────────┘Each loader is independent. Adding a new data source (chat transcripts, internal notes, changelog) means writing one new loader — nothing else changes.
What You'll Build
Glossary
| Term | Meaning |
|---|---|
| Document | Normalized unit of data from any support source |
| Loader | Function that reads a specific format and returns Documents |
| Metadata | Structured fields (priority, tags, channel) for filtering and routing |
| Data lake | Unified storage that combines all support data sources |
| SLA | Service Level Agreement — target response/resolution times by priority |
This is chapter 1 of AI Customer Support Agent.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details