Response Engine
Generating the Right Response
Why Response Quality Matters
Response quality IS the product. The customer doesn't see your intent classifier or your vector database. They see the response. A technically correct but cold response loses customers. A confident but wrong response destroys trust.
The response engine balances three concerns: helpfulness (does this solve the problem?), accuracy (is this information correct?), and safety (should we escalate instead?).
Key Concepts
Template + LLM Hybrid
The most effective approach combines structured templates with LLM generation:
| Component | Role | Example |
|---|---|---|
| Template | Structure, required elements | "Start with empathy, then steps, then escalation offer" |
| LLM | Natural variation, personalization | Different wording for the 500th password reset response |
| KB snippet | Factual content | The actual steps from the KB article |
| Entity injection | Personalization | "Account ACC-4521" from entity extraction |
Templates ensure every response covers the right points. The LLM adds natural variation so responses don't feel robotic. Neither alone is sufficient.
Intent-Based Templates
Each intent has a dedicated template with:
Confidence Scoring
The three-factor confidence model:
| Factor | Weight | High Signal | Low Signal |
|---|---|---|---|
| Intent confidence | 40% | Classifier is 90% sure | Classifier is 40% sure |
| Search quality | 40% | Top result scored 0.8 | Top result scored 0.2 |
| Source diversity | 20% | 3+ source types in top-5 | All from one source type |
The combined score determines the action:
The escalation threshold (default 0.6) is the most important tunable parameter. Setting it too low sends wrong answers. Setting it too high escalates everything.
Escalation Logic
Escalation rules define when the AI should hand off to a human:
| Rule | Trigger | Action |
|---|---|---|
| Low confidence | AI confidence < 0.6 | Route to Tier 2 queue |
| Billing dispute > $100 | Intent + amount | Route to billing specialist |
| VIP customer | Enterprise plan | Route to dedicated queue, 30min SLA |
| Security incident | Security intent detected | Page on-call, 15min SLA |
| Repeat contact | 3+ tickets in 7 days | Route to Tier 2 with history |
| Negative sentiment | Sentiment < -0.7 | Route to retention team |
| SLA breach warning | <30 min remaining | Alert team lead |
| Outage detected | 5+ similar tickets in 30 min | Create engineering incident |
Escalation is a feature, not a failure. The best support AIs know when they don't know. A graceful handoff to a human who has the full context is infinitely better than a confidently wrong auto-response.
Personalization
Customer context shapes the response:
Architecture Pattern
Classification + Retrieval Results
│
├──→ Template Selection (by intent)
│
├──→ Personalization (by customer context)
│
├──→ Template Filling (entities, snippets, conditionals)
│
├──→ Confidence Scoring (intent + search + diversity)
│
├──→ Escalation Check (10 rules)
│
└──→ Final Response (or escalation handoff)What You'll Build
Glossary
| Term | Meaning |
|---|---|
| Template | Structured response format keyed to an intent |
| Confidence scoring | Multi-factor assessment of response reliability |
| Escalation | Handing off to a human when AI confidence is low |
| Personalization | Adjusting tone and content based on customer context |
| Circuit breaker | Confidence threshold that stops auto-responding |
This is chapter 4 of AI Customer Support Agent.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details