Back to guides
6
5 min

Deploy & Monitor

Production Operations & Feedback Loops

Why Monitoring Matters

Deploying an AI support agent is not the finish line — it's the starting line. The critical question isn't "does it work?" but "how well does it work, and is it getting better?"

The key metric is auto-resolution rate — what percentage of tickets does the AI resolve without human intervention? Industry average is 20-30%. A well-tuned system with good KB coverage hits 60-80%.

Key Concepts

Multi-Channel Deployment

A support agent that only works in a web UI misses most interactions. Production deployment means:

ChannelIntegrationKey Consideration
Web chatNext.js app (Module 5)SSE streaming, rich formatting
Slack@slack/boltEphemeral messages for sensitive data
EmailInbound parsingAuto-respond for high confidence only
APIMCP serverHeadless access for other tools

The same pipeline (classify → retrieve → respond → escalate) serves all channels. Only the delivery format changes.

Slack Integration

The Slack bot has three modes:

  • Slash command: /support "I can't log in" — quick queries
  • Direct message: Private support conversations
  • @mention: In-channel questions, responses in threads
  • Critical: Use ephemeral messages for sensitive topics. A billing inquiry response with account details should only be visible to the requester, not the entire channel.

    Monitoring Dashboard

    Four categories of metrics:

    Resolution Metrics:

  • Auto-resolution rate (target: 60-80%)
  • Escalation rate (alarm if > 30%)
  • Average response time (target: < 2 seconds)
  • Average resolution time (target: < 5 minutes for auto-resolved)
  • Quality Metrics:

  • CSAT by intent — which intents satisfy customers?
  • CSAT by channel — is Slack performing differently than web?
  • CSAT: AI vs human — the critical comparison
  • Response accuracy — spot-check sampled responses
  • Operational Metrics:

  • Escalation reasons — why is the AI escalating?
  • KB coverage gaps — queries with zero good results
  • Top intents by volume — where to invest in automation
  • Token usage and cost per query
  • Trend Metrics:

  • CSAT trend over time — is it improving?
  • Resolution rate trend — is automation increasing?
  • New intent emergence — are customers asking new things?
  • The Feedback Loop

    The most important operational pattern:

    Deploy → Monitor → Identify Gaps → Fix → Redeploy
       ↑                                          │
       └──────────────────────────────────────────┘

    Weekly cycle:

  • Review the 10 lowest-CSAT interactions
  • Diagnose: classification error? Retrieval miss? Bad template?
  • Fix: add KB article, adjust classifier, improve template
  • Re-test with the original queries
  • Deploy and monitor
  • Each iteration makes the system better. After 4-8 weeks, auto-resolution rate typically jumps from 40% to 65%+.

    KB Coverage Gaps

    The highest-leverage improvement: adding missing KB articles. When the retrieval system returns no good results (top score < 0.3), log the query. After a week, cluster these "no answer" queries by topic. The top clusters are your missing KB articles.

    Example: if 50 queries per week ask about "API pagination" but there's no KB article for it, writing one article resolves all 50 going forward.

    MCP Server

    The Model Context Protocol exposes the support agent to other AI tools:

    MCP ToolPurpose
    `search_kb`Search knowledge base from any MCP client
    `classify_ticket`Run classification from external workflows
    `check_escalation`Evaluate escalation rules programmatically
    `draft_response`Generate response drafts from other tools

    This is the "headless support agent" pattern — the intelligence is decoupled from any specific UI.

    Operations Runbook

    Essential documentation for production:

  • Deployment checklist: environment variables, dependencies, health checks
  • Monitoring checklist: daily metrics to review
  • Incident playbook: what to do when CSAT drops, escalation spikes, or response time increases
  • Improvement cycle: weekly process for analyzing gaps and fixing them
  • What You'll Build

  • Slack bot with slash commands, DMs, and @mentions
  • CSAT tracking with intent-level and channel-level breakdowns
  • Resolution rate dashboard with trend monitoring
  • MCP server for headless access
  • Operations runbook for the production system
  • Glossary

    TermMeaning
    Auto-resolution rate% of tickets resolved by AI without human help
    Escalation rate% of tickets handed to humans
    CSATCustomer Satisfaction score (1-5)
    MCPModel Context Protocol — universal tool interface
    Feedback loopContinuous cycle of monitoring, diagnosing, and improving
    KB coverage gapCustomer questions with no matching KB article
    Ephemeral messageSlack message only visible to the requester

    This is chapter 6 of AI Customer Support Agent.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details