4 min

Deploy & Connect

Slack + MCP

From Local to Production

A working prototype on localhost isn't a product. Module 6 takes the Sales Companion from "works on my machine" to "available in Slack, accessible from Claude, monitored in production."

Key Concepts

Production Deployment

Deploying an AI application has unique concerns beyond traditional web apps:

Cold starts — AI requests are slow to initialize. Serverless functions (Vercel, AWS Lambda) may add 1-3 seconds of cold start time. Strategies: keep-alive pings, provisioned concurrency, or always-on servers for the AI endpoint.

Memory pressure — Embedding models and vector operations use more memory than typical web handlers. Size your instances accordingly (1GB+ per worker).

Secrets management — API keys for OpenAI, Supabase, and other services must be injected via environment variables, never committed to code.

Health checks — A /health endpoint that verifies:

Database connectivity (can we query pgvector?)

AI API reachability (can we generate embeddings?)

Response time within acceptable bounds

Slack Bot Integration

Sales reps live in Slack. Meeting them where they are is the difference between a tool they use and a tool they forget.

Slash commands:

/brief acme-corp        → Pre-call briefing for Acme Corp
/ask What's our pricing  → Quick question to the companion
  for enterprise?
/compare acme gong      → Competitive positioning analysis

Architecture:

Slack ──→ Webhook Handler ──→ AI Gateway ──→ Response ──→ Slack
              │
         Auth check
         Rate limit
         Format for Slack (blocks, threading)

The webhook handler receives Slack events, authenticates the request (verify Slack signing secret), passes the query through the existing AI Gateway, then formats the response using Slack Block Kit for rich display.

Threading — Long responses go in a thread to avoid cluttering the channel. Briefings post as collapsible sections with expandable detail.

MCP Server

Model Context Protocol (MCP) lets other AI tools — Claude, GPT, custom agents — access your Sales Companion as a tool. Instead of building separate integrations for each AI, you build one MCP server and every MCP-compatible client can use it.

{
  "tools": [
    {
      "name": "get_account_briefing",
      "description": "Get a pre-call briefing for a sales account",
      "parameters": { "account_name": "string" }
    },
    {
      "name": "search_knowledge",
      "description": "Search across all enterprise data sources",
      "parameters": { "query": "string", "filters": "object" }
    }
  ],
  "resources": [
    {
      "name": "account_list",
      "description": "List of all CRM accounts with deal stages"
    }
  ]
}

When Claude Desktop or another MCP client connects to your server, it discovers these tools and can call them on behalf of the user. Your Sales Companion becomes a building block that other AI systems can compose with.

Monitoring Dashboard

Production AI systems need different monitoring than traditional apps:

Latency breakdown:

Embedding generation: ~200ms

Vector search: ~50ms

LLM generation: ~1-3s

Total end-to-end: ~2-4s

Token usage:

Input tokens per request (retrieved context + prompt)

Output tokens per request (AI response)

Daily/weekly trends and per-user breakdown

Cost per request in USD

Error rates:

Retrieval failures (no relevant chunks found)

LLM errors (rate limits, timeouts)

Guardrail triggers (PII detected, topic boundary hit)

Quality signals:

User feedback (thumbs up/down on responses)

Citation accuracy (do cited sources support the claim?)

Response relevance scores

Operations Runbook

A runbook answers: "It's 2 AM and the system is broken — what do I do?"

Key scenarios:

LLM API is down → Fallback to cached responses, notify team

Vector DB is slow → Check index health, connection pool

Costs spiking → Check per-user budgets, look for runaway queries

Bad responses reported → Check retrieval quality, review guardrails

What You'll Build

Production deployment with health checks

Slack slash command integration

MCP server with tools and resources

Monitoring dashboard with latency, tokens, errors, and cost tracking

Glossary

Term	Meaning
Cold start	Delay when a serverless function initializes from scratch
Health check	Endpoint that verifies all system dependencies are working
Slack Block Kit	Slack's framework for rich message formatting
MCP	Model Context Protocol — lets AI tools discover and call your APIs
Tool	An MCP capability that takes parameters and returns results
Resource	An MCP data source that AI clients can read
Runbook	Step-by-step guide for handling production incidents
P95 latency	Response time that 95% of requests are faster than

This is chapter 6 of AI Sales Companion.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 5: Companion App