Back to guides
6
4 min

Deploy & Connect

Slack + MCP

From Local to Production

A working prototype on localhost isn't a product. Module 6 takes the Sales Companion from "works on my machine" to "available in Slack, accessible from Claude, monitored in production."

Key Concepts

Production Deployment

Deploying an AI application has unique concerns beyond traditional web apps:

Cold starts — AI requests are slow to initialize. Serverless functions (Vercel, AWS Lambda) may add 1-3 seconds of cold start time. Strategies: keep-alive pings, provisioned concurrency, or always-on servers for the AI endpoint.

Memory pressure — Embedding models and vector operations use more memory than typical web handlers. Size your instances accordingly (1GB+ per worker).

Secrets management — API keys for OpenAI, Supabase, and other services must be injected via environment variables, never committed to code.

Health checks — A /health endpoint that verifies:

  • Database connectivity (can we query pgvector?)
  • AI API reachability (can we generate embeddings?)
  • Response time within acceptable bounds
  • Slack Bot Integration

    Sales reps live in Slack. Meeting them where they are is the difference between a tool they use and a tool they forget.

    Slash commands:

    /brief acme-corp        → Pre-call briefing for Acme Corp
    /ask What's our pricing  → Quick question to the companion
      for enterprise?
    /compare acme gong      → Competitive positioning analysis

    Architecture:

    Slack ──→ Webhook Handler ──→ AI Gateway ──→ Response ──→ Slack
                  │
             Auth check
             Rate limit
             Format for Slack (blocks, threading)

    The webhook handler receives Slack events, authenticates the request (verify Slack signing secret), passes the query through the existing AI Gateway, then formats the response using Slack Block Kit for rich display.

    Threading — Long responses go in a thread to avoid cluttering the channel. Briefings post as collapsible sections with expandable detail.

    MCP Server

    Model Context Protocol (MCP) lets other AI tools — Claude, GPT, custom agents — access your Sales Companion as a tool. Instead of building separate integrations for each AI, you build one MCP server and every MCP-compatible client can use it.

    {
      "tools": [
        {
          "name": "get_account_briefing",
          "description": "Get a pre-call briefing for a sales account",
          "parameters": { "account_name": "string" }
        },
        {
          "name": "search_knowledge",
          "description": "Search across all enterprise data sources",
          "parameters": { "query": "string", "filters": "object" }
        }
      ],
      "resources": [
        {
          "name": "account_list",
          "description": "List of all CRM accounts with deal stages"
        }
      ]
    }

    When Claude Desktop or another MCP client connects to your server, it discovers these tools and can call them on behalf of the user. Your Sales Companion becomes a building block that other AI systems can compose with.

    Monitoring Dashboard

    Production AI systems need different monitoring than traditional apps:

    Latency breakdown:

  • Embedding generation: ~200ms
  • Vector search: ~50ms
  • LLM generation: ~1-3s
  • Total end-to-end: ~2-4s
  • Token usage:

  • Input tokens per request (retrieved context + prompt)
  • Output tokens per request (AI response)
  • Daily/weekly trends and per-user breakdown
  • Cost per request in USD
  • Error rates:

  • Retrieval failures (no relevant chunks found)
  • LLM errors (rate limits, timeouts)
  • Guardrail triggers (PII detected, topic boundary hit)
  • Quality signals:

  • User feedback (thumbs up/down on responses)
  • Citation accuracy (do cited sources support the claim?)
  • Response relevance scores
  • Operations Runbook

    A runbook answers: "It's 2 AM and the system is broken — what do I do?"

    Key scenarios:

  • LLM API is down → Fallback to cached responses, notify team
  • Vector DB is slow → Check index health, connection pool
  • Costs spiking → Check per-user budgets, look for runaway queries
  • Bad responses reported → Check retrieval quality, review guardrails
  • What You'll Build

  • Production deployment with health checks
  • Slack slash command integration
  • MCP server with tools and resources
  • Monitoring dashboard with latency, tokens, errors, and cost tracking
  • Glossary

    TermMeaning
    Cold startDelay when a serverless function initializes from scratch
    Health checkEndpoint that verifies all system dependencies are working
    Slack Block KitSlack's framework for rich message formatting
    MCPModel Context Protocol — lets AI tools discover and call your APIs
    ToolAn MCP capability that takes parameters and returns results
    ResourceAn MCP data source that AI clients can read
    RunbookStep-by-step guide for handling production incidents
    P95 latencyResponse time that 95% of requests are faster than

    This is chapter 6 of AI Sales Companion.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details