Production Pipeline
Monitoring, Retraining, and Scale
The Production Gap
A demo processes 5 documents and works perfectly. Production processes 50,000 per month and breaks in ways you never anticipated. New vendors, changed formats, scanned documents with coffee stains, multi-language invoices, and documents that are technically two stapled-together documents.
Production engineering bridges this gap with four systems: accuracy monitoring, human review, retraining triggers, and throughput optimization.
Accuracy Monitoring
The Metrics That Matter
Three metrics define extraction quality:
Per-Field Tracking
Overall F1 can hide problems. If vendor name extraction is at 0.99 but tax extraction is at 0.70, the average might look acceptable (0.85) while tax processing is severely broken. Track F1 per field to catch localized degradation.
Drift Detection
Accuracy doesn't degrade overnight — it drifts. A vendor changes their template. A new document type starts arriving. The scanner gets misaligned. Drift detection compares current F1 to historical baselines and flags fields that have degraded:
| Severity | Trigger | Action |
|---|---|---|
| Low | 1 field below 0.80 F1 | Log and monitor |
| Medium | 2 fields degraded | Schedule retraining |
| High | 3+ fields degraded | Immediate intervention |
Human Review Queue
Routing Logic
Every processed document gets one of three destinations:
Priority Tiers
Not all review items are equal. The queue assigns priority:
The Dual Purpose
The review queue serves two functions:
This dual purpose is what makes the human-in-the-loop pattern economically viable. You're not just paying reviewers to fix errors — you're paying them to generate training data that reduces future errors.
Retraining Triggers
When to Retrain
The retrain trigger evaluates four signals:
If any signal fires, the system recommends retraining. Urgency depends on severity:
What Retraining Means
For template-based extraction: update regex patterns to handle new formats. Review the corrections from the human review queue, identify pattern failures, and add new regex variants.
For LLM-based extraction: fine-tune on the new ground truth data. The corrections from reviewers become the training examples.
The Safety Guard
Never retrain on too few samples. If you processed 3 documents over a holiday weekend and 1 had an error, that's a 33% error rate — but it's meaningless. The system requires a minimum sample size (50+ documents) before triggering retraining decisions.
Throughput Optimization
The Scale Problem
Each document takes ~200ms to process through the full pipeline. Sequentially processing 50,000 documents takes 10,000 seconds (2.8 hours). With 10x concurrency, it's 17 minutes. Throughput optimization makes the difference between a pipeline that blocks your workflow and one that finishes before your coffee.
Three Optimization Levers
Metrics to Watch
The Data Flywheel
The complete production loop:
Documents arrive
→ Throughput optimizer batches and processes
→ Pipeline: ingest → classify → extract → validate
→ High confidence: auto-accept → output
→ Low confidence: review queue → human correction → output
→ Accuracy monitor tracks per-field F1
→ Drift detected → retrain trigger fires
→ Template update or model fine-tune
→ Improved extraction → fewer documents need review
→ Loop continuesThis is the data flywheel. More documents → more corrections → better templates → fewer errors → less review needed → lower cost per document. The pipeline gets better the more you use it.
The Economic Argument
Document processing pricing: manual processing costs $2-5 per document. Automated processing with human review costs $0.10-0.50 per document (amortized over infrastructure and reviewer time). At 50,000 documents per month, that's the difference between $100K-250K/year (manual) and $5K-25K/year (automated with review). The ROI of the pipeline — and of this course — is measured in hundreds of thousands of dollars annually for a mid-size organization.
This is chapter 6 of AI Document Processing.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details