Statistical Baselines
Teaching Your System What Normal Looks Like
The Definition of Normal
Before you can detect anomalies, you need a definition of "normal." A baseline is a mathematical model of expected behavior for a metric. Anything that deviates significantly from the baseline is a candidate anomaly.
The simplest baseline: the average value over recent history, plus a band of expected variation. But simple doesn't mean naive — the choice of baseline method, window size, and sensitivity threshold determines whether your system catches real incidents or drowns you in false alerts.
Moving Averages
The moving average is the workhorse of statistical monitoring. Given a window of N recent data points, it computes:
mean = sum(values) / NstdDev = sqrt(sum((v - mean)^2) / N)[mean - k*stdDev, mean + k*stdDev]const baseline = computeMovingAverage("api_latency_search", points, 24, 2.0);
// { mean: 156.3, stdDev: 24.1, upperBound: 204.5, lowerBound: 108.1 }The multiplier k (default 2.0) controls sensitivity. At 2 sigma, about 5% of normal data falls outside the bounds (assuming Gaussian distribution). At 3 sigma, only 0.3%.
Limitation: Moving averages treat all hours equally. A latency of 180ms at 3 AM (when a batch job runs) might be perfectly normal, but the same value at 2 PM (low-traffic period) could indicate trouble.
Z-Scores: The Universal Anomaly Language
A z-score converts any metric value into a standardized "how unusual is this?" measure:
z = (value - mean) / stdDevZ-scores are powerful because they're unit-agnostic. You can compare a z-score of 3.5 on latency (ms) directly with a z-score of 3.5 on error rate (ratio) — both mean "equally unusual."
const { anomalies } = computeZScores("api_latency_search", points, baseline, 2.5);
// Flags all points with |z| > 2.5, assigns severity:
// |z| > 5 → critical, |z| > 4 → high, |z| > 3 → medium, else → lowSeasonal Decomposition
API traffic follows patterns: higher during business hours, lower at night, different on weekends. A flat baseline flags every night as "unusually low" and every afternoon as "unusually high."
Seasonal decomposition separates these expected patterns from real anomalies:
value = trend + seasonal + residualAnomalies live in the residual. By analyzing only the residual, you avoid flagging normal daily patterns while still catching genuine deviations.
const result = computeSeasonalBaseline("api_latency_search", points);
// seasonalPattern: [45, 42, 40, 38, ...] — hourly coefficients
// residual stdDev used for anomaly thresholdsThe seasonal pattern for API latency typically shows a clear diurnal shape: values 20-30% higher during 9 AM to 5 PM UTC, dropping to baseline overnight. Removing this pattern means a spike at 3 AM (when values are normally low) gets flagged correctly, while elevated values at 2 PM don't trigger false alarms.
Threshold Tuning
The threshold is the most consequential parameter in your entire monitoring system. Too low → alert fatigue. Too high → missed incidents. The threshold tuner evaluates multiple options:
const results = tuneThreshold(points, baseline, knownAnomalyTimestamps);
// [
// { threshold: 2.0, anomalyCount: 45, falsePositives: 38, missed: 0, score: 38 },
// { threshold: 2.5, anomalyCount: 18, falsePositives: 12, missed: 0, score: 12 },
// { threshold: 3.0, anomalyCount: 8, falsePositives: 3, missed: 1, score: 6 },
// { threshold: 3.5, anomalyCount: 4, falsePositives: 1, missed: 2, score: 7 },
// ]The scoring function weights missed anomalies 3x more than false positives — because a missed P1 outage costs orders of magnitude more than checking a false alarm. Threshold 3.0 minimizes the score in this example.
When Statistics Aren't Enough
Statistical baselines work well for:
They struggle with:
Module 3 introduces ML-based detection methods that handle these cases without distributional assumptions.
This is chapter 2 of AI Anomaly Detection.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details