8-9 min

Grading & Sorting

Computer Vision for Produce Grading, NIR Spectroscopy, and Automated Sorting Line Calibration

Why Manual Grading Is a Bottleneck

India grades and sorts approximately 8% of its processed fruits and vegetables by machine, versus 70%+ in developed markets. The gap is not a technology availability problem — commercial sorting machines from companies like Tomra, Key Technology, and CFTRI-licensed Indian manufacturers are accessible. The gap is in the trained vision models and NIR calibration curves adapted to Indian varieties, growing conditions, and the wide quality variation that characterizes Indian produce markets.

A Kesar mango from Girnar and a Kesar from Junagadh are the same variety but can differ significantly in Brix (sugar content), skin color at maturity, and fiber content — parameters relevant to export grading. A vision model trained on Israeli or Brazilian mango varieties misclassifies Indian mangoes at 15-25% error rate. Indian variety-specific training datasets are the foundational investment.

Open data/mango-grading-images.json — it contains metadata and feature vectors from 12,000 Alphonso and Kesar mango images collected at packing houses in Ratnagiri and Junagadh, labeled by APEDA-certified graders: weight class (A/B/C), color grade (full yellow, 50% color, 25% color), external defect flags (stem end rot, skin bruising, latex staining, lenticel browning), and Brix (refractometer measurement on 20% sample).

Computer Vision Architecture for Produce Grading

A production mango grading line vision system has four components working in sequence:

Component	Technology	Speed Requirement	Output
Acquisition	4K line scan camera, telecentric lens, structured LED illumination	5-12 mangoes/sec	Raw image frames
Pre-processing	Background subtraction, perspective correction, color calibration to D65	Real-time GPU	Normalized fruit image
Defect detection	CNN (ResNet-50 or EfficientNet-B3) → per-pixel segmentation	<80ms/fruit	Defect mask + type classification
Grade assignment	Rule engine consuming vision outputs + load cell weight	<20ms/fruit	Diverter trigger signal (Grade A/B/C/Reject)

The CNN architecture choice depends on deployment hardware. On a Jetson AGX Orin (common in Indian packing houses due to price point), EfficientNet-B3 at INT8 quantization achieves 45ms inference time vs. ResNet-50 at 78ms, with comparable accuracy.

# Transfer learning setup for mango defect detection
import torchvision.models as models
import torch.nn as nn

backbone = models.efficientnet_b3(pretrained=True)
# Replace classifier head for multi-label defect detection
n_defect_classes = 8  # stem_end_rot, bruise, lenticel_browning, latex_stain,
                       # anthracnose, insect_damage, dry_skin, color_uneven
backbone.classifier = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(backbone.classifier[1].in_features, n_defect_classes),
    nn.Sigmoid()  # Multi-label: each defect independently present or absent
)

# Training data augmentation for Indian packing house conditions
transforms = [
    RandomRotation(360),           # Fruit orientation is random on conveyor
    ColorJitter(brightness=0.3),   # LED illumination variation between packing houses
    RandomHorizontalFlip(),
    GaussianBlur(kernel_size=3),   # Camera vibration from conveyor
    Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD)
]

Rice and Wheat Grain Quality: Broken, Discoloured, Foreign Matter

Grain quality assessment is the highest-volume grading application in India — the Food Corporation of India (FCI) grades 60+ million tonnes of rice and wheat annually. Traditional visual inspection by human graders is slow (2-5 kg/min), subjective, and vulnerable to fatigue. AI vision systems operate at 100-500 kg/hour with <1% misclassification on trained categories.

Key quality attributes for rice:

Attribute	Visual Signal	Detection Challenge
Head rice (whole)	L/W ratio > 2.5 for long-grain	Low contrast against background
Broken (large)	L/W ratio 1.5-2.5	Gradation boundary ambiguity
Broken (small/brokens)	L/W ratio < 1.5	Distinguish from husks
Chalky grain	White opaque spot in otherwise translucent grain	Requires transmitted light
Red-striped grain	Pink/red coloration from pericarp	Color variation across varieties
Immature grain	Green coloration, low density	Color camera + density separation
Foreign matter	Non-rice: husk, stone, mud ball	Multi-class: train separately per foreign type
Paddy (unhusked)	Golden brown hull	Reliable; consistent texture

CFTRI has developed an open-access rice quality assessment system (RQAS) with labeled datasets for Basmati, Sona Masuri, and Ponni varieties — the best starting point for training before collecting proprietary data.

Prompt: "I have a batch of Basmati rice images captured on a white background conveyor under
diffuse LED lighting [data/rice-grain-images.json — 8,000 labeled grains]. The current model
accuracy is 94% on head rice identification but only 78% on chalky grain detection (our target
is >92% for all classes). Analyze the confusion matrix [attached], identify the failure modes
for chalky grain, and recommend: (1) a data augmentation strategy to address the specific
failure cases, (2) whether a second model with transmitted light images would close the gap,
(3) a threshold calibration approach that minimizes false reject rate (economic cost: ₹18/kg
misclassified as lower grade) while keeping chalky grade-up rate <0.5%."

NIR Spectroscopy: Composition Analysis Without Sampling

Near-Infrared (NIR) spectroscopy measures the absorption of NIR light (780-2500nm) by food samples — different chemical bonds (O-H for water, N-H for protein, C-H for fat/starch) absorb at characteristic wavelengths. The result: a full composition analysis in 30 seconds without sample destruction, at ₹1-3/measurement vs. ₹500-2000 for wet chemistry lab analysis.

NIR calibration model development workflow:

1. Collect reference samples (n=200+ for robust calibration)
   → Cover full range of expected composition variation (variety, origin, season)
   → Each sample: NIR spectrum + wet chemistry reference analysis

2. Pre-process spectra
   → Standard Normal Variate (SNV) correction for particle size variation
   → Savitzky-Golay smoothing (2nd derivative) to resolve overlapping peaks
   → Multiplicative Scatter Correction (MSC) for scattering baseline drift

3. Model fitting
   → Partial Least Squares Regression (PLSR): n_components = 6-12 for food matrices
   → Cross-validate: leave-one-sample-out or k-fold (k=10)
   → Report: RMSECV (cross-validation error), R² ≥ 0.95 for quantitative analysis

4. Validation on independent hold-out set (n=50+)
   → RMSEP (prediction error on unseen samples) should be ≤ RMSECV × 1.15
   → Bias check: systematic over/under-prediction indicates subset not in calibration

Open data/nir-spectra-calibration.csv — it contains NIR spectra (1100-2500nm, 2nm resolution) and reference lab values for 350 wheat flour samples from 8 flour mills: moisture%, protein%, ash%, wet gluten%, water absorption (Farinograph). The task: build a robust PLSR model that can replace the Farinograph measurement with a 30-second NIR scan.

NIR applications by commodity in India:

Commodity	Parameters Measured	Accuracy (RMSEP)	Economic Value
Wheat flour	Moisture, protein, ash, gluten	Protein: ±0.2%, Moisture: ±0.1%	Grade/price decision on every lot
Rice	Moisture, amylose, chalky%	Moisture: ±0.15%	Milling yield optimization
Edible oil	FFA, moisture, adulteration	FFA: ±0.05%	Quality gating at receipt
Milk powder	Fat, protein, lactose, moisture	Fat: ±0.15%	Formula compliance for export
Spices	Moisture, essential oil, adulteration	Moisture: ±0.2%	FSSAI and Spices Board specs
Sugar	Moisture, ash, color (ICUMSA)	Color: ±12 IU	Premium grade eligibility

Automated Sorting Line Calibration and OEE

A sorting line that runs at 90% of rated speed due to miscalibration loses ₹500-2000/hour in processing capacity (product value × throughput gap). Calibration drift occurs from:

LED illumination aging (luminosity drops 15-20% over 5,000 hours)

Camera lens contamination (dust, sugar/salt deposits)

Background belt wear (color change affects foreground segmentation)

Seasonal variety changes (color, size distribution shifts requiring model recalibration)

AI-assisted calibration uses reference standard objects (certified color tiles, NIST-traceable size standards) run through the line at shift start. The vision system measures deviation from expected values and adjusts:

Exposure/gain correction — compensate for LED aging

White balance recalibration — correct for illumination spectrum drift

Model threshold adjustment — slide grade boundary to match current fruit distribution

Prompt: "Our mango sorting line vision system has been running for 8 months without recalibration.
Calibration tile measurements today [data/calibration-tile-readings.json] show: D65 white tile
luminance down 18% from baseline, red tile chroma shift ΔE=4.2 (CIEDE2000), green tile ΔE=2.8.
Current operational data shows grade A assignment rate dropped from 42% to 31% over the last
6 weeks (we believe actual incoming quality is stable). Calculate: (1) the correction matrix
needed to restore colorimetric accuracy, (2) whether the grade A rate drop is fully explained
by illumination drift or indicates a model recalibration is needed, (3) a recalibration schedule
recommendation based on this degradation rate."

Tomato and Grape Grading for Export

Tomato (Namdhari/Kolar): EU export grade requires color uniformity (CIELAB a* value), size (diameter 47-102mm), and absence of blossom end rot, catfacing, and cracking. AI grades 20,000+ tomatoes/hour with <2% misclassification vs. 800/hour by trained human grader at >5% error rate.

Grapes (Nashik Seedless for EU export): APEDA export protocol requires berry diameter, cluster weight, total soluble solids (Brix ≥16), and defect assessment. NIR on-line Brix measurement eliminates destructive sampling at 100% throughput.

Key Takeaways

Indian variety-specific training data is the foundational investment — transferring vision models trained on non-Indian produce varieties produces 15-25% higher error rates on Indian varieties. CFTRI and ICAR datasets are the starting points; packing house data collection is the moat.

Multi-label defect classification outperforms single-class approaches — defects co-occur (a bruised mango may also have lenticel browning), and treating each defect independently with sigmoid outputs allows each defect's threshold to be calibrated separately for economic optimization.

NIR PLSR calibration requires 200+ reference samples per commodity per origin region — a calibration built on Nashik wheat performs poorly on Punjab wheat due to variety and climate-driven composition differences. Calibration scope must match deployment scope.

Sorting line calibration drift is systematic and predictable — LED aging follows known degradation curves. Proactive calibration scheduling based on cumulative operating hours, plus daily tile checks, prevents the grade boundary drift that erodes product value.

This is chapter 5 of AI for Food Processing & Agri.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 4: Processing & Manufacturing Optimization

Ch. 6: Regulatory Compliance & Sustainability