Back to guides
5
8-9 min

Grading & Sorting

Computer Vision for Produce Grading, NIR Spectroscopy, and Automated Sorting Line Calibration

Why Manual Grading Is a Bottleneck

India grades and sorts approximately 8% of its processed fruits and vegetables by machine, versus 70%+ in developed markets. The gap is not a technology availability problem — commercial sorting machines from companies like Tomra, Key Technology, and CFTRI-licensed Indian manufacturers are accessible. The gap is in the trained vision models and NIR calibration curves adapted to Indian varieties, growing conditions, and the wide quality variation that characterizes Indian produce markets.

A Kesar mango from Girnar and a Kesar from Junagadh are the same variety but can differ significantly in Brix (sugar content), skin color at maturity, and fiber content — parameters relevant to export grading. A vision model trained on Israeli or Brazilian mango varieties misclassifies Indian mangoes at 15-25% error rate. Indian variety-specific training datasets are the foundational investment.

Open data/mango-grading-images.json — it contains metadata and feature vectors from 12,000 Alphonso and Kesar mango images collected at packing houses in Ratnagiri and Junagadh, labeled by APEDA-certified graders: weight class (A/B/C), color grade (full yellow, 50% color, 25% color), external defect flags (stem end rot, skin bruising, latex staining, lenticel browning), and Brix (refractometer measurement on 20% sample).

Computer Vision Architecture for Produce Grading

A production mango grading line vision system has four components working in sequence:

ComponentTechnologySpeed RequirementOutput
Acquisition4K line scan camera, telecentric lens, structured LED illumination5-12 mangoes/secRaw image frames
Pre-processingBackground subtraction, perspective correction, color calibration to D65Real-time GPUNormalized fruit image
Defect detectionCNN (ResNet-50 or EfficientNet-B3) → per-pixel segmentation<80ms/fruitDefect mask + type classification
Grade assignmentRule engine consuming vision outputs + load cell weight<20ms/fruitDiverter trigger signal (Grade A/B/C/Reject)

The CNN architecture choice depends on deployment hardware. On a Jetson AGX Orin (common in Indian packing houses due to price point), EfficientNet-B3 at INT8 quantization achieves 45ms inference time vs. ResNet-50 at 78ms, with comparable accuracy.

# Transfer learning setup for mango defect detection
import torchvision.models as models
import torch.nn as nn

backbone = models.efficientnet_b3(pretrained=True)
# Replace classifier head for multi-label defect detection
n_defect_classes = 8  # stem_end_rot, bruise, lenticel_browning, latex_stain,
                       # anthracnose, insect_damage, dry_skin, color_uneven
backbone.classifier = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(backbone.classifier[1].in_features, n_defect_classes),
    nn.Sigmoid()  # Multi-label: each defect independently present or absent
)

# Training data augmentation for Indian packing house conditions
transforms = [
    RandomRotation(360),           # Fruit orientation is random on conveyor
    ColorJitter(brightness=0.3),   # LED illumination variation between packing houses
    RandomHorizontalFlip(),
    GaussianBlur(kernel_size=3),   # Camera vibration from conveyor
    Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD)
]

Rice and Wheat Grain Quality: Broken, Discoloured, Foreign Matter

Grain quality assessment is the highest-volume grading application in India — the Food Corporation of India (FCI) grades 60+ million tonnes of rice and wheat annually. Traditional visual inspection by human graders is slow (2-5 kg/min), subjective, and vulnerable to fatigue. AI vision systems operate at 100-500 kg/hour with <1% misclassification on trained categories.

Key quality attributes for rice:

AttributeVisual SignalDetection Challenge
Head rice (whole)L/W ratio > 2.5 for long-grainLow contrast against background
Broken (large)L/W ratio 1.5-2.5Gradation boundary ambiguity
Broken (small/brokens)L/W ratio < 1.5Distinguish from husks
Chalky grainWhite opaque spot in otherwise translucent grainRequires transmitted light
Red-striped grainPink/red coloration from pericarpColor variation across varieties
Immature grainGreen coloration, low densityColor camera + density separation
Foreign matterNon-rice: husk, stone, mud ballMulti-class: train separately per foreign type
Paddy (unhusked)Golden brown hullReliable; consistent texture

CFTRI has developed an open-access rice quality assessment system (RQAS) with labeled datasets for Basmati, Sona Masuri, and Ponni varieties — the best starting point for training before collecting proprietary data.

Prompt: "I have a batch of Basmati rice images captured on a white background conveyor under
diffuse LED lighting [data/rice-grain-images.json — 8,000 labeled grains]. The current model
accuracy is 94% on head rice identification but only 78% on chalky grain detection (our target
is >92% for all classes). Analyze the confusion matrix [attached], identify the failure modes
for chalky grain, and recommend: (1) a data augmentation strategy to address the specific
failure cases, (2) whether a second model with transmitted light images would close the gap,
(3) a threshold calibration approach that minimizes false reject rate (economic cost: ₹18/kg
misclassified as lower grade) while keeping chalky grade-up rate <0.5%."

NIR Spectroscopy: Composition Analysis Without Sampling

Near-Infrared (NIR) spectroscopy measures the absorption of NIR light (780-2500nm) by food samples — different chemical bonds (O-H for water, N-H for protein, C-H for fat/starch) absorb at characteristic wavelengths. The result: a full composition analysis in 30 seconds without sample destruction, at ₹1-3/measurement vs. ₹500-2000 for wet chemistry lab analysis.

NIR calibration model development workflow:

1. Collect reference samples (n=200+ for robust calibration)
   → Cover full range of expected composition variation (variety, origin, season)
   → Each sample: NIR spectrum + wet chemistry reference analysis

2. Pre-process spectra
   → Standard Normal Variate (SNV) correction for particle size variation
   → Savitzky-Golay smoothing (2nd derivative) to resolve overlapping peaks
   → Multiplicative Scatter Correction (MSC) for scattering baseline drift

3. Model fitting
   → Partial Least Squares Regression (PLSR): n_components = 6-12 for food matrices
   → Cross-validate: leave-one-sample-out or k-fold (k=10)
   → Report: RMSECV (cross-validation error), R² ≥ 0.95 for quantitative analysis

4. Validation on independent hold-out set (n=50+)
   → RMSEP (prediction error on unseen samples) should be ≤ RMSECV × 1.15
   → Bias check: systematic over/under-prediction indicates subset not in calibration

Open data/nir-spectra-calibration.csv — it contains NIR spectra (1100-2500nm, 2nm resolution) and reference lab values for 350 wheat flour samples from 8 flour mills: moisture%, protein%, ash%, wet gluten%, water absorption (Farinograph). The task: build a robust PLSR model that can replace the Farinograph measurement with a 30-second NIR scan.

NIR applications by commodity in India:

CommodityParameters MeasuredAccuracy (RMSEP)Economic Value
Wheat flourMoisture, protein, ash, glutenProtein: ±0.2%, Moisture: ±0.1%Grade/price decision on every lot
RiceMoisture, amylose, chalky%Moisture: ±0.15%Milling yield optimization
Edible oilFFA, moisture, adulterationFFA: ±0.05%Quality gating at receipt
Milk powderFat, protein, lactose, moistureFat: ±0.15%Formula compliance for export
SpicesMoisture, essential oil, adulterationMoisture: ±0.2%FSSAI and Spices Board specs
SugarMoisture, ash, color (ICUMSA)Color: ±12 IUPremium grade eligibility

Automated Sorting Line Calibration and OEE

A sorting line that runs at 90% of rated speed due to miscalibration loses ₹500-2000/hour in processing capacity (product value × throughput gap). Calibration drift occurs from:

  • LED illumination aging (luminosity drops 15-20% over 5,000 hours)
  • Camera lens contamination (dust, sugar/salt deposits)
  • Background belt wear (color change affects foreground segmentation)
  • Seasonal variety changes (color, size distribution shifts requiring model recalibration)
  • AI-assisted calibration uses reference standard objects (certified color tiles, NIST-traceable size standards) run through the line at shift start. The vision system measures deviation from expected values and adjusts:

  • Exposure/gain correction — compensate for LED aging
  • White balance recalibration — correct for illumination spectrum drift
  • Model threshold adjustment — slide grade boundary to match current fruit distribution
  • Prompt: "Our mango sorting line vision system has been running for 8 months without recalibration.
    Calibration tile measurements today [data/calibration-tile-readings.json] show: D65 white tile
    luminance down 18% from baseline, red tile chroma shift ΔE=4.2 (CIEDE2000), green tile ΔE=2.8.
    Current operational data shows grade A assignment rate dropped from 42% to 31% over the last
    6 weeks (we believe actual incoming quality is stable). Calculate: (1) the correction matrix
    needed to restore colorimetric accuracy, (2) whether the grade A rate drop is fully explained
    by illumination drift or indicates a model recalibration is needed, (3) a recalibration schedule
    recommendation based on this degradation rate."

    Tomato and Grape Grading for Export

    Tomato (Namdhari/Kolar): EU export grade requires color uniformity (CIELAB a* value), size (diameter 47-102mm), and absence of blossom end rot, catfacing, and cracking. AI grades 20,000+ tomatoes/hour with <2% misclassification vs. 800/hour by trained human grader at >5% error rate.

    Grapes (Nashik Seedless for EU export): APEDA export protocol requires berry diameter, cluster weight, total soluble solids (Brix ≥16), and defect assessment. NIR on-line Brix measurement eliminates destructive sampling at 100% throughput.

    Key Takeaways

  • Indian variety-specific training data is the foundational investment — transferring vision models trained on non-Indian produce varieties produces 15-25% higher error rates on Indian varieties. CFTRI and ICAR datasets are the starting points; packing house data collection is the moat.
  • Multi-label defect classification outperforms single-class approaches — defects co-occur (a bruised mango may also have lenticel browning), and treating each defect independently with sigmoid outputs allows each defect's threshold to be calibrated separately for economic optimization.
  • NIR PLSR calibration requires 200+ reference samples per commodity per origin region — a calibration built on Nashik wheat performs poorly on Punjab wheat due to variety and climate-driven composition differences. Calibration scope must match deployment scope.
  • Sorting line calibration drift is systematic and predictable — LED aging follows known degradation curves. Proactive calibration scheduling based on cumulative operating hours, plus daily tile checks, prevents the grade boundary drift that erodes product value.
  • This is chapter 5 of AI for Food Processing & Agri.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details