Back to guides
1
8-9 min

Crop Yield & Quality Prediction

Satellite NDVI, Soil Sensor Fusion, and Agro-Climatic Regression for Indian Agriculture

The Prediction Stack: From Satellite to Harvest Estimate

Crop yield prediction is a multi-scale, multi-source problem. A single-sensor approach — relying only on NDVI or only on weather data — consistently underperforms ensemble methods that fuse satellite imagery, in-field sensors, and historical mandi records. The challenge in India is scale: 140 million farm holdings, average size 1.1 hectares, distributed across 15 distinct agro-climatic zones with wildly different soil profiles, rainfall regimes, and crop mixes.

The architecture that works at ICAR-NAARM and state agriculture departments follows a three-tier design:

TierData SourcesTemporal ResolutionSpatial Resolution
SatelliteSentinel-2, Landsat-8, RESOURCESAT-2 (LISS-IV)5-10 days revisit10-56m/pixel
Ground sensorsSoil NPK probes, moisture sensors, weather stations15-min intervalsPoint + interpolation
Agro-historicaleNAM prices, mandi arrivals, Agmarknet, district crop calendarsWeekly/seasonalTehsil level

Open data/ndvi-field-data.csv — it contains NDVI time-series for 200+ plots across Punjab wheat, Maharashtra soybean, and Andhra Pradesh rice. Each row has plot_id, date, NDVI, EVI, LAI (Leaf Area Index), rainfall_7d, and GDD (Growing Degree Days). The seasonal signature — steep NDVI rise at tillering, plateau at grain fill, rapid decline at maturation — is your primary yield signal.

NDVI Time-Series Feature Engineering

Raw NDVI values are not directly predictive. What matters is the shape of the seasonal curve relative to baseline:

# Core features extracted per plot per season
features = {
    "peak_ndvi": max(ndvi_series),                     # Proxy for max biomass
    "peak_timing_das": day_of_peak - sowing_date,       # Late peak → stress or late rains
    "integral_ndvi": trapz(ndvi_series, time_axis),     # Season-long photosynthesis
    "greenup_rate": (ndvi_peak - ndvi_base) / days_to_peak,  # Crop establishment vigor
    "senescence_rate": (ndvi_peak - ndvi_harvest) / days_to_harvest,  # Drydown rate
    "ndvi_cv": std(ndvi_series) / mean(ndvi_series),    # Within-field variability
    "stress_days": count(ndvi < ndvi_75th_percentile),  # Stress event frequency
}

The integral NDVI (sometimes called NDVI-based LAI integral or green-area duration) is consistently the highest-importance feature in gradient boosting models trained on ICAR district datasets. For wheat in Punjab, RMSE on district-level yield prediction drops from 8.2% to 3.1% when integral NDVI replaces peak NDVI alone.

Soil Sensor Fusion: NPK + Moisture + pH

India's soil health card scheme (115 million cards issued under PM Soil Health Card) has created a nation-scale soil database that historically sat in PDFs. Modern AI pipelines convert it into a geospatial raster:

Soil ParameterSensor TypePrediction Target
N (Nitrogen)Ion-selective electrode, NIRNDVI boost probability
P (Phosphorus)Colorimetric probeRoot biomass proxy
K (Potassium)Flame photometry / IoT electrodeStress tolerance
Moisture (volumetric)Capacitance probe, TDRIrrigation trigger, yield drag
pHpH electrodeNutrient availability multiplier
EC (Electrical Conductivity)EC probeSalinity stress flag

Open data/soil-sensor-fusion.json — it contains point-level soil readings from 500 farms enrolled in DeHaat's precision agriculture program, linked to their Sentinel-2 plot polygons. The critical relationship to model: nitrogen depletion typically shows as NDVI decline 12-18 days after application deficit, which matches the leaf nitrogen translocation timeline.

Fusion architecture uses spatial Kriging to interpolate point readings to field-level rasters, then concatenates with satellite bands as additional channels in a CNN or as tabular features in XGBoost:

# Kriging interpolation for NPK raster
from pykrige.ok import OrdinaryKriging

ok_N = OrdinaryKriging(
    x=sensor_lons, y=sensor_lats, z=N_readings,
    variogram_model='spherical',
    nlags=12
)
N_raster, N_variance = ok_N.execute('grid', grid_lon, grid_lat)
# Output: 10m resolution N availability map aligned with Sentinel-2 grid

Weather-Crop Regression: GDD and Stress Indices

Growing Degree Day (GDD) accumulation governs crop phenology more reliably than calendar date:

GDD_daily = max(0, (T_max + T_min)/2 - T_base)

# Crop-specific base temperatures for India
T_base = {"wheat": 0°C, "rice": 10°C, "cotton": 15°C, "sugarcane": 10°C}

# Cumulative GDD triggers
gdd_stages = {
    "wheat": {"emergence": 120, "tillering": 350, "heading": 900, "maturity": 1500}
}

Beyond GDD, the water stress index (WSI) integrates irrigation data with rainfall to predict yield drag:

WSI = ET_actual / ET_potential   # Values < 0.75 trigger significant yield loss

For sugarcane in Maharashtra's Marathwada region — where Minimum Support Price (MSP) incentives push farmers toward water-intensive cultivation — WSI-based models show 22% yield variance explained by groundwater depth alone. This creates a policy signal: MSP for sugarcane distorts crop choice in agro-climatic zones unsuited for it (rainfall < 750mm/year), exacerbating groundwater depletion.

Indian Agro-Climatic Zones and Model Transfer

India's 15 agro-climatic zones (Planning Commission classification) have distinct soil-weather-crop interactions. A yield model trained on Indo-Gangetic Plains wheat cannot be transferred to Deccan Plateau sorghum without domain adaptation:

ZoneKey CropsRainfall RegimeSoil TypeMajor Stress
Western HimalayanMaize, wheat, potatoMonsoon + snow meltAlfisols, EntisolsFrost, landslide
Eastern HimalayanTea, cardamom, riceHigh rainfall (2000-4000mm)OxisolsLandslide, flooding
Lower Gangetic PlainsJute, riceHigh humidity, monsoonInceptisolsFlooding, submergence
Middle Gangetic PlainsRice-wheat rotationBimodal rainfallInceptisolsWaterlogging, salinity
Upper Gangetic Plains (Punjab/Haryana)Wheat, rice, sugarcaneIrrigatedEntisols, AridisolsGroundwater depletion
Trans-Gangetic PlainsWheat, cotton, bajraLow rainfall, irrigation dependentAridisolsWater stress
Eastern Plateau (Chhattisgarh)Rice, milletsModerate monsoonAlfisols, UltisolsDrought, micro-nutrient deficiency
Central Plateau (MP)Soybean, wheat, cottonVariableVertisolsSoil shrink-swell, drought
Western Plateau (Maharashtra)Soybean, sugarcane, jowarLow and erraticVertisolsDrought, input cost
Southern Plateau (Deccan)Cotton, groundnut, sunflowerLow, bimodalAlfisolsDrought
East Coast PlainsRice, groundnut, horticultureCyclone-proneInceptisols, EntisolsCyclone, salinity
West Coast Plains (Kerala/Goa)Coconut, rubber, spicesVery high rainfallOxisols, UltisolsFlooding, waterlogging
Gujarat PlainsGroundnut, cotton, wheatLow and variableAridisols, EntisolsDrought, salinity
Western Dry (Rajasthan)Bajra, moth beanVery lowAridisolsSevere drought
IslandsCoconut, spicesTropical, humidEntisolsCyclone, salinity

Transfer learning strategy: pre-train on pan-India historical district yield data (40+ years from DES), fine-tune on zone-specific sensor data. ICAR-NAARM's implementation uses a shared encoder for spectral features with zone-specific heads for yield regression.

MSP Impact on Crop Choice Modeling

The Minimum Support Price system creates feedback loops that AI models must capture: when wheat MSP rises relative to bajra MSP, Punjab/Haryana farmers shift area from coarse grains to wheat, impacting groundwater use, stubble burning (and PM 2.5 events in Delhi), and input demand.

Prompt: "Given NDVI time-series for plot_id PB_2234 showing peak NDVI 0.82 at DAS 95,
integral NDVI 42.3 over 150 days, soil NPK [185, 28, 145 kg/ha], rainfall deficit
-40mm vs normal in Jan-Feb, and 2024 wheat MSP at ₹2275/quintal — estimate:
(1) expected yield in quintals/hectare with confidence interval
(2) probability this farmer switches to bajra next season if MSP gap narrows to <₹200
(3) recommended urea top-dress date to recover yield deficit."

Key Takeaways

  • Integral NDVI consistently outperforms peak NDVI for yield prediction — it captures the full season's photosynthetic activity, not just the maximum greenness moment.
  • Soil sensor fusion requires spatial interpolation — raw point readings need Kriging or IDW to produce field-level rasters that align with satellite imagery.
  • MSP-driven crop choice is a systemic model input — ignoring policy signals produces yield forecasts that miss the area-allocation shifts that dominate district-level production variance.
  • Zone-specific transfer learning is essential — a single national model underperforms zone-tuned models by 30-40% on RMSE, primarily because soil-rainfall interactions differ dramatically across India's 15 agro-climatic zones.
  • This is chapter 1 of AI for Food Processing & Agri.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details