8-9 min

Crop Yield & Quality Prediction

Satellite NDVI, Soil Sensor Fusion, and Agro-Climatic Regression for Indian Agriculture

The Prediction Stack: From Satellite to Harvest Estimate

Crop yield prediction is a multi-scale, multi-source problem. A single-sensor approach — relying only on NDVI or only on weather data — consistently underperforms ensemble methods that fuse satellite imagery, in-field sensors, and historical mandi records. The challenge in India is scale: 140 million farm holdings, average size 1.1 hectares, distributed across 15 distinct agro-climatic zones with wildly different soil profiles, rainfall regimes, and crop mixes.

The architecture that works at ICAR-NAARM and state agriculture departments follows a three-tier design:

Tier	Data Sources	Temporal Resolution	Spatial Resolution
Satellite	Sentinel-2, Landsat-8, RESOURCESAT-2 (LISS-IV)	5-10 days revisit	10-56m/pixel
Ground sensors	Soil NPK probes, moisture sensors, weather stations	15-min intervals	Point + interpolation
Agro-historical	eNAM prices, mandi arrivals, Agmarknet, district crop calendars	Weekly/seasonal	Tehsil level

Open data/ndvi-field-data.csv — it contains NDVI time-series for 200+ plots across Punjab wheat, Maharashtra soybean, and Andhra Pradesh rice. Each row has plot_id, date, NDVI, EVI, LAI (Leaf Area Index), rainfall_7d, and GDD (Growing Degree Days). The seasonal signature — steep NDVI rise at tillering, plateau at grain fill, rapid decline at maturation — is your primary yield signal.

NDVI Time-Series Feature Engineering

Raw NDVI values are not directly predictive. What matters is the shape of the seasonal curve relative to baseline:

# Core features extracted per plot per season
features = {
    "peak_ndvi": max(ndvi_series),                     # Proxy for max biomass
    "peak_timing_das": day_of_peak - sowing_date,       # Late peak → stress or late rains
    "integral_ndvi": trapz(ndvi_series, time_axis),     # Season-long photosynthesis
    "greenup_rate": (ndvi_peak - ndvi_base) / days_to_peak,  # Crop establishment vigor
    "senescence_rate": (ndvi_peak - ndvi_harvest) / days_to_harvest,  # Drydown rate
    "ndvi_cv": std(ndvi_series) / mean(ndvi_series),    # Within-field variability
    "stress_days": count(ndvi < ndvi_75th_percentile),  # Stress event frequency
}

The integral NDVI (sometimes called NDVI-based LAI integral or green-area duration) is consistently the highest-importance feature in gradient boosting models trained on ICAR district datasets. For wheat in Punjab, RMSE on district-level yield prediction drops from 8.2% to 3.1% when integral NDVI replaces peak NDVI alone.

Soil Sensor Fusion: NPK + Moisture + pH

India's soil health card scheme (115 million cards issued under PM Soil Health Card) has created a nation-scale soil database that historically sat in PDFs. Modern AI pipelines convert it into a geospatial raster:

Soil Parameter	Sensor Type	Prediction Target
N (Nitrogen)	Ion-selective electrode, NIR	NDVI boost probability
P (Phosphorus)	Colorimetric probe	Root biomass proxy
K (Potassium)	Flame photometry / IoT electrode	Stress tolerance
Moisture (volumetric)	Capacitance probe, TDR	Irrigation trigger, yield drag
pH	pH electrode	Nutrient availability multiplier
EC (Electrical Conductivity)	EC probe	Salinity stress flag

Open data/soil-sensor-fusion.json — it contains point-level soil readings from 500 farms enrolled in DeHaat's precision agriculture program, linked to their Sentinel-2 plot polygons. The critical relationship to model: nitrogen depletion typically shows as NDVI decline 12-18 days after application deficit, which matches the leaf nitrogen translocation timeline.

Fusion architecture uses spatial Kriging to interpolate point readings to field-level rasters, then concatenates with satellite bands as additional channels in a CNN or as tabular features in XGBoost:

# Kriging interpolation for NPK raster
from pykrige.ok import OrdinaryKriging

ok_N = OrdinaryKriging(
    x=sensor_lons, y=sensor_lats, z=N_readings,
    variogram_model='spherical',
    nlags=12
)
N_raster, N_variance = ok_N.execute('grid', grid_lon, grid_lat)
# Output: 10m resolution N availability map aligned with Sentinel-2 grid

Weather-Crop Regression: GDD and Stress Indices

Growing Degree Day (GDD) accumulation governs crop phenology more reliably than calendar date:

GDD_daily = max(0, (T_max + T_min)/2 - T_base)

# Crop-specific base temperatures for India
T_base = {"wheat": 0°C, "rice": 10°C, "cotton": 15°C, "sugarcane": 10°C}

# Cumulative GDD triggers
gdd_stages = {
    "wheat": {"emergence": 120, "tillering": 350, "heading": 900, "maturity": 1500}
}

Beyond GDD, the water stress index (WSI) integrates irrigation data with rainfall to predict yield drag:

WSI = ET_actual / ET_potential   # Values < 0.75 trigger significant yield loss

For sugarcane in Maharashtra's Marathwada region — where Minimum Support Price (MSP) incentives push farmers toward water-intensive cultivation — WSI-based models show 22% yield variance explained by groundwater depth alone. This creates a policy signal: MSP for sugarcane distorts crop choice in agro-climatic zones unsuited for it (rainfall < 750mm/year), exacerbating groundwater depletion.

Indian Agro-Climatic Zones and Model Transfer

India's 15 agro-climatic zones (Planning Commission classification) have distinct soil-weather-crop interactions. A yield model trained on Indo-Gangetic Plains wheat cannot be transferred to Deccan Plateau sorghum without domain adaptation:

Zone	Key Crops	Rainfall Regime	Soil Type	Major Stress
Western Himalayan	Maize, wheat, potato	Monsoon + snow melt	Alfisols, Entisols	Frost, landslide
Eastern Himalayan	Tea, cardamom, rice	High rainfall (2000-4000mm)	Oxisols	Landslide, flooding
Lower Gangetic Plains	Jute, rice	High humidity, monsoon	Inceptisols	Flooding, submergence
Middle Gangetic Plains	Rice-wheat rotation	Bimodal rainfall	Inceptisols	Waterlogging, salinity
Upper Gangetic Plains (Punjab/Haryana)	Wheat, rice, sugarcane	Irrigated	Entisols, Aridisols	Groundwater depletion
Trans-Gangetic Plains	Wheat, cotton, bajra	Low rainfall, irrigation dependent	Aridisols	Water stress
Eastern Plateau (Chhattisgarh)	Rice, millets	Moderate monsoon	Alfisols, Ultisols	Drought, micro-nutrient deficiency
Central Plateau (MP)	Soybean, wheat, cotton	Variable	Vertisols	Soil shrink-swell, drought
Western Plateau (Maharashtra)	Soybean, sugarcane, jowar	Low and erratic	Vertisols	Drought, input cost
Southern Plateau (Deccan)	Cotton, groundnut, sunflower	Low, bimodal	Alfisols	Drought
East Coast Plains	Rice, groundnut, horticulture	Cyclone-prone	Inceptisols, Entisols	Cyclone, salinity
West Coast Plains (Kerala/Goa)	Coconut, rubber, spices	Very high rainfall	Oxisols, Ultisols	Flooding, waterlogging
Gujarat Plains	Groundnut, cotton, wheat	Low and variable	Aridisols, Entisols	Drought, salinity
Western Dry (Rajasthan)	Bajra, moth bean	Very low	Aridisols	Severe drought
Islands	Coconut, spices	Tropical, humid	Entisols	Cyclone, salinity

Transfer learning strategy: pre-train on pan-India historical district yield data (40+ years from DES), fine-tune on zone-specific sensor data. ICAR-NAARM's implementation uses a shared encoder for spectral features with zone-specific heads for yield regression.

MSP Impact on Crop Choice Modeling

The Minimum Support Price system creates feedback loops that AI models must capture: when wheat MSP rises relative to bajra MSP, Punjab/Haryana farmers shift area from coarse grains to wheat, impacting groundwater use, stubble burning (and PM 2.5 events in Delhi), and input demand.

Prompt: "Given NDVI time-series for plot_id PB_2234 showing peak NDVI 0.82 at DAS 95,
integral NDVI 42.3 over 150 days, soil NPK [185, 28, 145 kg/ha], rainfall deficit
-40mm vs normal in Jan-Feb, and 2024 wheat MSP at ₹2275/quintal — estimate:
(1) expected yield in quintals/hectare with confidence interval
(2) probability this farmer switches to bajra next season if MSP gap narrows to <₹200
(3) recommended urea top-dress date to recover yield deficit."

Key Takeaways

Integral NDVI consistently outperforms peak NDVI for yield prediction — it captures the full season's photosynthetic activity, not just the maximum greenness moment.

Soil sensor fusion requires spatial interpolation — raw point readings need Kriging or IDW to produce field-level rasters that align with satellite imagery.

MSP-driven crop choice is a systemic model input — ignoring policy signals produces yield forecasts that miss the area-allocation shifts that dominate district-level production variance.

Zone-specific transfer learning is essential — a single national model underperforms zone-tuned models by 30-40% on RMSE, primarily because soil-rainfall interactions differ dramatically across India's 15 agro-climatic zones.

This is chapter 1 of AI for Food Processing & Agri.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Food Safety & Quality Control