Crop Yield & Quality Prediction
Satellite NDVI, Soil Sensor Fusion, and Agro-Climatic Regression for Indian Agriculture
The Prediction Stack: From Satellite to Harvest Estimate
Crop yield prediction is a multi-scale, multi-source problem. A single-sensor approach — relying only on NDVI or only on weather data — consistently underperforms ensemble methods that fuse satellite imagery, in-field sensors, and historical mandi records. The challenge in India is scale: 140 million farm holdings, average size 1.1 hectares, distributed across 15 distinct agro-climatic zones with wildly different soil profiles, rainfall regimes, and crop mixes.
The architecture that works at ICAR-NAARM and state agriculture departments follows a three-tier design:
| Tier | Data Sources | Temporal Resolution | Spatial Resolution |
|---|---|---|---|
| Satellite | Sentinel-2, Landsat-8, RESOURCESAT-2 (LISS-IV) | 5-10 days revisit | 10-56m/pixel |
| Ground sensors | Soil NPK probes, moisture sensors, weather stations | 15-min intervals | Point + interpolation |
| Agro-historical | eNAM prices, mandi arrivals, Agmarknet, district crop calendars | Weekly/seasonal | Tehsil level |
Open data/ndvi-field-data.csv — it contains NDVI time-series for 200+ plots across Punjab wheat, Maharashtra soybean, and Andhra Pradesh rice. Each row has plot_id, date, NDVI, EVI, LAI (Leaf Area Index), rainfall_7d, and GDD (Growing Degree Days). The seasonal signature — steep NDVI rise at tillering, plateau at grain fill, rapid decline at maturation — is your primary yield signal.
NDVI Time-Series Feature Engineering
Raw NDVI values are not directly predictive. What matters is the shape of the seasonal curve relative to baseline:
# Core features extracted per plot per season
features = {
"peak_ndvi": max(ndvi_series), # Proxy for max biomass
"peak_timing_das": day_of_peak - sowing_date, # Late peak → stress or late rains
"integral_ndvi": trapz(ndvi_series, time_axis), # Season-long photosynthesis
"greenup_rate": (ndvi_peak - ndvi_base) / days_to_peak, # Crop establishment vigor
"senescence_rate": (ndvi_peak - ndvi_harvest) / days_to_harvest, # Drydown rate
"ndvi_cv": std(ndvi_series) / mean(ndvi_series), # Within-field variability
"stress_days": count(ndvi < ndvi_75th_percentile), # Stress event frequency
}The integral NDVI (sometimes called NDVI-based LAI integral or green-area duration) is consistently the highest-importance feature in gradient boosting models trained on ICAR district datasets. For wheat in Punjab, RMSE on district-level yield prediction drops from 8.2% to 3.1% when integral NDVI replaces peak NDVI alone.
Soil Sensor Fusion: NPK + Moisture + pH
India's soil health card scheme (115 million cards issued under PM Soil Health Card) has created a nation-scale soil database that historically sat in PDFs. Modern AI pipelines convert it into a geospatial raster:
| Soil Parameter | Sensor Type | Prediction Target |
|---|---|---|
| N (Nitrogen) | Ion-selective electrode, NIR | NDVI boost probability |
| P (Phosphorus) | Colorimetric probe | Root biomass proxy |
| K (Potassium) | Flame photometry / IoT electrode | Stress tolerance |
| Moisture (volumetric) | Capacitance probe, TDR | Irrigation trigger, yield drag |
| pH | pH electrode | Nutrient availability multiplier |
| EC (Electrical Conductivity) | EC probe | Salinity stress flag |
Open data/soil-sensor-fusion.json — it contains point-level soil readings from 500 farms enrolled in DeHaat's precision agriculture program, linked to their Sentinel-2 plot polygons. The critical relationship to model: nitrogen depletion typically shows as NDVI decline 12-18 days after application deficit, which matches the leaf nitrogen translocation timeline.
Fusion architecture uses spatial Kriging to interpolate point readings to field-level rasters, then concatenates with satellite bands as additional channels in a CNN or as tabular features in XGBoost:
# Kriging interpolation for NPK raster
from pykrige.ok import OrdinaryKriging
ok_N = OrdinaryKriging(
x=sensor_lons, y=sensor_lats, z=N_readings,
variogram_model='spherical',
nlags=12
)
N_raster, N_variance = ok_N.execute('grid', grid_lon, grid_lat)
# Output: 10m resolution N availability map aligned with Sentinel-2 gridWeather-Crop Regression: GDD and Stress Indices
Growing Degree Day (GDD) accumulation governs crop phenology more reliably than calendar date:
GDD_daily = max(0, (T_max + T_min)/2 - T_base)
# Crop-specific base temperatures for India
T_base = {"wheat": 0°C, "rice": 10°C, "cotton": 15°C, "sugarcane": 10°C}
# Cumulative GDD triggers
gdd_stages = {
"wheat": {"emergence": 120, "tillering": 350, "heading": 900, "maturity": 1500}
}Beyond GDD, the water stress index (WSI) integrates irrigation data with rainfall to predict yield drag:
WSI = ET_actual / ET_potential # Values < 0.75 trigger significant yield lossFor sugarcane in Maharashtra's Marathwada region — where Minimum Support Price (MSP) incentives push farmers toward water-intensive cultivation — WSI-based models show 22% yield variance explained by groundwater depth alone. This creates a policy signal: MSP for sugarcane distorts crop choice in agro-climatic zones unsuited for it (rainfall < 750mm/year), exacerbating groundwater depletion.
Indian Agro-Climatic Zones and Model Transfer
India's 15 agro-climatic zones (Planning Commission classification) have distinct soil-weather-crop interactions. A yield model trained on Indo-Gangetic Plains wheat cannot be transferred to Deccan Plateau sorghum without domain adaptation:
| Zone | Key Crops | Rainfall Regime | Soil Type | Major Stress |
|---|---|---|---|---|
| Western Himalayan | Maize, wheat, potato | Monsoon + snow melt | Alfisols, Entisols | Frost, landslide |
| Eastern Himalayan | Tea, cardamom, rice | High rainfall (2000-4000mm) | Oxisols | Landslide, flooding |
| Lower Gangetic Plains | Jute, rice | High humidity, monsoon | Inceptisols | Flooding, submergence |
| Middle Gangetic Plains | Rice-wheat rotation | Bimodal rainfall | Inceptisols | Waterlogging, salinity |
| Upper Gangetic Plains (Punjab/Haryana) | Wheat, rice, sugarcane | Irrigated | Entisols, Aridisols | Groundwater depletion |
| Trans-Gangetic Plains | Wheat, cotton, bajra | Low rainfall, irrigation dependent | Aridisols | Water stress |
| Eastern Plateau (Chhattisgarh) | Rice, millets | Moderate monsoon | Alfisols, Ultisols | Drought, micro-nutrient deficiency |
| Central Plateau (MP) | Soybean, wheat, cotton | Variable | Vertisols | Soil shrink-swell, drought |
| Western Plateau (Maharashtra) | Soybean, sugarcane, jowar | Low and erratic | Vertisols | Drought, input cost |
| Southern Plateau (Deccan) | Cotton, groundnut, sunflower | Low, bimodal | Alfisols | Drought |
| East Coast Plains | Rice, groundnut, horticulture | Cyclone-prone | Inceptisols, Entisols | Cyclone, salinity |
| West Coast Plains (Kerala/Goa) | Coconut, rubber, spices | Very high rainfall | Oxisols, Ultisols | Flooding, waterlogging |
| Gujarat Plains | Groundnut, cotton, wheat | Low and variable | Aridisols, Entisols | Drought, salinity |
| Western Dry (Rajasthan) | Bajra, moth bean | Very low | Aridisols | Severe drought |
| Islands | Coconut, spices | Tropical, humid | Entisols | Cyclone, salinity |
Transfer learning strategy: pre-train on pan-India historical district yield data (40+ years from DES), fine-tune on zone-specific sensor data. ICAR-NAARM's implementation uses a shared encoder for spectral features with zone-specific heads for yield regression.
MSP Impact on Crop Choice Modeling
The Minimum Support Price system creates feedback loops that AI models must capture: when wheat MSP rises relative to bajra MSP, Punjab/Haryana farmers shift area from coarse grains to wheat, impacting groundwater use, stubble burning (and PM 2.5 events in Delhi), and input demand.
Prompt: "Given NDVI time-series for plot_id PB_2234 showing peak NDVI 0.82 at DAS 95,
integral NDVI 42.3 over 150 days, soil NPK [185, 28, 145 kg/ha], rainfall deficit
-40mm vs normal in Jan-Feb, and 2024 wheat MSP at ₹2275/quintal — estimate:
(1) expected yield in quintals/hectare with confidence interval
(2) probability this farmer switches to bajra next season if MSP gap narrows to <₹200
(3) recommended urea top-dress date to recover yield deficit."Key Takeaways
This is chapter 1 of AI for Food Processing & Agri.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details