Crop Yield & Quality Prediction
Satellite NDVI, Soil Sensor Fusion, and Agro-Climatic Regression for US & EU Agriculture
The Prediction Stack: From Satellite to Harvest Estimate
Crop yield prediction is a multi-scale, multi-source problem. A single-sensor approach — relying only on NDVI or only on weather data — consistently underperforms ensemble methods that fuse satellite imagery, in-field sensors, and historical price records. The challenge in the US is scale: roughly 2 million farm operations covering 880 million acres, average size 440 acres, distributed across distinct growing regions — the Corn Belt, the Great Plains, California's Central Valley, the Mississippi Delta — with wildly different soil profiles, rainfall regimes, and crop mixes. The EU adds further fragmentation across member-state climates.
The architecture that works at USDA NASS, land-grant university extension programs, and large ag retailers follows a three-tier design:
| Tier | Data Sources | Temporal Resolution | Spatial Resolution |
|---|---|---|---|
| Satellite | Sentinel-2, Landsat-8/9, Planet, USDA NAIP | 5-10 days revisit | 3-30m/pixel |
| Ground sensors | Soil NPK probes, moisture sensors, weather stations | 15-min intervals | Point + interpolation |
| Agro-historical | USDA NASS Quick Stats, CME futures, county yield records, Climate FieldView | Weekly/seasonal | County level |
Open data/ndvi-field-data.csv — it contains NDVI time-series for 200+ plots across Iowa corn, Illinois soybean, and California rice. Each row has plot_id, date, NDVI, EVI, LAI (Leaf Area Index), rainfall_7d, and GDD (Growing Degree Days). The seasonal signature — steep NDVI rise at canopy closure, plateau at grain fill, rapid decline at maturation — is your primary yield signal.
NDVI Time-Series Feature Engineering
Raw NDVI values are not directly predictive. What matters is the shape of the seasonal curve relative to baseline:
# Core features extracted per plot per season
features = {
"peak_ndvi": max(ndvi_series), # Proxy for max biomass
"peak_timing_das": day_of_peak - sowing_date, # Late peak → stress or late planting
"integral_ndvi": trapz(ndvi_series, time_axis), # Season-long photosynthesis
"greenup_rate": (ndvi_peak - ndvi_base) / days_to_peak, # Crop establishment vigor
"senescence_rate": (ndvi_peak - ndvi_harvest) / days_to_harvest, # Drydown rate
"ndvi_cv": std(ndvi_series) / mean(ndvi_series), # Within-field variability
"stress_days": count(ndvi < ndvi_75th_percentile), # Stress event frequency
}The integral NDVI (sometimes called NDVI-based LAI integral or green-area duration) is consistently the highest-importance feature in gradient boosting models trained on USDA county datasets. For corn in Iowa, RMSE on county-level yield prediction drops from 8.2% to 3.1% when integral NDVI replaces peak NDVI alone.
Soil Sensor Fusion: NPK + Moisture + pH
The USDA NRCS Web Soil Survey and SSURGO database provide nationwide soil mapping that historically sat in static map units. Modern AI pipelines fuse it with live probe data into a geospatial raster:
| Soil Parameter | Sensor Type | Prediction Target |
|---|---|---|
| N (Nitrogen) | Ion-selective electrode, NIR | NDVI boost probability |
| P (Phosphorus) | Colorimetric probe | Root biomass proxy |
| K (Potassium) | Flame photometry / IoT electrode | Stress tolerance |
| Moisture (volumetric) | Capacitance probe, TDR | Irrigation trigger, yield drag |
| pH | pH electrode | Nutrient availability multiplier |
| EC (Electrical Conductivity) | EC probe | Salinity stress flag |
Open data/soil-sensor-fusion.json — it contains point-level soil readings from 500 farms enrolled in a John Deere Operations Center / Climate FieldView precision program, linked to their Sentinel-2 plot polygons. The critical relationship to model: nitrogen depletion typically shows as NDVI decline 12-18 days after application deficit, which matches the leaf nitrogen translocation timeline.
Fusion architecture uses spatial Kriging to interpolate point readings to field-level rasters, then concatenates with satellite bands as additional channels in a CNN or as tabular features in XGBoost:
# Kriging interpolation for NPK raster
from pykrige.ok import OrdinaryKriging
ok_N = OrdinaryKriging(
x=sensor_lons, y=sensor_lats, z=N_readings,
variogram_model='spherical',
nlags=12
)
N_raster, N_variance = ok_N.execute('grid', grid_lon, grid_lat)
# Output: 10m resolution N availability map aligned with Sentinel-2 gridWeather-Crop Regression: GDD and Stress Indices
Growing Degree Day (GDD) accumulation governs crop phenology more reliably than calendar date:
GDD_daily = max(0, (T_max + T_min)/2 - T_base)
# Crop-specific base temperatures (US growing regions)
T_base = {"wheat": 0°C, "rice": 10°C, "corn": 10°C, "soybean": 10°C}
# Cumulative GDD triggers
gdd_stages = {
"corn": {"emergence": 120, "V6": 475, "silking": 1400, "maturity": 2700}
}Beyond GDD, the water stress index (WSI) integrates irrigation data with rainfall to predict yield drag:
WSI = ET_actual / ET_potential # Values < 0.75 trigger significant yield lossFor irrigated crops in California's Central Valley and the Ogallala-dependent High Plains — where federal crop insurance and water-rights incentives shape cropping decisions — WSI-based models show 22% yield variance explained by aquifer depth and allocation alone. This creates a policy signal: subsidized irrigation in arid zones (rainfall < 250mm/year) distorts crop choice toward water-intensive cultivation, exacerbating Ogallala Aquifer depletion.
US & EU Growing Regions and Model Transfer
US growing regions (USDA farm resource regions) have distinct soil-weather-crop interactions. A yield model trained on Corn Belt corn cannot be transferred to Central Valley almonds without domain adaptation:
| Region | Key Crops | Rainfall Regime | Soil Type | Major Stress |
|---|---|---|---|---|
| Corn Belt (IA/IL/IN) | Corn, soybean | Reliable summer rain | Mollisols (prairie loam) | Heat at silking, derecho |
| Northern Great Plains | Spring wheat, canola, sunflower | Low, semi-arid | Mollisols, Aridisols | Drought, frost |
| Southern Great Plains | Winter wheat, sorghum, cotton | Erratic, irrigation-dependent | Aridisols | Water stress, dust |
| Mississippi Delta | Rice, cotton, soybean | High humidity, abundant rain | Alfisols, Vertisols | Flooding, heat |
| California Central Valley | Almonds, tomatoes, rice, grapes | Mediterranean, irrigated | Entisols, Alfisols | Drought, water allocation |
| Pacific Northwest (Columbia Basin) | Potato, wheat, apple | Dry, irrigated | Andisols, Aridisols | Water stress, frost |
| Southeast Coastal Plain | Peanut, cotton, citrus | Humid, hurricane-prone | Ultisols, Spodosols | Hurricane, micro-nutrient deficiency |
| Northeast / Great Lakes | Dairy forage, apple, grape | Cool, humid | Inceptisols | Frost, short season |
| EU — Northern Europe (DE/PL/UK) | Wheat, barley, rapeseed, potato | Temperate, oceanic | Cambisols, Luvisols | Wet harvest, lodging |
| EU — Mediterranean (ES/IT/GR) | Olive, grape, durum wheat, citrus | Hot dry summer | Calcisols, Vertisols | Drought, heat |
| EU — Continental (FR/HU/RO) | Maize, sunflower, sugar beet | Variable continental | Chernozems, Luvisols | Drought, frost |
| Texas High Plains | Cotton, sorghum, corn | Very low, irrigated | Aridisols | Ogallala depletion |
| Florida / Gulf | Citrus, sugarcane, vegetables | Subtropical, humid | Spodosols, Histosols | Hurricane, salinity |
| Mountain West | Potato, alfalfa, sugar beet | Low, snowmelt-irrigated | Aridisols, Mollisols | Frost, water allocation |
| Northern Plains (Red River) | Sugar beet, spring wheat, edible bean | Cold, moderate rain | Vertisols | Flooding, salinity |
Transfer learning strategy: pre-train on national historical county yield data (USDA NASS, 40+ years), fine-tune on region-specific sensor data. Land-grant extension implementations use a shared encoder for spectral features with region-specific heads for yield regression.
Policy Impact on Crop Choice Modeling
US farm policy creates feedback loops that AI models must capture: when crop insurance guarantees and ethanol mandate (RFS) demand raise corn returns relative to soybean, Corn Belt farmers shift area toward corn, impacting nitrogen runoff (Gulf hypoxia), tile drainage demand, and input markets. The EU CAP and its eco-scheme payments create analogous incentives.
Prompt: "Given NDVI time-series for plot_id IA_2234 showing peak NDVI 0.82 at DAS 95,
integral NDVI 42.3 over 150 days, soil NPK [185, 28, 145 kg/ha], rainfall deficit
-40mm vs normal in June-July, and 2024 December corn futures at $4.65/bushel — estimate:
(1) expected yield in bushels/acre with confidence interval
(2) probability this grower switches to soybean next season if the corn-bean price ratio
falls below 2.4
(3) recommended sidedress nitrogen date to recover yield deficit."Key Takeaways
This is chapter 1 of AI for Food Processing & Agri (Global).
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details