Back to guides
1
8-9 min

Crop Yield & Quality Prediction

Satellite NDVI, Soil Sensor Fusion, and Agro-Climatic Regression for US & EU Agriculture

The Prediction Stack: From Satellite to Harvest Estimate

Crop yield prediction is a multi-scale, multi-source problem. A single-sensor approach — relying only on NDVI or only on weather data — consistently underperforms ensemble methods that fuse satellite imagery, in-field sensors, and historical price records. The challenge in the US is scale: roughly 2 million farm operations covering 880 million acres, average size 440 acres, distributed across distinct growing regions — the Corn Belt, the Great Plains, California's Central Valley, the Mississippi Delta — with wildly different soil profiles, rainfall regimes, and crop mixes. The EU adds further fragmentation across member-state climates.

The architecture that works at USDA NASS, land-grant university extension programs, and large ag retailers follows a three-tier design:

TierData SourcesTemporal ResolutionSpatial Resolution
SatelliteSentinel-2, Landsat-8/9, Planet, USDA NAIP5-10 days revisit3-30m/pixel
Ground sensorsSoil NPK probes, moisture sensors, weather stations15-min intervalsPoint + interpolation
Agro-historicalUSDA NASS Quick Stats, CME futures, county yield records, Climate FieldViewWeekly/seasonalCounty level

Open data/ndvi-field-data.csv — it contains NDVI time-series for 200+ plots across Iowa corn, Illinois soybean, and California rice. Each row has plot_id, date, NDVI, EVI, LAI (Leaf Area Index), rainfall_7d, and GDD (Growing Degree Days). The seasonal signature — steep NDVI rise at canopy closure, plateau at grain fill, rapid decline at maturation — is your primary yield signal.

NDVI Time-Series Feature Engineering

Raw NDVI values are not directly predictive. What matters is the shape of the seasonal curve relative to baseline:

# Core features extracted per plot per season
features = {
    "peak_ndvi": max(ndvi_series),                     # Proxy for max biomass
    "peak_timing_das": day_of_peak - sowing_date,       # Late peak → stress or late planting
    "integral_ndvi": trapz(ndvi_series, time_axis),     # Season-long photosynthesis
    "greenup_rate": (ndvi_peak - ndvi_base) / days_to_peak,  # Crop establishment vigor
    "senescence_rate": (ndvi_peak - ndvi_harvest) / days_to_harvest,  # Drydown rate
    "ndvi_cv": std(ndvi_series) / mean(ndvi_series),    # Within-field variability
    "stress_days": count(ndvi < ndvi_75th_percentile),  # Stress event frequency
}

The integral NDVI (sometimes called NDVI-based LAI integral or green-area duration) is consistently the highest-importance feature in gradient boosting models trained on USDA county datasets. For corn in Iowa, RMSE on county-level yield prediction drops from 8.2% to 3.1% when integral NDVI replaces peak NDVI alone.

Soil Sensor Fusion: NPK + Moisture + pH

The USDA NRCS Web Soil Survey and SSURGO database provide nationwide soil mapping that historically sat in static map units. Modern AI pipelines fuse it with live probe data into a geospatial raster:

Soil ParameterSensor TypePrediction Target
N (Nitrogen)Ion-selective electrode, NIRNDVI boost probability
P (Phosphorus)Colorimetric probeRoot biomass proxy
K (Potassium)Flame photometry / IoT electrodeStress tolerance
Moisture (volumetric)Capacitance probe, TDRIrrigation trigger, yield drag
pHpH electrodeNutrient availability multiplier
EC (Electrical Conductivity)EC probeSalinity stress flag

Open data/soil-sensor-fusion.json — it contains point-level soil readings from 500 farms enrolled in a John Deere Operations Center / Climate FieldView precision program, linked to their Sentinel-2 plot polygons. The critical relationship to model: nitrogen depletion typically shows as NDVI decline 12-18 days after application deficit, which matches the leaf nitrogen translocation timeline.

Fusion architecture uses spatial Kriging to interpolate point readings to field-level rasters, then concatenates with satellite bands as additional channels in a CNN or as tabular features in XGBoost:

# Kriging interpolation for NPK raster
from pykrige.ok import OrdinaryKriging

ok_N = OrdinaryKriging(
    x=sensor_lons, y=sensor_lats, z=N_readings,
    variogram_model='spherical',
    nlags=12
)
N_raster, N_variance = ok_N.execute('grid', grid_lon, grid_lat)
# Output: 10m resolution N availability map aligned with Sentinel-2 grid

Weather-Crop Regression: GDD and Stress Indices

Growing Degree Day (GDD) accumulation governs crop phenology more reliably than calendar date:

GDD_daily = max(0, (T_max + T_min)/2 - T_base)

# Crop-specific base temperatures (US growing regions)
T_base = {"wheat": 0°C, "rice": 10°C, "corn": 10°C, "soybean": 10°C}

# Cumulative GDD triggers
gdd_stages = {
    "corn": {"emergence": 120, "V6": 475, "silking": 1400, "maturity": 2700}
}

Beyond GDD, the water stress index (WSI) integrates irrigation data with rainfall to predict yield drag:

WSI = ET_actual / ET_potential   # Values < 0.75 trigger significant yield loss

For irrigated crops in California's Central Valley and the Ogallala-dependent High Plains — where federal crop insurance and water-rights incentives shape cropping decisions — WSI-based models show 22% yield variance explained by aquifer depth and allocation alone. This creates a policy signal: subsidized irrigation in arid zones (rainfall < 250mm/year) distorts crop choice toward water-intensive cultivation, exacerbating Ogallala Aquifer depletion.

US & EU Growing Regions and Model Transfer

US growing regions (USDA farm resource regions) have distinct soil-weather-crop interactions. A yield model trained on Corn Belt corn cannot be transferred to Central Valley almonds without domain adaptation:

RegionKey CropsRainfall RegimeSoil TypeMajor Stress
Corn Belt (IA/IL/IN)Corn, soybeanReliable summer rainMollisols (prairie loam)Heat at silking, derecho
Northern Great PlainsSpring wheat, canola, sunflowerLow, semi-aridMollisols, AridisolsDrought, frost
Southern Great PlainsWinter wheat, sorghum, cottonErratic, irrigation-dependentAridisolsWater stress, dust
Mississippi DeltaRice, cotton, soybeanHigh humidity, abundant rainAlfisols, VertisolsFlooding, heat
California Central ValleyAlmonds, tomatoes, rice, grapesMediterranean, irrigatedEntisols, AlfisolsDrought, water allocation
Pacific Northwest (Columbia Basin)Potato, wheat, appleDry, irrigatedAndisols, AridisolsWater stress, frost
Southeast Coastal PlainPeanut, cotton, citrusHumid, hurricane-proneUltisols, SpodosolsHurricane, micro-nutrient deficiency
Northeast / Great LakesDairy forage, apple, grapeCool, humidInceptisolsFrost, short season
EU — Northern Europe (DE/PL/UK)Wheat, barley, rapeseed, potatoTemperate, oceanicCambisols, LuvisolsWet harvest, lodging
EU — Mediterranean (ES/IT/GR)Olive, grape, durum wheat, citrusHot dry summerCalcisols, VertisolsDrought, heat
EU — Continental (FR/HU/RO)Maize, sunflower, sugar beetVariable continentalChernozems, LuvisolsDrought, frost
Texas High PlainsCotton, sorghum, cornVery low, irrigatedAridisolsOgallala depletion
Florida / GulfCitrus, sugarcane, vegetablesSubtropical, humidSpodosols, HistosolsHurricane, salinity
Mountain WestPotato, alfalfa, sugar beetLow, snowmelt-irrigatedAridisols, MollisolsFrost, water allocation
Northern Plains (Red River)Sugar beet, spring wheat, edible beanCold, moderate rainVertisolsFlooding, salinity

Transfer learning strategy: pre-train on national historical county yield data (USDA NASS, 40+ years), fine-tune on region-specific sensor data. Land-grant extension implementations use a shared encoder for spectral features with region-specific heads for yield regression.

Policy Impact on Crop Choice Modeling

US farm policy creates feedback loops that AI models must capture: when crop insurance guarantees and ethanol mandate (RFS) demand raise corn returns relative to soybean, Corn Belt farmers shift area toward corn, impacting nitrogen runoff (Gulf hypoxia), tile drainage demand, and input markets. The EU CAP and its eco-scheme payments create analogous incentives.

Prompt: "Given NDVI time-series for plot_id IA_2234 showing peak NDVI 0.82 at DAS 95,
integral NDVI 42.3 over 150 days, soil NPK [185, 28, 145 kg/ha], rainfall deficit
-40mm vs normal in June-July, and 2024 December corn futures at $4.65/bushel — estimate:
(1) expected yield in bushels/acre with confidence interval
(2) probability this grower switches to soybean next season if the corn-bean price ratio
falls below 2.4
(3) recommended sidedress nitrogen date to recover yield deficit."

Key Takeaways

  • Integral NDVI consistently outperforms peak NDVI for yield prediction — it captures the full season's photosynthetic activity, not just the maximum greenness moment.
  • Soil sensor fusion requires spatial interpolation — raw point readings need Kriging or IDW to produce field-level rasters that align with satellite imagery.
  • Policy-driven crop choice is a systemic model input — ignoring signals like crop insurance, the Renewable Fuel Standard, and EU CAP eco-schemes produces yield forecasts that miss the area-allocation shifts that dominate county-level production variance.
  • Region-specific transfer learning is essential — a single national model underperforms region-tuned models by 30-40% on RMSE, primarily because soil-rainfall interactions differ dramatically across US and EU growing regions.
  • This is chapter 1 of AI for Food Processing & Agri (Global).

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details