AI for Construction Project Scheduling & Cost Estimation
Delay Prediction, Earned Value Forecasting, Unit-Price Estimation, and Resource Optimization
Schedule Delay Prediction: Modelling What Every PM Already Knows
Construction delay on major infrastructure is not an anomaly — it is the baseline. Large public programs track thousands of projects across federal and state portfolios. Megaprojects routinely slip 12-18 months, and cost overruns of 20-30% are common on complex transit and highway work — patterns documented across decades of FHWA and academic megaproject research.
The causes are well-understood. What is missing is systematic quantification of delay risk at the activity level — which specific activities on which specific project will slip, and by how much.
Feature Engineering for Activity-Level Delay Prediction
Open data/project-schedule-data.csv in the code panel. Each row represents a schedule activity: project_id, activity_id, activity_name, wbs_code, planned_start, planned_finish, actual_start, actual_finish, planned_duration_days, predecessor_ids, resource_type, location_km, highway_section, season_at_start.
A gradient boosted model (LightGBM) trained on 50,000+ activities from 120 completed DOT highway projects predicts activity delay with the following features:
| Feature | Source | Importance (SHAP) |
|---|---|---|
| **season_at_midpoint** (winter/wet/dry) | Schedule + calendar | 0.19 |
| **activity_type** (earthwork/structural/pavement/utility) | WBS classification | 0.16 |
| predecessor_delay_cumulative | Schedule network | 0.14 |
| row_acquisition_status | Land records | 0.12 |
| labour_availability_index | Regional labor market data | 0.10 |
| utility_relocation_pending | Utility owner status | 0.09 |
| contractor_historical_performance | Past project data | 0.08 |
| planned_duration_days | Schedule | 0.05 |
| soil_type_at_location | Geotechnical data | 0.04 |
| distance_to_nearest_batch_plant | Project logistics | 0.03 |
The model achieves MAE of 18 days for activity-level delay prediction (median activity duration: 45 days). More usefully, it identifies the critical path activities most likely to cause project-level delay — enabling project managers to allocate management attention before delays cascade.
Weather Impact Modelling
Weather is not a binary event. A Pacific Northwest highway project loses 45-60 working days to rain between October and March. An Arizona project loses 10-15 days to extreme heat and monsoon storms. The model encodes this as a continuous feature — historical precipitation and temperature at the project location (NOAA gridded climate normals) crossed with activity type. Earthwork activities are 4x more sensitive to rainfall than structural concrete activities (which can continue under temporary shelters and cold-weather protection per ACI 306).
Right-of-Way and Utility Relocation
ROW acquisition and utility relocation are the two largest delay drivers on highway projects — together responsible for over half of total delay in many DOT programs. The model learns that activities with pending ROW within 500m have a 72% probability of delay exceeding 30 days. This is not insight — every highway PM knows this. The value is in quantifying it per activity and feeding it into schedule risk analysis.
Earned Value Management with AI: CPI/SPI Trend Forecasting
EVM is standard on large public projects — federal agencies, state DOTs, and major contractors all track CPI (Cost Performance Index) and SPI (Schedule Performance Index) per ANSI/EIA-748. The limitation of classical EVM: it reports current performance but forecasts linearly. If CPI = 0.92 today, the EAC (Estimate at Completion) assumes CPI stays at 0.92 for the remainder. In practice, CPI on infrastructure projects shows nonlinear patterns — it typically degrades during adverse-weather seasons, recovers afterward, and degrades again in the final 10% of the project (punch-list items, rework, demobilization inefficiency).
Time-Series Forecasting of CPI/SPI
An LSTM model trained on monthly CPI/SPI time series from 200+ completed projects (EPC and design-build delivery) captures these seasonal and phase-dependent patterns:
Input features (per month):
cpi, spi (current month)
cpi_3m_avg, spi_3m_avg (rolling average)
percent_complete
season (winter/wet/dry/peak)
activity_mix (% earthwork, % structural, % pavement, % finishing)
change_order_volume (cumulative approved COs as % of contract)
Output: cpi_forecast, spi_forecast (next 3 months)On a validation set of 40 projects, the LSTM EAC was within 5% of actual final cost on 72% of projects — versus 51% for linear EVM extrapolation. The improvement is largest for projects in the 40-70% completion range, where nonlinear effects are strongest.
Cost Estimation Using Unit-Price Books with Regional Factors
State DOT unit-price bid tabulations (historical winning bid prices by pay item) and RSMeans-style cost databases are the foundation of public sector cost estimation. Every state publishes its own weighted-average unit prices with location-specific multipliers. The challenge: published unit prices are updated annually, but actual market rates fluctuate quarterly. Regional factors (mountainous terrain, remote area, coastal) add 10-40% to base rates. Escalation (steel, cement, fuel, and labor indices from the Bureau of Labor Statistics PPI and ENR Construction Cost Index) needs forward-looking estimation.
Open data/cost-estimation-data.json — it contains item-level estimates: item_code, item_description, unit_price_usd, quantity, unit, location_factor, escalation_factor_applied, actual_rate_achieved, contractor_quoted_rate.
ML-Enhanced Cost Estimation
The ML model does not replace the unit-price book — it learns the gap between book-based estimates and actual bid/execution costs:
| Predictor | Effect on Book-Actual Gap |
|---|---|
| Location remoteness index | +5 to +25% (terrain, access road quality) |
| Market cycle position | ±10% (steel/cement price vs base-year index) |
| **Contractor competition** (number of bidders) | -5 to -15% (more bidders = lower quotes) |
| **Project scale** (contract value) | -3 to -8% (economies of scale on large jobs) |
| Season of letting | ±3% (early-season lettings attract more bidders) |
A Random Forest model trained on 5,000+ item-level comparisons (engineer's estimate vs low bid) from DOT bid tabulations predicts the estimate-to-bid ratio with MAE of 7%. This translates to ±7% accuracy on total project cost — comparable to AACE Class 3 estimate quality at the pre-bid stage.
Escalation Forecasting
For projects with 3-5 year execution periods, escalation can add 15-25% to base cost. The standard approach: apply a fixed escalation rate post-hoc. The AI approach: forecast steel, cement, and labor indices using ARIMA + external features (global steel prices, domestic cement capacity utilization, prevailing-wage determinations as a labor floor) and compute expected escalation at the estimate stage. This gives the estimator a realistic cost range rather than a point estimate that ignores future price movements.
Resource Optimization: Equipment and Labour Productivity
Open data/resource-utilization.json — it contains daily resource logs: project_id, date, resource_type (excavator/paver/batching_plant/transit_mixer/labour_crew), resource_id, planned_output, actual_output, unit, idle_hours, breakdown_hours, weather_code, shift_type.
Equipment Utilization Optimization
Construction equipment on highway projects typically achieves 55-65% utilization (actual productive hours / available hours). The gap comes from: breakdown (10-15%), idle waiting for material (10-15%), weather (5-10%), and poor sequencing (5-10%).
An optimization model using constraint programming (CP-SAT solver) with ML-predicted activity durations:
Objective: minimize total equipment-days across project
Constraints:
- Activity precedence (from CPM network)
- Equipment capacity (paver output ≤ 400 cy/shift, batch plant ≤ 80 cy/hr)
- Resource availability (equipment fleet size, crew count)
- Season restrictions (no earthwork during heavy rain or frozen subgrade)
- Concurrent activity limits (single-lane working under traffic)
ML inputs:
- Predicted activity duration (from delay model)
- Predicted weather days lost per month (from NOAA historical + forecast)
- Predicted breakdown probability per equipment type/ageOn a 60 km, 4-lane DOT highway project, this approach reduced asphalt plant idle time by 22% and paver utilization improved from 58% to 71% — achieved by better synchronizing aggregate stockpiling, plant production scheduling, and paving sequence.
Labour Productivity Prediction
Labour productivity on construction sites varies by 40-60% depending on season, overtime, skill level, and supervision intensity. A regression model predicting daily output per crew:
Key Takeaways
This is chapter 2 of AI for Civil & Infrastructure (Global).
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details