Back to guides
2
9 min

AI for Construction Project Scheduling & Cost Estimation

Delay Prediction, Earned Value Forecasting, Unit-Price Estimation, and Resource Optimization

Schedule Delay Prediction: Modelling What Every PM Already Knows

Construction delay on major infrastructure is not an anomaly — it is the baseline. Large public programs track thousands of projects across federal and state portfolios. Megaprojects routinely slip 12-18 months, and cost overruns of 20-30% are common on complex transit and highway work — patterns documented across decades of FHWA and academic megaproject research.

The causes are well-understood. What is missing is systematic quantification of delay risk at the activity level — which specific activities on which specific project will slip, and by how much.

Feature Engineering for Activity-Level Delay Prediction

Open data/project-schedule-data.csv in the code panel. Each row represents a schedule activity: project_id, activity_id, activity_name, wbs_code, planned_start, planned_finish, actual_start, actual_finish, planned_duration_days, predecessor_ids, resource_type, location_km, highway_section, season_at_start.

A gradient boosted model (LightGBM) trained on 50,000+ activities from 120 completed DOT highway projects predicts activity delay with the following features:

FeatureSourceImportance (SHAP)
**season_at_midpoint** (winter/wet/dry)Schedule + calendar0.19
**activity_type** (earthwork/structural/pavement/utility)WBS classification0.16
predecessor_delay_cumulativeSchedule network0.14
row_acquisition_statusLand records0.12
labour_availability_indexRegional labor market data0.10
utility_relocation_pendingUtility owner status0.09
contractor_historical_performancePast project data0.08
planned_duration_daysSchedule0.05
soil_type_at_locationGeotechnical data0.04
distance_to_nearest_batch_plantProject logistics0.03

The model achieves MAE of 18 days for activity-level delay prediction (median activity duration: 45 days). More usefully, it identifies the critical path activities most likely to cause project-level delay — enabling project managers to allocate management attention before delays cascade.

Weather Impact Modelling

Weather is not a binary event. A Pacific Northwest highway project loses 45-60 working days to rain between October and March. An Arizona project loses 10-15 days to extreme heat and monsoon storms. The model encodes this as a continuous feature — historical precipitation and temperature at the project location (NOAA gridded climate normals) crossed with activity type. Earthwork activities are 4x more sensitive to rainfall than structural concrete activities (which can continue under temporary shelters and cold-weather protection per ACI 306).

Right-of-Way and Utility Relocation

ROW acquisition and utility relocation are the two largest delay drivers on highway projects — together responsible for over half of total delay in many DOT programs. The model learns that activities with pending ROW within 500m have a 72% probability of delay exceeding 30 days. This is not insight — every highway PM knows this. The value is in quantifying it per activity and feeding it into schedule risk analysis.

Earned Value Management with AI: CPI/SPI Trend Forecasting

EVM is standard on large public projects — federal agencies, state DOTs, and major contractors all track CPI (Cost Performance Index) and SPI (Schedule Performance Index) per ANSI/EIA-748. The limitation of classical EVM: it reports current performance but forecasts linearly. If CPI = 0.92 today, the EAC (Estimate at Completion) assumes CPI stays at 0.92 for the remainder. In practice, CPI on infrastructure projects shows nonlinear patterns — it typically degrades during adverse-weather seasons, recovers afterward, and degrades again in the final 10% of the project (punch-list items, rework, demobilization inefficiency).

Time-Series Forecasting of CPI/SPI

An LSTM model trained on monthly CPI/SPI time series from 200+ completed projects (EPC and design-build delivery) captures these seasonal and phase-dependent patterns:

Input features (per month):
  cpi, spi (current month)
  cpi_3m_avg, spi_3m_avg (rolling average)
  percent_complete
  season (winter/wet/dry/peak)
  activity_mix (% earthwork, % structural, % pavement, % finishing)
  change_order_volume (cumulative approved COs as % of contract)

Output: cpi_forecast, spi_forecast (next 3 months)

On a validation set of 40 projects, the LSTM EAC was within 5% of actual final cost on 72% of projects — versus 51% for linear EVM extrapolation. The improvement is largest for projects in the 40-70% completion range, where nonlinear effects are strongest.

Cost Estimation Using Unit-Price Books with Regional Factors

State DOT unit-price bid tabulations (historical winning bid prices by pay item) and RSMeans-style cost databases are the foundation of public sector cost estimation. Every state publishes its own weighted-average unit prices with location-specific multipliers. The challenge: published unit prices are updated annually, but actual market rates fluctuate quarterly. Regional factors (mountainous terrain, remote area, coastal) add 10-40% to base rates. Escalation (steel, cement, fuel, and labor indices from the Bureau of Labor Statistics PPI and ENR Construction Cost Index) needs forward-looking estimation.

Open data/cost-estimation-data.json — it contains item-level estimates: item_code, item_description, unit_price_usd, quantity, unit, location_factor, escalation_factor_applied, actual_rate_achieved, contractor_quoted_rate.

ML-Enhanced Cost Estimation

The ML model does not replace the unit-price book — it learns the gap between book-based estimates and actual bid/execution costs:

PredictorEffect on Book-Actual Gap
Location remoteness index+5 to +25% (terrain, access road quality)
Market cycle position±10% (steel/cement price vs base-year index)
**Contractor competition** (number of bidders)-5 to -15% (more bidders = lower quotes)
**Project scale** (contract value)-3 to -8% (economies of scale on large jobs)
Season of letting±3% (early-season lettings attract more bidders)

A Random Forest model trained on 5,000+ item-level comparisons (engineer's estimate vs low bid) from DOT bid tabulations predicts the estimate-to-bid ratio with MAE of 7%. This translates to ±7% accuracy on total project cost — comparable to AACE Class 3 estimate quality at the pre-bid stage.

Escalation Forecasting

For projects with 3-5 year execution periods, escalation can add 15-25% to base cost. The standard approach: apply a fixed escalation rate post-hoc. The AI approach: forecast steel, cement, and labor indices using ARIMA + external features (global steel prices, domestic cement capacity utilization, prevailing-wage determinations as a labor floor) and compute expected escalation at the estimate stage. This gives the estimator a realistic cost range rather than a point estimate that ignores future price movements.

Resource Optimization: Equipment and Labour Productivity

Open data/resource-utilization.json — it contains daily resource logs: project_id, date, resource_type (excavator/paver/batching_plant/transit_mixer/labour_crew), resource_id, planned_output, actual_output, unit, idle_hours, breakdown_hours, weather_code, shift_type.

Equipment Utilization Optimization

Construction equipment on highway projects typically achieves 55-65% utilization (actual productive hours / available hours). The gap comes from: breakdown (10-15%), idle waiting for material (10-15%), weather (5-10%), and poor sequencing (5-10%).

An optimization model using constraint programming (CP-SAT solver) with ML-predicted activity durations:

Objective: minimize total equipment-days across project
Constraints:
  - Activity precedence (from CPM network)
  - Equipment capacity (paver output ≤ 400 cy/shift, batch plant ≤ 80 cy/hr)
  - Resource availability (equipment fleet size, crew count)
  - Season restrictions (no earthwork during heavy rain or frozen subgrade)
  - Concurrent activity limits (single-lane working under traffic)

ML inputs:
  - Predicted activity duration (from delay model)
  - Predicted weather days lost per month (from NOAA historical + forecast)
  - Predicted breakdown probability per equipment type/age

On a 60 km, 4-lane DOT highway project, this approach reduced asphalt plant idle time by 22% and paver utilization improved from 58% to 71% — achieved by better synchronizing aggregate stockpiling, plant production scheduling, and paving sequence.

Labour Productivity Prediction

Labour productivity on construction sites varies by 40-60% depending on season, overtime, skill level, and supervision intensity. A regression model predicting daily output per crew:

  • Skilled-to-helper ratio has the strongest effect — crews with >40% journeyman-level workers produce 30% more per worker
  • Consecutive overtime days shows diminishing returns — productivity drops 15% after 3 consecutive 12-hour shifts
  • Foreman-to-crew ratio below 1:3 degrades quality metrics (rework rate increases)
  • Key Takeaways

  • Activity-level delay prediction enables proactive management — knowing which activities will slip (and why) before they start is more valuable than detecting delay after it occurs. Weather and ROW are the dominant drivers, but their impact varies by activity type and location.
  • EVM forecasting with ML captures nonlinear CPI/SPI patterns — seasonal and phase-dependent performance changes are real and systematic. Linear EAC extrapolation underperforms.
  • Unit-price estimation benefits from market gap modelling — the published unit price is a reference, not a prediction. ML models that learn the book-to-actual gap improve estimate accuracy from ±15-20% to ±7%.
  • Equipment optimization has immediate ROI — even modest utilization improvements (5-10 percentage points) save millions on large highway projects where major equipment costs run tens of thousands of dollars per month per item.
  • This is chapter 2 of AI for Civil & Infrastructure (Global).

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details