9 min

AI for Construction Project Scheduling & Cost Estimation

Delay Prediction, Earned Value Forecasting, DSR-Based Estimation, and Resource Optimization

Schedule Delay Prediction: Modelling What Every PM Already Knows

Construction delay in India is not an anomaly — it is the baseline. The Ministry of Statistics and Programme Implementation tracks 1,800+ central sector projects above ₹150 crore. As of 2024, 44% are delayed, with average cost overrun of 21%. Highway projects under Bharatmala consistently slip 12-18 months. Metro projects (Kochi, Lucknow, Nagpur) run 2-4 years behind schedule.

The causes are well-understood. What is missing is systematic quantification of delay risk at the activity level — which specific activities on which specific project will slip, and by how much.

Feature Engineering for Activity-Level Delay Prediction

Open data/project-schedule-data.csv in the code panel. Each row represents a schedule activity: project_id, activity_id, activity_name, wbs_code, planned_start, planned_finish, actual_start, actual_finish, planned_duration_days, predecessor_ids, resource_type, location_km, highway_section, season_at_start.

A gradient boosted model (LightGBM) trained on 50,000+ activities from 120 completed NHAI highway projects predicts activity delay with the following features:

Feature	Source	Importance (SHAP)
season_at_midpoint (monsoon/winter/summer)	Schedule + calendar	0.19
activity_type (earthwork/structural/pavement/utility)	WBS classification	0.16
predecessor_delay_cumulative	Schedule network	0.14
row_acquisition_status	Land records	0.12
labour_availability_index	District-level migration data	0.10
utility_shifting_pending	Utility agency status	0.09
contractor_historical_performance	Past project data	0.08
planned_duration_days	Schedule	0.05
soil_type_at_location	Geotechnical data	0.04
distance_to_nearest_batch_plant	Project logistics	0.03

The model achieves MAE of 18 days for activity-level delay prediction (median activity duration: 45 days). More usefully, it identifies the critical path activities most likely to cause project-level delay — enabling project managers to allocate management attention before delays cascade.

Monsoon Impact Modelling

Monsoon is not a binary event. A Mumbai highway project loses 45-60 working days between June and September. A Rajasthan project loses 15-20 days. The model encodes this as a continuous feature — historical rainfall at the project location (IMD gridded data at 0.25° resolution) crossed with activity type. Earthwork activities are 4x more sensitive to rainfall than structural concrete activities (which can continue under temporary shelters).

Right-of-Way and Utility Shifting

ROW acquisition and utility shifting are the two largest delay drivers in Indian highway projects — together responsible for 55% of total delay in Bharatmala Phase-I projects. The model learns that activities with pending ROW within 500m have a 72% probability of delay exceeding 30 days. This is not insight — every highway PM knows this. The value is in quantifying it per activity and feeding it into schedule risk analysis.

Earned Value Management with AI: CPI/SPI Trend Forecasting

EVM is standard on large Indian projects — CPWD, NHAI, and L&T all track CPI (Cost Performance Index) and SPI (Schedule Performance Index). The limitation of classical EVM: it reports current performance but forecasts linearly. If CPI = 0.92 today, the EAC (Estimate at Completion) assumes CPI stays at 0.92 for the remainder. In practice, CPI on Indian infrastructure projects shows nonlinear patterns — it typically degrades during monsoon, recovers post-monsoon, and degrades again in the final 10% of the project (punch-list items, rework, demobilization inefficiency).

Time-Series Forecasting of CPI/SPI

An LSTM model trained on monthly CPI/SPI time series from 200+ completed projects (L&T EPC + NHAI BOT) captures these seasonal and phase-dependent patterns:

Input features (per month):
  cpi, spi (current month)
  cpi_3m_avg, spi_3m_avg (rolling average)
  percent_complete
  season (monsoon/post-monsoon/winter/summer)
  activity_mix (% earthwork, % structural, % pavement, % finishing)
  change_order_volume (cumulative approved COs as % of contract)

Output: cpi_forecast, spi_forecast (next 3 months)

On a validation set of 40 projects, the LSTM EAC was within 5% of actual final cost on 72% of projects — versus 51% for linear EVM extrapolation. The improvement is largest for projects in the 40-70% completion range, where nonlinear effects are strongest.

Cost Estimation Using DSR/SOR with Regional Factors

CPWD Delhi Schedule of Rates (DSR) and MoRTH Standard Data Book are the foundation of public sector cost estimation in India. Every state PWD publishes its own SOR with location-specific multipliers. The challenge: DSR rates are updated annually, but actual market rates fluctuate quarterly. Regional factors (hilly terrain, tribal area, coastal) add 10-40% to base rates. Escalation clauses (IEEMA formula for steel, cement, fuel, labour indices) need forward-looking estimation.

Open data/cost-estimation-data.json — it contains item-level estimates: item_code, dsr_description, dsr_rate_inr, quantity, unit, location_factor, escalation_factor_applied, actual_rate_achieved, contractor_quoted_rate.

ML-Enhanced Cost Estimation

The ML model does not replace DSR — it learns the gap between DSR-based estimates and actual tender/execution costs:

Predictor	Effect on DSR-Actual Gap
Location remoteness index	+5 to +25% (terrain, access road quality)
Market cycle position	±10% (steel/cement price vs DSR base year)
Contractor competition (number of bidders)	-5 to -15% (more bidders = lower quotes)
Project scale (contract value)	-3 to -8% (economies of scale on large EPC)
Season of tendering	±3% (post-monsoon tenders attract more bidders)

A Random Forest model trained on 5,000+ item-level comparisons (DSR estimate vs actual L1 bid) from CPWD and state PWD projects predicts the DSR-to-bid ratio with MAE of 7%. This translates to ±7% accuracy on total project cost — comparable to AACE Class 3 estimate quality at the pre-tender stage.

Escalation Forecasting

For projects with 3-5 year execution periods, escalation can add 15-25% to base cost. The standard approach: apply IEEMA formula post-hoc. The AI approach: forecast steel, cement, and labour indices using ARIMA + external features (global steel prices, domestic cement capacity utilization, MGNREGA wage rates as labour floor) and compute expected escalation at the estimate stage. This gives the estimator a realistic cost range rather than a point estimate that ignores future price movements.

Resource Optimization: Equipment and Labour Productivity

Open data/resource-utilization.json — it contains daily resource logs: project_id, date, resource_type (excavator/paver/batching_plant/transit_mixer/labour_gang), resource_id, planned_output, actual_output, unit, idle_hours, breakdown_hours, weather_code, shift_type.

Equipment Utilization Optimization

Construction equipment on Indian highway projects typically achieves 55-65% utilization (actual productive hours / available hours). The gap comes from: breakdown (10-15%), idle waiting for material (10-15%), weather (5-10%), and poor sequencing (5-10%).

An optimization model using constraint programming (CP-SAT solver) with ML-predicted activity durations:

Objective: minimize total equipment-days across project
Constraints:
  - Activity precedence (from CPM network)
  - Equipment capacity (paver output ≤ 300 m³/shift, batching plant ≤ 60 m³/hr)
  - Resource availability (equipment fleet size, labour gang count)
  - Season restrictions (no earthwork during heavy rain days)
  - Concurrent activity limits (single-lane working on live highway)

ML inputs:
  - Predicted activity duration (from delay model)
  - Predicted weather days lost per month (from IMD historical + forecast)
  - Predicted breakdown probability per equipment type/age

On a Tata Projects highway project in MP (NH-30, 60 km 4-lane), this approach reduced hot-mix plant idle time by 22% and paver utilization improved from 58% to 71% — achieved by better synchronizing aggregate stockpiling, HMP production scheduling, and paving sequence.

Labour Productivity Prediction

Labour productivity on Indian sites varies by 40-60% depending on season, overtime, skill level, and supervision intensity. A regression model predicting daily output per gang:

Skilled-to-unskilled ratio has the strongest effect — gangs with >40% skilled workers produce 30% more per worker

Consecutive overtime days shows diminishing returns — productivity drops 15% after 3 consecutive 12-hour shifts

Supervisor-to-gang ratio below 1:3 degrades quality metrics (rework rate increases)

Key Takeaways

Activity-level delay prediction enables proactive management — knowing which activities will slip (and why) before they start is more valuable than detecting delay after it occurs. Monsoon and ROW are the dominant drivers, but their impact varies by activity type and location.

EVM forecasting with ML captures nonlinear CPI/SPI patterns — seasonal and phase-dependent performance changes are real and systematic. Linear EAC extrapolation underperforms.

DSR-based estimation benefits from market gap modelling — the DSR rate is a reference, not a prediction. ML models that learn the DSR-to-actual gap improve estimate accuracy from ±15-20% to ±7%.

Equipment optimization has immediate ROI — even modest utilization improvements (5-10 percentage points) save crores on large highway projects where equipment costs run ₹50-80 lakh/month per major item.

This is chapter 2 of AI for Civil & Infrastructure.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 1: AI for Structural Health Monitoring & Assessment

Ch. 3: AI for Geotechnical Engineering & Foundation Design