Back to guides
3
9 min

EV Battery & Powertrain

BMS with ML, SoC/SoH Estimation, Thermal Runaway Prediction, and Battery Second Life

The Battery Is the Vehicle

In an EV, the battery pack is not a component — it is the platform. It determines range, performance, charge time, longevity, and residual value. Battery management systems (BMS) have evolved from basic cell balancing circuits to sophisticated embedded compute platforms. The next evolution is ML-augmented BMS: algorithms that estimate internal cell states more accurately than physics-based models alone, predict remaining useful life (RUL) months in advance, and detect thermal anomalies before they become runaway events.

India's EV market adds specific layers: the FAME II (Faster Adoption and Manufacturing of Hybrid and Electric Vehicles) subsidy program has driven aggressive price competition, pushing OEMs to use cheaper cells with tighter margins. Ola Electric, Ather Energy, Tata Motors EV, and Mahindra's electric portfolio all operate in a thermal environment that spans 0°C (Shimla in winter) to 48°C (Rajasthan in summer) — one of the widest ambient ranges in any major EV market.

Safety-critical disclaimer: Battery management systems are safety-critical components. Incorrect SoC/SoH estimation causes over-charge or over-discharge events that accelerate cell degradation and create fire hazards. Thermal runaway prediction models must be validated against the full operating envelope and must never be the sole safety mechanism — hardware protection circuits (OVP, UVP, OTP) are mandatory independent layers. All battery packs sold in India must comply with AIS-038 (EV safety) and AIS-048 (battery specific) standards.

SoC Estimation: Kalman vs Neural

State of Charge (SoC) is the fraction of remaining charge capacity relative to fully charged capacity. It is not directly measurable — it must be estimated from measurable quantities: terminal voltage, current, and temperature.

Physics-Based Approaches

Coulomb counting integrates measured current over time. Simple but accumulates errors from current sensor offset and initial SoC uncertainty.

Extended Kalman Filter (EKF) uses an equivalent circuit model (ECM) — typically a Thevenin model with RC pairs — as the state equation and corrects the state estimate using voltage measurements. The EKF is the production standard today and is well-understood by automotive safety engineers.

MethodAccuracy (RMSE)ComputeTemperature robustness
Coulomb counting3–8%Very lowPoor (drift over time)
EKF (1 RC pair ECM)1–3%LowModerate
EKF (2 RC pairs ECM)0.8–2%Low-mediumGood
LSTM neural network0.5–1.5%MediumExcellent (if trained across T range)
Transformer-based0.4–1.2%HighExcellent

Neural SoC Estimation

LSTM and Transformer networks for SoC estimation take a time-window of [V, I, T] measurements and predict SoC. They outperform EKF when cell aging shifts the ECM parameters — neural nets learn the relationship implicitly from data without needing explicit model re-identification.

# Training a simple LSTM SoC estimator
# Open data/battery-cycle-data.json for 500 cycle records

import json
import numpy as np
import torch
import torch.nn as nn

with open("data/battery-cycle-data.json") as f:
    cycles = json.load(f)

# Each cycle: {"voltage_v": [...], "current_a": [...], "temp_c": [...], "soc_true": [...]}
# sequence_length = 50 timesteps at 1Hz

class SoCLSTM(nn.Module):
    def __init__(self, input_size=3, hidden_size=64, num_layers=2):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=0.1)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):  # x: (batch, seq, 3)
        out, _ = self.lstm(x)
        return torch.sigmoid(self.fc(out[:, -1, :]))  # SoC in [0, 1]

model = SoCLSTM()
# Training loop: MSE loss on SoC, validation on unseen cycles at 45°C (Indian summer condition)

For the Ola S1 Pro and Ather 450X operating in Chennai (35–42°C ambient during charging), temperature-robust SoC estimation is a commercial differentiator — inaccurate SoC causes the BMS to terminate charging early, reducing effective range.

SoH Estimation and RUL Prediction

State of Health (SoH) quantifies battery degradation: SoH = 100% at beginning of life (BoL), falls to typically 80% at end of life (EoL), the threshold where range reduction becomes commercially unacceptable. SoH is defined as current capacity / rated capacity × 100%.

Remaining Useful Life (RUL) predicts how many more charge-discharge cycles (or calendar days) before SoH crosses the EoL threshold. Accurate RUL prediction enables:

  • Proactive warranty management (contact customer before failure, not after)
  • Second-life battery sorting (cells above 80% SoH → grid storage, below → recycling)
  • Fleet management for Ola and Ather's B2B delivery partnerships
  • Degradation Features

    Key measurable features that correlate with SoH:

    FeaturePhysical MechanismMeasurement Method
    Capacity fadeSEI layer growth, Li platingFull charge-discharge cycle
    Internal resistance increaseSEI growth, electrode porosityEIS or DC pulse test
    Incremental capacity (dQ/dV) peaksPhase transitions in cathodeSlow-rate charge measurement
    Coulombic efficiencySide reactionsRatio of discharge to charge capacity
    Voltage relaxation time constantSolid-state diffusion slowdownOCV recovery after pulse
    Prompt: "I have cycle data for 200 NMC cells aged to varying SoH levels (70–100%).
    Features available at each cycle: capacity (Ah), internal resistance (mΩ),
    mean voltage plateau (V), dQ/dV peak shift (mV), ambient temperature during cycling (°C).
    Target: SoH (%) and RUL (cycles remaining to 80% threshold).
    
    Design a two-stage ML pipeline:
    1. Feature engineering: which features to compute from raw V-I-T measurements
    2. Model selection: compare Gaussian Process, Random Forest, and LSTM for SoH and RUL
    Specify the train/validation split strategy that prevents data leakage across cell batches."

    Thermal Runaway Prediction

    Thermal runaway is the self-accelerating exothermic reaction sequence: electrolyte decomposition → separator melt → internal short circuit → venting → fire. It is the primary safety failure mode for lithium-ion batteries and the driver of high-profile EV fires that have damaged consumer confidence in India.

    Early warning requires detecting the precursor signatures before thermal runaway initiates:

  • Cell temperature divergence — one cell warming faster than its neighbors while pack temperature appears normal
  • Internal resistance spike — localized electrochemical anomaly
  • Micro short circuit detection — subtle capacity fade acceleration, reduced coulombic efficiency
  • Gas sensing — CO, H₂ evolution from electrolyte decomposition (requires gas sensors in pack design)
  • ML Anomaly Detection Approach

    The BMS sees hundreds of temperature channels at 10–100 Hz. A multivariate anomaly detection model — isolation forest, autoencoder, or LSTM-based predictor — monitors all channels and flags deviations from the learned normal distribution.

    Open data/bms-telemetry-samples.json — it contains BMS telemetry from 50 normal packs and 5 packs with documented cell-level anomalies (simulated and anonymized from field failure data). The anomaly detection benchmark: detect precursor events > 60 seconds before thermal runaway initiation.

    A key engineering decision is the false-positive rate. Triggering an emergency shutdown on a false positive leaves a customer stranded — particularly problematic for Ola's app-based scooter users without a dealer network nearby. The precision-recall tradeoff must be calibrated with operational context.

    Battery Second Life

    AIS-048 requires minimum 80% SoH at end of first automotive life. The cells that exit EV packs at 80–85% SoH still have substantial energy storage capacity suitable for:

  • Stationary energy storage: Grid backup at BSNL telecom towers, agricultural pump stations in UP/Bihar
  • Commercial EV fleet buffers: Charge buffer stations for Yulu/Bounce bike fleets in Bengaluru
  • ML's role in second life:

    TaskApproach
    Cell sorting by SoHRF or LSTM from discharge signatures, avoids full cycle test
    Pack reassembly optimizationConstraint-satisfaction: match cells by internal resistance within ±5%
    Remaining second-life predictionTransfer learning from first-life degradation model
    Grid storage dispatch optimizationReinforcement learning on grid pricing + cell temperature

    FAME II policy is indirectly driving second-life markets — OEMs must report on battery disposal as part of subsidy compliance, creating structured end-of-life data that enables ML model training.

    Key Takeaways

  • Neural SoC/SoH models outperform EKF on aged cells but require broad temperature coverage in training data. Indian OEMs operating in 0–48°C ambient ranges need models validated at both extremes.
  • Thermal runaway early warning is a product differentiator — 60+ seconds of warning time enables safe shutdown vs. uncontrolled failure. Build the anomaly detection stack with explicit false-positive rate management.
  • RUL prediction enables second-life economics — accurate degradation modelling is the prerequisite for sorting cells into second-life use cases, which changes the unit economics of the battery pack significantly.
  • AIS-038/048 compliance is the legal floor — ML algorithms are in addition to, not instead of, the mandatory hardware protection circuits. Document the ML system's role in the safety case per ISO 26262 ASIL requirements.
  • This is chapter 3 of AI for Automotive & EV.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details