April 17, 2026

•

AI & Machine Learning

Surrogate Model Data Preparation: 4 Failure Modes and How to Fix Each One

Most surrogate model projects fail at data prep. The 4 failure modes, what each one looks like, and how to fix it.

Filippo Boscolo Fiore

Head of Account Management

Most surrogate model projects fail silently.

The model trains. The validation accuracy looks reasonable. You deploy it. A few weeks later, the predictions start drifting. Engineers stop trusting it. Someone flags that the model "doesn’t work on new designs."

The standard response is to retrain with more data, tune the architecture, or swap the algorithm. None of those things are the problem.

The problem is the data preparation stage. NAFEMS research on reduced order modelling and surrogate deployment reaches the same conclusion consistently: modelling techniques are mature and well-understood. Data preparation is where production deployments stall.

This article covers the four failure modes most teams hit at the data preparation stage, what each looks like in practice, and how to fix each one.

The 4 failure modes

When surrogate models fail post-deployment, the failure mode is almost always one of four things. Each has a specific signature and a specific fix.

‍

Failure mode 1 - Inconsistent inputs across runs

The same parameter is named differently in different simulations. Units do not match. Boundary conditions are recorded in free-text comments instead of structured metadata. The model trains on what it thinks are comparable runs, but they are not.

Signature: The model looks accurate on the training set but performance varies wildly between programs or between engineers on the same team.

Fix: Enforce a canonical schema on data ingest, not on model training. The schema has to be the source of truth that every solver output is normalised into, not an afterthought applied during feature engineering.

‍

Failure mode 2 - Missing physical parameters

The model was trained on geometry and operating conditions, but the material properties, mesh density, or turbulence model settings, which materially affect the solver’s output, were not included in the training features. The model learned the average behaviour across those hidden variables.

Signature: The model works well when the hidden variables happen to match training distribution, fails catastrophically when they don’t. Often appears as "worked fine until we tried it on the new mesh."

Fix: Capture every variable that materially affects solver output as structured metadata on ingest, solver version, turbulence model, mesh characteristics, boundary conditions, material models. Make them queryable features, not comments.

‍

Failure mode 3 - Training-deployment distribution gap

The model was trained on data from one range of operating conditions and deployed on a new design that falls outside that range. The underlying physics changed, but the data structure did not capture enough context for the model to know it was extrapolating.

Signature: The model gives confident predictions that are physically wrong. No warnings, no uncertainty estimates, just silently bad output.

Fix: Track and store the operating envelope of the training data as metadata on the model itself. Build deployment-time checks that flag when inputs fall outside the training envelope before returning a prediction.

‍

Failure mode 4 - Silent schema drift

The data schema that produced the training set has changed since training. A new engineer introduced a slightly different variable name. The deployment pipeline is feeding the model data in a slightly different shape. The model makes predictions without warnings, but they are based on corrupted inputs.

Signature: Model performance degrades over weeks or months without any obvious trigger. Engineers blame "model drift" when the issue is actually upstream data drift.

Fix: Version the data schema itself, not just the model. Every model version is tied to a specific schema version. Schema changes trigger validation checks before new data can be routed to the model.

In every one of these cases, the model is working exactly as designed. The data it was given is the problem.

‍

Recognise any of these in your own surrogate project?

The fix is almost never a different algorithm — it’s the data layer underneath. See how we solve all 4 in one infrastructure pattern.

book a demo

read related case study

‍

What the fix actually looks like

The four failure modes all have the same root cause: data preparation is treated as an ML engineering problem when it is actually an infrastructure problem. Here is what the fix looks like when you treat it correctly.

‍

Schema consistency enforced on ingest

Every simulation run writes into the same structured schema, regardless of which solver produced it. Velocity fields always stored under the same variable name. Pressure fields always stored under the same variable name. Mesh metadata always stored in the same location. A CFD engineer using Fluent and another using Star-CCM+ both write into the same schema without knowing the other exists.

‍

Metadata captured automatically

Boundary conditions, material models, mesh settings, solver version, simulation run date, engineer owner, all captured as structured metadata on the dataset, queryable and filterable. Not embedded in file names. Not in a notes column.

‍

Queryable training set assembly

An ML engineer can pull "every run with design variant X, operating condition Y, solver version Z or later" and get a clean, consistent training set without a preparation phase. Weeks of data prep become a single query.

‍

Workflow as a reusable artefact

The preparation logic, feature engineering, and validation checks are stored as reusable workflows. The next training run on the next program does not start from scratch. It starts from the workflow that trained the last model. This is the same pattern we describe in our CAE data management post.

‍

The Tier 1 e-motor result

A Tier 1 automotive supplier working on e-motor development wanted to accelerate their RFQ process. Historically, producing an RFQ required running multiple CFD simulations across design variants, a process that took 1.5 months per RFQ.

The bottleneck was not the simulations. It was assembling the training data for a surrogate model that could predict the simulation outcomes without running the full solver each time. All four failure modes were present simultaneously.

Using Key Ward, the team structured their historical CFD data into a consistent schema, aligned variable names, canonical units, captured metadata on every boundary condition and material model. They trained a surrogate model on that structured dataset. The model predicts performance across the design space for new RFQ requests in minutes.

The result: RFQ turnaround from 1.5 months to days, with a 90% reduction in simulation compute cost because the solver is now only used for final validation, not for exploration. The surrogate model itself was not novel. The infrastructure that made it trainable and deployable was.

‍

See how the Tier 1 team did it end-to-end

The full case study walks through the architecture, the schema design, and the deployment — everything needed to replicate it.The fix is almost never a different algorithm - it’s the data layer underneath. See how we solve all 4 in one infrastructure pattern.

read case study

book a demo

‍

Why this matters beyond one pilot

Surrogate modelling is the entry point. Once the data preparation is solved, every downstream AI capability becomes viable, anomaly detection, design space exploration, predictive analytics, ROMs. McKinsey’s 2025 State of AI shows this divide in the numbers: the 5.5% of organisations seeing real EBIT impact from AI have almost universally built the data foundation first. The 95% that have not cannot move past their first model.

That is why we build Key Ward the way we do. Not as a surrogate modelling tool. As the engineering data infrastructure that makes surrogate modelling, and every other AI use case, tractable.