Can You Teach a Neural Network to Stay Consistent

A review of Dehtyriov, MacArt & Sirignano (arXiv:2605.26358, May 2026) — and why the most interesting thing about it isn’t the neural network.

There’s a failure mode in data-driven turbulence modelling that almost nobody puts in their headline figure.

You take a neural network. You train it on beautiful, high-fidelity DNS data to predict the Reynolds stresses from the local mean-flow gradients. On the held-out DNS fields, it’s gorgeous — anisotropy tensors that line up, error bars that make your supervisor smile. Then you plug it into an actual RANS solver, press run, and the thing either diverges in twenty iterations or quietly converges to something physically wrong.

I’ve watched versions of this happen. It’s the kind of result that doesn’t make it into a talk, because the story it tells is uncomfortable: the model was never the problem. The deployment was.

A new paper out of Notre Dame and Oxford — Daniel Dehtyriov, Jonathan MacArt, and Justin Sirignano, posted to arXiv on 25 May — names this failure mode precisely and then does something about it. The model they propose, DARSM (Deep Algebraic Reynolds Stress Model), is interesting. But what I find genuinely worth writing about is the diagnosis underneath it, because it reframes what we should even be asking of a data-driven closure.

Let me unpack the whole thing, and then I’ll tell you where I think the real ceiling is.

The closure problem, in one breath

If you’ve read anything else on this blog you can skip this paragraph, but for completeness: the RANS equations buy you something like ten orders of magnitude in cost over resolving every eddy directly. The price is a single unclosed term — the Reynolds stress tensor — that encodes the effect of all the turbulent fluctuations you averaged away. You don’t get the equations to close themselves. You have to model that tensor, and a century of turbulence modelling is, at heart, a century of arguments about how.

The default answer — the Boussinesq hypothesis — says the Reynolds stress is proportional to the mean strain rate through a scalar eddy viscosity. It’s cheap, it’s robust, and it’s wrong in any flow with meaningful anisotropy. (Regular readers know this is the hill I do my own research on.) So the obvious modern temptation is: throw the Boussinesq assumption out, and let a neural network learn the stress–strain relationship directly from data.

The temptation is right. The naive execution is where it goes wrong.

Two ways to fail, and why both are real

The paper frames the field as caught between two failure modes, and I think the framing is exactly correct.

Failure mode one: bypass the equations. You train a network to map flow features straight to Reynolds stresses, with no physical scaffolding. These models are flexible, but they need enormous amounts of high-fidelity data to generalise, and high-fidelity data is the one thing we never have enough of. Ask them to extrapolate to a Reynolds number or a geometry they didn’t see, and they fold.

Failure mode two: keep the equations, but train the closure offline. This is the more popular and more insidious one. You retain the RANS machinery, but you train the network on inputs sampled from DNS fields. The trouble is that at deployment, the network never sees DNS inputs — it sees the solver’s own, still-converging, slightly-wrong RANS solution. The input distribution it’s asked to predict on is not the distribution it trained on. In machine-learning terms, that’s distribution shift, and the consequence is either a destabilised solver or quietly degraded accuracy.

This second point is the part I want to sit on, because it’s so easy to miss. The network isn’t broken. The training pipeline is internally inconsistent. You optimised for a world (clean DNS inputs) that doesn’t exist at inference time (messy RANS inputs). It’s the data-driven equivalent of validating your solver on the exact solution and then being surprised it struggles on a real mesh.

What DARSM actually does

Here’s the move. Instead of asking a neural network to predict the Reynolds stresses — or even the anisotropy tensor — directly, DARSM asks it to predict the parameters of an implicit algebraic equation for the anisotropy, an equation derived from the Reynolds-stress transport equations under the weak-equilibrium assumption.

If “weak-equilibrium” rings a bell, it should: it’s Rodi’s 1976 assumption, the foundation of every algebraic Reynolds stress model (ARSM) we’ve used for fifty years. The idea is that the anisotropy of the Reynolds stresses doesn’t change much following the flow — you neglect the transport (advection and diffusion) of the anisotropy even while the stresses themselves are advected and diffused. That single assumption collapses a set of stiff transport equations into an algebraic relation you can actually solve.

So DARSM is a hybrid with a specific division of labour:

The physics provides the structure — the algebraic closure equation, with its built-in tensor representation and invariance properties. The network can’t violate it.
The network provides the coefficients — it maps the local flow invariants to the empirical parameters that classical ARSMs traditionally fixed by hand or fit to a handful of canonical flows.

This is a meaningfully different philosophy from “let the network learn the stress.” The network isn’t being trusted with the physics. It’s being trusted with the calibration the physics always left open. That’s a much smaller, much better-posed job — and crucially, it’s why the model can be trained on small datasets. You’re not asking it to discover turbulence from scratch; you’re asking it to fill in the constants the theory never pinned down.

And because the closure stays embedded in the governing equations during training, the distribution-shift problem largely dissolves. The network learns on the inputs it will actually face.

The results, and the one that matters

The headline numbers are strong. On the canonical square-duct and periodic-hill benchmarks, DARSM cuts the average test velocity error against baseline RANS by two to four times across a range of Reynolds numbers, geometries, and flow regimes — with peak case-level reductions reported as high as twelve times. It also beats five established data-driven approaches the authors line up against it: offline-trained networks, tensor-basis neural networks, field-inversion machine learning, DeepONets, and physics-informed neural networks.

Beating the offline-trained baseline is the whole point — that’s the distribution-shift argument made empirical. Beating tensor-basis neural networks is the more telling comparison, because TBNNs are also a physics-structured approach (they enforce the same tensor representation Galilean invariance demands). DARSM coming out ahead suggests the gain isn’t only from respecting invariance — it’s from keeping the closure consistent with the solver throughout.

But the result I keep coming back to is the generalisation one. They train DARSM on the square duct — an attached, anisotropy-driven secondary-flow case — and then, without retraining, apply it to the periodic hills, where the flow separates. That’s not an interpolation. Separation is a genuine change in the governing physics: adverse pressure gradients, recirculation, a shear layer that doesn’t exist in the duct. A model that transfers across that boundary is doing something more than curve-fitting. It’s the closest thing to evidence that the physical structure is carrying the model into regimes the data never showed it.

So here’s my honest take

I want to be clear about how much I like this, because the critique that follows is not a dismissal.

The diagnosis is correct, and it’s the part the field most needed to hear out loud. Most of the “ML for RANS” literature has been an arms race on architecture — bigger networks, fancier embeddings — when a large share of the real-world failures were never about expressiveness. They were about consistency: training the closure in a setting that doesn’t match deployment. DARSM treats that as the primary problem rather than a footnote, and the offline-baseline comparison turns that argument into a measurement. That’s good science.

I also think the philosophy — let physics own the structure, let the network own the calibration — is the right long-term bet. It’s the same instinct behind the symbolic-regression closures I wrote about last time: the most durable data-driven models will be the ones that constrain the network’s job down to something the network is actually good at, and hand the rest to theory. DARSM is a clean instance of that principle.

But.

The ceiling on this model is not set by the neural network. It’s set by the weak-equilibrium assumption the whole algebraic framework rests on.

Weak-equilibrium says the anisotropy is in quasi-equilibrium — that you can neglect the material transport of the anisotropy tensor. That’s an excellent approximation in attached, slowly-developing shear flows. It is a much worse approximation exactly where RANS already hurts most: strong non-equilibrium, rapid distortion, streamline curvature, the recovery region downstream of separation, the near field of a developing jet — anywhere the turbulence has memory and hasn’t caught up to the local mean strain. A perfectly trained network sitting on top of an algebraic closure cannot represent physics that the algebraic closure assumed away. No amount of data fixes a structural assumption.

Which makes the periodic-hill generalisation both the most impressive result and the most interesting one to interrogate. Separation is precisely a non-equilibrium event. So the question I’d want answered isn’t “does the average error go down?” — it clearly does. It’s where does the residual error live? My strong prior, from staring at anisotropy-invariant maps in free-shear flows for the past year, is that whatever error remains will concentrate in the regions where the weak-equilibrium assumption is weakest — the separation point, the early shear layer, the spots where the anisotropy is being actively reshaped rather than locally equilibrated. If the error is uniform, that would genuinely surprise me and would say something deep about how forgiving the algebraic form is. If it clusters in the non-equilibrium regions, then DARSM has found the best possible calibration of a structurally limited model — which is a real and useful thing, but a different claim from “we solved data-driven closure.”

That’s not a knock on the paper. It’s the question the paper sets up, and I suspect the authors know it. The honest framing is that DARSM extracts close to the maximum performance available within the weak-equilibrium family. The next frontier isn’t a better network on the same algebra. It’s whether we can let the data-driven part reach into the transport of the anisotropy itself — the term the algebra throws away — without losing the consistency and the small-data trainability that make DARSM work in the first place. That’s a much harder problem, and it’s where I think the genuinely new physics is hiding.

The thing worth keeping

Strip away the architecture and DARSM leaves behind a principle I think will outlast it: train the closure in the environment it will be deployed in, and let physics carry the structure the data can’t. That’s a correction to years of offline-trained, distribution-shifted models that looked wonderful on DNS and fell apart in the solver.

The eddy-viscosity assumption took the field a remarkably long way on a hypothesis we knew was wrong. Weak-equilibrium has done the same. DARSM is, in a sense, the most sophisticated thing you can build on top of weak-equilibrium — a learned, consistent, generalising calibration of a fifty-year-old approximation. That’s an achievement. It also tells you exactly where to dig next.

When I meet God, I’m still going to ask about turbulence. But I’d settle, for now, for knowing where the anisotropy stops equilibrating.

Read the paper: Dehtyriov, MacArt & Sirignano, “Deep Learning-based Algebraic Reynolds Stress Closures for RANS Simulations of Turbulent Flows,” arXiv:2605.26358 (2026).

Can You Teach a Neural Network to Stay Consistent? DARSM and the Distribution-Shift Trap in Data-Driven RANS

The closure problem, in one breath

Two ways to fail, and why both are real

What DARSM actually does

The results, and the one that matters

So here’s my honest take

The thing worth keeping

Further Reading

Leave a Reply Cancel reply

Can You Teach a Neural Network to Stay Consistent? DARSM and the Distribution-Shift Trap in Data-Driven RANS

The closure problem, in one breath

Two ways to fail, and why both are real

What DARSM actually does

The results, and the one that matters

So here’s my honest take

The thing worth keeping

Further Reading

Related articles

Leave a Reply Cancel reply