Stop Trusting the Model to Remember. Gate the Substrate Instead

The bug is boring. A user reads another tenant's data. Nobody defends it in a design review, and yet broken access control sits at number one on the OWASP Top 10. It ships because the rule lives in the wrong place: a prompt, a checklist, a shared assumption that every future engineer, and now every model invocation, will remember and reapply it correctly.

With AI generating most of the code, that assumption fails outright.

Reuben Brooks calls this the behavioral gate problem. You put "authorization IS VERY IMPORTANT" in your agent instructions. You populate CLAUDE.md. The model follows the rule often enough to feel safe and breaks it often enough to be dangerous. The model has to remember the rule, recognize where it applies, and resist the pull of local context. Then a human reviewer has to maintain that invariant across sixteen thousand generated lines. That is too many places for the constraint to live.

His answer is structural gates. Compilers, type checkers, linters, proof checkers. Each one produces a concrete answer about the artifact in front of it. When the code is wrong, it refuses. That refusal is the point. It moves the enforcement out of the model's instruction space and into the substrate the model is building against.

The core bet, stated plainly in his post, is this: for a wide class of production software, structural backpressure beats incremental improvements in agent intelligence. Existing models can already write almost all of your code. The limiting factor is whether you can know they did what you wanted. That knowledge comes from the substrate, not from waiting for a smarter model.

Tests are not enough on their own. Tests are empirical. They check the cases you and the model remembered to write. They say nothing about the handler someone adds next week.

Brooks built a tool and methodology called Shen-Backpressure to explore this in practice. The framing is a loop: the agent writes code, formal verification gates check the artifact, and failure feeds back as a signal the agent must resolve before continuing. The gates live in the language you are already shipping, not in a separate annotation system you have to maintain.

The practical implication is direct. Before your next AI coding sprint, ask where your invariants actually live. If the answer is "in the system prompt" or "in code review," they are behavioral gates. They will drift. Pick one critical invariant, broken access control is a good candidate, and move it into something that compiles, type-checks, or formally verifies. Make the agent unable to ship code that violates it, not just unlikely to. That is the shift. Smarter models are coming, but the substrate you give them today is the thing you control.