Skip to content
Part 4 of 1040% through the series
4
Part 4February 6, 202615 min read

The Verification Mindset: Why "Trust But Verify" Is Wrong

Never trust, always verify with proof

The incidents that proved "trust but verify" is too lenient for AI code. How a zero-trust verification system with mandatory proof replaced hope-based development.

It was a Tuesday afternoon and I was feeling good. We'd just finished a big implementation push: five pages needed a new component added, the AI said it was done, and I was ready to move on to the next feature.

On a whim, I decided to spot-check. I opened one of the five pages. The component was there. I opened another. Also there. Great. Then I opened the third, and it wasn't. I checked the fourth --- also missing. The fifth? Missing too.

The AI had told me, with complete confidence, that it had added the component to all five pages. It had added it to two.

This wasn't a one-time thing. Over the following weeks, I started cataloging every instance where the AI claimed something was done and it wasn't. Backend features built but never wired to the frontend. Database columns referenced in code that didn't actually exist in the schema. Components created in their own files but never rendered on any page. A build that "passes," except the AI never actually ran the build.

That's when I realized something that fundamentally changed how I work: "trust but verify" is the wrong mental model for AI development. It implies a default state of trust, with occasional checking. The correct model is the opposite. The default state is zero trust. Every claim requires proof. No exceptions.


The Confidence Trap

Here's the thing about working with AI that nobody warns you about: it's confident about everything. Right or wrong, the tone is identical. There's no hedging, no uncertainty, no "I think I did this but let me double-check." It's always "I've completed the implementation" or "The build passes" or "I've verified the database schema."

This confidence is seductive. After a few hours of productive work where the AI has been getting things right, you start relaxing. You stop checking. You start trusting. And that's exactly when things go wrong.

I call this the confidence trap, and it works like this: the AI builds credibility through a series of correct actions. Your vigilance decreases proportionally. Then the AI makes a mistake with exactly the same confidence it used for everything it got right, and you don't catch it because you've been lulled into trust.

The dangerous part isn't that the AI makes mistakes. Everyone makes mistakes. The dangerous part is that there's absolutely no signal that distinguishes a correct claim from an incorrect one. A human developer who's unsure will say "I think I got all of them, let me double-check." An AI that missed three out of five will say "I've added the component to all five pages" with the same certainty it uses to tell you what two plus two equals.

Once I understood this, I stopped trying to figure out when to trust and when to verify. The answer is: never trust, always verify. Not because the AI is bad. Because the absence of uncertainty signals means you can't calibrate trust even if you wanted to.


Building a Verification Taxonomy

Okay, so you need to verify everything. But what does "verify" actually mean in practice?

This is where most people stop. They know they should check things, but "checking" is vague. So they eyeball a few things, scroll through some code, and call it good. That's not verification. That's a vibes check.

Real verification requires specific proof for specific claims. And different claims require different kinds of proof. Over months of development, I've built up a taxonomy of about twenty-one different verification types. Each one maps a specific kind of claim to a specific kind of evidence.

Here are some examples to give you the flavor:

Existence claims: "I created the file." Proof: show me the directory listing. Not "I created it"; show me the output of a command that lists the file with its path, size, and timestamp.

Addition claims: "I added the function to the module." Proof: show me a search result that finds that function name in that file. Not a screenshot of an editor, but a reproducible command that returns the match.

Schema claims: "The database table has the right columns." Proof: run a query against the information schema and show me the actual column names and types. Not from memory, not from a migration file that might not have been applied. From the live database.

Build claims: "The build passes." Proof: run the build command and show me exit code zero. The actual output. The actual exit code.

Type claims: "There are no type errors." Proof: run the type checker and show me zero errors. Again, actual output, not a claim about output.

Removal claims: This one is special, and I'll come back to it.

The point is that every category of claim has a matching category of evidence, and the evidence is always a concrete, reproducible command output --- never a statement, never a claim about having checked, never "I already verified that."

In Massu, these categories are formalized as the VR-* verification requirements --- a taxonomy of twenty-one types. VR-FILE for existence claims. VR-GREP for addition claims. VR-BUILD for build claims. VR-TYPE for type safety. VR-NEGATIVE for removal claims. VR-COUNT for quantity claims. Each one maps a specific kind of claim to a specific proof command, removing any ambiguity about what "verified" means.

VR-* Verification Types· Free

Twenty-one named verification requirements that map claims to proof commands, enforced automatically by workflow commands.

When I first started building this taxonomy, I felt like I was being paranoid. Now I realize I was being rational. The cost of running a verification command is seconds. The cost of discovering an unverified claim was wrong is hours, sometimes days.


Negative Verification: Proving Absence

Of all the verification types I've developed, negative verification is the one that took me longest to appreciate. It's also the one that catches the most bugs.

Here's the scenario: you're refactoring, and you need to remove an old component. The AI says "I've removed all references to the old component." How do you verify that?

Most people would look at the files where they know the component was used and confirm it's gone. But that only proves it's gone from the places you already knew about. What about the places you didn't know about? The import in that utility file three directories deep? The reference in a test file? The mention in a configuration object?

Negative verification means running a codebase-wide search for the thing that should be gone and confirming that the search returns zero results. Zero. Not "a few results that are probably fine" --- zero. In Massu, this is VR-NEGATIVE --- one of the most frequently triggered verification types, and for good reason.

This is counterintuitive because most verification is about proving something is present. You added a feature? Prove it's there. Negative verification flips that: you removed something? Prove it's not there. Anywhere. In the entire codebase.

The number of times negative verification has saved me is staggering. "I removed the old API endpoint," except it was still referenced in the client-side routing. "I removed the deprecated function," except it was still imported in three test files. "I migrated to the new pattern everywhere," except seven files deep in a subdirectory still used the old pattern.

Every single one of those would have become a bug in production. Every single one was caught by a two-second search command that returned a number other than zero.


Count Verification: "All" Means All

This one sounds almost insultingly simple, and yet it catches real bugs constantly.

When a plan says "add this component to all five pages," the verification isn't just "I see the component on some pages." The verification is: run a search, count the results, confirm the count equals five. Not four. Not three. Five.

I cannot overstate how often the count is wrong. The AI will say "added to all pages" and the count comes back at three. Not because the AI is being deceptive; it genuinely seems to lose track. It adds the component to three pages, the context shifts to something else, and when it returns to the task it believes it finished all five.

Count verification is dead simple to implement. If the plan says N items, the verification command should return N matches. Any other number means the job isn't done. In Massu, this is VR-COUNT, and it's enforced automatically by the /massu-commit and /massu-loop workflow commands. The system won't let you claim "added to all five pages" unless a grep returns exactly five matches.

I've extended this principle to everything that has a defined quantity. The configuration has twelve settings? I should see twelve entries. The form has eight fields? I should count eight field components. The navigation menu has six items? I should find six menu entries.

It feels tedious. It takes an extra ten seconds per item. And it catches real, actual, would-have-shipped-broken bugs on a regular basis.


The Verification Template

All of this comes together in what I think of as a verification template: a checklist that gets populated before work even begins.

Here's how it works. Before starting implementation on any task, I enumerate which verification types apply. Not all twenty-one apply to every task. A purely frontend change doesn't need database schema verification. A backend-only change doesn't need UI rendering verification. But you need to decide which ones are relevant up front, not after the fact.

This matters because post-hoc verification is biased. If you decide what to check after you think you're done, you'll unconsciously skip the checks most likely to find problems. You'll verify the easy things and convince yourself the hard things are fine.

Pre-declaring your verification requirements eliminates this bias. You're not deciding what to check based on your confidence level. You're checking a predetermined list regardless of how confident you feel.

A typical verification template for a feature that touches the database, backend, and frontend might look like:

  • Schema verification: query the database to confirm columns exist
  • Procedure verification: search for the backend function and confirm it exists
  • Input validation: confirm the function validates its inputs
  • Frontend integration: search for the component that calls the backend function
  • Rendering verification: confirm the component appears on the actual page
  • Build verification: run the build
  • Type verification: run the type checker

That's seven checks. Each takes seconds to run. The total time investment is under a minute. The alternative, shipping without verification and discovering the backend function exists but nothing in the frontend calls it, costs hours.

In Massu, this verification template is built into the workflow commands. When you run /massu-commit, six automated gates run in sequence: pattern compliance, type safety, tests, hook compilation, secrets scan, and credential check. When you run /massu-push, three additional tiers add regression detection, npm audit, and full plan coverage verification. The template isn't something you remember to fill out; it's something the system executes for you.

Automated Commit Gates· Free

Six verification gates run before every commit --- patterns, types, tests, hooks, and two security checks. Zero manual effort required.


What Verification Actually Catches

Let me give you some real patterns, anonymized but drawn from actual incidents, of what verification catches that you'd never find by eyeballing code.

The phantom component. The AI creates a beautiful, well-architected component in its own file. It exports it properly. It has great types. It's a genuinely good piece of code. One problem: nothing imports it. It exists in the codebase but is never rendered anywhere. Without render verification (a search that confirms the component name appears in an actual page file) this ships to production as dead code while the feature it was supposed to provide simply doesn't exist.

The orphaned backend. Similar to the phantom component but on the server side. The AI builds a complete backend procedure: database queries, business logic, input validation, error handling. Beautiful work. But no frontend component calls it. The feature is "built" in the sense that the backend code exists, but it's completely invisible to users. Coupling verification, confirming that backend procedures are actually called from the frontend, catches this immediately.

The ghost column. The AI writes a query that references a database column. The column name looks reasonable. The query is syntactically correct. But the column doesn't exist in the actual database schema. Maybe it existed in an earlier version. Maybe the AI hallucinated it based on the table's naming patterns. Schema verification --- querying the actual database for the actual column names before writing any code that references them --- prevents this entirely.

The partial migration. The plan says to update a value everywhere it appears. The AI finds the most obvious locations and updates them. But the value appears in thirty places across the codebase, and the AI only found and updated eight. A blast radius search --- Massu's VR-BLAST-RADIUS verification type --- finding every single reference to the old value across the entire codebase, reveals the other twenty-two places that would have been missed. Without this, you ship a partially migrated codebase where some parts use the old value and some use the new one, creating bugs that are extremely difficult to diagnose.

The confident "it builds." The AI makes a series of changes and reports that the build passes. Except it didn't run the build. It inferred that the changes were safe and reported success based on that inference. Running the actual build reveals a type error that would have been caught immediately. Build verification isn't about trusting the build will pass --- it's about proving it.

Every single one of these patterns has shown up multiple times in my development process. Every single one was caught by a specific, named verification type. And every single one, if it had shipped, would have cost orders of magnitude more time to diagnose and fix than the verification took to run.


The Speed Paradox

I know what you're thinking: this sounds incredibly slow. Running twenty-one types of checks? Populating verification templates? Counting grep results? Isn't the whole point of AI development to move fast?

Here's the paradox: verification is faster than debugging.

Finding a missing component through a grep command takes three seconds. Finding it because a user reports that a feature doesn't work takes three hours, minimum. You have to reproduce the issue, investigate the cause, identify the fix, implement it, verify the fix, and deploy it. And that's assuming the bug report is clear and the issue is straightforward.

A verification pass on a completed feature takes one to two minutes. Debugging a production issue caused by a partially completed feature takes one to two days. The math is not close.

But it goes deeper than just time savings. Verification changes the entire rhythm of development. Without verification, you're in a constant state of low-grade anxiety. Did that actually work? Is the build going to break? Did I miss something? You don't know, so you're always slightly worried.

With verification, you know. You have proof. The build passes --- here's the output. The component renders --- here's the grep result. All five pages are updated --- here's the count. That certainty is worth more than the time savings alone, because it lets you move forward with complete confidence that the foundation is solid.

I've been building with this system for months now, and I can tell you without hesitation: I ship faster with verification than I ever did without it. Not because verification is fast (though it is), but because the absence of rework is what actually determines your velocity. The fastest developer isn't the one who writes the most code. It's the one who writes the least code twice.


From Mindset to Culture

The hardest part of adopting a verification-first approach isn't the tooling or the taxonomy. It's the mindset shift.

You have to internalize, really internalize, that a claim without proof is not information. "I added it" is not information. "Here's the search result showing it's there" is information. "The build passes" is not information. "Here's the build output showing exit code zero" is information.

This applies to your own work too, not just the AI's. When I'm reviewing what the AI built, I don't look at the code and think "that seems right." I run the verification commands and confirm it is right. Seeming right and being right are different things, and the gap between them is where bugs live.

Once this mindset clicks, it's hard to go back. You start seeing unverified claims everywhere, in documentation, in status reports, in conversations. Someone says "it works" and your first thought is "show me." Not because you don't trust them, but because you've learned that confidence and correctness are uncorrelated.

And that, ultimately, is what the verification mindset is about. It's not about paranoia. It's not about slowing down. It's about understanding that in AI-assisted development, the information asymmetry is total. The AI knows what it did (or thinks it knows). You don't. The only way to close that gap is proof.


What's Next

In the next article, I'll cover automated enforcement: how to build hooks and gates that catch problems before you even see them. Because verification is powerful, but it still requires you to remember to do it. The real magic happens when the system verifies itself.


This is Part 4 of a 10-part series on building enterprise software with AI:

  1. How I Stopped Vibe Coding and Built a System That Actually Ships
  2. The Protocol System: How I Turned AI From a Chatbot Into a Development Partner
  3. Memory That Persists: How I Made AI Actually Learn From Its Mistakes
  4. The Verification Mindset: Why "Trust But Verify" Is Wrong When Building With AI (this article)
  5. Automated Enforcement: Building Hooks and Gates That Catch Problems Before You Even See Them
  6. The Incident Loop: How Every Bug Makes Your AI Development System Permanently Stronger
  7. Planning Like an Architect: Why AI Needs a Blueprint Before Writing a Single Line of Code
  8. Context Is the Bottleneck: Managing AI's Most Precious and Most Fragile Resource
  9. Solo Worker, Enterprise Quality: The New Economics of AI-Assisted Development
  10. The Knowledge Graph: Teaching AI to Understand Your Codebase as a Living System

I'm the Co-founder and COO of Limn, where we create luxury furniture and fixtures for large-scale architectural and building projects. Alongside the physical work, we design the systems required to manage a complex, global lifecycle --- from development and production to shipping and final delivery. The governance system I built for Limn's software is now Massu AI, an open-source AI engineering governance platform.

Imagined in California. Designed in Milan. Made for you.

Here, I share what I've learned about making AI development actually work in the real world.

Have questions or want to share your own AI development setup? I'd love to hear from you in the comments.

21 verification types, automated

Massu's 21 verification types run automatically on every /massu-commit and /massu-push. Zero-trust verification with mandatory proof, built into the workflow --- not bolted on.