I want to talk about context. More specifically, what do you actually know about the thing you ship?
That question slapped me in the face the other day while I was talking to the proto-writer’s-room AI I built. I told it, “it needs to have better context,” and then realized I had no idea whether the code that implemented that request actually did what I meant. I did not write that code. I did not have the mental map.
This is not a hot take. It’s a slightly panicked observation from a 50 year old who still likes knowing how the sausage is made. When you used to write everything yourself you could answer a bug report with, “yeah, I know where that’s coming from.” Now? Sometimes not even close.
Why this feels weird
When I was younger I wrote low level stuff. Assembler to C to whatever language was fashionable. Each step abstracted away cognitive load. You gain speed and move faster, but you also become reliant on abstractions. You trust the layers below you. You accept you do not need to think about register allocation anymore.
AI is the same kind of abstraction, just more aggressive. Now the machine writes higher level flows and components for you. It pulls together libraries, wiring, tests, and maybe deployment scripts. You can get to working software faster. But you lose the internal map of how things fit together.
And that loss matters because software is fragile. Bugs happen. Someone files a bug and in the old world you have a hunch, a path to reproduce, a likely cause. With AI-authored code you can still triage. But it feels like being handed a stranger’s car and asked to fix the engine.
How I actually felt
Nervousness. Not drama. Nervousness because shipping things I cannot fully reason about is a legit risk. I’m curious enough to try it, and I can see a future where this is fine. But right now it’s complicated, and that complication deserves respect.
What’s at stake
- Shipping something you cannot reason about. If a feature goes wrong, can you own the fix? Can you explain it to customers or stakeholders? Can you backport a safe patch without breaking ten other things? Maybe, maybe not.
- Unknown bug surface. Is it a typo or an auth regression? If an AI recomposed the route handling and introduced a subtle security hole, you might never guess that by eyeballing the code.
- Team trust and velocity. Engineers need to trust the system they deploy. If they cannot form a coherent internal model, people slow down or stop pushing things.
You can mitigate these, but the mitigation looks different than the old days of reading every line of code.
What might replace the old mental model
I do not think you get the same 1:1 mental model back. And that might be okay. But if you are going to let AI write parts of your system, you have to build better instruments than just hoping the code “feels right.” Here are the patterns I would demand from any system I let write my guts. These are practical, not theoretical.
- Tests to nausea
Not unit tests as religion. Multi-layered, automated, and aggressive. Contract tests, integration tests, end-to-end tests that exercise real flows, property-based tests when it fits. Run them in CI and pre-prod. If you cannot test it thoroughly, do not ship it.
- Reasoning across large codebases
Your tools need to answer, “what could break if I change this line?” Not just lint rules. Real impact analysis: call graphs, dependency and data flow. If your AI cannot explain blast radius, you need better instrumentation.
- Provenance and explainability
Who wrote this snippet? When? Which prompt made it? Treat AI outputs like generated artifacts with metadata. You want to trace the origin of surprising behavior, not just stare at the compiled blob and guess.
- Human in the loop and red teams
AI writes the first pass. Humans review. But reviews need structure. Red teams should actively try to break the AI’s assumptions: security reviews, performance checks, UX sanity checks. Automate the mundane parts of review, but keep skeptical humans around.
- Orchestrators and automated remediation
If a bug is reported, let the system do triage for you. Have the AI propose fixes, run them in sandboxed CI, and present diffs with test results and risk assessments. This does not remove human ownership, but it buys time and data to make a sane decision.
- Specs, not surprises
Use spec-driven development. If your AI fills in implementation from a clear spec, you might not know every line of code, but you know the contract it must satisfy. Contracts are legible. Legibility buys options.
- Logs, metrics, and contract-level monitoring
If you lack a mental model, you need instruments that tell you when reality deviates from expectation. Feature-level metrics, request tracing, contract violation alerts. These are the closest thing to a mental model after the code itself.
Concrete example from the trenches
The writer’s-room AI I’m building is a tiny case study. I told it what contexts to capture. I did not write the code that wires that context into every article. So when I asked it, “do you know this context?” I literally did not know the answer. The AI might be bookmarking it correctly, or it might be ignoring half the list. I cannot tell without diving into generated code, or proving it with tests and logs.
I could write a set of acceptance tests that assert the generated agents have access to the named contexts. I could also require the AI to produce a one-paragraph explanation of how it wired context into the output. Both are sane. But I did not do that the first time. That mistake is the point.
Final thought
I do not think we will ever hold the whole mental model again in the way we used to. And I think that is okay. The reality of abstraction is we accept some unknowns to move faster. But we cannot be naive. We need systems that give us confidence even when we cannot reason about every line.