AI LLM Vibe Coding Technical Debt

Update 2025-11-20: Improved article structure.

All Your Coworkers Are Probabilistic Too

When people complain about large language models, I often feel like they're also complaining about their coworkers without realizing it.

People Are Probabilistic Too

If you've worked in software long enough, you've lived through this: you write a ticket, explain a feature in a meeting, and a week later you look at the result and think, "This is technically related to what I said, but it is not what I meant at all."

There is a very specific dialogue I've had more than once in my career. It usually happens after a feature demo.

PM: "This isn't quite what I wanted."

Dev: "Well, it is what I thought we agreed on."

In all cases, it's a symptom of the same thing: the words we exchanged did not capture the actual intent and context in our heads. We thought we had an agreement, but we only had a rough sketch.

I've worked with enough people to know that many struggle to express their wishes well, whether these are even clear to themselves in the first place. Most of the time people have only a rough vision. Maybe that's all they should offer, given that many problems only have solutions found through constant exploration and refinement. Vague specifications are almost the norm in our industry.

Nobody is surprised that the first shot at many problems needs multiple iterations when humans are involved. We shrug, we sigh, we clarify the spec, refine the user story, comment on the pull request and we fix it. Sometimes multiple times.

For over twenty years I've seen work critiqued and revised, every single day. And now that we have that exact same problem with LLMs, somehow some consider this a critical failure that makes working with these models pointless.

Machines Used to Be Predictable

The real problem is that we spent decades with machines that behaved in a very specific way. You send code into a compiler and it either accepts it or it doesn't. The machine never argues back, never claims it fixed something it didn't fix, never improvises. We built entire mental models around that determinism.

Now we have machines that behave much more like people: probabilistic, hungry for context, and occasionally confidently wrong. We expected deterministic tools but got probabilistic coworkers instead. The behavior isn't new. We've always worked with humans who act this way. It's just coming from an unexpected source.

Some people still struggle with this shift. They want LLMs to be as reliable as compilers or sorting algorithms. But that misses the point. Creativity implies the capacity to make errors. Filling in gaps that weren't explicitly specified is inherently creative work, even when, and especially when, it occasionally fails. The discomfort isn't a flaw in the technology. It's friction from an outdated mental model.

Treat It Like a Teammate

I've written before about building this site with LLM assistance in I Let AI Build My Website. This article focuses less on that specific project and more on the patterns that emerged from it.

LLMs improvise when instructions are vague. They produce something that's technically correct in a broad sense but not quite aligned with everything else in the system. Once I stopped expecting deterministic behavior and started treating the whole setup as something that needs guidance and constraints, the frustration went away.

The same practices that work with humans work with models. When I work with colleagues, I don't throw goals at them and hope for the best. We talk about constraints, trade offs, what success looks like and what we explicitly don't care about. We look at examples. We clarify edge cases. When I do the same in a prompt, the outcome improves in exactly the same way. People called that "prompt engineering" earlier. I still think of it as writing a better spec.

A lot of people tell me they "tried" using an LLM for coding and gave up because it kept making the same mistakes. When I ask what the interaction looked like, it usually boils down to: long prompt, long wait, big blob of code, disappointment.

When I had Copilot change some CSS for the share button on this site, the raw interaction was painful. It would claim it had fixed the problem while doing nothing useful. I'd point at the issue and it would adjust something else. It felt like arguing with a stubborn coworker who hasn't fully understood the problem but won't admit it.

The moment I forced myself into a different loop, things improved. I made it write a plan first. I commented on the plan before it touched anything. I instructed it to run the standard build and check tasks and to inspect the output itself. With that in place, the interaction looked almost exactly like working with a teammate who is still getting familiar with the system. A bit slow at first, but steadily converging.

Context Is Everything

In any nontrivial system, nobody has the whole picture in their head. You rely on diagrams, design docs, comments in the code and, unfortunately, a lot of institutional memory. If a new teammate struggles, it's not because they're stupid. It might be because half the relevant information is smeared across people's brains and a decade of Git history.

Humans at least have agency. They actively look for information you didn't tell them about. LLMs can't do that. They only see what you show them, and they perform poorly when important context is missing.

LLMs have a hard context window and zero ability to peek outside of it. If I don't show it a particular module, as far as the model is concerned that module doesn't exist. Early on, I assumed that working inside my editor meant the agent would "just know" where everything lived. It didn't. It would happily apply a pattern in one part of the codebase and ignore three other places where the same pattern existed, because those files weren't in view.

Things got noticeably better once I started treating context as something I had to curate. I wrote down project wide rules in places like docs/system-design.md and docs/coding-style.md, then forced the LLM to read them for every new task via .github/copilot-instructions.md and later AGENTS.md. I kept short, high density overview documents and pointed the model at relevant files explicitly instead of hoping it would guess what mattered.

Eventually I made it easy to discover all relevant documentation from a single starting point, the README.md, and made sure everything there is to know about how to work within the project can be found by following links from that document. Now it's almost enough to just tell the agent to look at the README and follow instructions to figure out how to work correctly within the codebase.

Incidentally, that is the same thing I'd have to do for a human colleague joining the project. With this approach I don't have to tell either an LLM or a person anything. I can just give them the repository. And if an LLM works well in this environment, I've implicitly demonstrated that a person could easily succeed as well.

In my career, this wasn't abstract. I spent years in companies with large, aging, poorly maintained codebases where onboarding new engineers was predictably miserable. Every new hire had to reverse engineer the system from scattered comments, half remembered conversations and whoever still happened to be around. What we were missing are exactly the things we now have to build for LLMs to give them a chance to work at all: coherent docs, a clear entry point into the system, tests that tell you what's safe to change, some written down sense of "how we do things here".

People shouldn't have to rely on tribal knowledge to deal with this kind of technical debt, but we tolerate it until the problem becomes unbearable. LLMs just make it unbearable faster.

Machines Are Still Machines

This might read like "LLMs are just people, treat them nicely." While the parallels are real, there are important differences.

The first is memory. When I explain an architectural decision to a colleague, I expect them to remember the important parts. If we have the same conversation three times and the same mistake keeps happening, we have a different discussion. With LLMs, every fresh session starts as if we've never met. They don't remember that I prefer composition over inheritance in this codebase, or that I don't want another YAML parser dependency. If that knowledge matters, it has to be somewhere the system can see every time.

Over time, this pushed me away from the idea of "teaching" the model and towards teaching the environment around it. Instead of hoping a preference will stick, I encode it explicitly: in instruction files, in reusable prompts, in automated checks that reject rule violating changes. I'm not building a relationship with an entity. I'm building rails that a stochastic process has to run on.

The second is judgment. The engineers I respect most will occasionally look at a requirement and say "this doesn't make sense" or "this contradicts what we decided last week." LLMs are surprisingly good at exploring a design space when you ask them to, but you need to ask. By default they cheer for your ideas and are happy to make any silly thing work exactly as requested. Some humans can be sycophantic too, but they at least have the option of calling nonsense nonsense. And because of that, one of my standard instructions is: criticize any design and technology decisions I make, lay out alternatives and advise me on better approaches. There are also significant differences between models on this front that are worth paying attention to.

The third is responsibility. People come with motives, egos, fears, ambitions. Models have none of that. They don't care whether I like them and they don't take it personally when I discard their work. That's convenient, but it also makes it dangerously easy to project responsibility onto them. "The AI did it" is a tempting sentence. It is also meaningless. The model didn't decide to deploy untested code. You allowed it to. If something goes into production that shouldn't, the fault lies with whoever wired the system together.

The last is scale. Doubling the throughput of a human team means making the same people more effective or adding more people, both slow and expensive. With LLMs, you can spin up several parallel attempts at the same feature and review the results. Simon Willison talks about running multiple coding agents in parallel and then doing the human work around that. I haven't seriously tried that yet. I might soon.

The Same Boring Practices

If you accept that all your coworkers, whether carbon based or silicon based, are probabilistic, the question becomes how to make that tolerable. The reassuring answer is that you don't need an entirely new discipline.

Being clearer about requirements is a good starting point. "Build a blog" is not a requirement, it's wishful thinking. Writing down what you actually care about, what you don't care about, and which corners you're fine cutting helps both the person sitting next to you and the model running in some datacenter.

Shorter feedback loops help as well. Ask for a plan, poke holes in the plan, let the model implement one small piece, run the checks, see what happens, repeat. It's the same pattern I try to follow with humans: don't disappear for two weeks and present a surprise, keep the steps small and visible.

Some people look at this, write a spec, discuss it, refine it, only then let anything touch the code and call it a return of the waterfall. I don't see it that way. The problem with classic waterfall was never that someone dared to write a spec. It was that the spec was treated as holy scripture and feedback came far too late. What I'm describing is closer to the agile projects that actually worked: rough outline, thin slice, learn, update, repeat. The specs evolve with every iteration.

Externalizing knowledge is another old idea that becomes unavoidable. Humans can muddle through with half remembered context and "ask Bob" as a strategy. Models cannot. If the way something works only lives in somebody's head or a long gone Slack thread, it's invisible to the system. Writing down the shape of the system, the not obvious constraints and the decisions that shaped it helps everyone, including your future self.

And then there's automated skepticism. Tests, linters, CI, code review rules, branch protections. All the things that make it harder for a rushed human to ship garbage are exactly the mechanisms that make LLMs at scale remotely sane. Simon Willison points this out in his article on "vibe engineering": if you already have solid tests, documentation, automation and review culture, agents become very useful. If you don't, plugging them in just amplifies whatever mess you already have.

None of this is new. We've been using these tools for decades to cope with humans being inconsistent, forgetful and occasionally overconfident. The noise source just got faster and cheaper. That makes the old answers more urgent, not less relevant.

On a Hacker News thread about specification driven development, someone described using an LLM as an "unreliable compiler." I like that analogy, not because it flatters the models, but because humans are unreliable compilers too. Two engineers implement the same spec, you get two different designs. The same engineer, given the same task three months apart, will not write the same code.

Humans are not good at sustained vigilance. Machines are very good at running the same check a thousand times without getting bored. Wiring that vigilance into the pipeline instead of relying on willpower is probably the most leverage you can get, regardless of who or what is writing the code.

If you do all that, the real people working with you will also have a much easier time. The problems LLMs introduce into a software project are mostly older problems turned up to eleven. The upside is that we already know how to make those tolerable. We just have to apply what we know more rigorously.

If you won't do it for the machines, at least do it for the people you work with.

Comments

Loading security question...
This name is not protected - only the secret is
Also makes your name unique
No comments yet. Be the first to comment!