I Let AI Build My Website

I am guilty of vibe coding. Look, I know what you think: some dude, probably a wannabe developer with no skills, used some pseudo-AI/LLM to create something simple. It probably sucks, maybe looks good on the surface, barely works, and is full of bugs.

But before you leave, please give me a chance to explain.

Meet Your Unlikely Vibe-Coding Guinea Pig

So, I am actually quite a seasoned developer, if I may humbly say so. If you care to check me out, look at my CV or LinkedIn. Otherwise, you just have to take my word that I have over 20 years of heavy-duty software engineering experience. I know C++, Python, C#, Java, and a bunch of other languages quite well. The stuff I built from scratch or worked on has been very diverse. I designed and implemented 3D engines, high-availability data analytics platforms, and various video streaming and computer vision solutions from scratch – just to name a few things among many.

One thing I have comparatively little experience with is web development. I did some PHP 20 years ago, React and Node like 10 years ago, and some very basic HTML and CSS once a year or so. But that's it. I've mostly been a library, server software, and classic desktop software developer.

So, I wanted to have a blog to write down and publish some stuff for a long time. I looked at options every once in a while: WordPress, Medium, this and that. But a voice in my head nagged me, saying, "You should build this thing yourself. You can probably do that. It is not that hard, and you'll learn something valuable from this experience." I made a few attempts but gave up quickly because I'm the kind of guy that wants to do things right and use the "state of the art" from the beginning. Not that smart, and rather self-sabotaging – I know.

And there were so many options: so many technologies, frameworks, etc. The web development world seems to evolve very quickly, and I couldn't keep up. I lost motivation quite fast.

When AI Agent Mode Entered My Life

Then something happened. In March 2025, I heard of the term "vibe-coding" – as did most developers.

Vibe coding is a software development approach that heavily relies on AI tools to generate code, with developers primarily providing high-level intent through natural language prompts.

The term describes that instead of writing code yourself and setting up environments, etc., you just tell an LLM in "agent mode" (semi-autonomous operation) what you want, and it will just do it for you.

Actually, what happens is that, like a magical "monkey paw," it often does something kinda like you wanted but not quite. To a degree this is simply because most people do not express all the intention in sufficient detail so that there is enough room for bad assumptions to be made. This is a problem for people and machines. Until you have aligned enough, that is. You'll often have to tell an LLM exactly to do this and that differently, fix build issues, etc., but in a time period much shorter than it would have taken you to learn how to do all these things yourself, it just creates what you want. That is the ideal case. This doesn't work out too well a lot of the time. Not yet.

In April 2025, VS Code made the "Agent mode" for GitHub Copilot available to the public. I watched a YouTube video of someone from the VS Code development team making himself a coffee and reading his emails while his VS Code instance just worked on its own on some instructions it was given to create some website – with passable results. So I thought to myself, let's give that a try. If it didn't work out, I wouldn't have even lost a lot of time.

So I updated VS Code, went to Copilot chat, and told it I wanted a blog. I don't have the exact prompt anymore, but I basically said:

I want a personal homepage and blog, fully static generated HTML and JS, no backend. I'll provide the content via Markdown files because I like Markdown. Write the generator code in Python because I like Python and can maintain that myself easily, and because I know Python has great support for such things. Use modern web technologies.

Let It Cook

And it went to work. I had to allow it to run various console commands, but it mostly worked on its own for a while. As it was describing what it would do and then doing it, I would already spot issues with the architecture of the code it was creating. But I let it finish. I'd get rate-limited every once in a while and had to switch models.

And yet, to my surprise, it did create something that was actually working superficially: a very generic-looking website, straight as if generated by Hugo with a really boring and simple theme.

But I was also impressed a little. It actually worked as described and produced something that was not looking entirely terrible. I didn't give a lot of detailed instructions, so it made a lot of assumptions (bad and good ones). There was no linting, no tests, only very basic build instructions, and an overly detailed README.

The AI Started Lying to My Face

If I were a non-engineer, I might not have noticed and cared for many of the issues and left it that way. But then I asked it to improve and change various things in the way I wanted them to be. It didn't really work well at first - sometimes not at all.

It would sometimes "lie" (I know the term implies consciously saying something that isn't true, but you know what I mean) to me when I told it to change some layout in the HTML/CSS, telling me it found the issue and fixed it while doing "something" that didn't really fix or change it how I asked it to. The Python code it generated was full of issues that a linter would complain about. There were many bad redundancies and very questionable system design choices. When I told it some design or code element should also be on some other part of the code, it would often just duplicate it instead of finding a way to reuse or refactor it. It didn't really have an overview of the full scope. Especially when I started a new chat. Copilot actually tells you what context (files) it uses in the current conversation. So I could often anticipate that it would not work well if some important context was not automatically picked up by Copilot. You can provide it context manually, but I found that quite tiresome.

So, I was reviewing code, telling it what it should do, waiting, and telling it again often - with some hints or specific instructions of what I knew already was wrong - waiting again, and then hoping it would do it right this time. Telling it to change designs/CSS for a certain effect was especially frustrating. I don't know CSS well, so I couldn't help it much. And obviously, it can't see the rendered website at all by default, so it does a bunch of guesswork. For some issues, like the "share" button on the article page, I spent an hour alone repeatedly instructing Copilot and waiting for it to achieve the design goals. But it did work. I could have intervened manually in many situations, but I almost never did, not even for simple stuff. I wanted that true vibe-coding experience, and I got it. It was great and horrible at the same time.

Technical Difficulties (And My Sanity)

The models available in agent mode had different strengths and weaknesses - some better at Python, others at CSS, all with their own quirks. But the real problem wasn't the models themselves.

VS Code would bug out occasionally and print API calls into the chat window instead of executing them. Models would get errors and just... stop. They'd forget which linter the project used or fail to run basic terminal commands.

At times, these bugs got VERY frustrating, and I had to just stop often for my own sanity.

Update 2025-08-01: The tooling has gotten much better, and newer models are significantly more capable. But the core experience remains the same.

The missing context problem got better when I eventually made use of .github/copilot-instructions.md and made sure this file was always used as context in any chat. More about that later.

Why LLMs Are Just Digital Humans with Amnesia

The view I had of LLMs has actually not changed much in all of this. I've been an early adopter of ChatGPT and have been paying for it basically since ChatGPT Plus exists. For coding, it has always felt like the junior developer I mentioned earlier.

And this is one of the main differences in perspective that I have with many of my peers. They often view and expect these LLMs to work as smarter and more capable versions of highly precise algorithmic solutions, like an upgrade of a calculator. These people point to an LLM's inability to count the number of words in a sentence it generated or multiply two medium-sized numbers.

And that is, I believe, where they are wrong and need to adjust their perspective. LLMs are modeled more closely after how human brains work than traditional computing solutions are. They make very similar mistakes as humans do! And you can't have creativity without the capacity for mistakes. These models are creative because they can make mistakes.

Some would argue that LLMs are not creative at all - they're just regurgitating, at best recombining, things they were trained with. This is true. But humans are not fundamentally different. We're also "trained" by our environment. Artists have inspirations, whether they know it or not. Each generation builds on the training data provided by the previous generation.

And interestingly, humans struggle with pretty much the same tasks an LLM struggles with. I'm not going to multiply large numbers in my head or count letters in a sentence either - I'd make mistakes. I use tools: calculators, programming languages, pen and paper. LLMs now do the same thing - they write and run programs for mathematical calculations instead of "guessing" with their flawed general reasoning.

But here's the key difference that became crystal clear during my vibe-coding experience: humans learn and remember. If I had tasked a human junior developer with creating a static site generator and blog template, I might have gotten a very similar result. But I can explain to that person what they did wrong, and they'll generally remember it and improve over time. LLMs don't do that - yet. They will make the same mistakes in every new conversation.

And that is what is ultimately frustrating about it.

I guess we have to wait for some more technological breakthroughs and engineering to fix this. What we are already seeing is that LLMs get a standardized protocol to interact with other services (and the world by extension) via, e.g., MCP and retain memories via e.g. vector databases.

What I am personally waiting and hoping for is a model that I can teach and train on my own and that just remembers everything I taught it on the fly. I'm sure we are getting there. I'm even sure we aren't far from it.

And once our artificial brains have many extensions to interact with the world, once they have memories and continuously enhance themselves, we are very close to a true AGI. It actually might just happen without us noticing it.

I, for one, welcome our new AI overlords.

VS Code + Github Copilot tips

In the meantime, I have some tips to make working with agentic AI in VS Code a little less frustrating. In the process of all of this, I've learned a lot about how to work with an LLM agent along the way too. Here is some of it:

Use `.github/copilot-instructions.md`

When the agent gets something wrong repeatedly or does something in a certain way you don't want, put a specific instruction in there. Curiously, it doesn't always work, but often does. Here are some of mine I used for this project (an excerpt):

- Always consider the info from the `README.md` and linked documents.
- Don't be so agreeable. If you think something is not a good practice, say it. If you think something is missing, say it. If you think something is not clear, say it. Discuss it with me.
- Discuss and provide options before altering architecture or introducing new dependencies.
- Ask questions to get clarifications - especially if I ask for something that is questionable from design or engineering perspective before you implement something.
- After you made changes, check for problems using the build task.
- In order to avoid inconsistencies and bugs, be on the lookout for code and configuration that can be reused, or refactored to be made reusable, and then do it. But do maintain architectural boundaries.

There are now also instruction files which can teach Copilot to apply certain instructions to specific file types.

Make use of the `#file:<somefile>` tool

Copilot will try to find all the relevant files and context on its own, but it often fails to see or understand which those would be. And because of that, it often won't see a redundancy or won't remove all parts of something if you don't specifically tell it which files to look at.

Not all models are equal. Some are more equal than others

Try different models to figure out their strengths. They do actually noticeably perform differently on some tasks. There is no shame in giving the same task to different models and comparing results. VS Code has a very nice "undo" feature to help with that. E.g., Gemini 2.5 Pro is very fast, and I found Claude Sonnet 3.7 to be best at working with HTML and CSS.

Let it work with you

I know the idea is alluring to have the agent do all the work unattended and get a good result after an hour. Except that won't work. I had a much more interactive experience when I instructed Copilot to ask me questions and lay out plans before executing them. Yes, you have to sit there and watch it work, but I would have lost even more time fixing the mistakes it made because it didn't present multiple solution options or ask for missing information, instead just making assumptions. And you'll learn something along the way too when it explains stuff to you and gives you options. And obviously, that is also how you should work with a human: frequent alignment and discussions for both of you to learn something is good.

Was It Worth It?

I estimate I spent about 40 hours creating just the Python static website generator from scratch and the example blog HTML and CSS. I think that is considerably faster than I would have been able to do it entirely on my own. The codebase is actually not terrible (anymore). I did prevent the worst stuff from happening - at least of what I was aware of. Also, a lot of that time was spent exploring the agentic-LLM technology itself. I had to figure out how to work well with it.

Since then, I spent even more time on a lot more stuff. I don't know how much. Another 40 hours maybe? By now this site has lots more CSS and JavaScript. And a Golang backend for serving all of it. And a podman container and a Terraform deployment. And a cloud-init, an nginx, an SSL certificate with auto-renewal, and a task system for me to manage all of that. I can now clone a single repo from my GitHub and deploy this entire site from scratch into a Hetzner Cloud project within 5 minutes by running 2-3 simple commands. I can do everything I want locally, update markdown content or just the Go backend or just the TLS certificate and upload and have the changes running in under a minute. Everything is automated.

In all of this, an LLM helped me to varying degrees. And even though LLMs, specifically Copilot, helped a lot with it, I also learned a lot.

I feel like I worked with a very fast junior developer with curiously broad and detailed knowledge in all kinds of technologies but sketchy methodology, craftsmanship, and discipline, which, luckily, I was able to provide in many instances. And in fact, the experience is really not that different.

One thing I am sure of now, though, is that you should have a good understanding of the technology that you are asking the LLM to help you with. Because if you don't and one junior dev reviews the code of another, you might get something which works but would induce pure horror in a true expert in the field. And because I'm still me, I still dove deep into some of the stuff I had the LLM doing for me in the first instances.

Overall, this is a success for me. Yet I feel affirmed in my suspicions about AI/LLM that I had from the day that ChatGPT 3.5 was made public: It can be a productivity tool for experienced developers if used right and very dangerous in the hands of novices. But you can use it as a tool to discover technology. Just don't let it do all the work for you if that is your situation. You won't learn much if you do.

I'm happy that LLMs provide access to knowledge and ability to many people. More people are going to be able to do things now they weren't able to do before on their own. Like the internet in general and search engines later were, it is another democratization and a force multiplier.

As another consequence, there is going to be a lot of shitty code out there which no one will fix. I still don't like the idea of this being used for critical software - not one bit. But for my blog, it is enough, and this blog likely wouldn't exist if I hadn't used it. I assume the same is true for many other projects out there.