> Modern high-end transformers have no problem producing code, while benefiting from a wealth of knowledge that far surpasses any individual.
It will also still happily turn your whole codebase into garbage rather than undo the first thing it tried to try something else. I've yet to see one that can back itself out of a logical corner.
This is it for me. If you ask these models to write something new, the result can be okay.
But the second you start iterating with them... the codebase goes to shit, because they never delete code. Never. They always bolt new shit on to solve any problem, even when there's an incredibly obvious path to achieve the same thing in a much more maintainable way with what already exists.
Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them. Until then, I remain a hater, because they only seem capable of the opposite :)
That's not true in my experience. Several times now i've given Claude Code a too-challenging task and after trying repeatedly it eventually gave up, removing all the previous work on that subject and choosing an easier solution instead.
.. unfortunately that was not at all what i wanted lol. I had told it "implement X feature with Y library", ie specifically the implementation i wanted to make progress towards, and then after a while it just decided that was difficult and to do it differently.
You'd be surprised what a combination of structured review passes and agent rules (even simple ones such as "please consider whether old code can be phased out") might do to your agentic workflow.
> Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them.
They can already do this. If you have any specific code examples in mind, I can experiment for you and return my conclusions if it means you'll earnestly try out a modern agentic workflow.
An LLM which was capable of refactoring all the duplicated logic into the common core and restructuring all the drivers to be simpler would be very very useful for me. It ought to be able to remove a few thousand lines of code there.
It needs to do it iteratively, in a sting of small patches that I can review and prove to myself are correct. If it spits out a giant single patch, that's worse than nothing, because I do systems work that actually has to be 100% correct, and I can't trust it.
That's a combination of current context limitations and a lack of quality tooling and prompting.
A well-designed agent can absolutely roll back code if given proper context and access to tooling such as git. Even flushing context/message history becomes viable for agents if the functionality is exposed to them.
It will also still happily turn your whole codebase into garbage rather than undo the first thing it tried to try something else. I've yet to see one that can back itself out of a logical corner.