> Modern high-end transformers have no problem producing code, while benefiting ...

jcalvinowens · 2025-06-24T00:42:29 1750725749

This is it for me. If you ask these models to write something new, the result can be okay.

But the second you start iterating with them... the codebase goes to shit, because they never delete code. Never. They always bolt new shit on to solve any problem, even when there's an incredibly obvious path to achieve the same thing in a much more maintainable way with what already exists.

Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them. Until then, I remain a hater, because they only seem capable of the opposite :)

unshavedyak · 2025-06-24T04:57:53 1750741073

> because they never delete code. Never.

That's not true in my experience. Several times now i've given Claude Code a too-challenging task and after trying repeatedly it eventually gave up, removing all the previous work on that subject and choosing an easier solution instead.

.. unfortunately that was not at all what i wanted lol. I had told it "implement X feature with Y library", ie specifically the implementation i wanted to make progress towards, and then after a while it just decided that was difficult and to do it differently.

soulofmischief · 2025-06-24T00:51:37 1750726297

You'd be surprised what a combination of structured review passes and agent rules (even simple ones such as "please consider whether old code can be phased out") might do to your agentic workflow.

> Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them.

They can already do this. If you have any specific code examples in mind, I can experiment for you and return my conclusions if it means you'll earnestly try out a modern agentic workflow.

jcalvinowens · 2025-06-24T01:20:05 1750728005

> You'd be surprised

I doubt it. I've experimented with most of them extensively, and worked with people who use them. The atrocious results speak for themselves.

> They can already do this. If you have any specific code examples in mind

Sure. The bluetooth drivers in the Linux kernel contain an enormous amount of shoddy duplicated code that has amalgamated over the past decade with little oversight: https://code.wbinvd.org/cgit/linux/tree/drivers/bluetooth

An LLM which was capable of refactoring all the duplicated logic into the common core and restructuring all the drivers to be simpler would be very very useful for me. It ought to be able to remove a few thousand lines of code there.

It needs to do it iteratively, in a sting of small patches that I can review and prove to myself are correct. If it spits out a giant single patch, that's worse than nothing, because I do systems work that actually has to be 100% correct, and I can't trust it.

Show me what you can make it do :)

recursive · 2025-06-24T00:15:11 1750724111

> It will also still happily turn your whole codebase into garbage rather than undo the first thing it tried to try something else.

That's not true at all.

...

It's only pretending to be happy.

olavfosse · 2025-06-24T13:29:32 1750771772

My Claude is happy to git restore and try a different approach when it walked itself into a corner ;)

soulofmischief · 2025-06-24T00:24:23 1750724663

That's a combination of current context limitations and a lack of quality tooling and prompting.

A well-designed agent can absolutely roll back code if given proper context and access to tooling such as git. Even flushing context/message history becomes viable for agents if the functionality is exposed to them.

jashmatthews · 2025-06-24T02:58:15 1750733895

Can we demonstrate them doing that? Absolutely.

Will they fail to do it in practice once they poison their own context hallucinating libraries or functions that don’t exist? Absolutely.

That’s the tricky part of working with agents.