The only proper way to code with an LLM is to run its code, give it feedback on ...

alkonaut · 2025-01-03T12:32:54 1735907574

Any LLM-coding agent that doesn't work inside the same environment as the developer will be a dead end or a toy.

I use ChatGPT to ask for code examples or sketching out pieces of code, but it's just not going to be nearly as good as anything in an IDE. And once it runs in the IDE then it has access to what it needs to be in a feedback loop with itself. The user doesn't need to see any intermediate steps that you would do with a chatbot where you say "The code compiles but fails two tests what should I do?"

moffkalast · 2025-01-03T12:35:51 1735907751

Don't they? It highly depends on the errors. Could range from anything like a simple syntax error to a library version mismatch or functionality deprecation that requires some genuine work to resolve and would require at least some opinion input from the user.

Furthermore LLMs make those kinds of "simple" errors less and less, especially if the environment is well defined. "Write a python script" can go horribly wrong, but "Write a python 3.10 script" is most likely gonna run fine but have semantic issues where it made assumptions about the problem because the instructions were vague. Performance should increase with more user input, not less.

alkonaut · 2025-01-03T13:33:16 1735911196

They could, but if the LLM can iterate and solve it then the user might not need to know. So when the user input is needed, at least it's not merely to do what I do know: feed the compiler messages or test failures back to ChatGPT who then gives me a slightly modified version. But of course it will fail and that will need manual intervention.

I often find that ChatGPT often reasons itself to a better solution (perhaps not correct or final, but better) if it just gets some feedback from e.g. compiler errors. Usually it's like

Me: "Write a function that does X and satisifies this test code"

LLM: responds with function (#1)

Me: "This doesn't compile. Compiler says X and Y"

LLM: Apologies: here is the fixed version (#2)

Me: "Great, now it compiles but it fails one of the two test methods, here is the output from the test run: ..."

LLM: I understand. Here is an improved verison that should pass the tests (#3)

Me: "Ok now you have code that could theoretically pass the tests BUT you introduced the same syntax errors you had in #1 again!"

LLM: I apologize, here is a corrected version that should compile and pass the tests (#4)

etc etc.

After about 4-5 iterations with nothing but gentle nudging, it's often working. And there usually isn't more nudging than returning the output from compiler or test runs. The code at the 4th step might not be perfect but it's a LOT better than it was first. The problem with this workflow is that it's like having a bad intern on the phone pair programming. Copying and pasting code back and forth and telling the LLM what the problem with it is, is just not very quick. If the iterations are automatic so the only thing I can see is step #4, then at least I can focus on the manual intervention needed there. But fixing a trivial syntax error beteween #1 and #2 is just a chore. I think ChatGPT is simply pretty bad here, and the better models like opus probably doesn't have these issues to the same extent

seba_dos1 · 2025-01-03T14:22:04 1735914124

> The problem with this workflow is that it's like having a bad intern on the phone pair programming.

Even worse than that - an intern has a chance to learn from this experience, get better and become a senior one day.