> Except... you can run generated code to see if it’s correct.
I have seen this idea in many different places. I really think we should stop saying it.
Running code obviously does not confirm that it is correct. Even running a comprehensive set of tests isn’t enough if you haven’t developed a mental model of your system.
The lack of mental model development is what concerns me most about LLM-driven development. Copilot has the right idea, because it generates less code and so the user theoretically has more of an opportunity to grok its output.
Sure, if you blindly accept code just because the LLM has executed it (and maybe written tests for it too) without actually understanding what it does you'll run into trouble.
But if you use it responsibly, LLMs can help you get to that accurate mental model. Code explanation is on of the things that they are really good at.
My personal rule is that I won't commit code generated by an LLM unless I'm confident I can explain how that code works to someone else.
I agree. And this is why I wish that the narrative wasn't "LLMs will make you faster", and instead was "LLMs will improve the quality of your work".
There are certainly times when you just need to pump out code that happens to be very easy to validate, and then you can actually achieve the 10x throughput improvement that so many like to claim.
But generally, reviewing code is going to be at least as hard, if not harder, than writing the code. If we view coding assistants as a pair programmer, the collaboration has a throughput overhead, but two programmers should produce higher quality work than one isolated programmer.
I suspect that this problem can be solved with greater computer power — instead of trying to squeeze everything into the context, LLMs will be fine-tuned to the design and architecture of each codebase.
Essentially, we’re relying only on “short term” memory for code generation at the moment, but should be using a mix of long- and short- term memory, just like human programmers.
Copilot and similar use-cases is like asking a random stranger a question on Stack Overflow. They may or may not be able to provide a useful answer, and that is highly dependent on the amount and quality of the context provided.
If you want an “AI coworker”, then proper “learning” will be needed…
> I have seen this idea in many different places. I really think we should stop saying it.
Stripped of context, sure. But in context in the article it makes perfect sense, it's about how hallucination isn't as much of a problem for coding as you might expect
I have seen this idea in many different places. I really think we should stop saying it.
Running code obviously does not confirm that it is correct. Even running a comprehensive set of tests isn’t enough if you haven’t developed a mental model of your system.
The lack of mental model development is what concerns me most about LLM-driven development. Copilot has the right idea, because it generates less code and so the user theoretically has more of an opportunity to grok its output.