Here's a very small piece of I code I generated quickly (i.e. <5 min) for a small task (I generated some data and wanted to check the best way to compress it):
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
> I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
Where is the breakpoint here? What number of lines of code or tokens in a codebase when it becomes not worth it?
30 to 40ish in my experience. Current state of the art seems to lack thinking well about programming tasks with a layer of abstraction or zooming out a little bit in terms of what might be required.
I feel like as a programmer I have a meta-design in my head of how something should work, and the code itself is a snapshot of that, and the models currently struggle with this big picture view, and that becomes apparent as they make changes. Entirely willing to believe that Just Add Moar Parameters could fix that (but also entirely willing to believe that there's some kind of current technical dead-end there)
https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).