I've used Claude with a large, mature codebase and it did fine. Not for every po...

airstrike · 2025-07-07T14:29:45 1751898585

IMHO LLMs are notoriously bad at test coverage. They usually hard code a value to have the test pass, since they lack the reasoning required to understand why the test exists or the concept of assertion, really

wrs · 2025-07-07T15:27:45 1751902065

I don’t know, Claude is very good at writing that utterly useless kind of unit test where every dependency is mocked out and the test is just the inverted dual of the original code. 100% coverage, nothing tested.

conradkay · 2025-07-07T17:21:05 1751908865

Yeah and that's even worse because there's not an easy metric you can have the agent work towards and get feedback on.

I'm not that into "prompt engineering" but tests seem like a big opportunity for improvement. Maybe something like (but much more thorough):

1. "Create a document describing all real-world actions which could lead to the code being used. List all methods/code which gets called before it (in order) along with their exact parameters and return value. Enumerate all potential edge cases and errors that could occur and if it ends up influencing this task. After that, write a high-level overview of what need to occur in this implementation. Don't make it top down where you think about what functions/classes/abstractions which are created, just the raw steps that will need to occur" 2. Have it write the tests 3. Have it write the code

Maybe TDD ends up worse but I suspect the initial plan which is somewhat close to code makes that not the case

Writing the initial doc yourself would definitely be better, but I suspect just writing one really good one, then giving it as an example in each subsequent prompt captures a lot of the improvement

girvo · 2025-07-08T05:53:10 1751953990

I've not gone into it yet, but I think BDD would fit reasonably well with agents and generating tests that aren't entirely useless.

astrange · 2025-07-07T21:53:27 1751925207

This is why unit tests are the least useful kind of test and regression tests are the most useful.

I think unit tests are best written /before/ the real code and thrown out after. Of course, that's extremely situational.

flir · 2025-07-07T14:19:17 1751897957

Don't want to put words in the parent commenter's mouth, but I think the key word is "unsupervised". Claude doesn't know what it doesn't know, and will keep going round the loop until the tests go green, or until the heat death of the universe.

mike_hearn · 2025-07-07T14:20:58 1751898058

Yes, but you can just impose timeouts to solve that. If it's unsupervised the only cost is computation.