Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Then you'll get code that passes the tests you generate, where "tests" includes whatever you feed the fuzzer to detect problems. (Just crashes? Timeouts? Comparison with a gold standard?)

Sorry, I'm failing to see your point.

Are you implying that the above is good enough, for a useful definition of good enough? I'm not disagreeing, and in fact that was my starting assumption in the message you're replying to.

Crap code can pass tests. Slow code can pass tests. Weird code can pass tests. Sometimes it's fine for code to be crap, slow, and/or weird. If that's your situation, then go ahead and use the code.

To expand on why someone might not want such code, think of your overall codebase as having a time budget, a complexity budget, a debuggability budget, an incoherence budget, and a maintenance budget. Yes, those overlap a bunch. A pile of AI-written code has a higher chance of exceeding some of those budgets than a human-written codebase would. Yes, there will be counterexamples. But humans will at least attempt to optimize for such things. AIs mostly won't. The AI-and-AI-using-human system will optimize for making it through your lint-fuzz-test cycle successfully and little else.

Different constraints, different outputs. Only you can decide whether the difference matters to you.



> Then you'll get code that passes the tests you generate

Just recently I think here on HN there was a discussion about how neural networks optimize towards the goal they are given, which in this case means exactly what you wrote, including that the code will do stuff in wrong ways just to pass the given tests.

Where do the tests come from? Initially from a specification of what "that thing" is supposed to do and also not supposed to do. Everyone who had to deal with specifications in a serious way knows how insanely difficult it is to get these right, because there are often things unsaid, there are corner cases not covered and so on. So the problem of correctness is just shifted, and the assumption that this may require less time than actually coding ... I wouldn't bet on it.

Conceptually the idea should work, though.


what if you thought of your codebase as something similar to human DNA and the LLM as nature and the entire process as some sort of evolutionary process? the fitness function would be no panics exceptions and latency instead of some random KPI or OKR pr who likes working with who or who made who laugh

it's what our lord and savior jesus christ uses for us humans if it is good for him its good enough for me. and surely google is not laying off 25k people because it believes humans are better than their LLMs :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: