>So, we're supposed to rely on LLM not hallucinating that it is allowed to do wh...

feznyng · on Feb 3, 2025

This is missing the human language ambiguity problem. If you don't perfectly specify your requirements and it misinterprets what you're asking for that's going to be a problem regardless of how smart it is. This is fine with code editing since you've got version control and not so great when running commands in your terminal that can't be as trivially reverted.

Besides that, you can absolutely still trick top of the line models: https://embracethered.com/blog/posts/2024/claude-computer-us...

Hallucination might be getting better, gullibility less so.

threecheese · on Feb 3, 2025

As a regular Claude user, incorrectness is not anywhere near solved; it may be domain-dependent, but just yesterday 3.5 invented a Mac CLI tool that does not exist (and would’ve been pretty useful if it had). I cannot take anything factual at face value, which is actually OK as long as net/net I’m still more productive.