>So, we're supposed to rely on LLM not hallucinating that it is allowed to do what it wants?
Yes. Frontier models have been moving at light speed over the last year. Hallucinations are almost completely solved, particularly with Anthropic models.
It won't be long before statements like this sound the same as "so you mean I have to trust that my client will always have a stable internet connection to reach out to this remote server for data?".
This is missing the human language ambiguity problem. If you don't perfectly specify your requirements and it misinterprets what you're asking for that's going to be a problem regardless of how smart it is. This is fine with code editing since you've got version control and not so great when running commands in your terminal that can't be as trivially reverted.
As a regular Claude user, incorrectness is not anywhere near solved; it may be domain-dependent, but just yesterday 3.5 invented a Mac CLI tool that does not exist (and would’ve been pretty useful if it had). I cannot take anything factual at face value, which is actually OK as long as net/net I’m still more productive.
Yes. Frontier models have been moving at light speed over the last year. Hallucinations are almost completely solved, particularly with Anthropic models.
It won't be long before statements like this sound the same as "so you mean I have to trust that my client will always have a stable internet connection to reach out to this remote server for data?".