Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

99.99% seems off by orders of magnitude to me. I don't have an exact number but I routinely see GPT 3.5 hallucinate, which is inconsistent with that level of confidence.

I've noticed this discussion tends to get too theoretical too quickly. I'm uninterested in perfection, 99.99% would be good enough. 70% wouldn't. The actual number is something specific, knowable, and hopefully improving.



I think it's way better than 70%, probably 95%+ even with bad data and poor prompts. I'd have to run more numbers but it's definitely better than 70%.

You can get to 99.9%+ with good data and well designed prompts. I'm sure it would be above 90% even with almost intentionally bad prompts, tbh.


It's definitely not that good if we share a definition of poor data/prompts.

This afternoon I tried to use Codium to autocomplete some capnproto Rust code. Everything it generated was totally wrong. For example, it used member functions on non-existent structs rather than the correct free functions.

But I'll give it some credit: that's an obscure library in a less popular language.


> This afternoon I tried to use Codium to autocomplete some capnproto Rust code.

This isn't what I said at all. I said with summarizing data.


I don’t have hard numbers but anecdotally hallucinating has gone down significantly with gpt4, it certainly still happens though.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: