My team has measurably gotten our LLM feature to have ~94% accuracy in widesprea...

agentcoops · 2025-07-16T16:12:56 1752682376

Yeah, I've had similar results. Even with GPT-o1, I find almost all errors at this point come from the web search functionality and the model taking X random source as an authority. It's interesting that I find my human intelligence in the process is most useful for hand-collecting the sources and data to analyze -- and, of course, for directing the process across multiple LLM queries.