Hacker News new | past | comments | ask | show | jobs | submit login

Lexis+ AI and Ask Practical Law AI systems produced incorrect information more than 17% of the time, while Westlaw’s AI-Assisted Research hallucinated more than 34% of the time:

https://hai.stanford.edu/news/ai-trial-legal-models-hallucin...






Just out of curiosity, what's the human lawyer baseline on that?

The failures are different from my experience in this.

Human lawyers fail by not being very zealous and most of them being very average, not having enough time to spend on any filings, and not having sufficient research skills. So really, depth-of-knowledge and talent. They generally won't get things wrong per se, but just won't find a good answer.

AI gets it wrong by just making up whole cases that it wishes existed to match the arguments it came up with, or that you are hinting that you want, perhaps subconsciously. AI just wants to "please you" and creates something to fit. Its depth-of-knowledge is unreal, its "talent" is unreal, but it has to be checked over.

It's the same arguments with AI computer code. I had AI create some amazing functions last night but it kept hallucinating the name of a method call that didn't exist. Luckily with code it's more obvious to spot an error like that because it simply won't compile, and in this case I got luckier than usual, in that the correct function did exist under another name.


> Just out of curiosity, what's the human lawyer baseline on that?

Largely depends on how much money the client has.


it's the self-driving car problem. Humans aren't perfect either but people like to ignore that.

True, they're similar... But what's also similar is that people make the mistake of focusing on differences in failure rates while glossing over failure modes.

Human imperfections are a family of failure-modes which have a gajillion years of experience in detecting, analyzing, preventing, and repairing. Quirks in ML models... not so much.

A quick thought-experiment to illustrate the difference: Imagine there's a self-driving car that is exactly half as likely to cause death or injury than a human driver. That's a good failure rate. The twist is that its major failure mode is totally alien, where units attempt to inexplicably chase-murder random pedestrians. It would be difficult to get people to accept that tradeoff.


No, people have the correct intuition that human errors at human speeds are very different in nature from human rate errors at machine speeds.

It's one thing if a human makes a wrong financial decision or a wrong driving decision, it's another thing if a model distributed to ten million computers in the world makes that decision five million times in one second before you can notice it's happening.

It's why if your coworker makes a weird noise you ask what's wrong, if the industrial furnace you stand next to makes a weird noise you take a few steps back.


I'm sure it's no where near good enough yet, but a legal model getting the answer right 83% of the time is still quite impressive imo.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: