I don't want to single you out, but what should I be taking away from the often recited "well people sometimes* can't do thing X either"-counterargument? Is all of this just fine? Can't we expect just a little bit more, I don't know, accuracy or rigor from a computer than a living person?
We've already passed the point where LLM's are better than human experts for medical diagnoses. In fact, according to this study, even LLM's alone are more accurate than human experts + LLM's, meaning any input the humans added was only a detriment to the accuracy
Computers are already perfectly accurate, and have been for decades in explicit quantifiable fields. In medicine, since a computer cannot perfectly replicate every single cell in the human body, its abstractions will be lower resolution than reality, but what matters is whether that low resolution abstraction is better than the alternative (human doctors).
A human doctor couldn't bring up a list of citations in literature instantly regarding a diagnosis. A LLM can.
Even if that paper hadn't said >>>"We are therefore very cautious to extrapolate our findings toward any implications about the LLM’s utility as
a standalone diagnostic tool"
Your post would be a extraordinary claim and need extraordinary evidence, not a specific study of a specific scenario.
Lots of data is pointing to the same conclusion: GPT-4 is at least as good if not better than human experts in medical diagnosis, at least in the areas studied. Thus the probability of a correct diagnosis is higher, thus safer, with GPT-4 than with any individual human expert.
This is so silly, the one study you linked to says that GPT 4 may have been trained on the answers to the test they gave it. So smart.
And since GPT 4 can't examine a patient's body the claim it's better at diagnosis that a human doctor seems like such a wacky thing to search the internet for "studies" to prove in the first place.
A nurse can examine a patient's body. Medical tools can and report their diagnostics with high precision. GPT-4 is multi-modal.
I feel you are nitpicking because you don't like the idea of an LLM being better than a human expert. Even if they weren't better than doctors nowadays, the chance they won't be in 1-2 years is tiny.
What I'm doing isn't nitpicking. I don't know the point of linking to studies is when you draw conclusions that have nothing to do with the study.
I just watched a video saying people are confused about what these models can do because
1) tech companies don't tend to say what they can do and leave users to figure it out.
And
2) Tech enthusiasts tend to exaggerate what they can do.
In your case I'm sure ChatGPT itself will tell you your comments are wrong- but for tech enthusiasts like yourself the AI is only wrong when it tells you it isn't all knowing, apparently.
It's like the bit in Monty Python's Life of Brian where the protagonist says he's not the messiah and a woman shouts "Only the true messiah would deny his divinity!"
> A human doctor couldn't bring up a list of citations in literature instantly regarding a diagnosis. A LLM can.
TFA is, literally, about LLMs spouting out erroneous medical references. I don't care about made up medical references or court cases.
I'm sure there are ways to bring up instantly a list of publication regarding a diagnosis (which a LLM may or may in the future correctly give: the diagnosis I mean) but I'm really not sure a LLM is what's needed to do then generate the list of related publication. I mean, FFS, they are compressed, lossy, knowledge.
LLMs are going to become tools as part of a toolchain. They're not a panacea.
They could, but they don't, at least that is what I'm getting from the article.
Even if they do, someone with the capability and understanding required (ie. not me) needs to bring that source up and verify that the claims align with the citation; the authors decided to use GPT-4 for this: "We adapted GPT-4 to verify whether sources substantiate statements and found the approach to be surprisingly reliable." I'm not happy with that either.