> a single axis on which an ML system underperforms a frigging doctor You do rea...

tinco · on Aug 22, 2023

Are we reading the same paper? In the graph I'm looking at the axis where the model underperforms the doctor is labeled "No inaccurate/irrelevant information", which has nothing to do with making the correct diagnostic.

The three important axes "Answer supported by consensus", "Possible harm extent = No harm" and "Low likelihood of harm" it is performing really similarly to the doctors, probably similar to the graph a single middle of the pack doctor would have.

Are you reading a different graph or am I misunderstanding something about it?

cristiancavalli · on Aug 22, 2023

I think the axis OP is looking at is “more inaccurate information” which medpalm does perform more poorly on.