I was lucky to date a girl who was into math, and who was coding those "machine learning" algorithms for a radiology startup here in Shenzhen.
She had a lot of scepticism for what she did. One of biggest showstoppers she said was the unpredictability of errors.
An algo can catch 99% tumors, including tiny ones, bur can randomly pass over very obvious ones which a human radiologist will spot with his eyes closed.
They had a demo day with radiologists, and them throwing tricky edge case xrays at the computer. Edge cases were all ok, but one radiologist pulled his own xray from his bag, with a 100% obvious, terminal stage tumor, and to company's embarrassment, the algo failed to detect it no mater how they twisted and scaled the xray. The guy then just walked out.
Had a similar problem ages ago, and ended up adding a "blindingly obvious tumor" detector pass before the regular pipeline, just to avoid this cognitive dissonance.
This is one of the (many) reasons that practical classification systems, as against research systems, tend to become Frankenstein's monsterish over time. It's naive to think that a single approach and pipeline will cover your domain well.
It seems to me the use case should be to have the radiologist look at a scan for tumors. Then, the algo should look. If they disagree, then the radiologist should look at the difference.
It'll be the best of both.
And in the scans where the algo is wrong, have the scan added to the machine learning database of the algo.
Unfortunately, a lot of times hospitals can only afford one or the other. These systems are very expensive and radiologists and cytologists aren't exactly cheap either. But, I agree, both would be good, especially considering the volume a Cytologist is expected to screen in a single day.
You point out another business opportunity: a developer who understands exactly what regulatory hurdles you need to jump in order to release medical software. I'm not sure exactly what's required in this case, but I'm doubtful there are many cloud providers who are HIPAA compliant.
I'm not sure it would need to be regulated, any more than a medical textbook needs to be regulated. The radiologist would be making any decisions. As for privacy, an x-ray is sent. No personal information whatsoever.
If the radiologist has to look at and double check every scan that algo looked at, then what is the point of the algo? Seems like a useless middleman that get in the way.
Screening is hard work and tedious, so even trained professionals regularly miss things. TP incidence rate is under 1% in the screening population.
There have been studies showing significant improvement from double-reading mammo, for example (i.e. two radiologists, independently). Using an ML approach is trying to give you some or all of this benefit without the cost of redundant reads.
Better to implement a system with a high rate of false positives (more importantly, low rate of false negatives) from the machine learning component, with all positive findings passed onto the radiologist. If the system can reliably (big if) filter out 98% of the chaff then the radiologist can spend a lot more time separating the false positives from the true positives. This approach has worked well for me so far.
This approach is problematic in medical screening applications. Mainly because you don't want to increase the work up rate for false positives since if they involve biopsy and a large screening population, eventually you will kill people this way (indirectly) so there is a pressure to control FP rate.
I guess this falls into a category that the ML algo learns a particular type very well and cannot recognise obvious cases if they look different than the training data. Human intelligence is a mix of pattern matching and attention focus that is hard to provide with a single pattern matching ML project. Isn't there projects try to use multiple pattern matching models combined to decrease the amount of false negatives?
She had a lot of scepticism for what she did. One of biggest showstoppers she said was the unpredictability of errors.
An algo can catch 99% tumors, including tiny ones, bur can randomly pass over very obvious ones which a human radiologist will spot with his eyes closed.
They had a demo day with radiologists, and them throwing tricky edge case xrays at the computer. Edge cases were all ok, but one radiologist pulled his own xray from his bag, with a 100% obvious, terminal stage tumor, and to company's embarrassment, the algo failed to detect it no mater how they twisted and scaled the xray. The guy then just walked out.