Using Deep Learning to Help Pathologists Find Tumors

whafro · on June 21, 2018

I work in this field (not directly on the ML, for the company born out of the winner of Camelyon16) and the last two years of progress has been amazing to watch. Tumor detection has become incredibly accurate, across basically every tissue/tumor type, and we're now making real progress on the next major goal: determining the best therapy for a given patient.

It's a bit of a dirty secret in this space that pathologists have a pretty high error rate on a lot of these tasks — it's just tough work for human eyes to do literally hundreds of times every day. Applying computer vision techniques can not only improve accuracy and reproducibility over human assessment, but you can do types of analysis in seconds-to-minutes that would literally take years for a human. We're just scratching the surface.

There are lots of ML challenges here, but just as many general tech/engineering/design challenges. So if you're interested in working on bringing work like this to the masses, we'd love to talk at PathAI.

ska · on June 21, 2018

That's a bit of an optimistic take. For example, I'm familiar with a number of areas with the imaging community is getting much better at academic challenges, but the resulting models all generalize quite poorly. This has a lot to do with the lack of sufficient labeled data, but it doesn't help that the understanding of tumor morphology is changing pretty rapidly.

It's true that screening is a particularly interesting application because of the issues of fatigue and low true positive rates. On the other hand, decades ago (i.e. well before deep learning approaches) we had clinically approved classifiers that did better than average radiologists for some of these tasks and the uptake still hasn't been that impressive. Lot's of non-technical issues around making stuff like this standard of care.

whafro · on June 21, 2018

I'm definitely an optimist about this, and I have an interest in it. So, grains of salt. But things worth noting:

The lack of labeled data is definitely a challenge, as you call out. But a sizable chunk of what we do is power a platform and network of pathologists to get this data within hours for training purposes. We think there will always be a very real need for human pathologists, but that the bread-and-butter work in pathology can be better handled by well-trained and thoroughly-validated algorithms.

And yeah, the non-technical issues are just as important as the technical ones:

* There's very limited use of digital imagery in clinical pathology at all. Fortunately, that's not the case in research pathology, and the success we've had in that field has been moving clinical labs toward an investment in digital pathology.

* Reimbursement (in the US) will be an issue. There are only a few options for billing payers for pathology reads, and they aren't necessarily in lock-step with the potential future of the industry.

* Like I mentioned, this opens up a class of analysis that just isn't feasible for humans to perform. It's up to us to show the value of this type of analysis.

* The regulatory environment is a real thing. We aren't hiding from this, and are creating processes that allow us to build and iterate software like we'd like to, while still faithfully meeting our regulatory burden.

So far, we've found our approach to be viable, and we've had some really strong early results with our customers (and solid revenue!). So I'm pretty optimistic, for sure.

ska · on June 21, 2018

I think we are largely in agreement.

I believe these techniques will have a huge impact on how we do pathology as well as things like screening radiology, and that part of that will be by breaking down the silos such specialties work in, at least to a agree. I also think we have quite a way to go on the technical side but it is achievable (not to do everything people dream of, but to make significant improvements).

I also think it will take much, much longer than most people on the research side believe (hope?) to even approach standard of care. These systems are not built to move fast.

I'm glad to hear you are getting good/interesting results, and hope you are focusing more on validation and breadth of data acquisition than a lot of groups do :)

ihnorton · on June 21, 2018

> the resulting models all generalize quite poorly

In this field, this kind of technology will be augmentative for the near-term at least, so generalization matters less than elsewhere provided the false negative rate is kept low enough. Outliers can be flagged for direct inspection.

ska · on June 21, 2018

It's a mistake to think that generalization doesn't matter much. In this case your false positive rate can go through the roof also, to the degree the system becomes useless. More generally it means you don't understand well how your system will work on real world data, so can end up with lots of unfortunate surprises.

It's true that if you are assuming a human/algorithm team some things are easier, but that doesn't make the problem go away.

poutrathor · on June 21, 2018

To give a counter point of view here, a close relative of mine is a pathologists. We talked a lot about this subject and so far he is unimpressed by the results. Moreover, it seems that the ability to provide fast (in seconds to minutes) analysis implies powerful computational ability and data network. Finally, the data acquisition right now takes more or less 20 to 45 minutes depending on the patient and the targeted area, which means the promised analysis quickness is not such a real pro.

I trust that science, research and ongoing work will end up providing interesting results, but many many startups will burn their cash before being able to provide real world usage services.

Like you said, there are LOTs of challenges here. Certainly not a "low hanging fruit", not that a profitable business either since most countries will squash costs anywhere they can because the health cost keeps growing, and a very difficult legal environment to deal with.

However, I am very thankful for your hard work and will to push toward a better future for health.

ihnorton · on June 21, 2018

> Moreover, it seems that the ability to provide fast (in seconds to minutes) analysis implies powerful computational ability and data network. Finally, the data acquisition right now takes more or less 20 to 45 minutes depending on the patient and the targeted area, which means the promised analysis quickness is not such a real pro.

In surgical pathology, the patient is still under surgery while a pathologist makes a rapid, preliminary diagnosis on fresh tissue. This can be done in under 10 minutes, but typically takes 20-30 in practice -- the bulk of which time is spent flash-freezing and sectioning. Pathologists have only a few minutes to review after prep, so any digital augmentation technology must run on the order of a few minutes to be of use in guiding the surgery. This guidance may be crucial because many tumors cannot be characterized ahead of time due to the lack of non-invasive diagnostic tools (the most promising is gene profiling of bloodborne cells, but that is very challenging due to low circulating concentrations). Tumor type and grade are a large factor in a surgeon’s aggressiveness-vs-risk calculation, and this is especially true in brain tumors where cognitive damage poses a quality-of-life risk which must be weighed against potential survival gains.

(not a pathologist, but I've done research, classifier, and software dev in the field -- very small, hands-on lab, so spent a huge amount of time in frozen section, OR, and with paths reviewing sections)

taneq · on June 21, 2018

I thought the primary importance of machine vision for this kind of cancer screening was that it doesn't give so many false negatives due to fatigue? IIRC there was a study that went back over cancer patients' screening tests and found that most of them had failed to identify visible, diagnosable tumors well before their disease was actually identified.

raducu · on June 21, 2018

A very important use case is that a lot of places don't have easy access to good diagnosticians, and software + hardware are easy to scale while humans, not so much.

maxerickson · on June 21, 2018

The "years for a human" implies a different context than rapid individual results.

ccarter84 · on June 25, 2018

Do you think as we continue to get better about detection, that we'll find an increasing array of slow-growing tumors present in much of the body? I.e. ones that may not need a full round of radiation / chemo for perhaps 5 years and perhaps more targeted approaches can be deployed?

Just something i've been wondering about since I've got cancer on both sides of family and have been pondering doing full-body scans (which still seem quick excessive on the risk/reward)

mkstowegnv · on June 21, 2018

I went to a talk by someone who had switched fields to one that involved analyzing portions of cells. After showing a series of slides with diverse, confusing blobs and lines, he said "when I first started this work I would look at a section and not see anything at all. But I have improved to the point that now I can look at a section and see anything I want to".

toolslive · on June 21, 2018

I did a project like this early 2000s, and it's amazing how far you get by just combining frequency filtering and knn-clustering. Nothing fancy required. really.

vowelless · on June 21, 2018

To your point: https://twitter.com/ShalitUri/status/1009534668880928769

The recent paper out from Google, "Scalable and accurate deep learning with electronic health records", has an notable result in the supplement: regularized logistic regression essentially performs just as well as Deep Nets

sacado2 · on June 21, 2018

Another virtue of non-fancy methods is that they can provide a basic explanation of their reasoning, which is very important in the medical field. I think one of the main drawbacks of deep learning is its opacity.

ogrisel · on June 21, 2018

You can do k-NN in the feature space of the network (activation of one of the last layers) to generate partial "explanations" of the neural network decisions.

Gatsky · on June 21, 2018

At the moment, a big limitation of this approach is the input data. Images of tumours are generally very thin sections of a complex 3D tissue that is processed in a way that introduces artefacts and then stained with 2 colours.

To truly leverage the power of machine learning, an end to end solution where the tissue is processed in a more data rich manner would be better (eg spatially aware single cell assays, non destructive thick slice imaging). This would feasibly replace the current system entirely, as it truly would do something no human could do, not just do it more accurately.

phonebucket · on June 21, 2018

While open sourcing the model is nice, it would be better still to open source the data set for the wider community to make more meaningful contributions.

Their GitHub repo states the following: "You need to apply for data access, and once it's approved, you can download from either Google Drive, or Baidu Pan."

yil8 · on June 21, 2018

We, Baidu Research, do not own the Camelyon16 Challenge dataset, and people need to apply on Camelyon16 Challenge website to download the original pathology slides. I guess my wording was bit confusing on github, which has been corrected, lol

ihnorton · on June 21, 2018

The data from Camelyon '16 and '17 apparat to be available without registration on GigaScience:

http://gigadb.org/dataset/100439

yil8 · on June 27, 2018

Cool! Thanks for pointing out.

phonebucket · on June 29, 2018

Ah, perfect. Thank you for the clarification!

louden · on June 21, 2018

It would be nice to see the sensitivity and specificity of the technique and for humans. False positives and false negatives are not equal in medicine, so we should report in such a way that people can evaluate them.

In this type of cancer, a lower specificity is an acceptable trade off for a very high sensitivity.

yil8 · on June 21, 2018

There was indeed a professional pathologist involved in the Camelyon 16 Challenge, where s/he spent 30 hours reviewing 130 slides, and ended up 72.4% sensitivity with 0 false positives. Our algorithm achieves ~91% sensitive at 8 false positives per slides, seems a win according to your "a lower specificity is an acceptable trade off for a very high sensitivity."

sooheon · on June 21, 2018

How is the "grid of patches" different from one more level of convolution?

yorwba · on June 21, 2018

Using another level of convolution would produce outputs that are statistically independent if they are farther apart than the size of the convolution kernel. In a conditional random field, the dependence of outputs on each other can be modeled as well.

For example, a conditional random field could express "either these patches both contain a tumor or none of them does" (which is helpful when there's something suspicious on the patch boundary) and the consequences of committing to either possibility can propagate over the whole field. In contrast, a convolutional layer would have to make the decision independently for each local area.

curiousgal · on June 21, 2018

Hi yorwba, are there any ressources you recommend for a deep understanding of neural networks? Something theoretical and beyond practical frameworks?

yorwba · on June 21, 2018

My personal approach is to read papers that seem interesting. Of course I usually do not have the necessary background in everything that's mentioned, but I treat those cases as black boxes. E.g. if the paper says that they use X to do Y I'll simply assume that you can do Y using X. If I think that the details of X are important, I dig deeper. Sometimes just by reading the corresponding Wikipedia article, sometimes by looking at the references in the paper. Then repeat recursively.

That approach has the advantage that you'll learn about techniques roughly proportional to their current popularity, but it has the disadvantage that explanations in papers tend to be brief and you have to put them into a coherent whole yourself.

If you prefer textbooks, I heard about http://www.deeplearningbook.org/ but didn't get around to reading it. In addition to neural networks, you'll probably also want to read about classical statistics and probability theory, since that's the origin of concepts like conditional random fields, which can be mixed with neural networks but are unlikely to be covered by literature on deep learning.

yil8 · on June 21, 2018

You could use more levels of convolution with larger receptive field. But this corresponds to larger patches, e.g. 512x512 pixels, and larger patches sometimes may not just be pure tumor cells or pure normal cells. And if you are just predicting 1 label for larger patches, it sometimes confuses the learning. What we propose with CRF, is larger receptive field with dense predictions, e.g. predicting more than one labels, and we use CRF to model the correlation between labels.

bitL · on June 21, 2018

Doesn't a grid of patches contain much more information than just one more level of convolution?

gwenzek · on June 21, 2018

Maybe computation cost? Not sure.

godzillabrennus · on June 21, 2018

Glad to see they open sourced their work.

leozou · on June 21, 2018

great work