Nvidia Scientists Take Top Spots in 2021 Brain Tumor Segmentation Challenge

littlestymaar · on Oct 11, 2021

Does it perform better than pigeons[1] though? ;)

[1]: https://twitter.com/emollick/status/1388594078837878788/phot...

tmabraham · on Oct 11, 2021

haha this a project my current Ph.D. advisor worked on (he's the lead author of that study).

He likes to say AI=avian intelligence...

mdp2021 · on Oct 11, 2021

Specialized tasks requiring training for which we use machines could be performed by available brains previously not thought capable. Genius. One of the most genial ideas I have met in a long time.

Actual article (2015): https://journals.plos.org/plosone/article?id=10.1371/journal...

jcims · on Oct 11, 2021

That's got to be a tough pill to swallow for a radiologist. Getting beat by a massive computing capability built over decades through millions of hours of applied human science and engineering is one thing, but a bird brain?

Pretty incredible. I wonder if you forced a radiologist to make a decision within a second or two if it would improve their scores...allowing them to rely on the instinct they've developed vs the cognitive wet blanket that factors in all of the externalities of making a decision.

ngcc_hk · on Oct 11, 2021

Very confused. I heard about the pigeon net in rfc, when in good old day they sometimes could have fun. But cancer spotting.

Really not fake … ???

TomVDB · on Oct 11, 2021

40 years ago, The US Coast Guard used to have a pigeon powered sensor to detect sailors lost at sea.

https://medium.com/war-is-boring/that-time-the-u-s-coast-gua...

pelario · on Oct 11, 2021

C'mon. Everyone knows than pigeons don't scale ;-)

WithinReason · on Oct 11, 2021

What if you feed them a lot?

rich_sasha · on Oct 11, 2021

Mixing males and females works too. Exponential growth

PartiallyTyped · on Oct 11, 2021

Only if predators are eliminated and there is no [food] scarcity.

_coveredInBees · on Oct 11, 2021

Well, that blog post was rather uninspiring. The winning solution basically amounted to a whole bunch of hyperparameter tweaking using tons of GPU compute on a UNet architecture that has existed for many years now. We've definitely reached a plateau of sorts when it comes to vision-based deep learning. There is definitely progress still happening, but a lot of it is very incremental and there hasn't really been any big fundamental overall performance improvements in a while. That being said, there has been a bunch of work on improving efficiency. Which is nice, but it only buys you so much when the only way to "improve" overall performance is to throw in several million more parameters with larger models to eke out another few percent in accuracy or whatever metric you are tracking.

andrewon · on Oct 11, 2021

"Training was done on eight NVIDIA A100 GPUs for 1000 epochs" "trained on four NVIDIA V100 GPUs for 300 epochs" "model was trained on a NVIDIA DGX-1 cluster using eight GPUs"

No doubt these are model innovations. But can't help wondering how much ready access to arrays of GPU might give them advantage... You know how many different models need to be tested before getting to the winning ones.

boulos · on Oct 11, 2021

Disclosure: I used to work on Google Cloud.

I dunno, their A100 results took about 20-30 minutes on 8 x A100s [1]. 8xA100s is like $24/hr on GCP at on-demand rates [2]. So like $12/run.

The efficiency was okay but not linear, so if you were more cost constrained you might go with 1xA100 for $3/hr and have ~2.5hr training times.

Getting that performance out of a GPU is more challenging than getting access to the GPUs. All the major cloud providers offer them.

(Nit: GCP deployed the 40 GiB cards rather than the later 80 GiB parts, but let's ignore that).

but it often doesn't matter

[1] https://github.com/NVIDIA/DeepLearningExamples/tree/master/P...

[2] https://cloud.google.com/compute/gpus-pricing

teruakohatu · on Oct 11, 2021

That is the final model right? Not all the training while hyper params, or structure, was developed.

PeterisP · on Oct 11, 2021

No matter if it takes you a single run or ten thousand runs; qualified manpower to code and supervise it costs more than $12/run, so with prices like that the limiting factor is how much people-time you can devote to it.

kettleballroll · on Oct 11, 2021

I disagree. I've participated and even won a few of those "ML in sciences" competitions in the past, and all other things being equal, the one with the most compute wins. 12$/hour sounds doable, but as soon as you need to grid over more than 4 hyperparameters (and that's small), you're looking at 100s to 1000s of runs in parallel. And you'll iterate over that process several times. Eg It's no coincidence that deepmind (with all of alphabets compute power) won CASP with alphafold. Yes they have a lot of very smart people, but so does everyone else. But no-one has the power to experiment with such large models so extensively for so long.

brilee · on Oct 11, 2021

If you're doing grid search on more than 4 hyperparameters, that probably explains why you think computation wins. Even random search with a constrained budget will do pretty well without breaking the bank.

kettleballroll · on Oct 13, 2021

Pretty well? Sure. Win a competition? No.

If you're not running standard models (where you only have learning rate and regularization to grid over) but you're trying more specialized stuff/developing new models, 4 hyperparameters are not too many.

irjustin · on Oct 11, 2021

A non-trivial amount. But, this is true of basically anything. Teams are resource constrained on hardware, man-power, intelligence, etc.

Intelligence to create innovative models, hardware to test/iterate models, etc.

AlphaFold is a clear example of what's possible with big money.

bigbillheck · on Oct 11, 2021

One of the other contenders agrees: https://twitter.com/HassanMohyudDin/status/14466175365614018... (Saving you a click, they trained on a single 2080Ti)

setr · on Oct 11, 2021

Should GPU choice affect model performance? AFAIK it should only impact training time

amitport · on Oct 11, 2021

When you can train and evaluate quickly you can iterate through experiments faster and produce better models...

aix1 · on Oct 11, 2021

The choice of hardware (here, type of GPU) could also impact the model architecture, potentially affecting model quality. For example, if GPU RAM is limited, one might have to reduce a model's depth or its receptive field just to be able to train on smaller hardware.

rawoke083600 · on Oct 11, 2021

i wonder if the average professional coder, still has. chance to beat the big teams ?

Reminded me of years ago(It might have been on HN) where some company/group trained a state of the art nlp model to classify if a financial statment or press release was possitive or negative. They had some good results until someone did:

Lets look how far down the press-release-statement the 'numbers' are :)

If it was positive-sentiment it usually was the case that the numbers were high up the page. If it was bad thr bulk of the numbers were much lower in the page.

Almost make sense,if you had goof numbers you want to screM it out. If you had bad numbers you want to preface/explain why first ??

OhHiMarkos · on Oct 11, 2021

> i wonder if the average professional coder, still has. chance to beat the big teams ?

I shared the sentiment but I have began to think otherwise. First of all, we need solutions to big problems. I wouldn't want to tackle Brain tumor problems using DL but someone with the brainpower and resources such as NVIDIA, why not?

But besides that, the whole scientific field is pretty young. An average coder or DL researcher isn't useless because Google has all the machines. There are fields where DL isn't used much. Even in the DL field there are problems to solve. Algorithms to optimize.

PartiallyTyped · on Oct 11, 2021

> i wonder if the average professional coder, still has. chance to beat the big teams ?

The common sentiment in r/ML is that a) big labs need some accountability because they keep getting credit for work they published but never released source code and therefore third party audit is impossible *. Furthermore, Resnet-50 is back to sota because of up-to-date training protocols, and every paper that claimed superiority was using outdated training techniques (or even none at all) for the baseline.

Furthermore, there are many suspicions that the current generation of DL/ML has reached a local minimum because we have collectively pushed for models, architectures, functions, and optimizers that did well on existing hardware. For example, one hypothesis is that the Adam and AdamW optimizers are doing so well on average because we optimized model architectures for them.

* Before somebody claims that this is part of the research process and we should be able to validate and implement models on our own, I would like to point out that there have been multiple instances where the math appeared sound, but the implemented models were bugged, and outperformed the existing models *because* they were bugged (NB biased) and the test sets didn't capture that. When corrected, the models didn't perform any better than existing unbiased methods.

joconde · on Oct 11, 2021

> Furthermore, Resnet-50 is back to sota because of up-to-date training protocols, and every paper that claimed superiority was using outdated training techniques (or even none at all) for the baseline.

That's interesting, do you have a source? I've been thinking about switching to some fancy new architecture in prod, but wasn't sure it would be worth it.

PartiallyTyped · on Oct 11, 2021

Current SOTA afaik are:

- Patches is all you need [1]

- ResNet-50 with new protocols [2]

- MLP Mixers [3]

[1] https://old.reddit.com/r/MachineLearning/comments/q35lex/r_p...

[2] https://old.reddit.com/r/MachineLearning/comments/q0vt2b/r_r...

[3] https://old.reddit.com/r/MachineLearning/comments/n59kjo/r_m...

ShamelessC · on Oct 11, 2021

I think it's somewhat confusing to compare an MLP Mixer architecture to a vanilla ResNet-50.

PartiallyTyped · on Oct 11, 2021

What do you mean confusing?

ResNet-50 with updated training protocol seems to be doing pretty well!

bigbillheck · on Oct 11, 2021

> i wonder if the average professional coder, still has. chance to beat the big teams ?

I attended the MICCAI workshops this year, and while I didn't go to the brain segmentation challenge, in the ones I did go to the winners were generally academic individuals or groups.

My MICCAI login is on my work computer, so I'm not going to go and check right now, but a quick google shows that the winners of FeTS2021 were all academics: https://twitter.com/FeTS_Challenge/status/144401094937199411... and HEKTOR2021: https://www.aicrowd.com/challenges/miccai-2021-hecktor/leade... looks similar.

EDIT: Academic groups are of course not the same as 'the average professional coder', but this is a fast-moving research field and if an individual is competing with the state-of-the-art that doesn't sound particularly 'average' to me.

EDIT 2: The DiSCO challenge appears to have been dominated by academics, both individuals and small teams : https://twitter.com/GabrielPGirard/status/144397287871477356...

hopfenspergerj · on Oct 11, 2021

Where do the correct answers come from?

sxg · on Oct 11, 2021

Hey! I actually handled the data coordination for the BraTS data sets. We used a combination of the best algorithms from prior years of the BraTS competition to pre-segment the data sets, and then we had experts (fully-trained neuroradiologists) make manual corrections, which were then further reviewed by my team before finalization.

The three tissue types of interest are fairly easy to identify in most cases. Edema is bright on the FLAIR sequence, enhancing tumor is bright on T1 post-contrast and dark on pre-contrast, and necrosis is relatively dark on T1 pre- and post-contrast while also being surrounded by enhancing tumor. These rules hold true in most cases, so it’s really just a matter of having the algorithm find these simple patterns. The challenge in doing this manually is the amount of time it takes to create a really high quality 3D segmentation. It’s painful and very tough to do with just a mouse and 3 orthogonal planes to work with.

wjnc · on Oct 11, 2021

Oh wow, the joys of the HN-community. Do you know the take of the neuroradiologists on this type of modelling? Are the models in a challenge like this already usable for enhanced decision-making by the experts?

sxg · on Oct 11, 2021

With the segmentations these models create, you can create reports that quantitatively describe the changes in different tumor tissues. That info can be useful for guiding chemotherapy and radiotherapy decisions.

Currently, the accepted practice is to report these changes qualitatively without using segmentations (the way it’s been done for years). While the segmentations created by the models are probably good enough to use in practice today, the logistical challenges of integrating the model with the clinical workflow impede its actual use.

Sure, you could manually export your brain MR to run the model, but that’s a pain to do when you’re reading ~25 brain MR cases/day.

wjnc · on Oct 11, 2021

Thanks Satyam! That's glass half full if I read it correctly. Working models that need to be integrated into a workflow. What kind of firms are we talking about that could do that?

(I know nothing of this tbh, except I once had a demo of a radiologist back when the gamma knife was introduced, have a colleague who became a radiotherapist and a friend who works in ML for Philips medical.)

sxg · on Oct 11, 2021

It’s definitely possible to do, and many companies are able to do it (eg RapidAI). I’m also not an expert in this specific problem, but there are HIPAA/privacy/security concerns that need to be addressed with the radiology department and IT team. Once those have been handled, there is some kind of API available to integrate the model.

shoo · on Oct 11, 2021

> All the imaging datasets have been segmented manually, by one to four raters, following the same annotation protocol, and their annotations were approved by experienced board-certified neuro-radiologists. Annotations comprise the GD-enhancing tumor (ET — label 4), the peritumoral edematous/invaded tissue (ED — label 2), and the necrotic tumor core (NCR — label 1), as described both in the BraTS 2012-2013 TMI paper [1] and in the latest BraTS summarizing paper [2]. The ground truth data were created after their pre-processing, i.e., co-registered to the same anatomical template, interpolated to the same resolution (1 mm^3) and skull-stripped.

- from the "data" section of http://www.braintumorsegmentation.org/

[1] https://ieeexplore.ieee.org/document/6975210

[2] https://arxiv.org/abs/1811.02629

cinntaile · on Oct 11, 2021

Perhaps they should impose rules that constrain the number of GPU resources you can use to achieve your result, that levels the playing field and keeps competitions like these open to everyone instead of just institutions with access to tons of capital. They do that for Formula 1 as well.

xvilka · on Oct 11, 2021

When Nvidia scientists will figure out how to opensource their drivers?

amelius · on Oct 11, 2021

The day their scientists are employed by academia.

joejohns · on Oct 11, 2021

Is there any other information about the challenge except for the Nvidia article?