With AIMI, we released EchoNet-Dynamic, the largest open dataset of echocardiograms (cardiac ultrasounds) and expert cardiologist labels as part of a paper published last year. The dataset went through a rigorous review to make sure no identifying information was leaked as part of the process. Happy to answer any questions.
The data license seems to be research-only. How would people be able to build products/medical device software with this license? Or is that not a goal of releasing this data?
It’s a research dataset, similar to MNIST or CIFAR. Stanford does not want to be in the business of monetizing patient data, so it restricts commercial use.
You just stated a paradox like it makes sense. If you _didn’t_ want to be in the business of monetizing data, while providing data, you _wouldn’t_ restrict commercial use.
how broad is the non commercial use clause?
I can imagine i.e. some BigPharma buying another datasets and using your data sets for who knows, validation of the acquired ones/metadata improvement etc. No commercial product in the area of imaging/diagnosis but maybe some commercial drug 10-15 years down the road.
Do you think that such use is also forbidden by the licence?
Just look at how agitated people here are getting at the prospect of GitHub copilot using tiny code snippets from their work for potentially commercial works.
Then imagine it’s not your unique way to loop over a file in python, but your medical information.