Hacker News new | past | comments | ask | show | jobs | submit login

Right, but "the training data is bad" is a very ML centric way of looking at the issue. It pushes all the difficult parts of the problem into the "data prep" sphere of responsibility.



Note that there are different ways in which data can be bad (i) image resolution not good enough, too many artifacts and noise (ii) its woefully incomplete, doctors collect and use information from other channels that aren't even in the image, regular conversations, sizing up the patient, if the doctor knows the patient for a long time then a sense of what is not normal for the patient given his/her history etc., etc.

Some of the issues that have been discussed in the thread can be incorporated in to a Bayesian prior for the patient, but there is still this incompleteness issue to deal with.


The first step would be to build an information collection pipeline that is in the same league as the doctors. That alone will be a monumental effort because doctors have shared human experiences to draw from and they are allowed to iteratively collect information.

I'm just complaining that it seems fantastically reductive to call the absence of such a pipeline "bad data" because developing such a pipeline would be a thousand times the effort of implementing an image detection model. Maybe a million times. It will require either NLP like none we have seen before or an expert system with so much buy-in from the experts and investors that it survives the thousand rounds of iterative improvement it needs to address 99% of the requirements.

Comparing issues like low resolution and noise to such a development effort seems like comparing apples to... jet fighters.


How else would you describe the issue?


Structural. The problem hasn't even been correctly formulated yet -- and it will take an enormous amount of work to do so.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: