Hacker News new | past | comments | ask | show | jobs | submit login
How a mathematician constructed a decision tree to solve a medical problem (fastml.com)
113 points by tomaskazemekas on June 28, 2014 | hide | past | favorite | 54 comments



Even if I don’t know much about the medical aspects of the problem, I could try to learn his methodology following his decision-making process, and then I could use this knowledge to come up with a set of rules.”

As the song says, "everything old is new again":

https://en.wikipedia.org/wiki/Knowledge_engineering

https://en.wikipedia.org/wiki/Rule_based_system

https://en.wikipedia.org/wiki/Expert_system


I wrote one of them for our support team to stop them asking me questions every two minutes (decision tree engine). Works wonders.

I'll eventually automate myself out of the office and then go sit on a beach somewhere.


This reminds me of those 20 Questions apps. Have you tried a good one lately? "Akinator" on iOS has an incredible ability to predict the person or character you're thinking of. It successfully guessed "one of the twins from The Shining" and "Calvin's dad from Calvin & Hobbes" when I played.

I can't imagine there are more possible diagnoses than there are fictional characters and famous people.


Getting the right answer in 20 questions actually relies on a rough implementation of binary search over a huge possible "thing space". A problem I can see with applying this to medicine is that medical conditions don't always present in the same way and measurable factors are not binary, so you often can't rule out a hypothesis with a certain result, even if it reduces the probability of that hypothesis.

The decision algorithm in the article is amazing in its accuracy and simplicity. There is probably some other low-hanging fruit in reproducing this method in other areas of medicine by working with someone who already has a high degree of accuracy in their diagnoses, but doesn't fully understand their own process. But outside of areas where anyone has the current ability to make such accurate diagnoses, it would be a lot more complicated.

You'd need to determine which factors are important, which of those are covariant, and how that relationship works. Then from there, work over tons of data to establish a bayesian model with weightings to apply to each factor so that you can observe the evidence, plug it into the model, and get out your probability distribution over possible diagnoses. It's no easy task, but I believe that this kind of approach is the future of medicine.


Akinator also available online here" http://en.akinator.com


Pretty cool. I beat it with 'Jane Whitefield' and was only the second person to play 'Jim Chee' (24 questions). I think it's a learning program since it asked me to choose the answer when it lost.


I tried to make it guess Sauron, but to no avail.


I'm not really a LoTR fan, but he got it first try on my play.

The game report is interesting: http://imgur.com/5YRk8jv


I said `yes' to servant, since Sauron is a servant to Morgoth (or so). I guess the program had another idea.


It guessed "Maria von Trapp" for Admiral Hopper.


It can also guess every Touhou Project or Ender's Game character I tried. I basically don't have enough imagination to beat it.

BTW. try to make it guess itself ;).


It got Ned Stark in 20 questions, and Jacques DeMolay in 22.


Interesting, this seems like a great example of the kind of diagnostic patent or 'natural law' patent that the Supreme Court struck down in 2012 in the Prometheus Laboratories case.


Also the book from Frenkel where this comes from (Love and Math: The heart of hidden reality) is completely awesome, a motivator for doing maths proffesionally, and with the strange ability to distill without distort modern math work (of his mainly) and transmit their importance outside the original circles. Just can say +1 to him!


I find this article insanely interesting. Is there software to help doctors create this decision tree these days?

Shame about the patent - why would someone do that?


Of course. They're called expert systems [1]. Yesterday on HN someone posted a link to a book [2] on probabilistic models of cognition which includes a section [3] on medical diagnosis.

[1] http://en.wikipedia.org/wiki/Expert_system [2] https://probmods.org/ [3] https://probmods.org/conditioning.html#example-causal-infere...


For coders - Peter Norvig's PAIP (which might be The Best Programming Book, but no room to get into that here) talks about these and leads you through building something like EMYCIN, the base of a functioning expert system without anything particularly domain-specific.

But Norvig also put them in context, and left me with the impression that Expert Systems were proven inferior to Bayesian approaches for a lot of the problems they had been developed for.

I can well imagine, though, that there are problems where expert systems are a good fit.

As an interaction model and as a way of guiding control flow, as a weird and different way of handling user interaction and building extensible applications, I think the EMYCIN example from PAIP is pretty cool. I've wondered how possible it could be to use something like EMYCIN to write webapps ...


We should get in touch, I've started translating Mycin into CoffeeScript, with the hopes of creating a crowdsourced medical expert system within the summer.


This sounds very cool. If you're planning on open sourcing any and all of this I'd like to contribute


Yes, it will be open sourced! I want to show engineers just how inanely dumb some of the algorithms in medicine are, and how (I believe) we can start creating publicly usable algorithms to pipe knowledge to the consumers once they get access to a bunch of data from consumer medical devices. Lofty goals, but I will attempt to execute on the idea!


Thanks Rob! Would love for you to participate and help me iterate once I release a prototype in a week or two. I have a colleague at Weill Cornell who may help test drive the system as well. Got your email in my notebook!


as a (new) doctor I would love to help out if I can as well


I think it was "author's certificate", also called a "inventor's certificate". It didn't give the author the right to exclusively use the invention. There were no exclusive patents in the Soviet Union, if I recall correctly.

http://en.wikipedia.org/wiki/Glossary_of_patent_law_terms#Au...


Hope I remember it correctly, a story form couple years ago about students form some polish university (was it PUT?) working on such system. It was meant to be only an assist to a doctor, because legally it was all it could ever be...

Anyway, the students loaded the system with rules from literature, interviews, etc and the testing started. Soon it was clear that there was mismatch between what system suggested and what doctors diagnosed. Not always but more than expected. The rules were updated and it was a bit better, but still not there. After some back and forth finally the doctors were told to speak aloud about what they are doing examining the patients - and it become clear that doctors used additional criteria, not mentioned in the interviews (when directly asked about that stuff). And even then it was not enough to explain the differences! Simply the doctors were using additional rules that they themeself were not fully aware they were using.

Sadly I don't know what happened to that project or any more details but the implications from this story always make me think... even highly trained individuals using very strict and well defined decision process end up with result that they can't fully explain! What if we could make this hidden expertize explicit to better train future doctors or just check if it is even valid?


This is really interesting, do you have a link to the article?


I heard about it from people working on that project, it was hour long presentation but this single anecdote is all I remember from it after 9(?) years...


You might be interested in http://en.wikipedia.org/wiki/Decision_tree_learning

(Wikipedia makes it sound very complex, but you can do decision tree learning from hand from tabulated data easily, it is just tedious)


I can see how a decision tree with 2 layers may be possible, but once it gets to 4/5 layers, I think the conditional information would be very hard to keep track. It's like looking at a heat/k map at 5 dimensions, it gets really taxing.


Huuuuge problem is reconciling the hundreds of DSLs physicians use to talk about different things--you'd think that they'd use the same terms to talk about the same body parts and diseases, but you'd think wrong. :(


There is lots of software for decision analysis style decision trees. See: https://www.informs.org/ORMS-Today/Public-Articles/October-V...

There is currently some confusion about decision trees used in decision analysis because the term "decision trees" is also used in machine learning but the approach is unrelated to decision analysis.


Yeah, of course. There are general tools like the ones mentioned here and also the entire field of health informatics and medical decision systems. It's a very well-developed field, but there are lots of social issues which prevent the widespread use.


> patent - why would someone do that?

It's usually to protect their idea, so they can use it in a product or business venture. There's some good information on patents and the way they are used on Wikipedia.


I've been using https://bigml.com/ for this sort of thing. Pretty easy to use api to link to most languages.


"One of the stories is about how during his studies in the 80s he built a decision tree to help with kidney transplants."

And here we are almost thirty years later, and doctors in the US are still getting paid $250k or more to do stuff software could do, were it allowed, better, faster, and much cheaper.

My prediction is medicine will be one of the last fields to benefit from automation, simply because its greedy practitioners have a monopoly and won't give it up without a terrible fight. Sure, you'll increasingly see them rely on expert systems, but you won't be allowed to cutout the middlemen and go straight to Dr. Watson itself. They'll still be extracting their pound of flesh from us for many decades to come.


I will not be happy the day I go to the E.R. with some emergency and I have to talk to a machine...


So if we had evidence that a machine was more effective or made better decisions (or had better all-cause mortality evidence) than a human clinician, you would choose the human clinician in all cases?

How is this any different than comparing Drug A to Drug B? You should choose the one that's more effective (and/or cheaper), why should emotion or the human-touch get into it?

Having said that, if you're paying the bill yourself, go nuts, but I'm in an environment where the public foots most of these bills, and therefore has some say in my opinion.


The human clinician would (hopefully) be operating the machine.


"Please state the nature of the medical emergency..."

It depends on how sophisticated the machine is, I think. Doctors make mistakes all the time.


Software can't grab your testicles and feel while you cough.


And why wouldn't the software be able to take the input from hardware, and then do some statistical analysis based on the last X grabs and feels it got, along with the diagnoses, interventions, and outcomes of those X grabs and feels?

Where X is a big number, I think the self-correcting software will do better than any human. Not perfectly, but better, which is enough to make me choose the machine.


Yeah man, the emperor totally has no clothes. Unfortunately the entire field is insulated from disruption until we can get clinical data into the consumer's hands.


I am wondering if decision trees ever get used in game code. A quick search isn't exactly turning up an answer, though I did find a piece on decision trees on AI Horizon: http://www.aihorizon.com/essays/generalai/decision_trees.htm And there is also this: http://en.wikipedia.org/wiki/Game_tree But I am thinking more like a game like SimCity, for example. And I am not really finding an answer.


They do [1], but not as often as you expect. Most games don't actually use AI, as the goal is to make in game characters appear intelligent, rather than actually being so.

[1] http://www.sauropodstudio.com/dev-diary-number-fifteen-ai-ba...


Thx!


What would SimCity be deciding?


Technically, it wouldn't be SimCity. I am just wondering if a decision tree could be incorporated into a simulation style game like SimCity (or some other Sim) and, if so, how would that look, code-wise? (for example)


Rather than decide which questions are the most important themselves, I wonder what results they could have gotten from running simulations with various weights on various parameters and question responses to try to come up with a correct diagnosis. At the very least the doctor's approach turns out to be best and they are already using that. Maybe the Monte Carlo type decision tree would be better though.


There is really nothing too intriguing about 'clinical reasoning'. Diagnosis is simply a collection of crude algorithms, some decision tree based, some are rule based, other's a crude criteria or point based system (poor mans Bayesian). I'm in the process of building a web app on this theme, so get in contact if you are interested!


The real problem in the field is that diagnosis is often a relatively poor gold standard to learn against in many more complex (or poorly understood) diseases.


Decision trees are also used in pricing stock options: http://www.investopedia.com/terms/l/lattice-model.asp

Effectively, a lattice model is a decision tree about the different factors affecting an option's value, and thus the resulting value.


Why don't they link to the paper? This article is really crappy and doesn't explain what they did at all. For example, why diddn't they mention probabilist networks, which dominates this field. And I doubt that 240 data points is enough to write a paper.


> Why don't they link to the paper?

I see nothing in http://scholar.google.com/scholar?q=frenkel+Arutyunyan+kidne... and given the context, it was likely buried in an obscure Russian journal & so of no value to English readers anyway.

> This article is really crappy and doesn't explain what they did at all.

Seems like a good explanation to me.

> For example, why diddn't they mention probabilist networks, which dominates this field.

Because those weren't used much back then.

> And I doubt that 240 data points is enough to write a paper.

Sure it is. Why wouldn't n=240 work?


Given the amount of data available, it is not unreasonable to expect n>1000


They obviously did not think that reasonable, their results seem to be fine, and I still don't see what the problem is with n=240.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: