Hacker Newsnew | past | comments | ask | show | jobs | submit | rahulnair23's commentslogin

Interesting work.

For a slightly different take using a similar intuition, see our paper [at ACL 2024](https://arxiv.org/abs/2402.14860) on ranking LLMs which may be of interest.

Our HuggingFace space has some examples: https://huggingface.co/spaces/ibm/llm-rank-themselves


thank you, will check out the paper, the hf space is very cool!


Pointers to said tools would be beneficial. Robust optimization is somewhat different from what the author states.


The ISO link[1] says this standard has been withdrawn however.

[1] https://www.iso.org/obp/ui/#iso:std:iso:3103:ed-1:v1:en


Washington D.C. is a great example of this.

While looking at residential location choice, around 10 years ago, you could discern a boundary (Georgia Ave in those days).

Netflix used to show the most popular movies by zip code, and one side of the boundary it was "Mamma Mia" and the other "Tyler Perry".


Across the river in Alexandria and the surrounding suburbs, this was true as well (not sure about Netflix, but the housing boundaries)

The bureaucrats, lawyers, doctors, and career officers (Pentagon) were highly concentrated in a few neighborhoods that border the Potomac (Old Town through Fort Hunt). A few blocks away, the Rt 1 corridor was largely working class or lower class, with some young enlisted families from Ft Belvior.

This extended to my high school - there were 3 buildings - first was arts/music, second was STEM and honors, and the third was the gym, auto shop, wood shop, and cooking. It's not hard to imagine there were students who never entered building 1. And the only reason I was ever in building 3 was the weight room and the one semester of auto shop I took.


I also went to West Potomac and spent hardly any time in Gunston.


Always jarring to hear the claim

> Technology is fundamentally neutral

It is in the same way guns are fundamentally neutral. You can't view it without context. Include that and it is clear that tech (or guns) isn't neutral at all.


Yes, Corbusier's vision of cities was tied to speed, and by consequence, to cars.

But Chandigarh today has fantastic segregated bike infrastructure! No other Indian city comes close. The city itself is pan-flat and relatively small (shaped as a rectangle where the longer side is 10 kilometers).

Cycle the routes and you see the vast array of tradesman, workers, maids, and vegetable vendors travel by cycle. Big apartment complexes will have bike stands full of pink bicycles that the maids would use.

Granted the city has far too many cars, but the article doesn't adequately reflect the town and all its people.


Man, I recall in good old 2008, arriving into Chandigarh after almost month of backpacking in rest of India. What a shock it was! Straight roads in mesh, traffic lights everywhere. It looked and felt so surreal, detached from rest of Indian subcontinent which felt to me to be in some very distant parallel universe.


I was there in 2006, but I started my journey recovering from jet lag there, and I kept thinking "this seems very different from what I expected India to be like". My traveling companion explained it is in fact completely different from most of India (and so I found in the rest of my visit).


Many causes have been put forward for this. However, I'm not sure the all safety aspects here have been well understood.

Here in Dublin, Ireland the few weeks immediately after the lockdowns where horrendous for driver behaviour and aggression. It took a while till behaviours were back to the norm. It was almost as if people needed to get reacquainted with it.


That gives me hope that things may calm down in the USA.


From the full paper[1]:

> All models were trained on 70% of the data and tested on the remaining 30% of the data. Note that each month record for each participant was considered an independent data point.

I'm almost certain that this is a mining leak.

Data from one patient will end up both in the training and test set and result in fantastic accuracy. Of course it will be from different months. The correct way to do this is to cross-validate/split across the population. It seems unlikely, from this description, that the authors have done so.

[1] https://alzres.biomedcentral.com/articles/10.1186/s13195-021...


Definitely.

It's a very common error and it should be easy to catch but....I've even seen a study that treated individual slices of an MRI as independent, which is laughably wrong.

I think part of the problem is that the "analysts" are increasingly uninvolved in the data collection, and just treat it as a tuple of (X, y). If you thought about what they mean, even for a second, ("Oh, Mr. Smith is always an awful driver"), the problem is obvious.


I'm somewhat unfamiliar with the problem, do you think you could explain why this is bad? Or maybe just point me in the right direction? Thanks!


One way to tell if a Machine Learning model is any good is to see how it does on unseen/new patients.

Of course, we don't wait to try it on real patients, so typically you'd partition the data you have already into (a) what you show to the machine learner (training data), and (b) what you hide from the learner (test data). The latter is only used to evaluate, i.e. you get the answer from the ML model and compare it to the real answer you have already. If information about the test data some how makes it to the training data, its referred to as a mining leak [1].

In this paper, they treat each month of a patient as an independent observation. However, GPS driver behaviour will be very similar from one month to the next for the same person. Genetic information is exactly the same. So for every month that the model is tested (test data), the learner has already seen very similar data in the training set - for some of the other months (for the same person) that happen to be in the training set. The split is typically done randomly. So it will do well.

The test results are therefore optimistic and do not support the conclusions.

[1] https://en.wikipedia.org/wiki/Leakage_(machine_learning)


Suppose that you have 3 data points, on June 14, 15, and 16, that due to personal driving quirks all appear to belong to the same person. If the 14th and 16th are in your dataset, and both correspond to Alzheimer-free Bob, that may be a strong hint that the data from the 15th is also Alzheimer-free.

But this doesn't help you in the real world where you won't necessarily have near neighbors corresponding to the same person, with a known diagnosis.


Overfits for study participants. Will Not necessarily give same results on gen pop


Health policy is fraught with counter-intuitive phenomenon - and screening is one of them.

Seems like it should help, but in practice leads to over-diagnosis.

For example - Cancer rates jumped in Korea after screening with no impact on patient outcomes [1]. There are several others.

[1] Lee, J. H., & Shin, S. W. (2014). Overdiagnosis and screening for thyroid cancer in Korea. The Lancet, 384(9957), 1848.


You can hardly conclude that broadly screening populations are ineffective from this study. You have to consider, among other things, the treatments available for the given disease being screened and the cost of that screening program. If treatments for the disease already have a low success rate (what is low?), the timing of detection doesn't really help. Additionally, if the cost of the screening program is negligible (what is negligible?), then even successfully treating a few patients may be worth it.


The current consensus about over-diagnosis (as I understand it) is that when there is a significant false positive rate and the cost of proving the positive false is high (in money, time, effort, worry), the screening program is not helpful. Some go further to say that low cost screening drives some of the high cost to outcome ratio in the US. I'll try to find a cite in my textbooks if you are interested.


I think the issues are deeper than that of false positives. Its possible that transient diseases get detected that would have fixed themselves without any treatment. Insead of a non-treatment one now has to deal with the side-effects of the interventions applied.


This is exacerbated by the fact that if the AI told the doctor that there is a doubt, no doctor will take the risk of not doing a biopsy / scanner / MRI / surgery (depending on the case). Because, how would you defend yourself in front of the judge ? This is something we always have in mind.

This is how you end with false positives and over-diagnosis.


This is a false blanket statement. Also one that could change as we start to see human+ai performance be better than just human performance.

For lung cancer screening, NLST showed a 20% reduction in mortality and now NELSON has shown even stronger results in Europe.

This “all screening is bad” is FUD in the medical field, frankly. Yes it has to be studied and implemented carefully, but to make blanket statements about screening as a whole is factually incorrect.


I have not stated "all screening is bad".

Broad-based population screenings as the parent comment suggests, in my opinion, are.

I'm yet to see any clinically-valid distinguishing aspects that would suggest AI would add value to screening. Curious to hear evidence that drives your optimism of human+AI.

Just to state, the NELSON study [1] focuses on high-risk segments. Their paper also recommends a "personalized risk-based approach" to screening. This seems reasonable.

[1] https://www.nejm.org/doi/full/10.1056/nejmoa1911793


The general thread here is about AI helping with a more proactive approach to medicine. Screening for high risk populations certainly falls under that.

You certainly said that screening leads to over diagnosis.

I think for screening, the best results are probably the upcoming prospective study from Kheiron.

https://www.kheironmed.com/news/press-release-new-results-sh...


I suspect, btw, that the Google model in this paper https://www.nature.com/articles/s41586-019-1799-6

will show stronger performance. But Kheiron appears to be ahead as far as proving the value of the tool since they have actually validated prospectively.


One place this is abundantly clear is in the humanitarian sector, which I peeked into for a couple of years for a project.

For all the talk about participatory processes, the entire system is donor-focused. Its an army of well meaning folks from the Global North that are executing on an view set by donors.

There is grumbling about lack of "local capacity" but somehow disregard that expat staff move countries every few years and yet somehow have a strong grasp of local conditions.

Of course the situation isn't as binary and things are a bit more complex in real life - but makes you wonder why/how these structures persist.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: