Hacker News new | past | comments | ask | show | jobs | submit login
What Data Can’t Do (newyorker.com)
170 points by RiderOfGiraffes on March 31, 2021 | hide | past | favorite | 101 comments



I am increasingly worried with people applying ML in everything without any rigour.

Statical inference generally only works well in very specific conditions:

1 - You know the distribution of the phenomenon under study (or make an explicit assumption and assume the risk of being wrong)

2 - Using (1), you calculate how much data you need so you get an estimation error below x%

Even though most ML models are essentially statistics and have all the same limitations (issues with convergence, fat tailed distributions, etc...) it seems the industry standard is to pretend none of that exists and hope for the best.

IMO the best moneymaking opportunities in the decade will involve exploiting unsecured IOT devices and naive ML models, we will have plenty of those.


As currymj commented, this isn't accurate for ML, only for classical statistics.

In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data. Thus, there aren't issues with fat-tailed distributions since we make no such normality assumptions. Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process.

I suppose you could say statistics is less "empirical" than ML in the sense that it is axiom-based, whether that is a normality assumption of predictions about a regression line or stock prices following a Wiener process. By contrast, ML is less rationalist by simply reflecting data.


It is absolutely untrue that DL is immune to fat-fail problems, and it is important that no one operate mission critical systems under this assumption.

The two fat tail questions one has to engage are:

- is it possible that a catastrophic input might be lurking in the wild that would not be present in a typical training set? Even with a 1M instance training set, a one-in-a-million situation will only appear (and affect your objective function) on average one time, and could very well not appear at all.

- can I bound how badly I will suffer if my system is allowed to operate in the wild on such an input?

DL gives no additional tools to engage these questions.


> It is absolutely untrue that DL is immune to fat-fail problems

In fact, working on fat tail problems is currently a hot topic in ML.


I don't quite follow: is not what you described a flaw fundamental to all forecasting; that is, the occurrence of a gross outlier? I should clarify that DL doesn't suffer from the same problem the normality condition has on fat-tails: a failure to capture the skew of the distribution.


It's not characteristic of all forecasting, only purely empirical forecasting.

Definitionally, the only way to reason about risk that doesn't appear in training data is non-empirical (e.g. a priori assumptions about distributions, or worst cases, or out-of-paradigm tools like refusing to provide predictions for highly non-central inputs).

DL is not any better (or worse) than any other purely empirical method at answering questions about fat-tail risk, and the only way to do better is to use non-empirical/a-priori tools. Obviously the tradeoff here is that your a priori assumptions can be wrong, and that too needs to be included in your risk model (see e.g. Robust Optimization / Robust Control).


I think it's wrong to assume that non-empirical methods can be reliably trusted to give better results. Humans are terrible at avoiding bias or evaluating risks, especially for uncommon events.


Food for thought: if every method for predicting event x is terrible, then you might as well not try to predict x and build your life in such way that you never expose yourself to the risk of x happening.


From a Bayesian point of view, that amounts to a "prediction" that the probability of event x is so significant that you should build your life around it. But I guess if you knew enough for that sentence to make sense you wouldn't have posted your comment. So, suffice it to say that Bayesian decision theory cuts the knot you're talking about.


I agree that ML tends to put weaker assumptions on the data than classical statistics and that it's a good thing.

However most ML certainly makes distributional assumptions - they are just weaker. When you're learning a huge deep net with an L2 loss on a regression task, you have a parametric conditional gaussian distribution under the hood. It's not because it's overparametrized that there's no distributional assumption. Vanilla autoencoders are also working under a multivariate gaussian setup as well. Most classifiers are trained under a multinomial distribution assumption etc.

And fat-tailed distributions are definitely a thing. It's just less of a concern for the mainstream CV problems on which people apply DL.


> In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data.

Okay, so that's about the same as classical statistics. You're just waiving the requirement to know what the distribution is. You are still assuming there exists a distribution and that it holds in the future when you apply the model. Sure you may not be trying to estimate parameters of a distribution, but it is still there and all standard statistical caveats still apply.

> Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process.

Classical statistics frequently makes use of multiple distrutions and stochastic processes.


Of course there's a distribution behind the data. The parent commenter was saying not all machine learning techniques need to know that distribution, as a refute to their parent comment.


I know what they're saying, I even reiterate it in my second sentence. My point is that doesn't protect you from the distribution changing, which is a problem that applies to machine learning and classical statistics.

This is in support of the GP comment: while you can loosen your assumptions about what the underlying distribution is and don't literally need to know it, you can't get away from the fundamental limitations of statistics. Which is the original topic we're talking about.


I dunno, there are definitely distribution-based assumptions—good luck working with skewed data. Most old-school techniques are kinda additive, so nobody's really been assuming a single distribution for practical applications.

Current ML techniques just work well for the kinds of problems people are applying them to, which is kind of a tautology. We should definitely seek to understand the theory behind stuff like dropout and not consider our lack of understanding a strength.


> I suppose you could say statistics is less "empirical" than ML in the sense that it is axiom-based, whether that is a normality assumption of predictions about a regression line or stock prices following a Wiener process. By contrast, ML is less rationalist by simply reflecting data.

I don't think that's true (or maybe I misunderstood?), I guess your comment "simply reflecting data" means fitting data with a very flexible function (curve)? There are very flexible distributions to fit almost any kind of data e.g https://en.wikipedia.org/wiki/Gamma_distribution or with a composition of them, but as a practitioner you still need to interpret the model and check if it does represent the underlying process well. Both statistical inference and ML are getting there using different methods.


The only reason that this may not be accurate for ML is because machine learners generally make no attempt to quantify their uncertainty in their predictions with e.g. confidence intervals or prediction intervals.

And there is a whole field of non-parametric statistics that doesn't make distribution assumptions.


I agree -- as ML becomes increasingly easy to be applied by non-experts or people without a heavy math/stats background, I've seen an increasing volume of arguments against the data science profession (someone the other day called DS the "gate-keepers") but: there be dragons.

Anyone can use SOTA deep learning models today, but in my experience, it's more important to understand the answer to "what are the shortcomings/consequences of using a particular method to solve this problem?" "what is (or could be) biases in this dataset?", etc. It requires a non-trivial understanding of the underlying methodology and statistics to reliably answer these questions (or at least worry about them).

Can you apply deep reinforcement learning to your problem? Maybe. Should you? Well, it depends, and you should understand the pros and cons, which requires more than just the knowledge of how to make API calls. There are consequences to misusing ML/AI, and they may not even be obvious from offline testing and cross validation.


Personally, I think the main problem with ML is simpler: it works well for interpolation, and is crap for extrapolation.

If the outputs you want are well within the bounds of your training data set, ML can do wonders. If they aren't, it'll tell you that in 20 years everyone will be having -0.2 children and all the other species on the planet will start having to birth human babies just so they can be thrown into the smoking pit of bad statistical analysis.


I agree, but that's equivalent to my original claim.

Being bad at extrapolation is a consequence of assuming all training data can describe your phenomena distribution and being wrong.


Outside of simple time series, I'm not aware of any good way to extrapolate.


One way to extrapolate is to use a mechanistic or semi-mechanistic model. The recent advances in neural differential equations are a really cool example of this


> If they aren't, it'll tell you that in 20 years everyone will be having -0.2 children and all the other species on the planet will start having to birth human babies just so they can be thrown into the smoking pit of bad statistical analysis.

https://xkcd.com/605/


i think this actually gets at what makes applied ML distinct from statistics as a practice, even though there is a ton of overlap.

statisticians make assumptions 1 and 2, and think of themselves as trying to find the "correct" parameters of their model.

people doing applied ML typically assume they don't know 1 (although they might implicitly make some weak assumptions like sub-gaussian to avoid fat tails, etc.) and also typically don't care about being able to do 2. and they don't care about their parameters; in a sense to an ML practitioner, every parameter is a nuisance parameter.

instead you assume you have some reliable way of evaluating performance on the task you care about -- usually measuring performance on an unseen test set. as long as this is actually reliable, then things are fine.

but you are right that in the face of a shifting distribution or an adversary crafting bad inputs, ML models can break down -- but there is actually a lot of research on ways to deal with this, which will hopefully reach industry sooner rather than later.


> instead you assume you have some reliable way of evaluating performance on the task you care about -- usually measuring performance on an unseen test set. as long as this is actually reliable, then things are fine.

This is the part that often fails in practice. Think of all the benchmarks that show superhuman performance and compare that to how good those same models really aren't. Constructing a good set of holdouts to evaluate on is really hard and gets back to similar issues. In practice, doing what you're describing reliably (in a way that actually implies you should have confidence in your model once you roll it out) is rarely as simple as holding out some random bit of your dataset out and checking performance on it.

On the other hand, what you often see is people just holding out a random bunch of rows.


I disagree, every ML model has some implicit statistical assumption, which is often not well understood by practitioners.

At minimum you must assume your underlying process is not fat tailed. If it is, then your training/validation/test data might never be enough to make reliable predictions and your model might break constantly in prod.

BTW shifting distributions and fat tailed distributions are sort of equivalent, at least mathematically.


I don't disagree with any of that, but I still think a responsible, clear-thinking ML practitioner can avoid having to assume the form of the data-generating process, depending on their application.

In some cases if you care about PAC generalization bounds, it's even the case that the bounds do actually hold for all possible distributions.


I think it's more meaningful to have the discussion in a specific problem domain since statistical inference or ML are just tools to better model a problem / phenomenon. The domain (prior) knowledge -- everything else that's not stats / ML, are the keys to build a more robust model. Leave the problem domain out we are left just with pure mathematical theories and the points can only be proved by simulated data.


Yes - this is pretty much exactly how I explain the difference between machine learning and statistics.

Despite using similar models, the expertise required for 'doing statistics' (statistical inference) is actually very different from machine learning. Machine learning fits into the 'hacker mentality' well - try stuff out see what works. To do statistical inference effectively, you really do need to spend time learning the theory. They both require deep skills - but the skills are surprisingly different considering it's often the same underlying model.


But without some statistical knowledge, isn’t there a risk of a lack of understanding about the robustness of “what works”?


Statistical knowledge doesn’t remove that risk. The extent to which it even lowers the risk is a question that could be answered empirically.


yeah, agreed - a good understanding of the model's statistical assumptions can often help you make the model more robust and also give you ideas for what types of feature engineering are likely to work.


"Every parameter is a nuisance parameter" is a great way to put it.


ML looks (for many peole) like a way to circunvent your grumpy statiscian saying that the underlying data is worthless and/or you should focus on getting the data pipeline done properly for a logit model on your churn rate.


"Scientist free science," -- being able to optimize systems without understanding them, has been a dream of the business world since the dawn of time. There's always been a market for cookbook recipes that automate the collection of data, and interpretation of results. Before ML, there were "design of experiments," and "statistical quality control."


>Before ML, there were "design of experiments," and "statistical quality control."

Statistical quality control, at least the way I know it, is very useful in finding problems in your process. I'm also not sure how this fits with your premise. It's about optimizing systems by first finding out where to look, and then looking there in detail with expert knowledge, i.e. deep understanding of your system.


I'm definitely with you there, but I've also seen the side of it where it turns into a cargo cult and runs headlong into the replication crisis.

Perhaps the good thing is that as the new things gain popular attention, the old techniques such as SPC are under less pressure to support success theater, and revert to being actual useful, solid tools.


Isn't the point of ML exactly that you don't know the underlying distribution? How is this ever assumed in any way? ML is not parametric statistics.


(Some) ML is non-parametric, but there are always some questions you need to be able to answer about your data. At bare minimum, is the generating process ergodic, what is the error of your measurement procedure, how representative of the true underlying distribution is your sampling procedure? All use of data should start with some exploratory analysis before you ever get to the modeling stage.

Once you have a model, at minimum understand how to tune for the tradeoffs of different types of error and don't naively optimize for pure accuracy. At the obvious extremes, if you're trying to prevent nuclear attack, false negatives are much more costly than false positives, if you're trying to figure out whether to execute someone for murder, false positives are much more costly than false negatives. Understand the relative costs of different types of error for whatever you're trying to predict and proceed accordingly.


Well, all optimization problems are equivalent to a maximum likelihood estimate for a corresponding probability distribution so you may make more implicit assumptions than you think.

Typical ML methods just have a huge distribution space that can fit almost anything from which they pick just 1 option. This has two downsides:

Since your distribution space is several times too large by design you lose the ability to say anything useful about the accuracy of your estimate, other than that it is not the only option by far.

Since you must pick 1 option from your parameter space you may miss slightly less likely explanations that may still have huge consequences, which means your models tend to end up overconfident.


I mean yes, there is parametric ML (maximum likelihood, MAP, GMMs, ...) and there is non-parametric ML (everything neural network, SVM, GBM, random forrests, ...).

I'd argue that the latter had bigger success in the past since the prior on the data distribution is usually wrong in real life. Think about a prior for image data distributions or the same in nlp. Forget about it.


[Disclosure: I'm an IBMer - not involved with this work]

With regard to exploitation, IBM research has done some interesting work in the form of an open source "Adversarial Robustness Toolbox" [0]. "The open source Adversarial Robustness Toolbox provides tools that enable developers and researchers to evaluate and defend machine learning models and applications against the adversarial threats of evasion, poisoning, extraction, and inference."

It's fascinating to think through how to design the 2nd and 3rd order side-effects using targeted data poisoning to achieve a specific outcome. Interestingly, poisoning could be to force a specific outcome for a one-time gain (e.g. feed data in a way to ultimately trigger an action that elicits some gain/harm) or to alter the outcomes over a longer time horizon (e.g. Teach the bot to behave in a socially unacceptable way)

[0] https://art360.mybluemix.net/


The problem is high dimensions knowing the distribution or even characterizing it fully with data is incredibly difficult (curse of dimensionality). I think the real assumption in ML is just that there is some low dimensional space that characterizes the data well and ML algorithms find these directions where the data is constant.


Wait until you find out low many studies have been published in medical journals with serious statistical flaws.


> 1 - You know the distribution of the phenomenon under study (or make an explicit assumption and assume the risk of being wrong)

Nonparametric methods say 'hi'.


> You know the distribution of the phenomenon under study

If you know the distribution of the phenomenon under study you dont need ML, that is what probability is for.

> or make an explicit assumption and assume the risk of being wrong

No.You have the Bias/Variance tradeoff here.You can make an explicit assumption about your model or not.

> Using (1), you calculate how much data you need so you get an estimation error below x%

This is extremely complicated for anything except the most trivial toy examples, probably not solvable at all and definitely not the way biological intelligent systems (aka some humans) do it.


This author has published a couple of articles like this at the New Yorker They all have this in common: the author works through some interesting and in some ways unusual cases where data or statistics have been improperly or naively applied, with some social costs. I really enjoy the articles themselves.

Then the New Yorker packages it up with a cartoon and a headline and subheadline like "Big Data: When will it eat our children?" or "Numbers: Do they even have souls?", and serves it up to their technophobic audience in a palatable way.

https://www.newyorker.com/contributors/hannah-fry


> *Numbers don’t lie, except when they do. Harford is right to say that statistics can be used to illuminate the world with clarity and precision. They can help remedy our human fallibilities. What’s easy to forget is that statistics can amplify these fallibilities, too. As Stone reminds us, “To count well, we need humility to know what can’t or shouldn’t be counted.”*

I do have a problem with her conclusion here. Are numbers really lying if it's actually an incorrect data collection method or conflicting definitions of criteria for generation of certain numbers (like the example used in the second to last paragraph)? She seems to be pointing out a more important fact, which is that people don't question underlying data, how it was collected, and the choices those data collectors made when making a data set. People tend to take data and conclusions drawn from it as objective realities, when in reality data is way more subjective.


> Are numbers really lying if it's actually an incorrect data collection method or conflicting definitions of criteria for generation of certain numbers

Obviously it's a figurative metaphor, but it's pretty clearly a case of "this supposedly objective factual calculation is presenting an untruth."


You can still be very misleading with objectively true calculations. "There is very low stress on the patient's arteries and only a very small tear." - said patient bled to death and has only atmospheric stress now that their veins are bloodless. Less than 0.1% of their total vein area has a rip in it.


One of the points is that the act of collecting the numbers and making decisions based on them can change the underlying behavior. The numbers can be perfectly correct (how many cases does the IT department get? How long does it take on average to resolve the issue?). The goal can be correct (we want to get issues resolved faster). But as soon as you try to manage people based on those perfectly valid numbers, but things often happen.


Hannah Fry is mathematics communicator working in quite a few other media. She's been on a couple of BBC documentaries, and on a few videos on the Numberphile YouTube channel (which is also very good regardless of who's on it)


It's ironic that you accuse them of sensationalizing headlines by making up a sensationalized headline. The headline for this article is pretty neutral and sets up an interesting article that teases a very deep topic with references to books that explain it further. And the cartoon was pretty funny. This is exactly what I expect from the New Yorker and I'm rarely disappointed.


The article seems to stop pretty early, as if something is missing.

It’s an anecdote about a government incentive to have doctors see patients within 48 hours causing doctors to refuse scheduling patients later than 48 hours in order to get the incentive bonus.

This is not an example of limits of data, but an example of perverse incentives.


> It’s an anecdote about a government incentive to have doctors see patients within 48 hours causing doctors to refuse scheduling patients later than 48 hours in order to get the incentive bonus.

That part is probably just the first 1/8th or so of the article (rough guess). Sounds like it was cut short for you?


When a measure becomes a target, it ceases to be a good measure.

This case could be said to be creating misleading data. If the doctor's offices aren't recording appointments more than 48 hours in advance, the System is losing visibility on the total number of people who want appointments. Every office will appear to be 100% efficient even though there is effectively still an invisible waiting list.


Did you use reader mode? because I noticed that when I use firefox's reader mode it will cut off part of the article on the New Yorker site.


When I first opened the article in Firefox, I only saw the first two paragraphs (same as the OP). This was true whether it was in reader mode or not. I opened it in Chrome and saw the whole article.

I just tried opening it in Firefox now (a couple hours later) and I see the whole article. If I switch to reader mode I do see it's truncated about halfway through, but I think that's a separate issue from what the OP was seeing.


On iOS. Initially used reader mode, but switched because it seemed cut off.

But also without reader mode I can’t see more than the NHS anecdote.


We clicked on the link and were presented with two paragraphs. New Yorker articles usually don't show up correctly, I don't know why they allowed to be posted here. Acting like we don't know how to read a webpage is gaslighting.


Well that wasn't my intention at all, just a problem i encountered a few hours before I read your comment so it was fresh on my mind.


Data is not a substitute for good judgment, for empathy, for proper incentives.

The article focuses on governments and bureaucracies but there's no better example than "data-driven" tech companies, as we A/B test our key engagement metrics all the way to soulless products (with, of course, a little machine learning thrown in to juice the metrics).

I wrote about this before: https://somehowmanage.com/2020/08/23/data-is-not-a-substitut...


I think it's almost worse in tech because it largely works. If the government sets a flawed metric, their real goal of pleasing their constituents has failed and theoretically they either have to fix it or lose political support.

But in tech, if your goal is just to make money, soullessly following data will often get you there, to the detriment of everyone else. Clickbait headlines will get you more views. Full-page popup ads will get you more ad clicks/newsletter subscriptions. Microtransactions will get you more sales. Gambling mechanics will get you more microtransactions.

You can say it's a flawed metric, but I think in the end, most people just actually care more about making money than they do about building a good product.


I've written the same sentence before! This is so cool! pardon the wall of text.

Here's my thesis, curious to hear your thoughts.

At some time around 2005, when efficient persistence and computation became cheap enough that any old f500-corp could afford to endlessly collect data forever, something happened.

Before 2005, if a company needed to make a big corporate decision, there was some data involved in making the decision, but it was obviously riddled with imperfections and aggregations biases.

Before 2005, executives needed to be seasoned by experience, to develop this thing we call "Good Judgement", that allows them to make productive inferences from a paucity of data. The Corporate Hierarchy was a contest for who could make the best inferences.

Post-2005, data collection is ubiquitous. Individuals and companies realized that you don't need to pay people with experience any more, you can simply collect better data, and outsource decision-making to interpretations of this data. The corporate hierarchy now is all about how can gather the "best" data, where "best" means grow the money pile by X% this quarter.

"Good Judgement" used to be expected from the CEO, down to at least 1-3 levels of middle management above the front-line people. Now, it appears (to me) to be mostly a feature of the C-Suite and Boards, and it's disappeared elsewhere. Long-term, high-performing companies seem to have a more diffused sense of good judgement. But these are rare. maybe they always have been?

Anyways, as we agree, this has a tendency to lead in problematic directions. Here's my thesis on "why".

Fundamentally, any "data" is reductive of human experience. It's like a photograph that captures a picture by excluding the rest of the world.

Few people seem to understand this analogy, because they think photographs are the ultimate record of an event. Lawyers understand this analogy. With the right framing, angle, lighting (and of course, with photoshop), you can make a photograph tell any story you want.

It's the same issue with data, arguably worse since we don't have a set of standard statistics. We have no GAAP-equivalent for data science (yet?).

Our predecessors understood that data was unreliable, and compensated for this fact by selecting for "Good Judgement". The modern mega-corps demonstrate that we don't have a good understanding of this today, evidenced by religious "data-driven" doctrine, as you describe.

People will say "hey! at least some data is better than no data!", to which I'll say data is useless and even harmful in lieu of capable interpreters. In 2021, have an abundance of data, but a paucity of people who are capable of critical interpretation thereof.

I don't know if it's a worse situation than we had 20 years ago. But it's definitely a different situation, that requires a new approach. I think people are taking notice of it, so I'm hopeful.


Thank you for writing this, I enjoyed reading it and largely agree.


Same, but the sentance "Individuals and companies realized that you don't need to pay people with experience any more, you can simply collect better data, and outsource decision-making to interpretations of this data." is probably demonstrably false and at a minimum it'd be very difficult to prove its true over a bull hypothesis that were as good as, if not marginally better than before as a species.

To an extreme it's "well we got horse carriages and cars so like no one is ever gonna run a fast marathon ever again, were outsourcing everything to the hay eaters".

Yeah no. And not that we shouldn't perhaps be more contemplative as a society, but as a species we generally don't atrophy capability that would other wise be complentary and beneficial to us.

Case and point, it's relatively well known "management ability" is a thing across cultures, experience, industry and training. The metrics point to at least in a narrow sense data driven executives outperforming their "solely intuitive" bunch.


Kind of love the initial story in the article about 48-hour wait times.

I had a stint writing conferencing software for quite some time, and every once in a while we'd come across a customer requirement that had capabilities which were obvious to us developers "would be misused". As a result, we did the "Thinking, Fast and Slow" pre-mortem to help surface other ways that the system could be attacked (along with what we would do to prevent it and how it impacted the original feature).

If you create something, and open it to the public, and there's any way for someone to misuse it for financial incentive (especially if they can do so without consequence), it will be misused. In fact, depending on the incentive, you may find that the misuse becomes the only way that the service is used.


When the doctor's office is inundated with patient visits, you cannot fix scheduling back-logs by fiddling with the scheduling algorithm, no matter how much data you have.

Say the calendar is initially empty and 1000 people want to see the doc, right now. You can fill them all into the calendar, or you can play games that solve nothing, like only filling tomorrow's schedule with 10 people, asking 990 of them to call back. That doesn't change the fact that it takes 100 days to see 1000 patients. All it does is cause unfair delays; the original 1000 can be pre-empted by newcomers who get earlier appointments since their place in line is not being maintained.


How do we read more than the initial story? Do we have to pay to read it? There is no indication on the webpage there is more than two paragraphs other than the advertisement for the author's book.


Not what I see, including in a Chrome incognito window (which shows me the whole article). Could you have an ad blocker or other extension that is going wrong? Do you use an unusual browser?


What is 'the "Thinking, Fast and Slow" pre-mortem'?


conferencing as in 'ComicCon' or 'Zoom'? Can you give an example?


Haha, hadn't even thought of that -- Conferencing as in developing bespoke (and some white-label) software for organizations deploying Office Communications Server (and R2), Lync and ultimately Skype for Business (I do a little Teams work these days but I am focused on other areas, presently).


> doctors would be given a financial incentive to see patients within forty-eight hours.

Not measuring that from the first contact that the patient made is simply dishonest.

"Call back in three days to make the appointment, so I can claim you were seen within 48 hours, and therefore collect a bonus" amounts to fraud because the transaction for obtaining that appointment has already been initiated.

I mean, they could as well just give the person the appointment in a secret, private appointment registry, and then copy the appointments from that registry into the public one in such a way that it appears most of the appointments are being made within the 48 hour window. Nothing changes, other than that bonuses are being fraudulently collected, but at least the doctor's office isn't being a dick to the patients.


It may amount to fraud but in the context of that service, no record was kept of calls not resulting in an appointment. My wife was a receptionist at a GPs around the time the article mentions and in some cases it was worse than that - if you phoned and asked for an appointment, if they couldn't give you one within 48 hours they wouldn't offer one at all - telling you to call back later / the next day.

Although the New Yorker piece has leaned on the bonus angle the way it was discussed publicly was that doctors weren't allowed to offer you appointments outside the 48 hour window [0].

It was a very silly interpretation of the rules, but I think GPs felt it was too rigid and therefore stuck to the letter rather than the spirit.

0 - http://news.bbc.co.uk/1/hi/health/3682920.stm


It's really hard to design a not game-able metric. The problem here seems to be that doctors are under-provisioned for some reason, and so long wait times are a form of load shedding for the system. Without addressing this core issue, which individual clinics have little control over because they are generally boxed in by regulations over who can administer medical care, except to rush appointments (which they're probably already doing), there's not much they can do to solve the problem, so all they can do is try to game the rules or not get the bonuses.


Doctors don’t get to bill for idle time. Being less than fully utilized is leaving money on the table. The idea here is presumably to compensate them for leaving gaps in their schedules.


What you're describing is effectively what doctors did: they left their entire calendar free until the last moment and only took appointments then.

It turns out this is not actually what people want; people want this availability to exist, but also do not want to be turned away if they book ahead of time, which points to this being a capacity problem, not a scheduling problem.


Not to defend the system, but I don't think they were trying to fraudulently get her 48 hour payment. Rather, if they accepted advance bookings (which was, after all, the old system) then almost nobody would be seen in 48 hours. They could have offered advance bookings for follow-ups, but since most appointments are taken by the sickest people, many of them will be follow-ups, so this probably wouldn't have helped.

If you want to reduce the queuing time in a system you need to reduce the processing time (i.e. the duration of an appointment) or increase the number of servers (i.e. doctors). You can't do it by edict.


On hindsight. However did you think of that before hearing of the problem. Even if you did, can you think of - without much time to think - how every possible metric I can propose on every possible topic.

Tony Blair was trying to solve a real problem that needed solving. That he opened a different problem is something that we should think of as normal, and not blame him for trying to solve the original problem. The question should be how to we change the metric until the unintended consequences are ones we can live with. That will probably take more than a lifetime to work out.

Note that there will be a lot of debate. There are predicted consequences that don't happen in the real world for whatever reason. There are consequences that some feel we can live with that others will not accept. Politics is messy.


Data always needs to be paired with empathy. ML/AI simply doesn't have empathy so it will always be missing a piece of the overall pie.

Let AI crunch the numbers, but combine it with a human who can understand the "why" of things and you can really kick butt.


I agree with you, although, unfortunately, most -- if not all -- engineers I know would respond to this by complaining about how "a human who can understand the 'why'" cannot be automated.


Confusing performance metrics and strategical objective is not a data problem, it is a human problem. It happens to a lot of people outside the usual Blair-WhiteNationalist-IQ crowd. I do not think that advanced technical knowledge in ML or stats is required to avoid this mistake ; it is the ability to perform valid counterfactuals statements.

A good example of what I mean can be found on wikipedia :

His instinctive preference for offensive movement was typified by an answer Patton gave to war correspondents in a 1944 press conference. In response to a question on whether the Third Army's rapid offensive across France should be slowed to reduce the number of U.S. casualties, Patton replied, "Whenever you slow anything down, you waste human lives."[103]

https://en.wikipedia.org/wiki/George_S._Patton

Here, US general Patton is not confounding a performance metric (number of casualities) with strategic goal (winning the war). His counterfactual statement could be that ''if we slow things down, you are simply delaying future battles and increase the total number of casualties in order to achieve victory''.

I'm not suprised at Blair decision. When we choose leaders, do we favor long term strategic thinkers, or opportunistic pretty faces?


From the ungated archive [1]:

> Whenever you try to force the real world to do something that can be counted, unintended consequences abound. That’s the subject of two new books about data and statistics: “Counting: How We Use Numbers to Decide What Matters”, by Deborah Stone, which warns of the risks of relying too heavily on numbers, and “The Data Detective”, by Tim Harford, which shows ways of avoiding the pitfalls of a world driven by data.

Data is a powerful feedback mechanism that can enable system gamification; it can also expose it. The evil is extracting unearned value from a system through gamification not the tools employed to do so. I’m looking forward to reading both books.

[1] https://archive.is/ynOm2


Data is very limited, indeed. We can't predict outside the distribution, or unrelated events (without a causal link), or random events in the future. We should be humble about the limits of data.


Sure, but coding a human-equivalent response to such events is trivial (because no one and nothing responds well to such events).


Does the use of statistics actually amplify misunderstanding, or merely reveal misunderstandings that were already there? In any of these examples given - predicting rearrests, infant mortality, or so on - it's hard to imagine that someone not using numbers would have reached a conclusion that was any closer to the truth.

Data has its limits, but the solution is usually - maybe even always - more data, not less.


It's pretty trivial to predict things without "data". Data just means using some measurement system to obtain measurements of some target phenomenon. Many targets cannot be measured, or have not occurred to be measured.

Reasoning counter-factually is trivial: What would happen if I dropped this object in this place in which an object, of this kind, has never been dropped before?

Well apply relevant models, etc. and "the object falls, rolls, pivots, etc.".

This is reasoning-forward from models, rather than backwards from data. And it's the heart of anything that makes any sense.

Data is not a model and provides no model. The "statistics of mere measurement" is a dangerously utopian misunderstanding of what data is. The world does not tell you, via measurement, what it is like.


But where does that model come from if not from data? We might use some logical principles to inform our model - but don't those principles themselves also ultimately have to be inferred from data about the world?


measurement of our bodies engaged in deliberate action

measurements of the world resolve ambiguities; they do not 'contain' descriptions of the world, not can they provide any

measurements of objects must be interpreted by models


This article is totally gibberish. It's a terrible mixture of many unrelated things. Just because those things all have something to do with data (anything can be presented in numeric form), it does not make their issues are about data.

First, the Tony Blair example is not about data. It is a failure of government planning. It's wrong politics and wrong economy.

The G.D.P. example is laughable. G.D.P. is never intended to be used to compare individual cases. What kind of nonsense is this?

And the IQ example. The results are backed by decades of extensive studies. The author thinks picking a few critics can invalidate the whole field. And look! The white supremacist who gave Asians the highest IQ, what a disgrace to his own ideology.

Many more. I feel it's kind of tactic to produce this kind of article. Just glue a bunch of stuff, throw together with somethings seem to be related, bam, you got an article.


Gotta disagree here. This article is acknowledging a pattern, that data is misused in many different areas.

I think the problem goes even deeper, which is a misunderstanding of the scientific method. Good discussion about this topic here: https://news.ycombinator.com/item?id=26122712


"data is misused in many different areas" is not a valuable / informative point.

There are many wrongs seem to have something to do with data, but in fact they are not.

Like socialist economy planning will eventually fail, but then you would say they misused data. It seems relevant, but misusing the data is not the real cause of their failure at all.


Replace "socialist" with "large company". The companies gather data, establish metrics, and manage to those numbers, and often bad things result. Ever been in a company where some internal support function goes to hell because its top manager's bonus depends on a metric, and they can improve that metric by refusing to support the users (find excuses to close IT support calls without fixing the issue, etc).


Yes. But then some would say the company fail because it "misused data", but it was not the real cause of the company's failure. Any project involves using data could blame the failure on "misused data" which is an useless conclusion.


An absolutely fantastic article that captures my concerns as a user, purveyor, and automater of systems that help with numbers. I'm always very cautious regarding the jump from numbers informing to numbers deciding.


This article is not so much about the data, as it is about rules and thresholds used to divide that data into groups.


"once a useful number becomes a measure of success, it ceases to be a useful number"

Two other unintended consequences of incentives I learned in economics:

1. Increasing fuel efficiency does not reduce gas consumption. People just use their car more often.

2. Asking people to pay-per-bag for garbage pickup resulted in people dumping trash on the outskirts of town.

Edit: Did more research after downvote. Definitely double check things you learn in college

1. The jury is still out: https://en.wikipedia.org/wiki/Jevons_paradox

2. Seems false https://en.wikipedia.org/wiki/Pay_as_you_throw#Diversion_eff...


They might as well have included the granddaddy example (as the age of computing goes): The vietnam war, mcnamara & body counts.



All observation is theory-laden; data cannot speak for itself.


around paywall : https://archive.vn/ynOm2




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: