Note that the projects were not chosen based on their results, and certainly not because the results are counterintuitive. Rather they are the ten best understood social programs we could find.
Hey, this is really neat. It reminds me of the lesswrong.com community, where the goal is to figure out ways of offsetting cognitive biases.
One thing I've always wanted to see is a mandatory A/B test built into any new legislation. That is, funding for a study after X years of the new law or program being in place would have to be part of the law. It could even including a threshold of the desired controlled effect of the law or else it is immediately revoked at that point.
This way we'd force people to argue not about 2nd order issues ("I think mandatory sentencing is good" vs. "No, I think it's bad") when they could agree on first order issues ("We all want to see a 10% decrease in robberies within 3 years.") This way a bad program would automatically expire if it didn't meet its goals, and the goals would have to be open enough that people could throw out bad laws ("Goal is to incarcerate 20% of the population" wouldn't pass, no matter what your political leaning).
> One thing I've always wanted to see is a mandatory A/B test built into any new legislation
This is what I find great about US and EU (I am from Czech Republic), that in a federal system, you can test the legislation in one state first, and then see the results and this lets the other states to perhaps adopt or avoid that approach.
Unfortunately, strong moneyed interests break that model by lobbying strongly in Washington and Brussels from the start, which I think is a bad trend, but I have no idea how to prevent it. The effect of having more cultural diversity is very hard to quantify economically. (There is also other side of the coin though, which is too much state competition can lead to polarization and war.)
Although sometimes there are things where you have to be careful about this approach - for example, if one state engages in lower taxes or pro-export policies, it may gain, but the same policy enacted by everybody will make everyone lose.
There are a lot of things which work better as federal programs vs state / local programs. Homeless populations spring to mind, if some city's treat there homeless population poorly they can 'export' the problem to another city / state. Where a state with overly generous benefits may see migration.
Education is another with states that export there most educated population having little incentive to subsidize their higher education system. Which likely has lead to the dismantling of state funded schools as people became more mobile.
> Which likely has lead to the dismantling of state funded schools
I disagree this is a necessary outcome. It may as well happen that backward states will take notice and improve their higher education system in various ways.
But I think what you are saying is a fair point and I actually mentioned it. What I would add to my comment is the common solution to this is called "principle of subsidiarity" and it's used EU and Switzerland (which is another example of a federation that often tests different law variations in smaller scale first).
It is unethical NOT to A/B test, because you may be doing the worst thing for all patients. You're right that it should be generally known this is done.
I certainly agree for trials for treatments for specific ailments. But insurers, hospital administrators, and others have always been making decisions that balance clinical outcomes and cost of care without patients knowing such a decision has been made. The experiments I refer to are similar to those choices and typically the funding will go testing choices that haven't been made because they are more costly in the short run.
For certain types of legislation the A/B test itself would create a perverse incentive for opponents of the legislation to make the negative effects occur to get the legislation revoked.
For example, if you said "we're going to put twice as many cops on patrol, but we'll revoke this if crime rates don't decrease" then a rational criminal would commit more crime, in order to get the law revoked.
Or if you said "we're going to tax carbon emissions, but we'll revoke the law if domestic oil production drops by more than 20%" you've just invited oil companies to shut down their wells and wait you out.
It is not in the criminals best interest to commit more crime. One criminal is probably only responsible for .0001% of the crime rate in any given city. So the incentive to commit more crime to revoke the law is infinitesimal. The disincentive created by the increased risk of incarceration would completely overwhelm the incentive to create more crime.
This isn't true when the effects are measured in an industry with only several major actors. For example you decide to test out net neutrality to see what effects it has on broadband cost. So you create a bill that says "Net Neutrality unless broadband costs go up". Then Comcast could just increase their prices to deactivate the bill. But situations like these would be easy to stop and the exception.
What if robberies decreased by 20% for some unrelated reason and the legislation increased robberies by 10%? It would still look like a 10% decrease overall.
I would think the law would have to be designed to be compatible with an experiment. Such as being phased in at different times in different places. You'd really have to design the experiment and set objective criteria for pass/fail before passing the law otherwise there'll be all kinds of interested groups distorting it.
I would like to quote your comment in a blog post I am doing about the test. With your permission, would quote the last two paragraphs - as an example of how the test results show change is needed and voila, here's a way to change things. Gr8 idea btw. A twist on sunsetting.
I enjoyed the quiz, but I am not a big fan of the positive_effect/no_effect/negative_effect classification, because it hides the error of the measurement.
Instead, I propose to use a slider, like:
> "In your estimate, how does the second response program influence the rate of repeat offenses?"
> Underneath, a slider on a one-dimensional axis, with appropriate labels. For example, on the leftmost side: "50% increase in repeat offenses", and on the rightmost side "50% decrease in repeat offenses".
> Clicking the "reveal" button superimposes the measured effect and its errorbar (this is important!) onto the slider.
That way everyone can see if the measured effect is relevant, or if the measurement error is so large that the study might have missed a relevant effect.
The downside of a slider is that the slider range will influence ("prime") the responses. But I doubt that it would influence the direction of the guess, so you could still extract useful information out of the reponses.
That would be better, though quite a bit more effort!
Usually these studies only have enough resolution to say "probably worked", "no measurable effect", "probably negative", and even then are often wrong.
It's very interesting how this quiz plays so well into your political and personal biases.
What I take away is reinforcing my bias that intentions and consequences are separate and that we should not conflate the two. However, I imagine many others see this and think "X is bad but Y is good. Let's do more of X and less of Y"
I was thinking about why this website triggered my skeptic alarms and came up with a few things.
1. The web interface makes it hard to go back. The first thing I want to do when I see a surprising answer is reread the question. I understand this is likely just bad engineering, but it makes the whole site feel less trustworthy.
2. Some questions get summarized poorly. For example, the mindfulness question asks "What effect does mindfulness based stress reduction have on self-reported mental health (rates of anxiety, stress, and depression)?" but then summarizes the choices to "reduction rates of mental health issues." You can't drop a word like self-reported from a question, after all physically disabled people self-report being happier after their disability (e.g. http://www.ncbi.nlm.nih.gov/pubmed/21870935).
Also in the "Drug Substitution Programs" question the text indicates that the research is based off of cases where "Addicts were given heroin or substitutes such as methadone or buprenorphine, based on their needs," however the choices are formed "Positive effect - Prescribing heroin to addicts reduces crime rates," [note that it dropped the or substitutes]. This feels like going for shock value.
3. At the end the website is selling very hard about some newsletter. Apparently the website seems to be focused around a career guide? If you truly have no axe to grind then present high-quality information and I'll naturally explore the site more.
4. If your objective is really to help raise awareness about how often media publicity for social interventions doesn't reflect efficacy as measured in journals then I would think at the end you would propose a plan of action , such as "When hearing about a social program, you can use google scholar to tap into research findings..."
This is great! One of the interesting things about social sciences (and engineering, although just like computer hacking, social engineering has unfortunately strong negative connotations) is that the real world is often counter-intuitive. In particular, people tend to believe in punishment a lot, while it commonly escalates the problem (I like the quip "emphasis on punishment is the sign of an obedience frame").
I am dismayed that I didn't do better than random chance, even though I like to read about social issues. I think this really shows the importance of empiricism in social sciences and engineering.
The problem with reward/punishment strategies is that rewards are inevitably deferred; and people who are in problematic situations likely have much lower ability to deal with deferred satisfaction. That's just how it is.
Unfortunately, because bureaucracy is mostly run by people with great skills in the area, this fact is really hard to get across, or is even rejected outright. After all, if I-Court-Judge or I-Politician could keep it together enough to get where I am, why can't anybody else manage just a little bit? And then the morality discourse kicks in and there is no way out.
Why not? You don't think the outcomes of the presented solutions were evaluated using scientific method? It would be nice if you could elaborate.
I understand your point, but if we were fair then software engineering shouldn't be called engineering either. Just like in social sciences, there is a lot of normative judgement and opinion compared to observation and experimentation. But I think both are getting better with time, and as we continue discourse, we employ empiricism more and more.
Would be nice if you explained in more detail what exactly do you mean. IMHO, this is too reductionist. I can only guess:
I imagine punishment works better on children but worse on adults. I imagine it works better against psychopaths than people who have empathy. I imagine it works better on people who coldly calculate than on people who impulsively give to emotions. I imagine it works better on people from the same ingroup than from the outgroup.
Each of these items alone can explain why you can see the effect psychologically but not socially.
The Elderly care question said that there was no effect on 3 year mortality, which is fine and I'm sure that is accurate but it doesn't answer the question of "Did it have a net positive/negative effect on quality of life for those 3 years".
Not been a negative nelly just an observation that something as complex as social interaction problems can't always be summed up with a clear net gain/loss.
Having conclusive evidence that a program doesn't work (or even is harmful) is significant. How effective is it for actually getting such programs stopped? Have any been?
Scared straight is highly effective PR. Hood looking young people getting yelled at by scary men? Possibly crying? It's like schadenfreude for "civilized" society.
I don't understand how policy leaders can look at hard numbers and say, "fuck it, let's keep funding it."
Tucker: I don’t know, but I can get one by this afternoon. The thing is, you’ve been listening to the wrong expert. You need to listen to the right expert. And you need to know what an expert is going to advise you before he advises you.
Home visits for older adults is a little disingenuous. The purpose is not really to reduce death risk, but to increase quality of life. I don't think that counts as "no effect" just "no effect on mortality" which is unsurprising.
> The purpose is not really to reduce death risk, but to increase quality of life.
Not necessarily. There are many at home services geared around attempting to improve medication adherence in elderly patients for example, as they frequently are on several medications concurrently and have trouble keeping track of dosing requirements due to cognitive dysfunction. Non-adherence can have a significant effect on longevity.
I got way fewer of these right than I would have hoped. It's clear that I need to update my understanding of the questions presented. I owe a debt to those who put this together. Thanks very much.
If it helps, any time you choose "No effect", you're making a guess about how thorough the experiment was, not about the program itself. Every program will have some effect, but may be too small to have been detected. Similarly if the correct answer is "no effect" that just means the studies weren't accurate enough.
It would be nice to see a summary of the projects and their effects after finishing the quiz. By that time, I had already forgotten about some and wondered what the answers were.
I find it a bit unintuitive that "no change" errors are valued the same as "opposite" errors (e.g. answering "positive" when the change was actually negative). I find it far less interesting when the discrepancy is whether there is any change or not than when the effect is actually opposite to what one would apparently expect.
Additionally as others have pointed out some of the questions and answers aren't very clear (though maybe it's just my reading comprehension that's failing). I too am unhappy about the way the elderly question is posed -- it's not clear whether all of the programs were actually focussing on reducing mortality. In fact the introduction mentions that it's merely one of several goals and from anecdotal evidence I would expect that the mortality metric is thrown off by patients stabilizing once their health has deteriorated enough to require them to be placed under permanent care -- that you're less (or equally likely) to die doesn't mean you're more (or equally) healthy.
This may be specific to me (though I'm using the latest chrome, latest mac os, so I doubt it), but the "share on facebook" link didn't work at all for me, and the facebook "like" link took me to a page to share the post on my timeline, not just "like" it. Didn't see any easy route to report such things, so here it is!
While the content is great, the tech seems bloody awful. Why is it so slow to load? It takes 5-10 seconds to load on both my laptop and my desktop (both < 1 year old) and I have nice fast fibre. Why would you start with a loading screen in this day and age? And worse a loading screen that doesn't tell you what it's loading. Flash is dead.
It severely impacts the usability, it looks like nothing is happening and I can imagine a lot of people just close it before it loads.
And it's (virtually) static content! Why doesn't it load with the opening page at least?
Note that the projects were not chosen based on their results, and certainly not because the results are counterintuitive. Rather they are the ten best understood social programs we could find.
You can read about the findings of our research into whether people can guess what works and what doesn't ahead of time: http://www.vox.com/2015/8/13/9148123/quiz-which-programs-wor...