Is there a list of well-known studies that have been overturned because they haven't been reproducible? I'm wondering how much "common knowledge" that has come from those studies is actually true or not.
There are, but the "replication studies" tend to change little details for whatever reason so they aren't really checking for reproduciblity anyway.
I don't mean details like where/when the study was performed, because obviously the original authors thought it would generalize beyond their exact sample. I mean they will change the methods to ask different questions on a survey, etc.
Also, they use statistical significance vs not to determine if the same result was observed (rather than having quantitatively similar results)... The entire situation is ridiculous.
"One evening at the SIPS conference, after sessions had been concluded and beers consumed, a researcher asked Heathers for his opinion of a study that seemed suspicious. When Heathers decided he’d heard enough to render a verdict, he took a few steps back and began to shout.
""Give me a B!"
""B!" the assembled scientists replied.
""Give me a U!""
And here we see how more bullshit gets started. Deciding that something sounds bad means it is wrong is exactly the same error as deciding that something sounds good means it is right.
> Deciding that something sounds bad means it is wrong is exactly the same error as deciding that something sounds good means it is right.
False equivalence. A correct paper inherently has no flaws, while a wrong paper inherently has at least one flaw. It is very possible to be able to immediately detect some classes of flaws, and that in no way means that one can immediately find all of the flaws.
When I was a TA, I found several people who cheated. One person I discovered because there's no way a struggling student in an introductory C course is going to pull out code with gotos and register keyword usage plastered everywhere. Do I claim to have caught everyone who cheated? Hell no.
I don't know if you've ever been subject to scientific peer review or not. One of the most frustrating things is reviewers who misunderstand the paper, but are convinced they've found some sort of problem with it. Admittedly, these sorts of issues usually point to places where the paper could be more clear. However, the point still stands that it is far easier to misunderstand something, than it is to understand it and find a flaw.
I would be very, very suspicious of any scientist claiming to have found a flaw in a work they've only been given a superficial description of. It may be accurate to say they have a hunch of where a flaw might be. But without really spending time with a paper it would foolish to claim any degree of certainty. Only very low quality papers can be rejected so quickly.
There was an interesting study recently that showed prediction markets actually did a better job of determining 'fake' papers than journals in social sciences. [1][2] A group of researchers took some 21 papers published in Nature and Science from 2010 to 2015. 13 were able to be replicated. The prediction market accurately determined all 13 predictable papers, and 5 of the 'fake' papers. They gave the remaining 3 about 50/50 odds. Though one thing to be said there is that even in studies that they managed to replicate, the effect size was found to be, on average, 50% of the stated effect size.
I think there's long been a perception that social sciences are heavily influenced by people who take whatever their biases are, create some experiment specifically designed to show them, and then play with the numbers or experiment's parameters until they manage to do so. This goes all the way back to (and certainly before) Zimbardo's Stanford Prison Experiment. There was nothing organic there. Participants, both prisoner and guard, were heavily coached on how to act and, in their own words, saw the experiment more as an acting role than emergent normal behavior. It seems to be that this perception is accurate.
In a society where people are increasingly radicalizing on social views, we ought expect social sciences to become even more dysfunctional in the years to come. This sort of stuff is, in turn, casting a very dangerous cloud over the rest of science since people tend to extrapolate these actions and behaviors in the social sciences, to science as a whole. In my opinion we need to start creating a strong distinction between science driven by science that yields falsifiability, predictability, and is driven exclusively by direct experimental result -- as compared to not-quite-science that is based on models, abstract experimentation, is not falsifiable, and does not provide meaningful predictions. What I mean by meaningful is that the whole point of predictability is not to have something to encourage political action on as is often the case in social science, but to use as a litmus test for the accuracy of a hypothesis. If it's true then that provides strength to the hypothesis, if it's false then the hypothesis is false. Without falsifiability, predictions are worthless.
The replication crisis is larger than the social "sciences". Microbiology is also affected, and I've even seen evidence that occasionally computer science is affected, though to nowhere near the same extent and for different reasons.
Are there examples of these "data thugs" causing quality scientific work to be unduly criticized?
For example, good faith research that cannot be replicated is part of science. Are there researchers who have done such good faith research (no p-hacking, no egregious statistical errors) who now have to endure harassment and low-quality criticism because of the work of the "data thugs?"
"Bad faith" research is almost certainly non-replicable, but non-replicable research isn't necessarily in "bad faith".
But that doesn't matter, because non-replicable research is still non-replicable research.
If you study a nonextant phenomenon enough times, then you can produce a null-hypothesis rejection anyway; and if you study a nonextent phenomenon once, then that one study has a chance (albeit minute) of rejecting the null hypothesis anyway. The first is bad faith, the second is either bad luck or incompetence, but neither is an accurate representation of whatever was being studied.
> good faith research that cannot be replicated is part of science.
>"But that doesn't matter, because non-replicable research is still non-replicable research"
Exactly, the people who say direct replications are unnecessary are the ones fostering an environment for fraud. To the scientist it doesn't matter much why you couldn't describe what you did well enough for other people to repeat it. Maybe you made it up, maybe it depended upon the (unmeasured) magnetic field in your room, whatever.
There is a replication crisis in every field that is not disciplined by a need to achieve externally-defined goals. Chemistry is in good shape because people apply it in order to e.g. create steel.
The difference between a scientist and an engineer is that the scientist answers the question "if I do X, what will happen?", and the engineer answers the question "what do I do in order to get Y to happen?". But if there are no engineers working off of the results, the scientist is free to say anything.
Sadly, as pointed out elsewhere in the subthread, this is a sufficient but not a necessary condition. Medicine has plenty of well-defined goals, but it doesn't know how to meet them. The cheap wisdom there is: if you want something badly enough, there will always be someone willing to sell it to you, whether or not they actually have it.
That's a criticism of a bunch of heuristics we've had for selecting promising theories. No popular theory has been proven wrong. It's just that we've been stuck for a really long time. And we need some kind of heuristics, otherwise we're just going to overfit to our data.
If you're interested in machine learning/deep learning/artificial intelligence, for example, I've got bad news about a lot of those papers that appear at NIPS...
I've yet to see anything from the current boom be anything more than empirical curve fitting in fancy clothes. I've lived through enough AI winters to not be surprised. Still disappointed.
UMAP is pretty cool. It's like the realization of a dream that's been hanging around (when it was called "multidimensional scaling") since the mid-20th century.
This[0] is a great video talking about problems with lots of studies in biology-adjacent fields including, but not limited to, psychology and nutrition.
Many of our field's greatest contributions to date are due to the perpetual nagging of pedantic zealots who condemn the laxness of other mathematicians accepting aesthetic brevity as a permissible excuse for unspecified leaps in reasoning.
The history of logic is a completely uninterrupted episode of crisis in replication, where the creative genius of intuition is targeted for termination as we insist upon rigorous definitions and disciplined mechanistic proof as the official workhorses while only regarding the art of surreal hypothesis divination and strongarmed notational analogies as just temporarily embarrassing stopgaps until we can tease out the procedure for automating the prodigies while managing the resultant verbosity of minutely incremental 'drudgery' through the factorization of symmetry/redundancy and begin the descent into madness of hierarchies of meta proof and possibilities of hypercomputation.
Maybe this is out of envy against the more naturally gifted such as Ramanujan, or rather out of desire to retire to the beach someday without neglecting one's responsibility in the search for ultimate knowledge, but more likely just the practicality of desiring an objective reality check when the deepest dives into the abyss of specialization and abstraction render oneself completely lonely and isolated from the checks and balances of haphazard social consensus in peer review or even the _physical_ universe's fairly arbitrary and low-precision demands in constraining and verifying empirical modeling/prediction that more casual disciplines such as sociology or physics typically set as the highest caliber of evidence.
There's a lot of discussion here on why Psychology has this problem more so than fields outside the social science realm.
I think we're all missing what is fundamentally flawed about academic psychology; and it's not their methodology.
In North America (perhaps elsewhere) you are required to have at least a Master's degree to practise Psychology and you should have a doctorate if you want any mobility with your practise.
This leads people who have no interest in academia having to find a way to convince people they've discovered something new and novel so that they can go apply what has already been discovered.
It's no surprise many of this studies can't be replicated! They were designed from the beginning to lead to a significant findings so that someone could write their dissertation, get their doctorate and get the hell out of there.
And you know what? I do not blame them in the slightest. Academia is a nightmare and it's holding a whole profession hostage.
The problem being described is that becoming a clinical psychologist often requires doing social psychology research to get a graduate degree.
There are Doctor of Psychology (PsyD rather than PhD) programs that focus on clinical psychology training rather than doing research, but the majority of clinical psychologists still do a traditional research-oriented graduate program and there are far more schools offering them.
Premise:
> In North America (perhaps elsewhere) you are required to have at least a Master's degree to practise Psychology and you should have a doctorate if you want any mobility with your practise.
Conclusion:
> This leads people who have no interest in academia having to find a way to convince people they've discovered something new and novel so that they can go apply what has already been discovered.
I'm not rejecting the premise, I'm saying the conclusion is not supported by this article. None of the figures mentioned in the article (Daryl Bem, John Bargh, Susan Fiske, Brian Wansink, Amy Cuddy, Simine Vazire, etc) are clinical psychologists. None of the research described in the article is clinical psychology, or even appears to have been performed for clinical psychology.
Maybe clinical psychology has a replication crisis, I don't know, but there is no evidence here for the idea that clinical psychology degree candidates are causing the replication crisis in social psychology.
Heck, computer science has the same problem, and probably will as long as psychology, since like every person, every business is different. Replication difficulties follow naturally.
A difference though, despite the name, computer scientists aren't calling themselves scientists (I mean what science do most of us do?). Calling yourself a scientist comes with a large amount of responsibility and accountability.
Academia is certainly not holding computer science hostage like this. Nowhere near as many people have graduate degrees in CS as people practicing their Psychology degrees.
I think computer science is still held hostage to academia if you are interested in doing more creative work. There is a difference between "practical research" where you are developing new ways to solve problems or apply technology and "academic research" where you are developing computer science theory. However, if you want to do practical research, then opportunities are limited if you don't have a PhD. Without one it seems like the primary options are CRUD apps.
It doesn't seem like this is the case in other engineering disciplines or even outside of engineering.
I am not really sure how you are defining 'research' but I think your definition might be wildly different from mine.
I have a feeling in the pit of my research that what you define as 'practical research' I would NOT define as research in any commonly accepted sense of the word.
> There is a difference between "practical research" where you are developing new ways to solve problems or apply technology and "academic research" where you are developing computer science theory.
this distinction doesn't hold water for me. look at adaboost! but moreover I am very curious what you think 'practical research' looks versus 'academic research'. Like there is a research spectrum in how abstract versus how concrete/empirical computer science research is but 'practical research' is often still pretty complicated/lots of theoretical concerns/ etc etc
> if you want to do practical research, then opportunities are limited if you don't have a PhD
research is hard and it people don't want to hire people for research roles unless they have a track record of doing research. the vast vast majority people with a history of doing 'good' research tend to have Phds. However this is not always the case.
>It doesn't seem like this is the case in other engineering disciplines or even outside of engineering.
I wonder if MA's and PhD's should now be minted by those simply trying to reinforce ore reexamine certain areas. Or maybe trying them in different contexts. Like what happens to the principles if we do the experiments with African villagers etc. as an example.
>"It's no surprise many of this studies can't be replicated! They were designed from the beginning to lead to a significant findings so that someone could write their dissertation, get their doctorate and get the hell out of there."
"It’s a hard enough life to be a scientist," she says. "If we want our best and brightest to be scientists, this is not the way to do it."
And he’s a skeptic of this new generation of skeptics. For starters, Nisbett doesn’t think direct replications are efficient or sensible; instead he favors so-called conceptual replication, which is more or less taking someone else’s interesting result and putting your own spin on it.
Btw, I dont consider either psychology nor medical research to be a science at this point. I just call them "research", which seems neutral enough to me.
disagreement and people being wildly wildly wrong in good faith [and other people pointing it out!] is an essential part of the scientific process. it is difficult because academics often closely associate themselves with the status of their scientific contributions.
No one really likes killing your heroes but sometimes it must be done.
If psychology claims to be a science it MUST collectively accept this. However we must be sure to deal kindly with the people behind the ideas.
> If psychology claims to be a science it MUST collectively accept this
I'm an undergrad in psychology. To my dismay, there is no consensus that psychology should be a science. You still hear claims that humans are too great to be measured, etc. That we are more than material, and therefore can never be studied objectively.
Of course, I disagree with this (everything real can be measured in some way), but the problem runs way deeper. It is an epistemic disaster, where most students do not try to learn the first thing about epistemology. Without exaggeration, some psychologists just want to tell nice stories. I remain baffled. I'm hoping time solves the issue, because I sure don't have a solution.
Transformation from a medieval guild to evidence-based medicine was a painful journey in medice, too.
For example Semmelweis 170 years ago. He gathered data and published findings, that patient mortality is greatly reduced if doctors disinfect their hands before treating patients. Other doctors took this suggestion as offensive, rejected his ideas and attacked him for suggesting them. He had a mental breakdown and died in an asylum.
> Rush continued to advocate his depletion therapy during the yellow fever epidemics in Philadelphia in 1794 and 1797, although his reputation and practice were already waning. By 1797, William Cobbett, the satiric journalist who frequently targeted Rush, was in full cry. He reviewed the 1793 bills of mortality for Philadelphia and showed that the mortality rates increased significantly following the institution of Rush's remedies. He characterized Rush's work as “… one of those great discoveries which have contributed to the depopulation of the earth.” When Rush referred to calomel as the “Samson of medicine,” Cobbett wrote:
>> Dr. Rush in that emphatical style which is peculiar to himself calls mercury the Samson of medicine. In his hands and those of his partisans it may indeed be justly compared to Samson: for I verily believe they have slain more Americans with it than ever Samson slew of the Philistines. The Israelite slew his thousands, but the Rushites have slain their tens of thousands (28).
> Rush sued Cobbett for libel in 1797. The case dragged on for 2 years, probably due to political maneuvering by Rush's enemies. Cobbett was found guilty and fined $5000 (later reduced to $4250), at the time the largest award ever made in Pennsylvania. The damage had long since been done, however, and Rush's practice had vanished by 1797.
We don't take calomel anymore, but it was used for over a century more, including in "teething powders" until 1954. There's a 1965 "Perry Mason" episode in which an attempted murder victim is given lemonade laced with mercuric chloride, with the dubious idea that this will be written off by investigators as a product of a reaction between his habitual calomel and a very weak acid.
Here's a fundamental problem with psychology reproducibility even in principle: experiments often depend on the subject not knowing about the experiment; therefore, the more well-known an experiment, the harder it is to perform.
For example, it would be difficult to perform a large Stanford Prison Experiment without any of the participants knowing about the Stanford Prison Experiment (which knowledge would taint the results). And if you select for ignorant participants, your sample is no longer random.
An interesting point and not untrue, but it's only important if we're trying to get at "human nature" in itself, which to me seems kind of silly as we're social animals by nature. In practice, there's nothing wrong with testing different levels of awareness, and if an experiment is part of popular culture then it's part of us too.
It should be noted that the SPE's methodology was totally fucked in the first place, and before reproducing it we'd need to actually perform it "right" at least once (which can't be done now that ethics* boards are a thing).
>if an experiment is part of popular culture then it's part of us too.
Sure, but so what? The point is that certain psychological experiments can, just by being known to the participants, have their results change, and this means in some sense there are fundamental problems with the replicatability of those experiments.
I'm of the opinion that the complexity of possible human behavior and phenomena is too great to allow for some experiments to be replicated and controlled.
There are some scientific facts about humans that you can establish because you can do a replicated, controlled experiment and there are others that you can't, for ethical (no one should allow this experiment to be done) or material reasons (this experiment is very easy to conduct assuming you have multiple copies of the planet earth).
> I'm of the opinion that the complexity of possible human behavior and phenomena is too great to allow for some experiments to be replicated and controlled.
If these elements of human behavior cannot be confirmed by replicable experiments, what chance do we have of knowing about it.
Claims about such behavior are nothing more than stories, for they are not based on evidence.
> There are some scientific facts about humans that you can establish because you can do a replicated, controlled experiment and there are others that you can't
There is no such thing as a 'scientific fact' that cannot be established by a replicated controlled experiment. The stated dichotomy is really important though, and one that it seems psychology has partially failed to make. For it lost its reputation by presenting 'stories' as scientific fact.
That is not to say there is no value in studying human behavior that is beyond science, but we need to realize that we cannot treat the result of this as 'scientifically true'. Instead, it is something like 'intuitively true based on anecdotal experience'.
How many different ways are there to form a solar system? I am sure the requirements for experiment and possible combinations of factors are just as daunting.
So, I don't think thats the real problem. I think the main real problem is that in the failing fields they look for differences between groups while in the successful fields they look for "universal" rules.
That has always been the impression I got, but I kind of dismissed it as being a thing of a past that you only read about in old psychology textbooks. It is kind of baffling that this persist in the 21st century.
There is really no excuse for them not trying to replicate each others studies this whole time to the point it was allowed to build into a crisis. That is a very old and well known part of the scientific method. So maybe they were somehow so ignorant of science that it was "good faith", which doesn't sound any better than doing it on purpose to me.
And psychology is hardly the only area of research, or even the worst, when it comes to this. Eg, medical research seems to be worse.
>There is really no excuse for them not trying to replicate each others studies this whole time
I think that this statement fails to account for the incentives for researchers: what kind of things they get success, praise, and support for doing. Despite their importance to the scientific project of any field, replicating experiments and publishing negative results are not rewarded the way that new studies with positive results are.
I was in academia dealing with the same stuff. I worked constantly for years to save my project once I figured out what was going on. I got it to the point where at least it wasn't BS, but after that was too burned out to go further. At every step there were social (not scientific) obstacles to me doing a good job.
After completing the degree, I quit rather than produce fake science or spend my time trying to fight against those who are. I have been there, and have no pity for people who choose to produce fake science.
Sure, in some areas of physics they reproduce studies millions or billions of times by figuring out how to get it done cheaper and cheaper until students can do it:
> disagreement and people being wildly wildly wrong in good faith
Unfortunately but the most important basic studies of psychiatry were bullshit. And maybe they were in good faith, but it's incredibly hard to believe that. Freud, for instance, was either serial sex offender and mass-accomplice to sexual abuse, or an incredible rube ...
You see Freud saw sexual fantasies everywhere in women. They would fantasize about getting raped by this or that family member. There's a tiny little problem with that ...
It's come out in more than a few cases that these patients were in fact raped, assaulted and ... by family members and others. Afterwards these "patients" saw everyone's intentions to be sexual and expressed that by seeing sexual abuse everywhere and worrying about what would happen to them.
Strange, isn't it ? Freud never once saw it. God knows how many women tried to convince him to help them get justice, but it must have been over 100 at least. Never once did he help.
It's generally accepted that Science increases accuracy over time, but I don't think people understand what that looks like.
Intuitively people tend to think of random ideas to be 50/50 true or false. But, the space of possible vs actual means random ideas are almost always false. So, fields like medicine demonstrate you can spend generations looking into something and still be wildly wrong.
Freud like all early researchers had no idea what was going on. But, as a baseline to be improved upon he still provides part of a useful foundation for the discipline.
if after introduction of such a system, the number of "hysteria" diagnoses or whatever modern equivalent starts dropping substantially, it would prove Rush right.
> Freud like all early researchers had no idea what was going on.
I would like to point out that the more you read about this issue, the harder this is to believe.
I hadn't mentioned this, but obviously Freud had a clear financial incentive to enable and hide the sexual abuse that the people who paid for his services committed. And of course, there's the possibility (as some of his patients claim) that he himself also committed those acts, so he may have been protecting himself. (and I don't mean the weird and obviously sexual treatments used, I mean he may have actually raped his patients)
That's one.
Second is that even now there are very atrocious signs about psychiatry, and the other fields of psychiatry, like child services both academic and practice, as both have seen systematic, often legally supported abuse. Just read that link. Some of it, like the abuse in the Soviet Union is in the past.
Most of it is not.
The very basis of child services, the "show me on this anatomically correct doll ..." study was shown to be a fraud. Kids simply point out what they find interesting, and the outcome is random. No correlation to abuse at all. In the study, researchers simply lied about what the kids pointed out. Oops.
How many kids were locked away by child services because they pointed at the wrong part ? There can be no doubt that we're talking tens of thousands minimum.
And of course, https://www.ncjrs.gov/App/Publications/abstract.aspx?ID=2335... . There are no words. The government is unwilling to stop it's use even when fraud was shown. And let's just not mention the question of retracting and fixing of decisions made in the past.
Very similar issues exist for mental services.
On the practice side there is open abuse as well. The main child services decision maker in Norway "used child porn" for 20 years (of course this person has also assigned kids to himself, because of course he did ... note that these children have not been taken away from him after he was found out, nor has he been jailed for what he did. Add to that that the organization he was part of is regularly accused of endemic racism, excessive violence, kidnapping, torture, and worse, causing the deaths of children)
That was 1 year ago. I mean I guess he was doing it for the past 20 years (minimum). But this person got caught 1 year ago. And the system, of course, supported him, after he got caught. How can anyone, at this point, believe that the organization of child services has, in the most generous interpretation, not been infiltrated and is being used to enable and legalize paedophilia, rather than prevent it ? Worst interpretation, of course, is that there never was any other goal. (in Norway, but equally horrifying scandals have occured in the UK, the Netherlands, Belgium, France, Spain and Italy. And I'm pretty damn sure that's because I haven't checked many other countries, or just don't speak the language)
The only reason is "provides a useful foundation of the discipline" is that there are many people whose livelihood depends on this being true ... even when we know it's not just a lie, it was at least partly a bunch of criminals hiding their crimes ... And today, it's no different at least somewhat serve to help disgusting abusers and torturers to commit and hide their deeds.
Nor was it different in the past, as we now know plenty of scandals. Nor was it different under other systems. Democracies, dictatorships, communist states, monarchies, ... it doesn't matter, it happened in all of them. Christian, Muslim, atheist, militant atheist states ... and let's just quickly mention what Japan got caught doing : chemical weapons research on mental patients ... whose status as mental patients was established under very dubious circumstances. None have received compensation, or an apology, because of course not.
> you can spend generations looking into something and still be wildly wrong.
Unfortunately that is very clearly not what happened. Criminals and immoral politicians decided what sort of theory would be useful and the result of that is still used today as authoritative.
And even then we're disregarding the fact that even today results in psychology are so often rather thinly veiled frauds.
Note how "trustworthy" this person was. Before he was found out, here's his biography:
Stapel obtained an M.A. in psychology and communications from the University of Amsterdam (UvA) in 1991.[3] In 1997 he obtained his Ph.D. cum laude in social psychology from the UvA.[3] He became professor at the University of Groningen in 2000[3] and moved to Tilburg University in 2006, where he founded TiBER, the Tilburg Institute for Behavioral Economics Research.[3] In September 2010, Stapel became dean of the social and behavioral sciences faculty.[3]
Stapel received the "Career Trajectory Award" from the Society of Experimental Social Psychology in 2009
Unfortunately all of this was the result of about 15 years of constant frauds, a fact complained about by double digit numbers of his students, with zero results.
Needless to say, his book is used by child services and in the Netherlands, and they have not made any retractions, apologies, or changed any of their decisions (which result in incarcerating children) as a result of this.
At what point do we say "this is a total failure" and call these people off ? I guess they're just too useful for corrupt politicians.
I have absolutely no idea why your comment is being downvoted, I guess people incorrectly read your comment that any kind of self-described child service (or caretaker) is necessarily abusive. The problem is when people blindly assume good faith without any provable and verifiable system to ensure this is so.
Or perhaps they think you hold this opinion because they blindly suspect you don't think kids (or peoole in general) in trouble deserve care.
Under such criticism, how is it possible to demand publically verifiable care, if the moment one demands it you are branded as cheaping out on poor souls???
>No one really likes killing your heroes but sometimes it must be done.
It's true. You've got to get your hands dirty. I'm confident I learned as much about the Dark Triad from repeatedly stabbing Delroy Paulhus as I did from merely reading his work.