Hacker News new | past | comments | ask | show | jobs | submit login
What's wrong with social science and how to fix it (fantasticanachronism.com)
142 points by michael_fine on Sept 11, 2020 | hide | past | favorite | 101 comments



> Publishing your own weak papers is one thing, but citing other people's weak papers? This seemed implausible...

This is practically required by reviewers and editors. If you wade into a topic area, you need to review the field and explain where you fit in, even though you know full well many of those key citations are garbage. You basically need to pay homage to the "ground breakers" who claimed that turf first, even if they did it via fraud. They got there first, got cited by others, and so are now the establishment you are operating under.

And making a negative reference to them is not a trivial alternative. For one thing, you need to be certain, not just deeply suspicious of the paper, which just adds work and taking a stand may bring a fight with reviewers that hurts you anyway.


Citing a paper needn't be a celebration of it, you can cite a paper to say "these guys find X but..."


You basically need to pay homage to the "ground breakers" who claimed that turf first, even if they did it via fraud.

Even referring to it as “science” is fraudulent. Testable theories and repeatable outcomes, anyone? Time this whole field was defunded.


Might be that I am older than most of you, but when I was in high school the term was "social studies" then later it evolved to "social science" so the university crowd could legitimize it to get more funding. there obviously is no science in it.


"Social studies" refers to a variety of different social science and humanities subjects taught in public education. The term comes from US lawmakers around 1900.

"Social sciences" are generally considered the "science of society," and the term comes from philosophers in the mid-1800's.


> "Social sciences" are generally considered the "science of society," and the term comes from philosophers in the mid-1800's.

Some of the philosophers named it "science" of society in order to piggyback off the reputation of real science.

Because real science has such a great reputation, everyone from creationists ( creation science ) to social "scientists" ( social science ) have tried to associated themselves with science to gain credibility.

Social science isn't science. Neither is creationism.

Real science ( as most people understand it ) deal with natural laws and the natural world.


Society is an observable phenomenon that we are able to run experiments on and acquire empirical evidence about. Those experiments have so far been much more difficult to run, rely on poor measuring tools, and the limited data we have acquired is given too much weight. That's still science though.

They very much are a part of the natural world, and presumably have some kind of "natural laws" guiding them. Creationism is not, as it is about things outside our observable world


> Society is an observable phenomenon that we are able to run experiments on and acquire empirical evidence about.

Yes we can. But not in the scientific way. Society changes. You cannot replicate your test on a society because of that. Natural laws do not. So we can reliably hypothesize and test it.

> That's still science though.

It's not a science. By definition.

> They very much are a part of the natural world, and presumably have some kind of "natural laws" guiding them.

Yes. They have natural laws guiding them. It's called physic, chemistry, biology, etc. Social "science" doesn't delve into "natural laws". Society is run, not on natural laws, but human laws.

> Creationism is not, as it is about things outside our observable world.

No it is not. It is about our observable world but uses the bible ( another thing in the natural world ).

Laws, history, political science, economics, etc are are real sciences. They are political/propaganda/etc tools to shape society.

Using your logic, just because literature exists in the natural world, it must mean literature is a science. Obviously it is not.

Just because you can observe society doesn't make it a science, no more than the fact that you can observe a play or a concert makes them a science.

In order for your claim to hold, society has to be constant. But they are not. We must have multiple copies of the same society to test on. We do not. And societies are guided not by natural law, but human law. Which leads to an absurdity because different human societies have different laws. If we follow your logic, then means that there are different natural laws depending on where you live and what country you are from.

For example, that there are laws against interracial or interreligious marriages in some societies isn't a natural law, it's a human law. There is no science in social science. No more than there is science in religion or astrology.


Math is the foundation of science, right? Things in science are provable because of the rigorous mathematical proofs applied to data, correlating with reproducible results, is my perception.

There are many models of societies currently and previously in existence that can be measured, compared, and correlated with predictable cause and effects.

Society is not constant, it's probabilistic, like many topics in physics or other hard sciences.

Society has natural laws that can be mathematically derived and applied. Why can't science be applied to non natural processes? The point of science is to build a model that can be used to build other models, or ultimately be applied to game the subject.

Are you saying we learn physics just to learn it, not to manipulate it to somehow benefit ourselves? Don't we use research in physics as a tool to shape our world in exactly the same way we use research in social science to shape society?

Your definition of science seems to only exclude some folks from the term that are doing the exact same work in a different subject. The word science doesn't preclude a subject. I don't think your contrast is sound.


> Math is the foundation of science, right?

No. The scientific method is the foundation of science. Math is a wonderful tool that is used in science and other fields like business, architecture, music, etc.

> Things in science are provable because of the rigorous mathematical proofs applied to data, correlating with reproducible results, is my perception.

Nothing in science is "provable". Science is not math or logic. Go look up what a scientific theory is.

> The point of science is to build a model that can be used to build other models, or ultimately be applied to game the subject.

That is not the point of science. It is a benefit of science but not the point of science.

> Your definition of science seems to only exclude some folks from the term that are doing the exact same work in a different subject.

Not my definition. I didn't make it up. Just because you haven't a clue what science really is doesn't mean it's "my" definition. How about you take a few seconds and relearn what was taught in high school.

You have a bizarre notion that science exists to benefit ourselves. Religion, art, literature, music, business, etc also benefit us. Doesn't make it science.

Society is a human creation with human laws. That is no more "natural law" than the man-made laws governing chess or monopoly are "natural law".

Now that doesn't mean social "science" is worthy of study. Of course, humanities are worthy of study. But it isn't a science. I think politics is just as valid a field of study as physics or art or literature. But it doesn't make political science a science.

But your real worry is that removing science from social science takes away much of the credibility and respect that real scientists in physics, biology, chemistry, etc have earned right? In my experience, it's usually the social "scientists" who are the most vocal about latching onto "science" because they so desperate want cling onto that unearned authority and credibility.

But then again, it's also why creationists ( creation science ) also want to latch on to science. Pretty much frauds who want to manipulate society want something to give them credibility.


>No. The scientific method is the foundation of science

Simple question then, which aspect of the scientific method is not compatible with "human law?" I can develop a question about how society works, form a hypothesis, develop experiments, and analyze the results.

I may develop poor experiments and as a result only be able to provide limited analysis permanently tainted by a "human law" bias, but it would still be science.


That's just ignorance of how science works. There are no experiments in astronomy either ... or are there?

Natural experiments. Just as much as they happen in economics and sociology. And even though stars and black holes might be simpler than humans, it doesn't mean the soft sciences are not science.

That said exactly because of the additional complexity the bar should be higher for what's acceptable as research, viable methods and appreciable results.


There were some stunning claims being made on Twitter last month based on a recently published study. Instantly skeptical, I dug into the methodology section and found this gem:

"It should be noted that the results cannot be estimated using a physician fixed effect due to a numeric overflow problem in Stata 15 which cannot be overcome without changing the assumptions of the logit model."

... The sad part was they didn't even choose a reasonable model in the first place.


(Ignore my previous reply I found it myself). To be fair to the authors, it is not their primary specification, that was a linear probability model. The logit model is just a robustness check to make sure the linearity assumption isn't driving the results.


Yes, the primary specification was a linear probability model for the likelihood of a binary dependent variable conditioned on two binary input variables. As far as I could tell, the fit was max likelihood without regularization and the paper's bombshell conclusion was based on the regression coefficients' p-values.

The Stata thing was just one of many, many red flags.


When your robustness check fails because of numerical overflow...


I mean they couldn’t just do it in Matlab? or Python? So incredibly lazy.


Python, R, Mathab etc. are outside of many people's skillsets. I've tried to evangelize to many fellow researchers, but they simply don't have the time or the interest in novel programming languages when the tools they have (largely) work.


Matlab is quote expensive.


Which paper was this?


Is there a similar study done on the physical sciences? I’m getting a bit of holier-than-thou feeling from this article.

Edit: from all this talk of reproducibility, I wonder what percentage of cutting edge ML research is reproducible (either from lack of public training sets / not enough compute)


There are definitely studies criticizing ML publications similarly. As a kind of statistics (but often without the rigor), screwups make ML methods appear better than they really are. Hence the literature is packed with screwups.

Other CS subfields that get a lot of criticism are "network science" and bioinformatics.


I've been playing with some ML lately. I'm sure some of this is because I only have a high level understanding of what I'm doing, but it really feels like throwing things at the problem and see what sticks and is fast. Add features, remove features, try different network architectures, different activation functions, etc. I think the best understood thing might be overfitting, which is oddly reassuring.


> ...throwing things at the problem and see what sticks and is fast.

That generally isn't enough to get published nowadays though, at least in a simple sense (in a broader sense that process might describe all research, of course). To get published requires some deeper demonstration of a new kind of method that not only works, but is superior to all that preceded it in some important way. In other words you show your new method compared to other methods, where yours must be better. Obviously, bad research here can show a supposed advantage by either doing a unfair job applying competing methods, or overfitting their new methods. Or both. As I understand it, the first one is quite common: comparing a poorly-tuned old method to a carefully-tuned new method.


Well yes. I remember hearing that there are now meta-networks, networks that optimize other networks by mechanisms you described.


ML has the benefit though of a rapid turnaround time from Academia -> Industry. Things that work/replicate will be immediately used to make money. Things that don't work will be abandoned pretty quickly (at least outside of ML researchers).


There are tons of replication issues across the sciences, they are just most salient in the Social Sciences because the topic is just really hard to study well.

Clinical trials can often be flawed, even if the stats are fine, just in how they sample. For example, women are often excluded from trials due to hormonal changes, but how drugs impact women is really important! Participants are also typically drawn from specific locations, and so may not be representative of people with different diets, lifestyles, and environmental factors.

Physics has its own controversies, though not always directly related to replication. For example, Harry Collins recounts the social factors involved in the discovery of gravitational waves: https://blogs.sciencemag.org/books/2017/03/28/harry-collins-...


Biological sciences are more often than not just as difficult to reproduce, mostly due to the difficulty of controlling living organisms, the somewhat random nature of the outcome, and p-hacking.



He mentions that epidemiology has actually more severe problems than economics. Having read some epi papers I understand why. Not sure if you'd count that as a physical or social science though: at least theoretically it's biologically based, but in reality the data it works with is mostly social and demographic.


This guy overstates his case somewhat. Consider:

"If the original study says an intervention raises math scores by .5 standard deviations and the replication finds that the effect is .2 standard deviations (though still significant), that is considered a success that vindicates the original study!"

Why the exclamation point here? The replication study isn't magically more accurate than the original study. If the original paper finds an 0.5 standard deviation effect and the replication study finds an 0.2 standard deviation effect, that increases our confidence that a real effect was measured, but there's no reason to believe that the replication study is more accurate than the original study. Maybe the true effect is less than measured, but maybe not. So yes, it should be considered a success.


When I advise decision makers on reading statistics (in my case, state-wide health data), I urge them to focus on effect size and only use significance as a filter. Two reasons:

1. Effect size is the most important thing. The point of the study is (usually) to guide decisions. Sticking with the article's example, let's say combining both studies shows the increase is likely 0.35 standard deviations. Is the intervention still worth the cost? Is it still the best option?

2. If there's enough data (e.g., an observational study) or a good chance of omitted variables, there's going to be a "statistically significant" difference. No matter what's measured. I would bet my life's savings there's a statistically significant difference in profits of New York businesses depending on whether the owner's named Jim or Bob. A replication of the experiment with all Jim and Bob businesses in another state would also guarantee significance. So it's a coin toss whether the second study would "successfully replicate" the same direction of effect.


I think his point here is that the effect in replication is closer to 0 than to the original claim. It might be more obvious if he chose an order of magnitude difference as an example - going from the dominant factor to technically-not-nothing might be replication but it's not vindication.


They reported two facts, statistical significance and effect size, one of which was not replicated. Of course this doesn't prove anything definitively, but it still arouses suspicion. As for whether it's exclamation-point-worthy, at worst it depends. For example, 0.5 standard deviation improvement in math scores might normally require a ton more expensive effort or even have been thought previously impossible, while a 0.2 was easy and not publication-worthy.


If someone is selling you on rethinking an entire discipline they are probably overstating their case a bit. Or you’re ignoring them outright.

That doesn’t mean they’re wrong, necessarily. Overcoming inertia is a huge challenge. Daunting, even.


Agree. In the face of institutional inertia - here crossing industry and academic fields - your starting point is when I see a fly I use a canon. It is extremely difficult to budge. As so the usual reminder: the first 51% of communication is repetition


> the first 51% of communication is repetition

Is this a famous saying? It sounds nice but I hadn't heard it before. (And Google doesn't pop up anything obvious.)


That's my own turn of phrase.


I have found that the repetition usually only works if it comes from multiple sources. It's the primary reason I encourage the 'new guy' to take their "this doesn't make sense" questions up the chain of command or to the groups we collaborate with.

One, I hate to crush a spirit, although I'm just sending them to someone else to do it. Two, about one time in ten they come back with a preliminary roadmap to a solution.

I can tell Steve until I'm blue in the face that this code is nuts and get nowhere, or I can send three other people to tell him once and finally he'll take it seriously.

I don't think it's malicious, I think it's a combination of basic human psychology with learning strategies. Sometimes you have to shop for new instruction because the perspective your teacher brings simply doesn't resonate. Each person reinforces the neural connections and phrases it a little differently (perhaps especially with new people, because they are fresh off the street and don't use our jargon yet?)


It depends on how the replication is done, but the big replication projects typically use very large sample sizes compared to the original studies, so their error bars are much smaller.


Indeed, and even if the replication isn't significant, it doesn't mean that the replication and the original study significantly differ from each other.

Overall, the condescending trash talking in this article led me to flag it.


Could you speak a little more on that point?


bruh


I'm sorry, but this is such a ridiculous counter argument I'm speechless.

An explanation point as criticism?

>How dare he wear tweed, his argument is invalid.

>The replication study isn't magically more accurate than the original study. If the original paper finds an 0.5 standard deviation effect and the replication study finds an 0.2 standard deviation effect, that increases our confidence that a real effect was measured

It also increased our confidence that the effect is small enough to be ignored. You can't pretend that the two studies are independent from each other. The second is directly the result of the first and you need to use Bayesian methods to calculate your belief of the result. The questions of 'is there an effect' and 'the effect size is >= 0.5 sd' give you two vastly different probabilities and vastly different policy responses.


> An explanation point as criticism

As in "eats, shoots, and leaves", a little punctuation can totally change the meaning of a sentence. In this case, a period would have expressed agreement while an exclamation point expresses incredulity.


!


The "social sciences" include a lot. Wrt Sociology, I'd say one problem is the overemphasis on quantitative methods - they try to be as serious as the big boys.

The best sociological research I've read was qualitative though. Questionable replicability is of course built-in in this type of research but the research dealt with relevant questions. Most quantitative sociology seems rather irrelevant to me.

Another problem is of course that most quantitative sociologists don't have a clue what they are doing. They don't even know the basics of statistics and then use some statistical methods they don't understand. It's some kind of overcompensation, I think. Although, psychologists are even worse in this respect. It's really fun to watch an psychologists torturing SAS.

I write this as someone who was originally trained as sociologist and over the years turned into a data scientist.


I’m really interested in what you feel about the potential applications of CS/ML to sociology. Or if you might have any resources that talk about that.

I ask because I’m enrolled in a research program in “computational humanities”. My initial feeling towards the program is that it’s kind of a sham.

Computational Humanities seems to be as computational as an accountant using Excel for their work. Not that I particularly mind, I’m not very interested in the computational aspect at all.


There is a Springer "Journal of Computational Social Science". That could be a source of inspiration. CS/ML in Social Sciences gets interesting where a great amount of routine data is generated -- i.e. areas close to public administration.

Why did you subscribe in the program when you're not interested in the computational aspect? Or are you more interested in some kind of grand social theory/philosophy of computation? If you read German, Armin Nassehi "recently" published "Muster. Theorie der digitalen Gesellschaft" (Pattern. Therory of a digital society). He is not the first but I find his stance interesting - based on several interviews, I haven't read the book though. Many sociologists deal with the Internet & AI but I find those works less inspiring because they usually lack an adequate technical understanding. To me it often feels like bushmen theorizing about empty coca cola bottles (you probably don't know the movie?).


I've tried to understand this (obviously quite angry/ranty) article and cannot actually figure out what data it has.

It seems to not be based on actual replication results, but predicted replication results? But then the first chart isn't even predictions from the market, but just the author's predictions?

The author clearly has a real hatred for practices in the social sciences. But I don't see any actual proof of the magnitude of the problem, the article is mostly just a ton of the author's opinions.

Is there any actual "meat" here than I'm missing? Or is all this just opinions based on further opinions?


It's based on this: https://www.replicationmarkets.com, which is linked in the first paragraph of the article.

Per https://www.replicationmarkets.com/index.php/rules, volunteers are predicting whether 3000 social science papers are replicable. According to the rules, of those 3000 papers, ~5% will be resolved (i.e. attempts will be made to replicate). According to the article, 175 will be resolved. It's unclear to me who exactly will do that work but I would guess it's people behind replications markets dot com (they are funded by DARPA). The rules say that no one knows ahead of time which papers will be resolved so I assume the ~5% (or 175) will be chosen by random.

The data in the article seems to be based on what the forecasters predicted, not which papers actually replicated (that work hasn't been done yet...or at least hasn't been made public). The author of the article is assuming that the forecasters are accurate. To back up this assumption, he cites previous studies showing that markets are good at this kind of thing.

The tone is ranty but, by participating in the markets, the author is putting his money where his mouth is.


I think you're right. Take a look at the before/after curves for "this is what the predictions look like after the papers".

The before curves are gaussian+ distributed and pessimistic, but the after curves are all distinctly bimodal (or worse). This suggests that some population of the participants were broadly pessimized by their surveys and another population was broadly optimized by their surveys.

This could instead be a measurement of how people's trust in science is predicated on how well it matches their own prior beliefs.

+ A sharper eye shows they aren't quite bimodal in the prior belief. Even in those cases, the separation between the modes gets much wider.


No, you are exactly right.


I try not to look down on social science, for the most part data is data as long as you can reason about how it was collected and who by.

The only thing that worries me a little (or a lot sometimes) is that there doesn't seem to be much "bone" for the meat to hang off of - that is, in physics, if your theory doesn't match experiment it's wrong whereas in social science you're never going to have a (mathematical) theory like that so you have to start (in effect) guessing. The data is really muddy, but thanks to recent (good) political developments whatever conclusions can be drawn from it may not be right in their eyes. For example, (apparently) merely commenting on the variation hypothesis can get you fired [https://en.wikipedia.org/wiki/Variability_hypothesis#Contemp...].


Requiring social science theories to have a mathematical founding might be a little too much to ask the social scientists because unlike Physicists, their command of mathematics is far from adequate to do any serious exploration.

I majored in Mathematics but out of curiosity I took some Psychology modules when I was in university. What I found baffling was their lack attention to details. They just seem to have an intuitive model of their subject and they were just reinforcing that intuition while overlooking any details that could have challenged it. Coming from a field where every symbol, punctuation matters, I realised to Psychologists exact details of a curve don't seem to matter much as long as the general trend made sense.

Someone who really impressed me was Dan Ariely who is a behavirol economist. Even though I didn't see any mathematics in his lectures, I loved his approach to the field. I'd be quite happy if more of social science took a similar approach even if they didn't back it up with rigorous mathematics.


I read one guess that 2/3 of the published results in social science are wrong. Suppose you tried to develop a deeper theory of these things and derived consequences from these “results” as one does in math and physics. If your corollary depends on 4 prior results, each with a 1/3 chance of actually being true — assuming no logical errors on your part — then the chance your result really is correct is (1/3)^4 <0.01. With results like this it’s not going to be easy to get much depth that holds water.


[flagged]


This is a bit of an unjust to the history of sociology, and science writ large. What Weber does is help establish, axiomatically not experimentally, the base definitions for the system of sociology. So yes, it is a fiction from another system, but it must be assumed true in that system. This is, ironically given your accusations, a purely mathematical approach to creating a field.

It is taught in 101 because it is important when teaching a subject for the learner to be able to answer “And what again are we doing here?”


The physical sciences have no less "old white man conjecture" in their history.


They have but is not taught as science. Physics 101 does not study Aristotles ideas as part of physics. Not because they are wrong but because they are not scientific. They start with Galileo and Newton.


The first chemistry lecture I ever had started with something to the effect of "Chemistry emerged from alchemy, and we'll not speak of this again!"



Threadreader version: https://threadreaderapp.com/thread/1304399437641461760.html

He mentions changing the threshold for significance as a possible tweak but the issue is something more fundamental. Humans have flaws - like political biases or a tendency to favor one’s own hypotheses (confirmation bias). Humans also operate within systems that have incentives that can motivate them away from truth seeking (publication bias). All this exacerbated the fundamental problem that statistical techniques are easy to manipulate. Virtually all academic (university) studies, in their published format, simply lack the necessary information, controls, and processes a reader would need to easily detect flawed statistical claims. Instead a reader has to blindly trust - assuming that data was not selectively included/excluded or that the parameters of the experiment were rigorously (neutrally) chosen or whatever else. There is no incentive for the academic world to correct for this - there isn’t for example, a financial consequence for a decision based on bad statistics, as a private company might face.


I am glad this topic is getting attention. There is significant bias in academia in social science even outside flaws in statistical techniques. The field has been weaponized to build foundational support for political stances and blind institutional trust granted to academia is enabling it. This author mentions the implicit association test (IAT) as an example of a social science farce that is well known to be a farce, and notes that most social science work is undertaken in good faith.

However the damage has been done and it doesn’t matter if MOST work is done in good faith if the bad work has big impact. As an example, IATs have been used to make claims about unconscious biases and form the academic basis of books like “White Fragility” by Robin DiAngelo. Quillette wrote about problems with White Fragility and IAT as early as 2018 (https://quillette.com/2018/08/24/the-problem-with-white-frag...), and others continue to write about it even recently in 2020 (https://newdiscourses.com/2020/06/flaws-white-fragility-theo...). However few people are exposed to these critical analyses, and the flaws in the scientific/statistical underpinnings have not mattered, and they have not stopped books like White Fragility from circulating by the millions.

We need a drastic rethink of academia, the incentives within it, and the controls that regulate it to stop the problem. Until then, it’s simply not worth taking fields like social science seriously.


It could just be abolished. If the NSF was scrapped, what would the scientific world look like? Probably like the world before WW2, when most science was done by hobbyists or "inventors". Einstein did his seminal work whilst a patent clerk, etc etc.

Most analyses of the problems in science are really analyses of the problems in academia. There's no iron law that states academia has to be funded to the level it is today, and for most of history it wasn't. And let's recall, that these meta-studies are all about science, which is one of the better parts of academia. Once you get outside that into English Lit, gender studies, etc, the whole idea of replicating a paper ceases to be meaningful because the papers often aren't making logically coherent claims to begin with.

A lot of people look down on corporate funded science, but it has the huge advantage that discoveries are intended to be used for something. If the discovery is entirely false the resulting product won't work, so there is a built-in incentive to ensure research is telling you true things. The downside is there's also no incentive to share the findings, but that's what patents are for.

Of course a lot of social psych and other fields wouldn't get corporate funding. But that's OK. That's because they aren't really useful except for selling self-help books, which is unlikely to be a big enough market to fund the current level of correlational studies. That would hardly be a loss, though.


> If the NSF was scrapped, what would the scientific world look like? Probably like the world before WW2, when most science was done by hobbyists or "inventors".

There were scientists who received financial backing from wealthy individuals in a manner not so different than VCs operate today; Tesla among them.

Regardless, I tend to agree that science that exists for the sake of publishing, because publishing is a requirement of receiving grants, has diluted the respectability of science.


Does anyone have links to the Replication Prediction Market results mentioned in the article? That sounds super interesting.

As an amusing nudge, I bet you could do some ML to predict replicability of a paper (per author's suggestion that it's laughably easy to predict) and release that as a tool for authors to do some introspection on their experimental design (assuming they're not maliciously publishing junk).


Here’s the paper, published only recently: https://royalsocietypublishing.org/doi/10.1098/rsos.200566

> I bet you could do some ML to predict replicability of a paper (per author's suggestion that it's laughably easy to predict)

I am betting any such ML system could be gamed and addressing the issue would ultimately still need humans in the loop. For example, what if I am selective with my data, beyond the visibility of ML evaluating the final published paper? I don’t think this is “laughably easy” to predict. It may be easy to spot telltale signs today that predict replicability, but as soon as those markers are understood, I imagine authors will simply squeeze papers through the cracks in a different way.

Another issue is this bit from the author on Twitter:

> Just because it replicates doesn't mean it's good. A replication of a badly designed study is still badly designed. There are tons of papers doing correlational analyses yet drawing causal conclusions, and many of them will successfully replicate. Doesn't mean they're justified.


IIRC from prior discussions of this, a lot of the accuracy of the markets comes from people just applying common sense - like, if a really surprising claim that people should really have noticed before now comes with a huge effect size, it's probably false. ML can't judge that because it doesn't have the ability to do basic sanity checks on claims like that. It takes a sceptical human with life experience to do that.


Huh? That sounds exactly the thing that a ML system would learn quickly from the data. You probably don't even need shiny deep learning (though it helps).

Just like with the Netflix Prize stuff, where the conclusion was very similar, ie. just dump in as much data as you can, crank up the ML machinery, and it'll discover the features (better than you can engineer them) and learn what to use for recommendation ranking. And that's basically what we see with GPT-3 too. If you have some useful labels in the data it'll learn them even without supervision, because it has so many parameters, it basically sticks.

Get some papers run it through a supervised training phase where you give it a set with every paper scored based on how retracted/bad/unreplicating it is and you'll get a great predictor. Then run it to find papers that stick out, and then have a human look at them, and try to replicate some of them to fine-tune the predictor. Plus continue to feed it with new replication results.

That said, using an ML system as the gatekeeper as OP suggested is a bad idea, as it'll quickly result in the loss of proxy variables' predictive power.

Though ultimately a GPT-like system has the capacity to encode "common sense".


Even GPT-3 doesn't encode common sense, which is why it can't do a lot of basic physical reasoning. It's "just" word prediction, albeit very impressive word prediction.

If you look at what GPT produces closely, a lot of it is simply bullshit. It sounds plausible but is wrong. That's exactly the wrong type of AI to detect plausible-but-wrong-bullshit papers, which are the most common type.


Right, I worded that a bit lazily. There's no confidence score output from GPT-3, but if there were and if the user would select to only get high confidence results then it would shut up quickly. And that's what I meant by common sense. Of course it depends on the corpus. It's really-really just text, as you said. (It's possible that it can somehow eventually encode high level things like arithmetic, but so far it seems, even if it does have that model somewhere embedded, it doesn't know how/when to use it.)

The language model (GPT-3) doesn't have to understand physics, it just have to help extract out some semantics of the paper.

There needs to be a classifier on top trained with a labeled set of good and bad papers.


I think there is a confidence score actually! Most blogs about it don't show them but this one went into it:

https://arr.am/2020/07/25/gpt-3-uncertainty-prompts/

It's really cool how the uncertainty prompts alter the confidence associated with the next words.

I guess I'm not disagreeing with you in the abstract that a theoretically strong enough AI could identify bad papers, especially if it had some help for 'real' arithmetic. It at least could flag the most basic issues like plagiarism, cited documents that don't contain the cited fact, etc. Detecting claims that are themselves implausible seems like the hardest task possible, however. That's very close to general AI.


> Detecting claims that are themselves implausible seems like the hardest task possible, however. That's very close to general AI.

Yes, of course. I was simply trying to say that an AI can be quite successful in detecting the usual no-nos, eg. multiple comparisons without correcting for it, p-hacking, or ... who knows what "feature" the classifier would find. Maybe there's simply none, so it'll be really up to subject matter experts to review them. (But it's unlikely, because there are quite successful blogs devoted to simply picking apart shoddy papers simply based on looking at the controls, and other parts of experiment design and the methods sections, and of course the aforementioned stats.)


> Even if all the statistical, p-hacking, publication bias, etc. issues were fixed, we'd still be left with a ton of ad-hoc hypotheses based, at best, on (WEIRD) folk intuitions.

This is the quiet part which most social scientists, particularly psychologists, don't want to discuss or admit: WEIRD [0] selection bias massively distorts which effects are inherent to humans and which are socially learned. You'll hear people today crowing about how Big Five [1] is globally reproducible, but never explaining why, and never questioning whether personality traits are shaped by society; it's hard not to look at them as we look today at Freudians and Jungians, arrogantly wrong about how people think.

[0] https://en.wikipedia.org/wiki/Psychology#WEIRD_bias

[1] https://en.wikipedia.org/wiki/Big_Five_personality_traits


I'm not sure that psychologists really even make the distinction between "what is socially learned" and what is "inherent to humans" to be honest. I want to say no one really denies traits are influenced by social factors, but I'm sure you could find some citation to the contrary somewhere.

The Big Five are pretty reproducible in part or in whole, but it's strawman to say psychologists are "never questioning whether personality traits are shaped by society." That's not just not true, nor is it even clear what that question means. Go to Google Scholar and search for "Big Five" and terms like "measurement invariance" or "cultural" or "social" or "societies" and take a look.

The Big Five are meant to be descriptive, the "why" is a different issue. (Just to explain it a different way, let's say you do unsupervised learning of cat images, and find over and over and over and over and over again over decades and different databases that the algorithms always return the same 5 types of cats, plus or minus a little. Wouldn't you make a note of it if you were interested in visual types of cats?) And it's important to remember that some consensus around the Big Five wasn't really until the 90s (even today I'm not even sure there's "consensus" around the Big Five).

I agree that there's a problem with selection of participants, but the only way to do that is to increase participation of the scientific community worldwide. And there are whole fields (cultural psychology) dedicated to the problems surrounding this issue.

The Freudian comparison is also worth commenting on in two respects: first, Freudians got in trouble for not pursuing falsifiable empirical research, which is simply not the case for the things you're talking about. Second, everyone loves to hate on Freud, but the basic tenets of unconscious versus conscious processes that sometimes conflict are still a bedrock of neurobehavioral research, including two-system theories ("fast and slow"), which won someone a Nobel prize and is a darling of cognitive researchers. There are legitimate discussions to be have about the utility of two-system theories but those discussions are far more sophisticated than the criticisms I think you're referring to.


You're right that I'm thinking of very basic criticisms. In particular, there's zero evidence that humans aren't p-zombies [0] and no definitive rejection of the Dodo Bird hypothesis [1]; in combination, this suggests that psychologists are both wrong to imagine that there's anything interesting going on inside of a human's mental states, and also wrong to try to classify those mental states into appropriate and inappropriate states. Instead, what ends up getting studied is society's own idea of what ought to be happening inside our homoncular Cartesian theater [2].

Given these foundational issues, it's folly to try to support Big Five or any other descriptive model just by saying that it's a good fit for the numbers. Any principal component analysis will find something which factors out as if it were a correlative component. This dooms Big Five just as reliably as it dooms g-factors or Myers-Briggs or any other astrology-like navel-gazing.

(If you want an example of actual five things showing up again and again and again, mathematics has examples [3][4][5], but it turns out that when actual five things show up, then the reaction is not to serenely admire the correlation, but to admit terror before cosmic uncertainty. Psychologists do not seem to go insane and kill themselves like statistical mechanics or set theorists; have they really seen the face of god?)

[0] https://en.wikipedia.org/wiki/Philosophical_zombie

[1] https://en.wikipedia.org/wiki/Dodo_bird_verdict

[2] https://en.wikipedia.org/wiki/Cartesian_theater

[3] https://en.wikipedia.org/wiki/ADE_classification

[4] https://en.wikipedia.org/wiki/Monstrous_moonshine

[5] https://en.wikipedia.org/wiki/Classification_of_finite_simpl...

[6] https://en.wikipedia.org/wiki/Hard_problem_of_consciousness


There's also zero evidence that humans are p-zombies, and plenty of criticism that p-zombies don't work as a thought experiment.

It's a pretty big leap to throw away consciousness on the back of equal outcomes in psychotherapy. There are partial rejections of the Dodo bird verdict in your link.

The Cartesian theatre doesn't account for the mind's ability to imagine things that never were. As soon as you account for that via some emergent material property we have an opening to inject the properties of consciousness back into the discussion.

It's easy to say that the Big Five's cross-cultural statistical correlations are not good enough to describe people, though to dismiss it entirely off your grounding is not really going to work?

Repetition of natural structures is common. Many idealised aesthetic styles rely on that, like the Fibonacci spiral. Why would a fixed and repetitive uncertainty be any less terrifying that any other kind of uncertainty? We don't know what's before the big bang or what colour people really see in their mind when we say red.


I don't really think that you're cogent here; it seems like you just wanted to say things which stand in opposition.

While it's true that there's no empirical evidence within humans for the question of p-zombies, Occam's Razor neatly shows that the world without souls is the simpler one; both worlds look just like ours, but one of them requires all of these additional unfalsifiable unverifiable claims about souls and consciousness and inner experiences and etc.

The Dodo Bird's strength comes from the multitude of different models of therapy which have existed over the decades. We know, from history, that the memes of psychotherapists leaked out into popular culture and slowly altered how we talk about thinking. Nonetheless people are more neurotic (more diagnosed with mental disorders) than ever before! So the memes of psychotherapy do not on their own decrease mental disorders. People will look back on our current decade and think how silly we were to focus on "medications", "hormones", "chemical imbalances", "neurotransmitters", etc. just like we now think it's silly to focus on "repressed memories", "dream interpretation", "hysteria", "oedipal urges", etc.

Humans cannot imagine anything completely novel. Every thought which a human ever has is unoriginal and hopelessly tied to the human experience. This is known as the anthropic bias and has been known about for millennia. If you believe in souls, you have an uphill battle in terms of evidence, including here.

Please stop believing in souls. There's no evidence for it and it turns your arguments to mush.


The Big Five is a much less impressive accomplishment than you’d think for how much people talk about it.

https://carcinisation.com/2020/07/04/the-ongoing-accomplishm...

> The interesting thing about the Five Factor Model is what it gets away with, in terms of being considered a theory, even though it is not causal, and makes no predictions. What counts as a “replication” of the Five Factor Model, as in Soto (2019), is the following: a correlation is found between one or more factors of the Five Factor Model and some other construct, and that correlation is found again in another sample, regardless of the size of the correlation. In almost all cases, and in 100% of Soto (2019)’s measures, the construct compared to a Big Five factor is derived from an online survey instrument.

> What counts as a “consequential life outcome” is also fascinating. In most cases, the life outcome constructs are vague abstractions measured with survey instruments, much like the Big Five themselves. For instance, the life outcome “Inspiration” is measured with the Inspiration Scale, which asks the subject in four ways how often and how deeply inspired they are. Amazingly, this scale correlates a little bit with Extraversion and with Open-mindedness. Do these personality traits “predict” the life outcome of inspiration? Is “Inspiration” as instrumentalized here meaningfully different from the Big Five constructs, such that this correlation is meaningful?


The people who use Hanlon's razor to explain away malice are both incompetent and malicious. Only someone who is an idiot would ever think to use 'I'm very stupid' as an excuse or explanation why they did something very damaging. If you are smart enough to realize you are incompetent after the fact you were smart enough to realize it before the fact, and that means you were malicious in not recusing yourself.


The way this was stated in one discrete thought leads me to a problem with human nature i dwell on: how much of what our society and culture is, and what authority is, is just a effort to disguise our frailty and fallibility? It is tremendously hard to be reliable and competitive across multiple disciplines and for the majority of tasks involved in basic human life. There is too big a trade off between available time and location and doing any task well. We are primarily hunting-gathering the easiest ways to meet our needs. How can you blame people for not recusing themselves from participating or misrepresenting themselves as competent when our culture values that and expresses it so dramatically at the highest levels of public performance,from IPOs to high office and everywhere else. Storytelling in the tradition of the heroic myth is mostly about becoming qualified to assume a social role, as an upward stuggle.

It seems built into human character to bite off far more than we can chew, as in free real estate, and then leverage the social value of holding something others are willing to compete for. I think it amounts to a social survival instinct, and i lament how there's very little chance of discouraging people from doing it because of the potential payoff. If anything i think it's a failure of institutions for being built to exploit that competition rather than guard against its excesses.


Imho we also tend to underestimate the impact of cognitive biases on our own views and behaviors. We are often largely unaware of this. In this case, I find that Hanlon's razor is to simple with the black and white distinction of incompetence and malice. Biases often fall in neither category.

People who view themself as rational / technical might be even more prone to not realizing how much they are affected by this? If your self-image is that you are very rational person (more rational than others), you might be especially prone to denying and therefore not being aware of biases.


I guess I fall under the field of "Progress Studies" though I think I'm much less concerned with the replication crisis than most.

Most new social science research is wrong. But the research that survives over time will have a higher likelihood of being true. This is because a) it is more likely to have been replicated, b) its more likely to have been incorporated into prevailing theory, or even better, have survived a shift in theory, and c) is more likely to have informed practical applications or policy, with noticeable effect.

Physics and other hard sciences have a quick turnaround from publication to "established knowledge". But good social science is Lindy. So skip all the Malcolm Gladwell books and fancy psych findings, and prioritize findings that are still in use after 10 or 20 years.


> This is because a) it is more likely to have been replicated, b) its more likely to have been incorporated into prevailing theory, or even better, have survived a shift in theory

Not if this article is to be believed! He claims that studies that could not be replicated are about as likely to be cited as studies which are. That implies the problem may instead get worse and worse, the structure more and more shaky as time goes on.


Citation is not an endorsement—plenty of things are disagreed with in order to disagree with something, reference history in a field, or contextualize a result against past findings.

Here, the author seems to only look at recent papers, and so we don't really get to see how the citation patterns have evolved over 10, 20, or 30 years. But even then, established ideas tend to not be cited at all— the concept of "knowledge spillovers", for example, is common in Economics and other fields, yet the original reference is rarely used. Other times, more established claims will be encoded in a book or some work of theory—and people will cite the theory rather than the paper that made the original claim.


It's common to see this topic: what's "wrong" with social science. But there are always some things wrong with every science. If nothing was wrong, there wouldn't be any science left to do.

Social science asks more of us than any other science. Physics demands that we respect electricity and don't increase the infrared opacity of the atmosphere. Chemistry requires that we not emit sulfur and nitrogen compounds into the air. But social sciences will not rarely call for the restructuring of the whole society.

This is the "problem" with social science, or more properly, with the relationship between the social sciences and the society at large. When we call for "scientific" politics, it is a relatively small ask from the natural sciences, but it is a revolution -- even the social scientists themselves use this word -- when the social sciences are included in the list (Economics is no different). Psychology, as usual, falls somewhere in between.

So the relationship between the social scientists and the politicians may never be as cordial as the relationship between the natural sciences and the politicians. The "physics envy", where social scientists lament that they do not receive the kind of deference that natural scientists do, will have to be tempered by the understanding that the cost of such deference differs widely.

(All of this is ignoring that physics had a 200-year head start)


Social scientists turn the microscope on themselves also. When the microscope turns elsewhere you see similar patterns to differing extents (cf. recent articles on reanalysis of fMRI data, pharmacology replication rates, Theranos or hydroxychloroquine).

Meta-science has always been the gift of social science. This will all eventually funnel down elsewhere, just like meta-analysis.

But you're right, in that social science hits very close to home, more so than other sciences. Imagine that it suddenly worked very very well, and someone in the field of neuropsychology could manipulate behavior just like you might a lightbulb. Isn't that what critics are really asking for?


>Physics demands that we respect electricity and don't increase the infrared opacity of the atmosphere.

Physics does no such thing. It tells us that increasing the heat retained in the atmosphere increases the planets surface temperature. It is a descriptive science. Not a prescriptive one. Wanting to have industrial civilization possible in the next century is why you don't increase the infrared opacity of the atmosphere. But that is a value judgment far outside the scope of physics, and one social sciences claim is theirs by right of ... something.

The metaphors people use to think about the natural world are terrible, or as Carl Sagan put it Demon-Haunted.

The reason why physics, and other hard sciences, are so useful and respected is that you can switch dependent and independent variables around with a lot of success.

If I have the ideal gas law:

PV = nRT

Then I can rearrange it and be fairly confident it still works.

P = nRT/V

If you are an engineer this is a godsend. You want to set a hard value for P but can only directly control V or T? Try the second equation! You have a chance at succeeding without having to spend decades building machines that blow up and kill everyone around them!

Politicians see that and are jealous. Surely if those lame eggheads can get things to work like that we can too. So the social sciences give you equations as well. After a bunch of statistics we see that:

time spent in school = a*wealth - c

We can't control wealth, but we can control how long people spend in school:

wealth = (time spend in school + c)/a

So if we force everyone to stay in school until they are 50 everyone will have 20 million dollars in their bank accounts.

And to anyone who asks how this works, politicians say: Why are you against science and hate poor people?


This is why knowledge of causal inference is essential.

Causality is not established via tweaking a correlation or regression analysis, and we social scientists should know that.


Casual inference is the bottom most rug of what gives hard sciences its power. It is that we understand the objects we are manipulating at a much deeper level so we don't sound like idiots.

Suppose that we take:

g = ma

A perfectly valid way to find experimental values for gravity at a location. But that doesn't mean that if we push an object really hard we increase the gravity in that location, or decrease it if we pull on the object. Just because symbol manipulation gives as an answer doesn't mean that the answer makes sense, you need to keep track of all the implicit state of the universe.


> so we don't sound like idiots.

I'm fine sounding like an idiot so long as my slope is increasing :)

Right, Judea Pearl covers this in [i]The Book of Why[/i].


> Stupidity: they can't tell which papers will replicate even though it's quite easy.

I am not familiar with this work. What exactly makes a paper predictably replicatible?


There is a footnote for that claim - https://journals.sagepub.com/doi/full/10.1177/25152459209196... - but basically "things that are hard to believe" and "things that barely passed statistical analysis" (high p-values).


Of malice vs stupidity, I'm pretty certain it's stupidity. Or more precisely, self-delusion.

The story of Millikan's oil drop experiment replications and also James Randi's (and CSICOP's) battle with pseudo-scientists convince me of this.


There's probably a mix of both. At some point, most people probably realize that there's something fundamentally wrong - but by then, they're a few decades in and too much of their career and personal life depends on it being true, so openly changing course is very daunting. When your identity and career depends on something wrong being considered true, you have no incentive to point out problems, and every incentive to mislead.


Shameless plug with the ten relevant problems I scooped from a very recent literature review: interculturalism, introspection, truth, authenticity, human enhancement, critical thinking, technocracy, privilege, ethics, higher education. Link to free intro: https://www.tenproblems.com/2020/08/01/ten-problems-for-soci...


That's a lot to say before even the replication results are actually out.


It's not science.


The epistemal value of epidemiological studies is very low. Nothing can be done to fix it.


Most people just don't have a clue about what they are doing and have no passion for their research whatsoever. When you have money as the main driver for science, this kind of stuff is exactly what you should expect. There's homeless and crackheads etc at a 3km radius from the majority of social sciences schools around the world. It's a complete a failure and scam. Science development is analog to social development and nothing is going to change by appealing to scientists to don't cite weak research lmao


Lots of social science in crap, for sure, no arguing about it, dunno how to make it better then not to do it. Though, some are interesting if you have the patience for it, ex. Linguistics, Psychology and Economics, even things like critical theory are sort of useful, think of it like the abstract algebra of social science. Just people pulling apart concept to see if they can be put back together in another way to create something new. I now a lots of CS researchers and they do shit work and cite each others excrement, honestly CS is the sociology of STEM. Their I said it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: