I am still hoping that some day economy will find it's place where it belongs - next to astrology, numerology and alchemy.
This is with tongue in cheek but as a mathematician (financial math) I find economic equations and theorems laughable and it's scientific method just a thin veil over ideology.
I see economics in the same place as psychology and sociology. There is real science, but the results are so impactful to politics that the science cannot withstand the political pressure generated by those who want to treat these as political tools instead of scientific fields of research.
Part of the problem is that whenever a clear-cut rule of economics is found, investors take advantage of it, changing its behavior. It's almost a quantum-mechanics-like problem in that the observer (scientists) changes the behavior of thing being observed (investors and consumers).
This is a problem math majors have with pretty much every other science out there. You can rank them if you like, and there are parts that are better and worse (in economics you have game theory, which I'm sure you're not going to find too ideological, and you have experimental economics ... which is little better than social sciences).
True, but consider utilitarianism. Have you ever tried modeling the economy using a more reasonable set of assumptions ?
Not so easy.
But yeah I get it. Obviously the basic economic assumptions ignore:
1) government in general, but especially tax law. It's WAY too complex to model, unfortunately (and I doubt that's a coincidence), nor is it even constant.
2) it ignores that the vast majority of markets are supply- or demand- constricted. In reality demand graphs flatten off at some point. Let's take bread. There is only demand for so much bread, and lowering the prices past a point no longer increases demand at all. So you can see it exists.
Supply bound is similar. We are just not going to produce more land than there is. That market is absolutely supply bound.
The weird thing economics glosses over is that nearly every market is either supply constricted or demand constricted, and the common equations just don't work there.
3) information. Economics assumes perfect information, when that obviously doesn't exist. Information spreads though, so over very long time frames perhaps there's a case to be made that perfect information does exist as long as the information is old enough.
The thing is, whilst I might agree that the first one is at least theoretically solvable, it's just too much work. The second and third are impossible to model. And I bet you could come up with 10 more.
Your general point that economics models greatly simplify the real economy is a good one. However, the field has made significant advances to introduce real-world constraints into the models.
Even if there are a few game theoretic games in advanced papers that have limited information it would be a rather large stretch to say that economics incorporates limited information at this point in time. Maybe it's coming, I don't know ... It's not going to be soon.
A similar problem exists for 2. Yes I can find a paper or two, or probably even more that talk about it, but they don't exactly have a wide following because it just makes things too hard.
Incorporating both ? Can't even find a paper on it.
This Wikipedia article does not state that around 40 % of the experiments in psychology and medicine are not reproducible.
Out of 45 medical studies: "16% were contradicted by subsequent studies, 16% had found stronger effects than did subsequent studies [...]". This not conclusive given the very small sample, but even if we set aside this, the non-replicated would be 16 % (no partial replication) or 32 % (no exact replication). Another factor of incertitude is that some of the 45 studies were not challenged, and we don't know why.
Out of 100 studies in psychology, "studies in the field of cognitive psychology had a higher replication rate (50%) than studies in the field of social psychology (25%)". Again, each of these rates in very uncertain, considering the sample size is around 50.
It seems that replication rate is around 70 % with medicine studies, 50 % in cognitive psychology, and under 30 % for social psychology. Economy, with a replication rate of 60 %, would do much better than psychology overall.
I'm not sure it's a crisis. Nobody who understands research expects every study to replicate, so what is a reasonable standard? 40% having weaker effects isn't a surprise to me, a complete amateur, for whatever that's worth. The point of research is to push the into novel territories of knowledge; like new tech at startups, that's necessarily going to include many failures.
If it's a critique of the scientific method(s), I'm all for very serious work on improving it. But I think the question is, how? Does someone know something that works better that they are holding back from us? Has any method in the history of humanity produced better results (or made more sense)? Should all science stop until it's perfect? What should we do going forward?
EDIT: I meant to add: Maybe social sciences are just really hard. They are a bit less deterministic than many aspects of physics, for example, where gravity is so predictable that a small perturbation thousands of light years away can tell us what's happening there.
Well, for one, because the standard for publication in many fields is “there’s a less-than-5% chance that we observed these effects because of coincidence”.
Combine that with people not publishing negative-results studies, and all it takes is someone doing 20 studies (or worse, 20 analyses on the same data set) in order to find that 1-in-20 chance occurring...by chance.
Of course, this has little to do with research itself and much more to do with the standards researchers hold each other to.
While you're right about the 5%-or-less chance, it's still a 5% chance we're aiming for. If we end up with 40% over a large number of papers, something went wrong.
Not publishing the negative results shouldn't affect this number. They're not included in the 5% chance in the first place.
There is a 5% chance of observing the effect, if the effect is not there.
I think the difference is best illustrated by an example:
If you have 20 researchers investigating a hypothesis that turns out to be false 1 of them will report evidence for a false hypothesis, which will likely not replicate. Thus, if for every true hypotheses investigated, 20 false hypotheses are investigated, and the 19 researchers with false hypotheses do not report their result, that means that 1/2 of the reported positive results will not replicate.
What is not being considered here at all is that different sciences are different.
First, you have purely theoretical sciences. "Math" (even though quite a bit of these disciplines are actually not quite in math, but in physics, economics, philosophy or computer science. There's others). Obviously they're not just replicate-able, they would never get published if they weren't. Furthermore replicate-ability is absolute, because it's theoretical.
Then you have positivist sciences. Essentially Physics and Chemistry. These are sciences where you can actually experiment, and therefore you can have bounds (like the 5/6 sigma bound in Physics). Replicate-ability is not absolutely guaranteed like in Maths, but it's going to work.
Then you have statistical sciences, like medicine, climate science, experimental economics where due to practical limitations (need actual people that might die as a result of the study, we don't have many test-planets, ...) the number of examples is extremely limited, and in general, far more limited than the complexity of the system would require according to actual statistics.
So replicate-ability is going to take a further BIG hit.
And then you have the sciences where we are studying the existence of a phenomenon in the first place. Biology, social sciences, psychology, political sciences, language studies, parts of experimental economics. In these sciences something gets studied because it exists. We study Siberian tigers, because they exist. We read Hamlet because it exists. Replication ? What do you even mean ? Replicate-ability is essentially zero, and everything is just based on judgement of famous characters.
Add to that that quite a few of these sciences (social sciences, political sciences, English) are up-front biased. They start from the point that they want to find/push X. This can be a value system, a way of thinking, or even spelling or grammar. Needless to say, these studies can be criticized from a neutral perspective.
But critically, thinking about it, you will realize: it would not make sense for social studies or English studies to be neutral.
The further you move down the line, the less the demands of the field on replication are, and the more problems replicate-ability has.
And yes, by the time you hit social studies, the norm is far past the point where a physicist would be fired for scientific fraud.
That does not make social sciences useless, it just means that people's evaluation of scientific results needs to go deeper than "oh scientist said X" (or even "consensus is Y").
> Replicate-ability is essentially zero, and everything is just based on judgement of famous characters.
That doesn't fit what we see even in the linked article and the older story about psychology replication: All these experiments had methods that could be repeated (that's what the replication project people did), the effects of almost all the experiments were replicated, and for about half the research the effects were as strong as before. That's a long way from methods that are "just ... judgement of famous character" and "essentially zero" replication.
> quite a few of these sciences (social sciences, political sciences, English) are up-front biased. They start from the point that they want to find/push X
What is that based on? Is there someone in those fields that has written about it? Have you done any work in them?
Having some minor, intimate familiarity with these fields, though from an outside perspective, I don't agree. Political science, for example, aims to identify political phenomena and how they function. Every experiment in any science starts with a hypothesis, the theory the experimenter sets out to prove, and generally experimenters have a career based on promoting a certain model or idea. That is true of mathematicians and chemists too.
Also, I wouldn't group English, one of the humanities, with the social sciences.
The entire premise behind so called scientific experimentation is reproducibility. It is what allowed humans to move past superstitions, demonstration of consistent causes and effects.
This is why economics, psychology, etc. are called "soft" sciences. This absolutely is a crisis, because by definition ALL of these experiments are supposed to be replicatable and reproduced before they can scientifically be taken as truths of any certainty, and we're seeing that many modern fields, particularly medical, are operating based on potentially dangerously misleading experimental results.
And, since this is HN, possibly even maliciously misleading.
The crisis, then, is that people take results as truth before they've been replicated.
The more reasonable expectation is not that the results will be successfully replicated, but that the experiment can be replicated. In other words: That there is enough detail that someone else can try to confirm the results.
The problem is we've come to rely on first publication of results far too much. A first publication of a result should be seen as "here's an interesting result; someone please confirm." Not as "here are some new facts for you."
I’m no academic but I’ve read a few papers specifically because I wanted to implement what they talk about. None have been useful for this purpose. I don’t know how you can possibly replicate them if there is insufficient detail on the “apparatus”.
Or maybe, human psychology is too complicated with too many factors influencing results to be easily reproducible. The crisis is then more in the expectation you have and about initial small scale experiments being treated as "truth" and expected to work across cultures, social groups and social situations. Apple always falls down, but how people respond is sensitive to a lot of context.
Some of crisis are frauds or shaddy science (prison experiment). Some of it is the simplistic expectation that since experiments in basic physics are simple, psychology and sociology should be equally simple.
Lastly, these science are call soft, because they use less math and deal with fuzzy issues.
I think the critique and crisis stems more from research that presents optimistic, or potentially embellished results, rather than failed experiments, or experiments with small sample sizes. There seems to be pressure to generate “novel” research that presents profound results rather than admitting an experiment yielded nothing. Admitting nothing came from a study would prevent more people from wasting hours, days and years of time trying to replicate or study something that isn’t true or doesn’t seem to work.
A lot of resources are poured into science and what science recommends (sometimes). Given the stakes, I’d prefer a track record that’s more than slightly better than flipping a coin.
> I’d prefer a track record that’s more than slightly better ...
So would I, but do you know of a better alternative? What if this is the best option we have for now?
> slightly better than flipping a coin
Also, to be clear, that doesn't accurately depict the results of the replication study. Almost every replication produced the same effects as the original experiment; in about half the replications, the effects were weaker. It's not that half the experiments were wrong; they were less conclusive.
Sounds snarky, but perhaps this is true in that Economics cannot be science in the Popper sense since our experiments are usually too noisy to lead to reasonably falsify any hypothesis.
Still, I don’t think we’ve come upon a better method of knowledge acquisition in this realm.
I think there are profound limits to what can be understood from the application of natural scientific methods to the social world, because it's an open and non-linear system in which the units - people, groups - reflect on and learn from their experience. Also, while in the natural world you can easily create operational codings of variables because they follow deductively from proven theories (e.g. temperature), you cannot do that for the social world. There is no single and authoritative concept of 'power', for instance.
All that said, there is no one scientific model. Evolutionary biology has never pretended to by anything like physics. It's an open and non-linear system. It cannot make predictions, and nearly every generalisation it makes is either relatively trivial, or has a great many exceptions. For the most part, it's a historical science.
With the current Publish or Perish zeitgeist in science this number can be seen as high.
Also, even if assuming legitimate significant results, it's hard to control for every possible confounding variable: Maybe part of the effect is due to some sociological parameters of of the test group, maybe it's the heat/cold, maybe it only happens on Tuesdays. Even the same stimulus can change its meaning over time (like the Williams video). Humans make for finicky test subjects.
IMO a single scientific paper shouldn't be considered much more than a small indicator that an effect might exist and it will require further work. I think popular media does Science a disservice by ignoring this and headlining single papers as "fact".
True, but also not that good. All of the heat/cold/only on Tuesdays stuff should have been baked into the original experiment, really. Not that they should be able to take into account everything, but 60% replication does not indicate a very high standard. I mean it's not much better than if you just took and educated guess, with no data, since you could probably do better than 50/50. The whole point of the study is that it is supposed to be better than an educated guess.
Given that in most fields publication bias and p-hacking is standard and basically written into the way science is published (i.e. you only publish your results after you've done your study, so noone knows what you intended to study when you started) it's normal to assume that most things published are simply wrong.
I think he was implying that better than 50% implies its at least not mostly bullshit, or that we could perhaps draw usful inferrences if we are right at least most of the time. But agreed, none are particularly enthusing.
Isn't that how you know something is based in science? commonality of people given the procedures to replicate, they come out with the same results consistently? so then this would mean non-scientific results. I guess they're not scientifically sound studies.
Hypotheses must be continually verified anew by experience. In an experiment they can generally be subjected to a particular method of examination. Various hypotheses are linked together into a system, and everything is deduced that must logically follow from them. Then experiments are performed again and again to verify the hypotheses in question. One tests whether new experience conforms to the expectations required by the hypotheses. Two assumptions are necessary for these methods of verification: the possibility of controlling the conditions of the experiment, and the existence of experimentally discoverable constant relations whose magnitudes admit of numerical determination. If we wish to call a proposition of empirical science true (with whatever degree of certainty or probability an empirically derived proposition can have) when a change of the relevant conditions in all observed cases leads to the results we have been led to expect, then we may say that we possess the means of testing the truth of such propositions.
With regard to historical experience, however, we find ourselves in an entirely different situation. Here we lack the possibility not only of performing a controlled experiment in order to observe the individual determinants of a change, but also of discovering numerical constants. We can observe and experience historical change only as the result of the combined action of a countless number of individual causes that we are unable to distinguish according to their magnitudes. We never find fixed relationships that are open to numerical calculation. The long cherished assumption that a proportional relationship, which could be expressed in an equation, exists between prices and the quantity of money has proved fallacious; and as a result the doctrine that knowledge of human action can be formulated in quantitative terms has lost its only support.
Looking backward to past events involving conscious, acting people is history, not economics. Because human beings are not falling stones, the methods of physics are a poor fit.
The study's abstract ( http://science.sciencemag.org/content/351/6280/1433 ) says We found a significant effect in the same direction as in the original study for 11 replications (61%); on average, the replicated effect size is 66% of the original.
On average, the replicated effect size is only 66% of the original effect size. What explains this?
> We found a significant effect in the same direction as in the original study for 11 replications (61%); on average, the replicated effect size is 66% of the original.
EDIT: Similar results to the well-known psychology replication study of a few years ago:
Strictly on the basis of significance — a statistical measure of how likely it is that a result did not occur by chance — 35 of the studies held up, and 62 did not. (Three were excluded because their significance was not clear.) The overall “effect size,” a measure of the strength of a finding, dropped by about half across all of the studies. Yet very few of the redone studies contradicted the original ones; their results were simply weaker.
Assume you measure an effect and get two values, the estimated effect size in standard deviations and a p-value representing the probability that, if the effect size were zero in reality, you would have estimated an effect size at least as large as the one you did get.
In phase two, assume you publish a paper if the p-value is less than 0.05, regardless of the effect size. If p >= 0.05, you don't publish anything.
We are now done with the assumptions.
p-values get smaller as the effect size increases, and they get smaller as the sample size increases. There is a concept called statistical power which measures the smallest effect size it is possible for a study to detect at a given p-value, given the (fixed) sample size that that study has to work with. Larger sample sizes mean more statistical power and mean that it's possible to estimate a smaller effect size for the same p-value.
Adding this all up, we can see that:
- if the true effect size is small;
- AND if the sample size is "small", defined relative to the true effect size;
- AND if we filter studies by whether they meet a p-value threshold ("reach statistical significance");
- THEN the only studies that can be published will find effect sizes that are too large. They do not have the power to detect the true effect size; the only possible results are "no effect" and "unrealistically large effect".
The quick summary is that a reduced effect size on replication is a strong indicator that the original finding was spurious. As replications continue they will trend toward the higher of the true effect size or the floor set by the available statistical power.
Actually any strength of effect is possible. From tiny and misreported as big to huge and underreported.
This is what it means the studies do not have the statistical power to measure effect size.
Reproducing the results and taking an average over all results with multiple comparison and pooling correction (metastudy) could then give a valid estimate of the effect size.
P-value only checks perhaps for non-null result, if not circumvented.
It would be good to produce a funnel graph for effect sizes reported in those underpowered studies. Perhaps ones showing small (but non-null) effect sizes do not get published.
That is not actually an explanation; you can only apply regression to the mean if you know what the mean is. The explanation I give correctly predicts that replicated experiments will see their effect sizes decline. Saying "regression to the mean" does not.
(It is quite possible to interpret this as regression driven by the p-value threshold, but if you do that, you're relying on the explanation I gave.)
That's not true. If I take the top 5 students based on performance in a test, and put them in a group. They will likely do worse the second time around. No need to know the mean.
This is with tongue in cheek but as a mathematician (financial math) I find economic equations and theorems laughable and it's scientific method just a thin veil over ideology.