> The problem in the research community is that scientists have little incentive to duplicate earlier work just to check if it’s correct. Many journals have explicit policies forbidding the publication of work that attempts to replicate previous experiments.
This drive for unending novelty in the sciences is a shame on many levels. The good and useful work of duplicating and verifying results goes undone, and scientists are driven ever more forcefully towards designing studies only on the basis of what will attract grant agencies.
Duplicating major results carefully would be useful to the scientific record. Trying things that will probably fail, and then publishing negative results, would be useful too. But, for most researchers, doing this useful work appears to be career suicide.
I've never encountered a field in science where fundamental results aren't sufficiently replicated by other labs.
That nobody spent $500-$2000 to estimate leg hair drag probably speaks more about people not wanting to be caught measuring leg hair drag than about the entire field's tendency to replicate findings. It's also really weird for people to consider $500/hour tunnel time to be expensive for a sport with as much money as cycling.
Really, I thought it was weird, and probably inappropriate, to mix in so much of an outsider's amateur and unsopported opinion about science into an otherwise interesting story about leg hair drag.
If there's a serious story somewhere about results not getting replicated when they should in many fields of science, I'd like to hear about it. There's often stuff on the fringe, but any serious result will be replicated soon. Take, for example, the acid-bath stem cell retraction from not long ago. Where else are important results not replicated? The Higgs? It's a baffling perspective.
So far, my favorite explanation is the direct, obvious one, which is that we're sometimes -- perhaps more often than we'd expect -- falling victim to statistics. Given the very large number of experiments and studies being conducted everywhere, and the selection for positive results, statistical anomalies are being accidentally selected for and then not being extensively enough re-evaluated much later.
The article discusses that as a possible explanation, but the journalist does his job, igniting reader interest, by suggesting the effect somehow "defies the laws of statistics". I favor the simpler explanation, which is that our meat brains simply underestimate the vast number of ways in which statistics would like to bugger us.
Some of the things that both you and the parent commenter mention are discussed in the article. For example:
> Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found.
and
> Richard Palmer, a biologist at the University of Alberta, who has studied the problems surrounding fluctuating asymmetry, suspects that an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place.
(aside: I'm not one of HN's vocal "science is junk" members. I love science, or, at least, I like to admire its butt (http://explosm.net/comics/3557/). I don't believe there's something fundamentally wrong with the scientific method. But, I do think more emphasis needs to be placed on repeatability of experimental results over longer periods of time, and I hope that this will become more of a trend as the world economy continues to grow.)
This is very fascinating stuff! And it's also familiar from GWAS where there's a fantastic number of statistical tests.
However, it's all examples of things being replicated in the sense of (1) being tested again, and being "not replicated" in the sense of (2) being tested again and being found to not repeat as initial reports.
I think the article's complaint was about (1), though I find this (2) far more interesting and will pursue these threads, thanks!
> Given the very large number of experiments and studies being conducted everywhere, and the selection for positive results, statistical anomalies are being accidentally selected for and then not being extensively enough re-evaluated much later.
I once heard a natural language processing professor observe that a typical NLP conference might have twenty presenters, each giving results to a 95% confidence. "So even if everyone does all their work flawlessly, every conference will have on average one set of bad results."
Its a very common phenomenon in science that a finding is published in a glamour journal e.g. Nature/Science and then if not immediately controversial e.g. acid bath stem cells, it becomes very difficult to publish something contradictory.
Journal referees will set the bar for such findings very very high...and you are quite likely to be refereed by the original author or their friends.
I know this to my cost having gone to extraordinary lengths to publish something slightly contradictory in a far lower impact journal than the work deserved.
In general, if your results contradict existing science, your work for publication will be passed immediately to those scientists for "peer review".
Those same scientists are motivated to protect their own existing and future work. The academic standards you are subjected to will be far higher than work supporting the status quo.
Unfortunately, they are usually also the best equipped to find flaws in the new work. True, this is not ideal, but it's also not ideal to send a paper to less expert reviewers.
All of which goes to say that pre-print publication is the way to gooooo.
I agree with your sentiment but there is one flaw, cycling is a dirt poor sport that has traditionally been very blue collar. There are racers in the Tour that make less than $30k per year, live at home with their parents in the off season and wrench as mechanics. There are only a small handful of them that make "pro athlete" money.
That's a lot of money to most of them, especially when they can be testing fit and position which have well known advantages to both aero and power output AND they already shave.
There are two problems, only one of which is addressed by duplication.
1. The original study may be plain wrong. Author 1 claims to describe phenomena X, author 2 can't find any evidence for X using the exact same methodology.
2. The original study is wrong because the methodology is bad, interpretation is incorrect, reagents are not specific, techniques are not clean, etc.
There's a strong argument to be made that it is more important to do orthogonal work which indirectly verifies the original results, as opposed to simply trying to reproduce the original work. This solves both #1 and #2.
The paper cited in the article is a great example. Gluing hair onto a plastic leg and putting it in a miniature wind tunnel is probably an exercise in bad methodology—but they might have still gotten the right result for their experiment. And that's useful because now we know that there may be something different about either plastic legs vs. real legs or miniature wind tunnels vs. big wind tunnels. This is a trite example, but it should be clear how this might be more important in other studies.
"Ioannidis's 2005 paper "Why Most Published Research Findings Are False"[5] has been the most downloaded technical paper from the journal PLoS Medicine.[8] A profile of his work in this area appears in the November 2010 issue of The Atlantic.[9] The Atlantic article notes Ioannidis analyzed "49 of the most highly regarded research findings in medicine over the previous 13 years". In the paper Ioannidis compared the 45 studies that claimed to have uncovered effective interventions with data from subsequent studies with larger sample sizes: 7 (16%) of the studies were contradicted, 7 (16%) the effects were smaller than in the initial study and 31 (68%) of the studies remained either unchallenged or the findings could not be replicated.[5]
Statisticians Goodman and Greenland agreed that "many medical research findings are less definitive than readers suspect" but disputed his headline claims as unsupportable by the methods used.[10][11] Ioannidis responded to this critique[12] and other researchers have generally supported the general thrust of his findings.[13][14] Ioannidis' work is focused on improving research design standards."
I agree. You can't even publish negative results, although the data would be incredibly useful to the scientific community. The whole point of science is consistent replication to aid prediction.
Some rich person needs to start a program incentivizing this kind of work.
It seems more likely that some rich person already have duplicated some "key" discoveries and is happily reaping the benefits of knowing what does work, without making much noise of it.
I would like to think that there is a self correcting element to this. If someone produces genuinely high impact novelty science, others will want to build on that - and will end up replicating it anyway. If it isn't possible to replicate or isn't important, then it will be forgotten about.
I agree about publishing negative results. The experiments that don't work, but might have had a high impact will likely be repeated, as no one knows it hasn't been tried.
If only science publishers would agree. But they're in a money-making business and novel results, "breakthrough" results, even if unverified and soon to be disproven, sell journals much more efficiently than null results or falsifications of earlier work.
It seems like a drive for unending novelty would cause people to want to gather evidence that contradicts well-established studies. That's by definition novel, and would probably result in good press.
The Specialized video[1] is worth watching - it's pretty funny how astonished they were over how much power it saved.
If you are interested in this, then the book Faster: The Obsession, Science and Luck Behind the World's Fastest Cyclists by Michael Hutchinson is really good (and very well written).
He notes that human intuition about aerodynamics just isn't very good (you have to test) and that the current state of the art is no longer wind tunnel testing but computational fluid dynamics (CFD) followed by testing with a power meter on the road.
CFD lets designers iterate much quicker on designs and try things outside the norm (avoiding the local maxima problem). Power meters plus riding is better than wind tunnel testing because things like variable cross winds are very hard to test in wind tunnels.
The annoying thing with CFD with analyzing viscous phenomena is the question of reproducibility. Flow structures in this regime (low Re, that is low speed and small length scales) are notoriously sensitive to surface defects, e.g. the hairs being analyzed. A CFD sim will show nearly the same results for a given problem, and current CFD lacks the fidelity to really model tiny turbulent features at these scales as well as the necessary fine surface definition. Experimental results may show drastically different results depending on the minute upstream flow variations, and surface quality: in this case, legs with hair/sweat/dust/oils could all trip the boundary layer to varying degrees. Stories abound of clumsy wind tunnel technicians who left skin grease on a wing model for a low-speed test and neglected to polish it with a rag...
I had thought that it was already established by swimmers that hair was a detriment to speed. Why would it be surprising to cyclist? I do ride, but not competitively so I will keep the razor to my face only, thank you.
Exactly this, its already well established that skin hair causes drag. Firstly it acts as insulation. Secondly, it works remarkebly well to evaporate sweat and being originally savanna dwelling primates that endurance hunted our pray it would only make sense that drag not only exchanged for, but contributes directly to our ability cool our body temperature.
> The tests showed that shaving his legs reduced Thomas’s drag by about 7 per cent, allowing him to exert 15 watts less power and still go at the same speed. In theory, that translates to a 79-second advantage over a 40-kilometre time trial that takes about one hour.
Shaving reduces the drag, but the originally study measured the reduction around 0.6 percent while this one found a 7 percent reduction. So, about 10 times bigger than anticipated.
When I read this "Even more confounding was that the results contradicted earlier findings" it totally colored my reading of the article. (Emphasis mine.)
One thing they don't tell you in the Specialized video is that they're not testing someone who is pedaling. There's some much more turbulence from just moving your legs, that shaved or not shaved makes almost 0 difference. No one is coasting for 40 km. Even the time savings they do cite in perfect, almost platonic conditions is seconds over hours of riding.
Bicycling manufacturers always skew the numbers for how much savings (time, watts, whatever) their new top of the line frames will give you. It's all smoke and mirrors.
One thing they don't tell you in the Specialized video is that they're not testing someone who is pedaling.
This isn't true. Watch the video - he's pedaling when they test.
There's some much more turbulence from just moving your legs, that shaved or not shaved makes almost 0 difference.
That's almost exactly wrong. It's true that pedalling creates turbulence, but the speed your legs move and the turbulence created by that movement is almost entirely disguised by other effects.
For example, the fact a bike is asymmetrical (because of the drivetrain) is a much bigger factor than turbulence because of moving legs.
It is important that testing includes pedaling, because there can be particular positions that work better on some bikes (or for some people) than others.
But in this case the testing used sensible protocols and the difference is a real thing.
Bicycling manufacturers always skew the numbers for how much savings (time, watts, whatever) their new top of the line frames will give you. It's all smoke and mirrors.
That may be the case, but all Specialized is selling here is their aerodynamic expertise (at least until you can buy a Specialized razor blade).
IMHO, you're absolutely nuts to believe a bike company, when it comes to these sorts of things - that's what an independent research company is for - or at remembering a basic part of experiments: it needs to be reproducable (and then verified) - this "shaved legs" hasn't seen that.
Specialized has something to sell - their business practices, especially with their trademark protection, has been ridiculous, which has been a turn off to people who think clearly.
The market for high-end bikes are people who can afford the bikes, which is usually not the people who would see any difference in buying the $2k bike, rather than the $15k bike. Specialized really doesn't want you to understand this, so they make wildly exaggerated claims about their equipment. People who benefit from fractions of a percent better performance usually are people who are sponsored by the bike company (and thus, get the bikes for free). A difference of a few seconds, over a 50km course makes sense for the elite time trialist, in the Pro tour, it makes 0 difference to your weekend warrior, doing an hour long crit.
Who the heck am I? I'm someone that rides bikes, often for very long distances, to break records (which I do). Don't believe the hype, unless you don't especially have an interesting in holding onto your money.
I'll concede only that my views are unpopular, but that certainly doesn't mean they're incorrect.
I agree you need to be very careful when you evaluate claims by manufactures. Independent testing is very hard to find, too (outside the German Tour magazine which does a decent job).
But in this case it is unlikely their findings are particularly biased. It's worth noting that the point of the original article was that the Specialized testing actually verified an earlier finding that hadn't been tested properly since.
> One thing they don't tell you in the Specialized video is that they're not testing someone who is pedaling.
Yes they are. Chris Yu (running Specialized's wind tunnel; the only bicycle company with their own) is a CalTech/Stanford trained aerodynamicist and competitive cyclist. Give him some credit.
ah, that makes much more sense. If they had been pedaling I was wondering how they got rid of bias since it would be pretty hard to blind the rider as to whether their legs were shaved!
Another cycling company, Trek bicycles, has a pedaling mannequin that they use for some of their wind tunnel tests([1] page 14). That's an older report (2010), but they have another from 2013 if you're interested in that stuff[2].
In this case they measure the wind resistance of the rider directly in the wind tunnel. The fact the rider knows their legs are shaved doesn't seem to be a factor in that measurement.
Exactly. I'm an active racer and have witnessed people get yelled at in the medic tent for not shaving their legs. Treating road rash is awful with leg air.
What's interesting here is that the practice of cyclists doing so goes back a century. By contrast, the fashion of women shaving their legs dates back primarily to around World War II, and the influence of pinup model Betty Grable. See: http://www.straightdope.com/columns/read/625/who-decided-wom...
The gremlin that is low Re, viscous drag. This is how cacti work; the needles slow surrounding air to reduce moisture loss through convective evaporation.
Previous test: "leg-shaving reduced drag by 0.6 per cent"
New test: "The tests showed that shaving his legs reduced Thomas’s drag by about 7 per cent"
Yet the blurb says: ''Even more confounding was that the results contradicted earlier finding''. What's contradictory about those results? Be it 0.6% or 7%, shaving your legs clearly reduces drag.
I believe that was a reference to the "insignificant difference" claims. 0.6% wasn't seen as significant enough difference so it became a "fashion choice" rather than an efficiency one.
I find it odd that they only report the results in percent in the first place. Maybe the baseline drag of a rider back in the 80's was much higher than today, making 0.6% old the same amount of force as 7% new.
a family friend, the aerodynamicist Chester Kyle, was mentioned in this article. He was a lecturer at Long Beach State (I think) and did practical wind-tunnel testing with my father (an engineer and manufacturer) on bicycle shapes for many years.
He pushed the edges of conventional design, and participated in the human powered vehicle races quite a bit.
That's fantastic; I never thought I would hear about him on here.
It bugs me that the test is run with the vertical wheel supports in place. It's conceivable that they alter airflow in a way that affects how it goes over the cyclists legs.
Interesting, I wonder how dramatic the effect is with swimming, there are swim flumes like this https://www.youtube.com/watch?v=feUhyCklHL0. It's usually used to analyze your stroke, but maybe if you put a bar in there to hold on to and test unshaven vs. shaven.
As a former swimmer, I can say that shaving is not only done for drag. Once shaved and in the water, even while casually swimming in warmup, I would feel as if I'm actually floating better. We'd call the "feel for the water" or something similar. It's hard to describe unless you've experienced it, but I suppose it has something to do with a newly exposed layer of skin in contact with the water.
And, of course, there are also the benefits of less drag and psychological placebo effect.
As a former national-level swimmer, I always shaved my legs/arms and tapered my training for a big meet twice a year.
I found that combination to lower my times by about 1 second per 50m, or about 5%. I don't know how much of that is attributable to the shaving, but I will say it helped me "feel" way faster.
Summary: a team at Amgen discovers 47 of 53 "landmark" studies published in high-quality journals could not be reproduced. A team at Bayer did an internal review of programs they had initiated based on journal studies and found that less than a quarter of those findings could be reproduced.
Three very damning quotes:
"Some authors [of the journal articles] required the Amgen scientists sign a confidentiality agreement barring them from disclosing data at odds with the original findings."
"'We went through the paper line by line, figure by figure,' said Begley. 'I explained that we re-did their experiment 50 times and never got their result. He said they'd done it six times and got this result once, but put it in the paper because it made the best story. It's very disillusioning.'"
"The problem goes beyond cancer. On Tuesday, a committee of the National Academy of Sciences heard testimony that the number of scientific papers that had to be retracted increased more than tenfold over the last decade; the number of journal articles published rose only 44 percent."
Academics are pressured to produce publications, not to produce science, and their studies are not always rigorous (not blinded to experimenters, etc.). People with high integrity and ability do produce good science that gets published, but unfortunately that appears to be the minority, even in highly prestigious journals.
So I'm curious: how would you fix it?
One thing the article mentions that might improve things is every journal dedicating one complete issue a year to reproducing the most influential studies of the year. Another could be getting a consortium of pharmas (who try to reproduce studies all the time, because if you're going to successfully make drugs you need the thing to work) to publish their internal data for the benefit of all. Does something like that exist?
I think it's healthy that this discussion expresses amazement, skepticism, methodological questions, and observations about competition for public media attention, and frequent instances of journals cynically manipulating publication for maximal sensationalism under a scientific patina.
Strictly my own imagination, but I'm anticipating the introduction down the road of the world's most advanced hair removal system that gives cyclists a big aerodynamic advantage proven by scientific tests. Yes buy the XYZ system and fly to victory! Wow, I really am jaded, but too many times we've seen it happen.
So much bad science obscures the good work because the latter doesn't make the news, it's far too boring to attract attention. It's the small incremental, tedious, repetitious, careful, persistent work that provides the real advances. Edison I think said, "we find 20,000 ways it doesn't work" and learn something every time we try.
I propose a simple set of remedies. Journals should give highest priority to publishing careful replications of prior studies, whether it's yes/no/maybe. Negative studies whether original or replications are give as much priority and those confirmatory. Novel associations are of course welcome if meeting standards of adequate power to discern something beyond quirky results.
Ultimately, the impact of hyperbolic claims about the meaning of research causes the greatest distortions of scientific process and progress. Journal editors can solve the great bulk of misinformation their journals promulgate. It's remarkably simple. All they have to do is issue an edict, all that authors can write about are the history, background, what they did and factual results of their work. These "rules" apply to observational and experimental work all the same. IOW it would be forbidden to draw conclusions about the what the outcomes mean, what is proven, or what "causes" what.
Sure giving context about past results is necessary and usefully informative, but conclusions are for we the readers to determine. If this was the case, suddenly the strident "answers" and premature incorporation of findings into practice will be sharply reduced. Hysteria will subside. We can then actually use our scientific talent to accomplish honest goals, and solve the real and daunting problems we actually have.
Perhaps this seems a radical view and maybe it is. It's not whether I'm right or anybody will make any of these substantial changes. It is about getting back to careful thoughtful scientific inquiry and reducing misdirected human energies.
I speculate here that this could be a second reason why we lost our big ape hair. The preferred method of hunting was running after prey until it fell/slowed down - due to the fact that, unless other animals, man can run and hold the mouth open, thus eliminating the excess heat + higher oxygen flow. Having less hair than the prey made us run otherwise faster while on lower enery requirements.
At the speed we run, it would make no difference I would imagine. The benefits of shaving your legs only really come into effect for cyclists at the pro level ... those of us who can manage a steady 25km/h see far less benefit.
I always thought it was an urban myth that it made you go faster, and that the real reason was that road rash was easier to treat.
I am not a competitive cyclist but remove the hair on my legs with a depilator in the Spring because my leg pelt retains too much heat in the Summer. There is a noticeable improvement in cooling after it is done.
While not the most pleasant process, it avoids stubble and the delayed regrowth means the new hair is sufficiently long to begin providing warmth just in time for when cooler Fall temps arrive. I always considered that to be the real reason why the pros do it with road rash being of secondary concern. Those with shorter leg hair just went along and copied their hairier peers.
Shaving your legs doesn't take much time at all. I race a few times a week (during the season) and shave once, at most twice, a week. Probably takes 10 minutes to shave. There's nothing I could do for 10 minutes a week that would improve my time by up to eight percent (most amateur cyclists already overtrain as it is). Not part of your question, but shaving is also largely about being able to treat road rash and avoid infection.
They are claiming "it contradicts previous results", but certainly the savings must depend on the athlete's pre-shave hirsuteness. Unless they controlled for that, the previous results could have just been because of a relatively smooth group of subjects.
The article states that the previous results didn't involve any subjects at all, but a "fake lower leg in a miniature wind tunnel with or without hair glued onto it".
That's my guess -- they must be quite expensive to build, with quite a bit of specialized hardware involved (not exactly off-the-shelf stuff!), and highly-specialized people to run it ...but then only one experiment can use the whole shebang at a time (including setup/teardown time), so they can't spread out the costs across lots of concurrent users.
Compared to numbers from earlier studies (0.6), the new number (7) looks suspiciously like a floating point error arising from software/hardware issues!
I appreciate that in medicine it can be. But this is not medicine. In this case the 'research' was obviously done by first letting someone cycle WITH hair, then they were shaved (with time to recover I imagine) and then they tried again. I say 'obviously' because this little participants is not enough because you'd need more people to be able to compare means in performance differences.
The problem I see however is that after being shaved the participant would feel faster and thus behave faster. A sort of placebo effect if you will. At the same time, the cyclist-cultural effect of being 'unshaved' in the first test would make them feel sluggish. The same effect as seen in experiments where participants are presented with negative messages vs positive messages and asked to perform tasks. The participants exposed to negative messages, as expected, perform poorly in comparison to the other group.
In conclusion, I feel, that this 'research' can only be used as something to base you hypothesis on, but it is in no way conclusive in its current setup. You'd need to have random groups of cyclists, some of whom are, to begin with, shaved, and some who are not. It would maybe be easier not to use cyclists in order to find a diverse enough set of participants. Then again I don't know how many regular cyclists go around unshaven.
This wasn't "how fast can you get from A to B" - they were cycling at a steady rate on a stationary bike in a wind tunnel. Pedalling faster would not trivially make the results better (it might have some complicated aerodynamic effect, but I don't have any clue of the likely size or direction).
The number of people needed to archieve statistically significant results more or less solely relies on the size of the effect that gets measured[1]. It's actually not unusual to design experimental studies with just a single person - provided the "treatment" (shaven legs in this case) are reversible. Then you just have to make repeated phases with and without treatment , e.g. ABABAB or even ABACABAC if you have an alternative treatment you want to compare the results with.
[1] If you design a study you can actually start from the other end: If you can estimate the size of the effect to be measured, you can calculate the number of people needed for significance beforehand.
Massage and reducing wear-and-tear in event of crashing are the best most plausible reasons I've heard for it. I'd never heard anybody seriously thinking it reduces drag in a meaningful way.
(Edited to describe my experience, per "let's not read the article" comment).
This drive for unending novelty in the sciences is a shame on many levels. The good and useful work of duplicating and verifying results goes undone, and scientists are driven ever more forcefully towards designing studies only on the basis of what will attract grant agencies.
Duplicating major results carefully would be useful to the scientific record. Trying things that will probably fail, and then publishing negative results, would be useful too. But, for most researchers, doing this useful work appears to be career suicide.