> 7% higher risk in white women (HR = 1.07, 95% CI: 0.99–1.16)
The confidence interval clearly includes 1.0. I'm tired of all these studies being uncritically reported on in the press with no mention of the giant error bars.
IMO medical researchers need to raise their standards. 95% is way too low. Also, journalists should be ashamed at the state of science and medical reporting. It's ironic that you can read articles blaming tech for the spread of fake news and then turn the page to the health section and read garbage like this article.
Won't argue about the validity 95% confidence interval here, but the 95% CI of HR of women as a whole excludes 1, and for African Americans clearly also excludes one.
The overall conclusion is still sound (if you accept the validity of 95% CI as a measurement).
Only for dye, not for straighteners, and only for "Nonprofessional application of semipermanent dye to others", and black women. If you count up the specific claims made in the article, a majority are not actually supported by the study. And unless the study was preregistered for the cohorts they analyzed, the statistical analysis is suspect even at the 95% level which I still contend is too low.
Anyway, they get 6 interesting results, slightly more than the expected number of false positive cases with a 95% CI interval with 55 cases.
Moreover, in the last case, 1.00 is excluded because the interval is too low, i.e. in this study in the premenopausal women that applied hair dye to other person between 1 and 3 times a year got less cancer that the women that didn't. This is probably just noise, but note that this case is not highlighted in the abstract in spite they highlight a few cases where the 95% interval includes 1.00.
In the supplementary table 2, classified by estrogen reception, they report also 55 results. The intervals that exclude 1.00 are:
0.78 (0.63, 0.97)
0.82 (0.70,0.95)
Both are below 1.00, and neither is highlighted in the abstract. This number is close to the expected number of false positive cases with a 95% CI interval with 55 cases.
I guess there is another table in the paper where they classify the women black or white, and perhaps even more classifications.
So the evidence looks weak and close to the noise level.
You can't reuse the same data to get 95% CI for all women and for white woman by deducing that all women are excluded but white women included. They have to calculate a confidence for the event that white women are affected AND all women are not.
1. The model says there exists some true value of a parameter, say, how much more likely you are to get cancer if you use hair dye.
2. You run an experiment with two groups, hair dye, and no hair dye.
3. If you were to run this experiment 100 times, you would expect that 5 of those times, the true value wouldn't be within the confidence interval, and 95 times, it would be.
This type of analysis doesn't tell what you really want to know: how likely is it that hair dye causes cancer. What you do know is that you should expect that if you run a bunch of experiments, you should expect 5% of your 95% confidence intervals to be "wrong."
To have a degree of belief in something, you need a prior. If I run a hazard ratio analysis for skydiving without a parachute, and the 95% CI includes a hazard ratio of 1.0, I'm still not going to believe that this is a safe activity. If I run it on drinking a glass of water, and the 95% CI doesn't include 1.0, I'm still going to believe that drinking a glass of water is safe.
Not sure if the answer will be clear, but that's inherent in the weird and counter-intuitive nature of the concept itself (which I consider as a major disadvantage of it).
It is an interval that was obtained using a confidence interval generation procedure, for which the following is true: for any true value of the quantity you're estimating, the procedure yields an interval that contains that correct value 95% of the times, when run on different random samples collected conditioned on that true value.
You want to estimate some quantity, let's say the cancer risk ratio of hair-dyers to non-hair dyers. You make some (noisy) experimental measurements on people.
Then, let's say someone hands you a confidence interval generating procedure. You feed it with your measurement data and it spits out an interval. To claim that this interval is a 95% confidence interval is actually a claim about the confidence interval generating procedure, not about the particular interval. And that claim is that for any true cancer risk ratio value, if we ran lots of experiments and fed all the results to this procedure, then 95% of the resulting intervals would contain the correct corresponding value. So in more detail: Assume the true ratio is, say, 1.05. Then if we repeated our noisy experiment in a 1.05-ratio world and applied the confidence interval generating procedure to the results, we would expect to get 95% of those intervals containing the number 1.05 and 5% of them not containing it. If we assume the ratio is 1.2, the same thing should hold. For this analysis to work, we need an idealized measurement model that tells us what the distribution of measurements looks like given any particular underlying value of the quantity of interest (say the cancer ratio). Then using this model and the description of a given confidence interval generating procedure, we may be able to prove mathematically that it indeed has the above property. Most of the time, statisticians use standards procedures, for which this property has been proved and the proof is widely known. They rarely invent new such procedures.
So, in summary: we don't know what the true value is in reality. But we can prove that the confidence interval generating procedure has the above explained property. Then, we will call the interval that this procedure gives us on our real measurement data a 95% confidence interval.
What it definitely does not mean is that the true value is in the interval with 95% probability, and that's the most common misunderstanding.
I'm not sure if this helps at all. These things are really counter-intuitive and twisted against our natural way of thinking.
Framing it as statement about an interval generating procedure is really interesting! Do you know what an example would be? Appreciate anything you can point me towards to learn more.
Honestly, I feel like the whole concept is straight up weird and fascinating.
Our sample comes from a true population that we don’t know. Looking at the data we got (primarily the variance and the sample size) we try to infer what the true population mean is. If the spread is small and the sample size is large, odds are pretty good that our sample mean is close to the population mean. If the sample size is small and the variance in the sample is really high, then odds are good our sample mean is pretty far from the true population mean.
We can quantify that. Basically saying, if the true population mean is x, how improbable would a sample as different as ours have been? When we construct a 95% confidence interval we’re saying “assuming the randomness of my sample is no more improbable than a 1/20 dice roll, the true population should be this far or closer to my sample mean.”
For an intuitive example, consider a 100 sided die going from n to n+100 (e.g. 7 to 107) where we are trying to guess the middle value (n+50) based on a single roll. We get 100. If we want to be 100% confident, the then confidence interval should be 0 to 200 as that’s the range the true value must be in. If we say, nah I’m sure I got it exactly right, your 0% confidence interval is just 100 to 100. If you put on your scientist hat, you say you doubt your particular roll was in the top or bottom 5% of the die, so let’s call it a 90% confidence interval at 5 to 195.
Note that you don’t have any particular reason for that last statement, and you could be wrong.
In other words a 95% interval says here’s the range that the true population mean should be in assuming our particular sample was no more improbable from the truth than 1 in 20. If the null result is in those bounds, in the article, 1.00 (for no effect) then even though your sample mean might be 7%, your data says it wouldn’t be outrageous to get a sample mean of 7% by chance from a true population mean of no effect.
It's a plausible mechanism but a bogus study. With most of their confidence intervals including hazard ratios of 1.0 (or very close to it), they need to rule out even slight correlations like smoking and (low) income. Recall bias of the survey participants is another major factor in cancer cases. Overall this reads like they collected data of a bunch of chemical exposures setting their p to < 0.05 and found this one that they could publish on.
I'd like to see antiperspirant usage as a variable, but it seems like it would be hard to find a control. Having tried several times to buy deodorant without antiperspirant for my wife, I can say how hard it is to find an armpit product without aluminum oxide. It's not so hard to find masculine scented deodorants.
I am glad that they have statistical evidence, however it has been believed for a long time.
Beauty salons use a wide variety of products that are biologically active and unregulated. Many are known carcinogens, and working at a beauty salon puts you at increased risk of cancer. See https://www.ncbi.nlm.nih.gov/pubmed/19755396 for random verification.
The question is what it does to the customers. It has long been suspected that increased cancer rates among blacks are tied to the fact that they are more likely to use beauty salons than whites. And if they do use them, they use them more often.
> “The research was based on the medical records of more than 46,000 women aged 35 to 74 from the Sister Study, meaning all women involved had a close relative who had died of breast cancer.”
Can someone explain to me how this isn’t a huge problem for this study?
If you want to be careful, you can say the study found that among cohort with increased cancer risk due to genetics, using those hair products increased cancer risk even further.
You go with the data that you have, not the data you wish you could get.
It would be nice to redo the study with a large population that is randomly selected. However there is also no reason to believe that the results will be significantly different in direction. Therefore it would also be unwise to discount this result.
If the risk factor is homogenized, then the variable researched can be isolated theoretically.
However, this doesn’t allow you to test for interaction between those variables.
In the study for example, they identify an apparent interaction between ethnicity and the chemicals mentioned; we don’t know what the effect size would be (whether additive, multiplicative, or something else entirely).
(This is without any domain knowledge of the mechanisms of breast cancer to be fair.)
The only thing surprising here is that it isn't scalp cancer.
Hair straighteners work by breaking and reforming the disulphide bonds in proteins. Proteins perform many important roles related to DNA repair, chromosome formation, DNA replication, and other cancer-relevant actions. It can't be good to have bonds in those proteins getting changed somewhat randomly.
Skin has millions of years of incremental improvement at keeping stuff out. It is not unheard of for a carcinogen applied to the body to not cause skin cancer directly but to cause cancer elsewhere because being on the body makes it likely to get in through the various holes where it then finds its way to some more susceptible tissue.
Is there a source for this that isn't Newsweek? Newsweek isn't really a trustworthy source; they are, as I understand it, a publication several tiers below Buzzfeed (which does good work sometimes) that has acquired and turned Newsweek into a Newsweek Suit, Buffalo-Bill-style, and wears it around the Internet.
Just guessing here, but I suppose it's because the effect has less to do with where the chemicals get into the body than where they accumulate and/or have an effect. As it turns out, scalp tissue is not sensitive to those chemicals but breast tissue is. Could have been the liver or the prostate instead, but it wasn't.
And if a woman has long hair at least some of the chemicals from the hair dye will wash off over their breasts when they shower.
Also guessing here, but home hair dye is usually quickly washed off in the sink while a salon will not only wash your hair more thoroughly but they tend to wash it twice after dying. So I would assume that when a person showers after home hair dye a lot more of the chemicals are rinsed off over their breasts when compared to after a salon dye.
This is due to the tendency for bioaccumulation in lipids, which breast tissue and organs tend to have a high percentage of. I'm not an expert, just what I remember reading, so someone please chime in for the exact mechanisms.
Likely to be some other contributing factors, like their role in producing breast milk biochemistry related to that (speculating: accumulates in the mother but doesn't get passed on to the child).
The confidence interval clearly includes 1.0. I'm tired of all these studies being uncritically reported on in the press with no mention of the giant error bars.
IMO medical researchers need to raise their standards. 95% is way too low. Also, journalists should be ashamed at the state of science and medical reporting. It's ironic that you can read articles blaming tech for the spread of fake news and then turn the page to the health section and read garbage like this article.