Imagine that you are a scientist, nearing the end of a twenty-year study. Two decades ago, you thought that mothers drinking milk during pregnancy might lead to benefits for their children. When those babies were born, you weighed them and found that the milk-drinkers’ newborns were indeed a little bigger. Just wait twenty years, you said. We’ll see that those kids are taller as adults.
Now that the twenty years have gone by, you measure the grown-up tykes. As you plug the numbers into your computer, a sudden realization makes you shiver: we’ve lost almost a third of the original participants. The remaining group doesn’t have enough statistical power to give you a valid result.
It gets worse: you have detailed records on the mothers and their milk consumption during pregnancy, but that was twenty years ago. You have no records of what the children themselves were eating or drinking as they grew up, which can certainly affect their growth.
You shrug, and finish punching in the numbers. The computer spits out two things:
(1) The kids of milk-swilling moms are a teensy bit taller.
(2) The p-value is 0.19, but you were hoping for 0.05 or less.
Bzzt. Not significant.
Uh-oh. A twenty-year project down the drain. So, let me ask you…
As this scientist, would you then write a paper declaring that maternal milk consumption makes kids grow taller?
As the editor of a journal, would you publish such a paper?
As a science writer at the freaking New York Times, would you give this study any space in your esteemed publication?
Of course n–Oh. Wait.
If you click that link, you can see the little disclaimer buried at the end of the fourth paragraph: “But these trends did not achieve statistical significance.”
Now, people often misunderstand what significance means in this sense. It doesn’t mean that the groups had very different heights, or that the result is important to know. “Significance” means that it passed a simple yes-or-no test for whether the result counts or not.
Imagine you have a skeptical friend. A skeptical owl.
“I think something is up with these new nickels,” you tell the owl. “Watch this.” You toss the coin three times and it comes up heads every time. “They always land heads!”
Skeptical owl is skeptical. “Three heads? That’s not so unusual. Happens all the time with completely normal coins.”
You toss some more, and get ten heads in a row.
“OK.” says skeptical owl. “That would be really unlikely if the coin is fair.”
“So you believe me?” you ask the owl, hopefully.
“I’m just saying that’s really unlikely,” says the owl, ever skeptical. “Your p value is, like, 0.00097.” Owls can calculate p-values in their heads.
This is what significant means: You convince the owl that your result is “really unlikely” when the p-value gets smaller than 0.05, which the owl would describe as a 5% chance that it could have happened by accident. Computer statistics programs are all written by owls. They calculate this number for you, and you have a clear, black-and-white answer to the question of whether the result is statistically significant.
Just say no to non-significant results!
This is where the milk-hyping researchers went wrong: they wrote up their results in spite of the owl’s disapproval. It’s true that the .05 cutoff is arbitrary, but really, their p-value of 0.19 is slightly less impressive than a three-heads streak. Skeptical owl is skeptical.
This study is not the only one where researchers try to report non-significant results; it’s an epidemic of bad judgment and/or bad math. Take a look, for example, at this study, published earlier today in the American Journal of Clinical Nutrition, which should really know better. The researchers hoped that Vitamin D supplements would result in people getting fewer infections, as measured by prescriptions for antibiotics. Guess what: they had to begin their conclusion with “Although this study was a post hoc analysis and statistically nonsignificant…”
Friends, don’t fall for it. A non-significant result is one that doesn’t deserve to have lofty conclusions based on it. Not even if you really want it to be significant, if you are wishing super hard for it to be significant. If you feel the need to describe your result as “tantalisingly close to significant,” you already know the truth.