Statistics can be unintuitive. What’s a large difference? What’s a large sample size? When is something statistically significant? You might think you know, based on experience and intuition, but you really don’t know until you actually run the analysis. You must run the proper statistical tests to know what the data are telling you.
Even experts can get tripped up by their hunches, as we’ll see.
In my family, we’re huge fans of Discovery’s MythBusters. This fun show mixes science and experiments to prove or disprove various myths, urban legends, and popular beliefs. Are Daddy Longleg spiders really super-poisonous? Can diving underwater protect you from an explosion, or being shot? Are toilets really the cleanest place in the house? What is the fastest way to cool a beer? They often find a way to work in impressive explosions, one of their hallmarks. Thanks to MythBusters, my 7-year-old daughter was able to explain to me that you can identify the explosive ANFO because it’s made out of pellets!
I love MythBusters because it makes science fun. The show’s hosts extensively plan and execute small-scale tests before conducting the full-sized experiment, and they make huge efforts to rule out competing variables. MythBuster’s skilled crew and well-stocked workshop can build a rig or robot to test virtually anything in a controlled and repeatable fashion. They also place a strong focus on collecting data and using that data to make decisions about the myths. This show is a fun way to bring the scientific method alive for our young daughter. Good stuff!
Having said that, I did catch them making a statistical mistake during an episode we watched recently. I’m pointing this mistake out only to highlight how nonintuitive statistics can be, and not to put down the hard work of the MythBusters crew.
This episode tested the myth that yawning is contagious—so if you see someone yawn, you’re more likely to yawn yourself. The crew recruited 50 people who thought they were being considered for an appearance on the show. One by one, each subject spoke with the recruiter, who either yawned, or not, during the spiel. The subjects then sat by themselves in an isolation room and were told to wait. While in the isolation room for a set amount of time, unbeknownst to them, they were watched to see if they yawned.
• 25 percent (4 out of 16), who were not exposed to a yawn, yawned while waiting. I’ll call this the non-yawn group.
• 29 percent (10 out of 34), who were exposed to a yawn, yawned. I’ll call this the yawn group.
Jamie Hyneman, one of the MythBuster hosts, concluded that because of their large sample size (n = 50), the difference of 4 percent was meaningful. They didn’t run a statistical test, but the decision was based on his intuition about the statistical power that the sample size gave them. Let’s test this out a bit more rigorously.
To test their data, we’ll need to use the two proportions test in Minitab (Stat > Basic Statistics > 2 Proportions). We can use summarized data rather than data in a worksheet.
Fill in the main 2 Proportions dialog like this:
MythBusters wanted to test whether the proportion for the yawn group was greater than the non-yawn group. So we need to perform a one-sided test, which also provides a little more statistical power.
Click Options and choose greater than as the alternative hypothesis to determine whether the first proportion is greater than the second proportion. We get the following output:
You’ll see that there are two p-values. The Fisher’s exact test is for small sample sizes. The note about the normal approximation and small sample sizes indicates that we should use the Fisher’s exact test P-Value of 0.513. This value is greater than any reasonable alpha value (typically 0.05), so we can’t reject the null hypothesis.
Conclusion: The data do not show that there is a higher proportion of yawning subjects in the yawn group than in the non-yawn group. Further, rather than having a large sample, Minitab indicates that the sample is small.
Fans of the show know that when they can’t confirm a myth, MythBusters find an exaggerated way to replicate the myth to show the extreme conditions that are necessary to make the myth happen. This method is a great way to increase the number of explosions they get to show. As much as I want to, I can’t give you an impressive explosion for this column’s finale. However, I can give you a startling answer to the question of how large a sample MythBusters needed to have a good chance to detect a difference of 29 percent vs. 25 percent. The answer is so large that you might just end up waving yours arms around like Adam Savage does on the show.
To figure this out, we’ll use Minitab’s Power and Sample Size calculation for 2 Proportions (Stat > Power and Sample Size > 2 Proportions). We’ll use the proportions from the study and a power of 0.8, which is a good standard value, as I’ve discussed here.
In a nutshell, a power of 0.8 indicates that a study has an 80-percent chance of detecting a difference between the two populations if that difference truly exists. Fill in the dialog like this:
Under Options, choose greater than (p1 > p2). We get the following results:
The results show that MythBusters needed a whopping 1,523 subjects per group (3,046 total) to have an 80-percent chance of detecting the small difference in population proportions. That’s a far cry from the 50 subjects that they actually had. Why is this so large? There are two main reasons.
First, the effect size is small, and that requires a larger sample. Second, the data for this test are categorical rather than continuous. The subjects either yawned or did not yawn while in the isolation room. Generally speaking, any given amount of categorical data represents less useful information than the same amount of continuous data. Consequently, you need a larger sample size when you’re analyzing categorical data.
We can also take the results of the study and use them to determine how much power the study had. To do this, we input the sample size and the estimate of each proportion from the study into the power and sample size dialog. Of course, we don’t know the true values of the population proportions, but the study provides the best estimates that we have at this point.
For this study, Minitab calculates a power of 0.09. This value indicates that there was less than a 10-percent chance of detecting such a small difference, assuming that the difference truly exists. Therefore, insignificant results are to be expected for this study regardless of whether the difference truly exists or not.
Given the results of the 2 Proportions Test and the power analysis, we can conclude:
• There is no evidence that yawns are contagious.
• The study had inadequate power to detect a difference.
Coming from the university world of academic research projects, I would say that MythBusters conducted a pilot study. These are small experiments designed to gather initial estimates (such as the proportions) and determine the feasibility of conducting a larger study. At this point, the main result is that the study, as it was performed, was not up to the task at hand. It could not reasonably detect the size of the difference that is likely to exist, if there is even a difference.
That does not mean that this project was a waste of time, though, because you don’t know this until you at least do some research. In the research world, the question now would be whether further research is worthwhile. This determination is different for each research project. You must balance the effect size (small in this case), the benefits (negligible), and the additional costs (very large for a much larger sample size). So, I’d guess that a large follow-up study is unlikely to happen.
We remain huge fans of the MythBusters. This case study only serves to highlight the fact that conducting research and data analysis is a tricky business that can trip up even the experts.