In a recent blog post, “Assessing Variability for Quality Improvement,” I showed how measuring variability is as important as measuring the mean for a product or service in a quality improvement initiative. The mean, by itself, often tells an incomplete story. Additionally, quality management veterans know that controlling the variability is often more difficult than controlling the mean. If you want to change the mean, it often entails adjusting a manufacturing setting or target. However, reducing variability often requires new technology or procedures.
For example, in the image below, it’s generally easier to re-center the tightly clustered measurements on a target than it is to reduce the spread of the dots that are centered.
This rule of thumb applied to a research project that I was involved in.
The study looked at bone density in teenage girls. Specifically, we wanted to determine whether jumping from 24-inch steps, 30 times, every other school day would increase their bone density compared to the control group. For the jumping intervention, we wanted the subjects to experience an impact of six times their body weight (BW) for each jump.
I conducted a pilot study to quantify the impacts by having each subject jump five times onto a force plate. After data analysis, I found that the average impact—the mean—across all subjects was 6.13 BWs, which sounds great. However, I also found that variability was too high with a standard deviation of 1.08 body weights. While the overall mean exceeded our target, nearly half the subjects had means that were below 6 BWs. Subject means ranged from 4.7 to 8.4.
The probability distribution plot in figure 1 shows the distribution of landing forces using the distribution, mean, and standard deviation estimates from the pilot study. The shaded area indicates that we can expect that only 55 percent of the subjects will average more than 6 BWs, which is our target.
If you don’t know the nature of your process inputs, the outcomes may be unpredictable. Landing impacts were the key treatment for our study. High variability here would be equivalent to studying the effects of a drug but giving each subject wildly different, unknown doses. Or building a bridge, but being unsure of the strength of the material.
Although the subjects jumped from a fixed height, the magnitude of the peak impact depended on how much the jumpers flexed their knees. Theoretically, if jumpers did not flex their knees at all, they could exceed 50 BWs and injure their knees. The fact that the impacts were down in the 4–8 BW range was due to the subjects bending their knees on landing.
We found that the amount of knee-bending was very sensitive to external factors. You can read here about how I found that subjects who did not wear shoes in previous jumps had significantly reduced impacts in later jumps, even when wearing shoes. Also, I compared impacts for jumping on a mat vs. no mat. Surprisingly, the peak impacts on the mat were actually slightly higher. We hypothesized that seeing the mat subconsciously suggested to the subjects that they didn’t need to cushion their impact as much.
I dug a bit deeper and used an X-bar-S control chart (figure 2) in a slightly unconventional manner. Each subgroup represents a subject. Consequently, each plotted point represents the mean and standard deviation of each subject’s five trials (except for subject 2, who had a missing value).
The in-control S chart shows that each subject has a consistent landing style that produces impacts of a consistent magnitude. However, the out-of-control X-bar chart indicates that different subjects have very different means. Collectively, the chart shows that some subjects land hard consistently while others consistently land softly. The control chart suggests that the variability is not inherent in the process (i.e., common cause variation) but rather assignable to differences between subjects (special-cause variation). This begs the question: Can we train the subjects to land a certain way?
If we simply wanted to increase the mean, we could have easily made the steps higher. In the range that I tested (8 in., 16 in., and 24 in.), I found that adding 8 in. to the jumping height increases the impact by an average of 2 BWs. However, our mean was acceptable and we needed to control the variability between subjects, which entailed a more involved, ongoing process change.
We decided that we needed to train the subjects how to land. For the training, we created a short video that demonstrated the proper way to land. The subjects were shown this video several times during a school year. Additionally, the nurse for the research study observed all of the jumping sessions, looked for deep knee bends, and corrected each subject as needed. This ongoing training and corrective action reduced the variability enough so that the impacts were consistently greater than 6 BWs.
This simple jumping example illustrates how reducing the variability is often more difficult than improving the mean, which would have just required higher steps. Next time, I'll highlight another aspect of too much variation: how it can obscure significance in statistics.