Featured Product
This Week in Quality Digest Live
Health Care Features
Scott Trevino
Cybersecurity can’t wait
Amy Brown
AI and machine learning can help turn call-center conversations into actionable improvement strategies
Gleb Tsipursky
Leaders need worker wellness for the health of their company
Medical device manufacturers get additional three or four years, depending on risk class
Bhushan Avsatthi
Health Care News
Easy, reliable leak testing with methylene blue
New medical product from Canon’s Video Sensing Division
Reduce identifying info in patient health data to enable better treatments and diagnostics
Making the new material freely available to testing laboratories and manufacturers worldwide
Google Docs collaboration, more efficient management of quality deviations
MIT course focuses on the impact of increased longevity on systems and markets
Delivers time, cost, and efficiency savings while streamlining compliance activity
First responders may benefit from NIST contest to reward high-quality incident command dashboards
Enhances clinical data management for medtech companies
Health Care

## The Famous DOE Question

### How many experiments should I run?

Published: Sunday, August 14, 2016 - 23:00

I hope this little diversion into design of experiments (DOE) that I’ve explored in my last few columns has helped clarify some things that may have been confusing. Even if you don’t use DOE, there are still some good lessons about understanding the ever-present, insidious, lurking cloud of variation.

Building on my June column, consider another of C. M. Hendrix’s “ways to mess up an experiment”: Insufficient data to average out random errors (aka, a failure to appreciate the toxic effect of variation).

This is where the issue of sample size comes in, and it’s by no means trivial.

### How many experiments should I run? It depends.

The ability to detect effects depends on your process’s standard deviation, which in the tar scenario from my May column simulation was +/– 4 (the real process was actually +/– 8).

Here’s a surprising reality for many: The number of variables doesn’t necessarily directly determine the number of experiments. But let’s continue the tar scenario:

“Three variables? It’s obvious: Let’s run a 2 x 2 x 2 factorial.” (Eight experiments.)
Most people might not realize that this design would allow detection of an approximate 8 percent to 9 percent difference between the high and low levels of a variable, e.g., the average difference in tar if one goes from 55°  to 65°, or 26 percent to 31 percent copper sulfate, or 0 percent to 12 percent excess nitrite.

“I want to do only four experiments, so I’ll do a 2 x 2 factorial and study the other variables later.”
There are consequences! Running an unreplicated 2 x 2 (four experiments) on two of the variables (e.g., omitting excess nitrite) would allow detection of an 11 percent to 13 percent difference between the high and low variable settings. Interactions between excess nitrite and each of the other two variables would be unknown.

“What do you mean, replicate it?”
Replication of your 2 x 2 x 2 factorial (16 experiments) would then allow detection of an approximate 5.5 percent to 6.5 percent difference. To get this same accuracy with two variables, the 2 x 2 factorial would have to be repeated three more times (16 total experiments)—a wasted opportunity.

If you knew up front that 16 experiments would be needed for your objective, you could now easily include excess nitrite. And you could easily add two additional variables (five total, perhaps those two variables you were planning to “study later?”) with no serious consequences.

Wouldn’t it be nice to discover up front that the excess nitrite could subsequently be set to zero? It’s your decision, and it depends on the answer to this question: What size effect must you detect to take a desired action?

I’ve often had this conversation in various guises:

Client: I have three variables I can test. Given the potential cost savings per each percent tar reduction, I need to detect a one percent difference.

Davis: Sit down. I’m afraid I have some bad news. That would require 500 to 680 experiments, depending on how badly you want to detect that effect.

Client: Ohhh... what if I cut it down to two variables?

Davis: Sorry. Still 500 to 680.

Client: Really? OK, I’ll settle for detecting 2 percent.

Davis: You’d better stay sitting. That would now require 130 to 170 experiments. But wait; let’s chat some more. Under the right circumstances, I might be able to recommend a 15-run design (three-variable Box-Behnken) that will map the region;  or, should you wish to study two additional variables, there is a five-variable design that would allow you to study these two additional variables and map the region in 33 experiments (believe it or not, four variables would also take ~30 experiments).

Why the dramatic difference in numbers? It depends on your objective, which brings me to another Hendrix “mess up”: Establishing effects (factorial) when the objective was to optimize (response surface), or vice versa.

People think it’s as simple as running a factorial design based on the number of variables, and then performing statistical t-tests. It’s not.

### A healthcare example—for everyone

Suppose you’re interested in examining three components of a weight-loss intervention:
• Keeping a food diary (yes or no)
• Increasing activity (yes or no)
• Home visit (yes or no)

You plan on randomly assigning individuals to one of the eight experimental conditions, each representing a different treatment protocol. For example, the individuals randomly assigned to Condition 2 would receive a home visit, but neither of the other two intervention components. Those randomly assigned to Condition 7 would receive the “keeping a food diary” and “increasing physical activity components,” but wouldn’t receive a home visit. People assigned to Condition 1 will have to rely on sheer willpower.

Sounds simple enough.

I happen to be visiting your facility, and you ask me for a sample size recommendation for the number of people needed.

I smile and say, “Please sit down.”

To be continued next time.