That’s fake news. Real news COSTS. Please turn off your ad blocker for our web site.
Our PROMISE: Our ads will never cover up content.
Davis Balestracci
Published: Sunday, August 14, 2016 - 23:00 I hope this little diversion into design of experiments (DOE) that I’ve explored in my last few columns has helped clarify some things that may have been confusing. Even if you don’t use DOE, there are still some good lessons about understanding the ever-present, insidious, lurking cloud of variation.
Building on my June column, consider another of C. M. Hendrix’s “ways to mess up an experiment”: Insufficient data to average out random errors (aka, a failure to appreciate the toxic effect of variation). This is where the issue of sample size comes in, and it’s by no means trivial. The ability to detect effects depends on your process’s standard deviation, which in the tar scenario from my May column simulation was +/– 4 (the real process was actually +/– 8). Here’s a surprising reality for many: The number of variables doesn’t necessarily directly determine the number of experiments. But let’s continue the tar scenario: “Three variables? It’s obvious: Let’s run a 2 x 2 x 2 factorial.” (Eight experiments.) “I want to do only four experiments, so I’ll do a 2 x 2 factorial and study the other variables later.” “What do you mean, replicate it?” If you knew up front that 16 experiments would be needed for your objective, you could now easily include excess nitrite. And you could easily add two additional variables (five total, perhaps those two variables you were planning to “study later?”) with no serious consequences. Wouldn’t it be nice to discover up front that the excess nitrite could subsequently be set to zero? It’s your decision, and it depends on the answer to this question: What size effect must you detect to take a desired action? I’ve often had this conversation in various guises: Client: I have three variables I can test. Given the potential cost savings per each percent tar reduction, I need to detect a one percent difference. Davis: Sit down. I’m afraid I have some bad news. That would require 500 to 680 experiments, depending on how badly you want to detect that effect. Client: Ohhh... what if I cut it down to two variables? Davis: Sorry. Still 500 to 680. Client: Really? OK, I’ll settle for detecting 2 percent. Davis: You’d better stay sitting. That would now require 130 to 170 experiments. But wait; let’s chat some more. Under the right circumstances, I might be able to recommend a 15-run design (three-variable Box-Behnken) that will map the region; or, should you wish to study two additional variables, there is a five-variable design that would allow you to study these two additional variables and map the region in 33 experiments (believe it or not, four variables would also take ~30 experiments). Why the dramatic difference in numbers? It depends on your objective, which brings me to another Hendrix “mess up”: Establishing effects (factorial) when the objective was to optimize (response surface), or vice versa. People think it’s as simple as running a factorial design based on the number of variables, and then performing statistical t-tests. It’s not. Suppose you’re interested in examining three components of a weight-loss intervention: You plan on randomly assigning individuals to one of the eight experimental conditions, each representing a different treatment protocol. For example, the individuals randomly assigned to Condition 2 would receive a home visit, but neither of the other two intervention components. Those randomly assigned to Condition 7 would receive the “keeping a food diary” and “increasing physical activity components,” but wouldn’t receive a home visit. People assigned to Condition 1 will have to rely on sheer willpower. Sounds simple enough. I happen to be visiting your facility, and you ask me for a sample size recommendation for the number of people needed. I smile and say, “Please sit down.” To be continued next time. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.The Famous DOE Question
How many experiments should I run?
How many experiments should I run? It depends.
Most people might not realize that this design would allow detection of an approximate 8 percent to 9 percent difference between the high and low levels of a variable, e.g., the average difference in tar if one goes from 55° to 65°, or 26 percent to 31 percent copper sulfate, or 0 percent to 12 percent excess nitrite.
There are consequences! Running an unreplicated 2 x 2 (four experiments) on two of the variables (e.g., omitting excess nitrite) would allow detection of an 11 percent to 13 percent difference between the high and low variable settings. Interactions between excess nitrite and each of the other two variables would be unknown.
Replication of your 2 x 2 x 2 factorial (16 experiments) would then allow detection of an approximate 5.5 percent to 6.5 percent difference. To get this same accuracy with two variables, the 2 x 2 factorial would have to be repeated three more times (16 total experiments)—a wasted opportunity.A healthcare example—for everyone
• Keeping a food diary (yes or no)
• Increasing activity (yes or no)
• Home visit (yes or no)
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Davis Balestracci
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.