PROMISE: Our kitties will never sit on top of content. Please turn off your ad blocker for our site.
puuuuuuurrrrrrrrrrrr
Davis Balestracci
Published: Monday, September 19, 2016 - 14:14 Referring back to June’s column, I hope you’ve found C. M. Hendrix’s “ways to mess up an experiment” helpful in putting your design of experiments training into a much better perspective. Today, I’m going to add two common mess-ups from my consulting experience. If you’re not careful, it’s all too easy to end up with data that’s worthless.
Underestimating the unintended ingenuity of human psychology will mess up your experiment—and this includes the study planners! Trust me, there is no way you could make up the things busy people will do to (unintentionally) mess up your design and its data. “But I just want to run a simple 2 × 2 × 2 factorial,” someone might say. To refresh your memory: Suppose you’re interested in examining three components of a weight-loss intervention: It’s easy to set up the 2 × 2 × 2 factorial design matrix, and you might think the only remaining issue is how many people should participate. Remember from last column, when a client’s initial request for “just a design template” turned into three consults totaling more than two hours? Using this weight-loss experiment scenario, let’s see why. Item 1: Some things need to be clarified “Weight loss” is pretty vague. What exactly do you mean by that (i.e., the operational definition)? Pounds? Percent of original weight? Meeting a pre-established goal? Something else? What time period is going to be studied? One month? Two months? Three months? Six months? A year? And what exactly do you mean by “keeping a food diary,” “increasing activity,” and “home visit?” Do you mean daily? Weekly? Do you want them to formally record the activity? Does a phone call count as a “visit?” What characterizes the people you would like to study? Are you going to make it absolutely clear to the study participants what is expected? Don’t underestimate a very serious “headache factor” that I’ve often encountered: Is this study going to collect data from several sites or departments? If so, how are you going to make sure everyone agrees on the answers to the questions above? How would you know? Would a simple designed data-collection sheet (with definitions on it) help reduce this variation? (Hint: Yes.) Should you PLAN a brief initial study whose only objectives are to reduce the human variation in perceptions of 1) the execution of the protocol, 2) how to define and collect the data, and 3) the ease of using the designed data collection sheet? (Hint: Yes! But these data won’t be used in the study itself.) How will you choose people for such a study? Item 2: What threshold of weight loss would result in proceeding with your investment in such a program? Or is it a matter of “met personal goal” (1) or “did not meet personal goal” (0)? The latter measure of “percent of patients who met goal” isn’t recommended. Read on to see why. Item 3: How badly do you want to detect this difference? Item 4: How many people do you need? Key question: What is the ratio of this desired result from item 2 above, relative to your standard deviation? Are any actual data available from similar weight-loss research studies (preferably using the same time period) to measure the weight losses as well as unintended gains? If so, what are some of the reported standard deviations for such a group of people? Are they consistent enough to come up with an approximate value? Remember the client dialogue from last newsletter regarding the tar scenario? If you obtained sample sizes like these, might you consider the possibility of studying two additional variables’ effects on your results? Perhaps consider: “Person does not set a goal” (0) or “Person sets goal” (1)? They come from answering three questions. The first is answered pretty much by default: The other two are: Finding the answers to the three questions above automatically answers the implicit fourth question: What size sample do I need? If you have a specific sample size in mind and plan on using p = 0.05 for significance (that’s two questions answered), you can work backward to calculate various combinations of the other two, either: Important point: If all you do is default to p = 0.05 to detect effects, your design will, by default, answer questions to items 2, 3, and 4 (as in the calculated sample sizes above). This could possibly waste your hard work unless you reconsider your objectives. For those of you who are wedded to the “rapid cycle PDSA” methodology, have I brought up some things that might need consideration when you PLAN? (Hint: Yes.) Here’s an additional mess-up, which, unlike the first, isn’t necessarily guaranteed—if you carefully PLAN: Vague planning on a proposed vague solution to a vague problem usually results in vague data, on which vague analyses are performed—yielding vague results. Many times, I see data collection addressed as an afterthought, usually ad hoc. Collecting poorly designed data (or not even collecting data at all) virtually guarantees non-trivial human variation seeping in—an open door for introducing the toxic and very human “Constant Repetition of Anecdotal Perceptions.” (CRA...). More discussion of sample size next time. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.Two More Lurking Mess-Ups for Any Experiment, Designed or Not
The ‘human variation’ factor is always present
Balestracci’s Mess-Up No. 1
• Keeping a food diary (yes or no)
• Increasing activity (yes or no)
• Home visit (yes or no)
• Is a weight-loss goal of any kind going to be involved? For everyone?
• Imagine you have the data in hand. What are you going to do with it? Will it allow you to take the action you desire?
• Are counting tallies of any kind going to be recorded?
• Is the threshold between any “nonevent” (0) and “event” (1) clear?
• If two people were evaluating a participant, would they get the exact same number(s)?
• How will you decide whom is admitted into your study?
• How will you sample to obtain these people?
• How will you know that they understand?
• Should assessing this understanding be an ongoing part of the home visit?
• How will you get this information from the people not having home visits? Would the knowledge of a scheduled check-in phone call bias their results?
• DO the brief study.
• STUDY the results with everyone's input—both planners and participants. What mess-ups occurred during both the study’s execution and data collection? How could they be avoided? Should the data collection sheet be redesigned or simplified for easier recording?
• ACT on these conclusions and begin your next PLAN—another process/data study or begin the experiment?
• Should you PLAN some type of ongoing data-collection assessment during the study to prevent data contamination from human variation? (Probably.)
• In wanting to detect an effect of 1, its resulting ratio relative to the tar-process standard deviation of 4 was (1/4) = 0.25, which required 500 to 680 experiments. If this was your desired ratio as well, then, like them, you would need 500 to 680 people.
• Similarly, when wanting to detect an effect of 2, (2/4) = 0.5. If this was your desired ratio, then you would need 130 to 170 people.Experimental logistics aside, where do these sample size numbers come from?
1. What risk are you willing to take for declaring an effect significant when it isn’t? Usually, 5 percent (p < 0.05).
2. What size effect must I detect?
3. How badly do I want to detect it?
• What effect you can reasonably detect (2), given your specific answer to (3)—i.e., desired probability to detect it
• The probability of detecting a desired effect (3), given your specific answer to item (2)—i.e., desired effect to detectBalestracci’s Mess-Up No. 2
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Davis Balestracci
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.