Featured Video
This Week in Quality Digest Live
Six Sigma Features
Anthony D. Burns
Why has it taken so long to understand that processes need analytic methods, not enumerative ones?
Cheryl Pammer
Using intervals to get at the tail ends of the problem
Rip Stauffer
It helps to build a table
Mike Richman
A conversation with Neil Polhemus
Matthew E. May
One place where root cause analysis has no real place is in strategy formulation

More Features

Six Sigma News
SQCpack and GAGEpack offer a comprehensive approach to improving product quality and consistency
Customized visual dashboards by Visual Workplace help measure performance
Helps manufacturers by focusing on problems and problem resolution in real time
Ask questions, exchange ideas and best practices, share product tips, discuss challenges in quality improvement initiatives
Says capitalization gives false impression that Six Sigma is more significant than other methodologies
His influence on the methodology can’t be denied
Nov. 30, 2016, in Copenhagen
A story about how organizations rise and fall—and can rise again

More News

Rip Stauffer

Six Sigma

The Importance of Understanding Conditional Probability

It helps to build a table

Published: Wednesday, June 6, 2018 - 12:03

A lot of people in my classes struggle with conditional probability. Don’t feel alone, though. A lot of people get this (and simple probability, for that matter) wrong. If you read Innumeracy by John Allen Paulos (Hill and Wang, 1989), or The Power of Logical Thinking by Marilyn vos Savant (St. Martin’s Griffin, 1997), you’ll see examples of how a misunderstanding or misuse of this has put innocent people in prison and ruined many careers. It’s one of the reasons I’m passionate about statistics, but it’s hard for me, too, because it’s not easy to work out in your head. I always have to build a table.

The best thing to do is to be completely process-driven; identify what’s given, then follow the process and the formulas religiously. After a while, you can start to see it intuitively, but it does take a while.

In my MBA stats class, one of the ones that always stumped the students was a conditional problem:

“Pregnancy tests, like almost all health tests, do not yield results that are 100-percent accurate. In clinical trials of a blood test for pregnancy, the results shown in the accompanying table were obtained for the Abbot blood test (based on data from ‘Specificity and Detection Limit of Ten Pregnancy Tests’ by Tiitinen and Stenman, Scandinavian Journal of Clinical Laboratory Investigation, 53, Supplement 216). Other tests are more reliable than the test with results given [in figure 1].

Positive
Result

Negative
Result

Subject is pregnant

80

5

Subject is not pregnant

 3

11

Figure 1

“1. Based on the results in the table, what is the probability of a woman being pregnant if the test indicates a negative result?”

“2. Based on the results in the table, what is the probability of a false positive; that is, what is the probability of getting a positive result if the woman is not actually pregnant?”

Everyone would just try to look at it as though there were no conditions... they would say, 5/80 for question 1, and 3/80 for question 2.

The first question, though, is asking, “What is the chance of being pregnant, given a negative result?” There were 16 negative results, and of those, five were pregnant. So the answer is 5/16, or 31.25 percent. For the second question, it’s, “What is the probability of a positive, given that the woman is not pregnant?” In this case, there are 14 nonpregnant women, and three of those got a positive result. So that’s about 21.42 percent.

These numbers, and this idea, are really important. Some statisticians make their living explaining these concepts to juries. People get fired or arrested because of false positives on urinalysis and other tests, because there is a general impression that they are far more reliable than they actually are.

It’s all about what you are given, and how you define things. Let’s look at a different example. In the military, people are given random drug screenings. The test is “certified 99-percent accurate.” I was always told that this means that if you do drugs, and you’re tested, it will catch you 99 percent of the time.

We think, “logically,” that this means there is only a 1-percent false negative rate... that the fact that someone who does drugs doesn’t get caught 1 percent of the time indicates that 1-percent false positive rate. Worse, we assume that if the “false negative rate” is only 1 percent, the false positive rate must also be 1 percent…it’s just common sense, right?

But “common sense” isn’t... it’s neither common nor truly sensical. Look at it this way... suppose we test 100,000 service members. Suppose further that 0.1 percent or one in a thousand service members actually do drugs. We might get this table shown in figure 2.

 

Do Drugs

Don’t Do Drugs

Test Positive

99

999

Test Negative

1

98,901

Figure 2

Tables like this are informative, but they don’t tell the whole story. You can see from this that the company is technically correct... at least in this case, of 100 people who did drugs, 99 were caught and one was not. But a false positive rate and a false negative rate are made up of more. To get to the whole story, it’s also good to do the marginals, or row and column totals as shown in figure 3.

 

Do Drugs

Don’t Do Drugs

 

Test Positive

99

999

1,098

Test Negative

1

98,901

98,902

Totals

100

99,900

 

Figure 3

Numbers like these, the numbers of people tested, are very important. This helps us figure out our givens. The false negative rate is not the number of people who did drugs and tested negative. It’s the number out of all the people who tested negative who actually did drugs. In this case, the false negative rate is much better than advertised... it’s 1/98,902, or 0.00001, about one in 10,000 who do drugs and get tested get away with it.

The consequences, though, are on the false positive side... this is where people get turned away for employment or get fired. In the case of the military, a lot of people end up in a lot of trouble with the random urinalysis program. While we want to be cautious, and we don’t want a lot of druggies flying or controlling aircraft or tanks or other deadly weapons, we should also be concerned that we might be ruining careers unnecessarily. If we look at the table, the “common sense” interpretation of the false positive rate would be 999/100000, or 0.999 percent, very close to the 1 percent that we assumed initially. But, as astounding as it may seem, considering the number of people that are convicted each year because of this assumption, this is entirely incorrect!

The actual false positive rate consists of the number of people incorrectly identified as drug users, or the number of nondrug users out of the total number of positives. In this case, that’s 999 out of 1,098, or 90.98 percent! In other words, your chance of actually being a drug user, given a positive result on this “99-percent accurate” test, is only 9.02 percent!

Yes, it’s tricky. No, it’s not intuitive. But it’s important. It touches lives. Juries, lab technicians, doctors and nurses, lawyers, employers, employees, and patients who don’t understand this put either themselves or others in peril every day.

Discuss

About The Author

Rip Stauffer’s picture

Rip Stauffer

Rip Stauffer uses his extensive experience in total quality and Six Sigma to educate and counsel at all career levels with specific experience in government, manufacturing, medical devices, financial services, and healthcare organizations. Stauffer is Senior Consultant at MSI and CEO of Woodside Quality LLC. Stauffer is an ASQ senior member, an ASQ Statistics Division member, a certified quality engineer, a manager of quality and organizational excellence, and a Six Sigma Black Belt and Master Black Belt. He is an adjunct faculty member at Walden University, teaching graduate and undergraduate business statistics courses and international business courses. 

Comments

"This is one of the best

"This is one of the best short articles I hvae ever read on this topic.  Good examples and clear writing.  I will use it in my classes.  Thanks.  RCL

PS  Note that probability is misspelled in the title.  It is printed as 'probablity.'

Thanks

Welll... we did say that probablity... probabababilty... probability was a problem