The Importance of Understanding Conditional Probability
It helps to build a table
Published: Wednesday, June 6, 2018  12:03
A lot of people in my classes struggle with conditional probability. Don’t feel alone, though. A lot of people get this (and simple probability, for that matter) wrong. If you read Innumeracy by John Allen Paulos (Hill and Wang, 1989), or The Power of Logical Thinking by Marilyn vos Savant (St. Martin’s Griffin, 1997), you’ll see examples of how a misunderstanding or misuse of this has put innocent people in prison and ruined many careers. It’s one of the reasons I’m passionate about statistics, but it’s hard for me, too, because it’s not easy to work out in your head. I always have to build a table.
The best thing to do is to be completely processdriven; identify what’s given, then follow the process and the formulas religiously. After a while, you can start to see it intuitively, but it does take a while.
In my MBA stats class, one of the ones that always stumped the students was a conditional problem:
“Pregnancy tests, like almost all health tests, do not yield results that are 100percent accurate. In clinical trials of a blood test for pregnancy, the results shown in the accompanying table were obtained for the Abbot blood test (based on data from ‘Specificity and Detection Limit of Ten Pregnancy Tests’ by Tiitinen and Stenman, Scandinavian Journal of Clinical Laboratory Investigation, 53, Supplement 216). Other tests are more reliable than the test with results given [in figure 1].


Figure 1 
“1. Based on the results in the table, what is the probability of a woman being pregnant if the test indicates a negative result?”
“2. Based on the results in the table, what is the probability of a false positive; that is, what is the probability of getting a positive result if the woman is not actually pregnant?”
Everyone would just try to look at it as though there were no conditions... they would say, 5/80 for question 1, and 3/80 for question 2.
The first question, though, is asking, “What is the chance of being pregnant, given a negative result?” There were 16 negative results, and of those, five were pregnant. So the answer is 5/16, or 31.25 percent. For the second question, it’s, “What is the probability of a positive, given that the woman is not pregnant?” In this case, there are 14 nonpregnant women, and three of those got a positive result. So that’s about 21.42 percent.
These numbers, and this idea, are really important. Some statisticians make their living explaining these concepts to juries. People get fired or arrested because of false positives on urinalysis and other tests, because there is a general impression that they are far more reliable than they actually are.
It’s all about what you are given, and how you define things. Let’s look at a different example. In the military, people are given random drug screenings. The test is “certified 99percent accurate.” I was always told that this means that if you do drugs, and you’re tested, it will catch you 99 percent of the time.
We think, “logically,” that this means there is only a 1percent false negative rate... that the fact that someone who does drugs doesn’t get caught 1 percent of the time indicates that 1percent false positive rate. Worse, we assume that if the “false negative rate” is only 1 percent, the false positive rate must also be 1 percent…it’s just common sense, right?
But “common sense” isn’t... it’s neither common nor truly sensical. Look at it this way... suppose we test 100,000 service members. Suppose further that 0.1 percent or one in a thousand service members actually do drugs. We might get this table shown in figure 2.


Figure 2 
Tables like this are informative, but they don’t tell the whole story. You can see from this that the company is technically correct... at least in this case, of 100 people who did drugs, 99 were caught and one was not. But a false positive rate and a false negative rate are made up of more. To get to the whole story, it’s also good to do the marginals, or row and column totals as shown in figure 3.


Figure 3 
Numbers like these, the numbers of people tested, are very important. This helps us figure out our givens. The false negative rate is not the number of people who did drugs and tested negative. It’s the number out of all the people who tested negative who actually did drugs. In this case, the false negative rate is much better than advertised... it’s 1/98,902, or 0.00001, about one in 10,000 who do drugs and get tested get away with it.
The consequences, though, are on the false positive side... this is where people get turned away for employment or get fired. In the case of the military, a lot of people end up in a lot of trouble with the random urinalysis program. While we want to be cautious, and we don’t want a lot of druggies flying or controlling aircraft or tanks or other deadly weapons, we should also be concerned that we might be ruining careers unnecessarily. If we look at the table, the “common sense” interpretation of the false positive rate would be 999/100000, or 0.999 percent, very close to the 1 percent that we assumed initially. But, as astounding as it may seem, considering the number of people that are convicted each year because of this assumption, this is entirely incorrect!
The actual false positive rate consists of the number of people incorrectly identified as drug users, or the number of nondrug users out of the total number of positives. In this case, that’s 999 out of 1,098, or 90.98 percent! In other words, your chance of actually being a drug user, given a positive result on this “99percent accurate” test, is only 9.02 percent!
Yes, it’s tricky. No, it’s not intuitive. But it’s important. It touches lives. Juries, lab technicians, doctors and nurses, lawyers, employers, employees, and patients who don’t understand this put either themselves or others in peril every day.
Comments
"This is one of the best
"This is one of the best short articles I hvae ever read on this topic. Good examples and clear writing. I will use it in my classes. Thanks. RCL
PS Note that probability is misspelled in the title. It is printed as 'probablity.'
Thanks