It’s better to measure things when we can; that’s been well-established in the quality literature over the years. The use of go/no-go gauges will always provide much less information for improvement than measuring the pieces themselves. However, we don’t always have the luxury of using continuous or variables data. Sometimes, the only way to track the important events we want to track is to count them. Numbers of defectives, exceptions, reschedules, readmissions, rework rates, scrap rates… all these processes are vital to our operations, and all have to be counted. The performance of numerous transactional and other business processes can only be assessed using counts.
The counts generally fall into two types: Counts of items or counts of events. In counts of items, you are counting objects, and some of them possess some attribute of interest. Late, defective, empty, full—all these might be attributes we care about. If we had 20 deliveries today and four of them were late, then we could say we had four late deliveries and 16 on-time deliveries. We can count the occurrence of the attribute, and also its non-occurrence. Because the measurement units are the same for the numerator and the denominator (area of opportunity), we can express the count as a proportion. It's important to note, though, that to be an area of opportunity, the counts in the denominator must be related to the counts in the numerator. This sometimes happens with scrap numbers, when scrap from an earlier time period is included in the numerator, and production from the current time period is included in the denominator.
Counts of events happen when the area of opportunity is some finite region of space or time, instead of another count. Accidents per day, scratches per square foot of windshield (or per windshield), IT system crashes per month; all these are examples of counts of events. You can count the occurrences, but you can’t count the non-occurrences. Counts of events can be expressed as ratios, too, but because the numerator and denominator are in different units, and the result can easily be greater than one, it’s not a proportion. Counts of events are usually characterized for analysis using the Poisson distribution; counts of items (those that can be expressed as a proportion) are usually assumed to be modeled by the Binomial distribution.
One of the primary tools used to deal with binomial data—in a process improvement scenario—is the attribute chart, usually a p- or np-chart. It can be a very useful chart, when the data meet the assumptions for the binomial. These assumptions were enumerated by Donald Wheeler1
While p- and np- charts can be very useful, and I highly recommend them when the conditions are correct, they aren't always the best charts to use, and should be used with some caution. There are a few inherent problems that seem to crop up a lot. This article will illustrate a couple of the foibles observed over many years of wrangling with these interesting charts.
Let’s open with a brief thought experiment: Consider the four data sets in figure 1.
The np-charts for these data are illustrated in figure 2 below. Data set A, tracked with an np chart, would look like many other control charts. The data from column B admittedly plot rather strangely. The data in column C sweep a narrow band around the centerline, and the column D data are out of control.
Note sets C and D. One contains only 11 and 9, the other contains 19 and 1. It's readily apparent that column C’s data vary significantly less from each other than do column D’s. If you were tracking these data, would you consider using a chart that found these sets’ standard deviations to be identical? In other words, how much value does a chart have if it cannot detect the dramatic difference in variation between these two situations?
Welcome to the wild, wacky, wonderful world of attributes charts!
In each of these situations, the np-chart (often used when the areas of opportunity or denominators for the proportions are equal) yields the same limits: 1.51 for the lower control limit (LCL) and 18.49 for the upper control limit (UCL).
This is due to the math for the dispersion statistic used with binomial data:
where p is the average proportion for the set of data from which the limits are derived, and n is the number of data in the denominator for each proportion. The only basis used to estimate expected dispersion is binomial theory, and the average proportion… no actual measure of between-point variation is used.
In another example, a favorite data set for introducing the other common binomial data chart, the p chart, comes from Making Sense of Data (SPC Press, 2003) by Don Wheeler. This set tracks the proportion of premium freight shipped on each of 24 days (Monday-Saturday for four weeks). Because the number of shipments (the denominator) changes each day, a new set of limits is calculated each day2. Figure 3 is a p-chart of the premium freight data.
In this chart we see that the proportion premium freight displays evidence of stability for this time period, and it’s averaging about 24.6 percent. There are no out-of-control points. The interesting pattern in the control limits themselves is instructive; the math for the limits of a p-chart is:
The denominator for the limits for each of the proportions is the denominator for that proportion. This means that, as the area of opportunity grows, the limits will shrink. So the chart also demonstrates that the largest number of shipments appears to be on Saturdays, and the smallest number on Mondays. The p-chart, then, offers some insight that you might not get from, say, an individuals chart for the proportions, and when the assumptions for the binomial theorem are met, the chart can be very useful.
When those binomial assumptions are not met, the theoretical distribution will not be in place, and the chart becomes much less useful…it may show nothing but false signals. Charts such as the one in figure 4 are not uncommon.
The chart in figure 4 was presented by a Black Belt candidate as the tentative baseline for the data measuring the process of interest in her project. It appears wildly out of control. The first question to be asked when presented with a chart like this one is “How large were the areas of opportunity?” In this case, the answer was “they were all greater than 100,000.” Remember that in the assumptions for the binomial distribution, each sample had to be independent and the probability had to be the same for each. When considering large numbers of items, the likelihood of over-aggregation is large, and the chance for within-sample homogeneity is small. Within-sample clustering is very likely.
We can actually simulate that situation with the Premium Freight data.
Figure 5 contains the premium freight data. The original data are in the first two columns; you can see that the area of opportunity (total orders) ranges from about 100 to about 500 each week, and the number of items shipped premium freight varies from twenty or so to roughly 100. It’s not difficult to imagine that, with growth—and all other conditions remaining the same—the same company, a couple of years later, might be shipping 10 times what they had been. It’s also not hard to think that the overall proportion of orders sent premium freight might be about the same. To simulate that, we multiply the original numbers by 10 (third and fourth columns). Whether you divide the numbers in the second column by those in the first, or the numbers in the fourth column by those in the third, the result is the same, and recorded in the fifth column.
While the original data plotted into a nice, stable p-chart, the transformed data come out in the much less-stable-appearing figure 6.
Jacking the size of the areas of opportunity up deflates the limits: note the UCL and LCL for the last day (0.2256 - 0.2661 for this chart, versus 0.1818 – 0.3099 for the original data). The theoretical limits no longer contain the variability in the process. Unless the point-to-point variation remains very tight or the areas of opportunity remain small, the chart will not provide a reliable way to sort the signals from the routine variation.
If you want to continue using the attribute chart, you could take a random sample of orders each day and plot the proportion premium freight on a p- or np-chart. That might keep the limits from being inflated. When using attribute data, however, your information is already sparse; cutting the amount of information per sample may not be desirable. In addition, there’s still the problem of the binomial assumptions. If your areas of opportunity are very large, the chances for clustering grow, which may render the chart useless as well.
There is another approach that works well, though: track the data on an Individual values and Moving Range (ImR or XmR) chart for the proportions. Remember, the p- and np-charts are also charts for individual values. The ImR chart uses the average empirical difference from point to point as its measure of dispersion. As long as the counts are high enough, the binomial distribution is reasonably unimodal and symmetrical, and bears enough resemblance to a normal distribution that the ImR chart works pretty well most of the time. You miss some of the insight that you might get from the varying control limits, but you gain insight into the consistency of the point-to-point variation. See figure 7 for an individuals chart of the proportion premium freight.
If your initial p-chart had looked like figure 6, you might have concluded that your process was hopelessly out of control. In figure 7, however, we see that the average proportion premium freight is 0.2448, and that the day-to-day variation averages about 3.27 percent. This results in natural process limits for the day-to-day operation of roughly15.5 percent and 33.2 percent. This would provide a reasonable baseline for tracking.
This technique is not without its detractors, most of whom state that the XmR (or ImR) chart isn't as robust to departures from normality as, for instance, XbarR charts. Walter Shewhart pointed out that normality was neither a necessary nor sufficient condition for the existence of a state of statistical control3, and Davis Balestracci demonstrated pretty well the lack of a solid connection between normality and control4. This idea will no doubt continue to be debated, sometimes causing the inevitable fistfight between the theoretician and the practitioner at a particularly wild statistician party.
If you insist on normality for your ImR charts, consider this: The proportions of premium freight, when checked for normality using the Anderson-Darling test, the Ryan-Joiner test, and the Kolmogorov-Smirnov test yield p-values of 0.829, >0.10 and >0.15, respectively; so even though we know the data aren't continuous, they don’t differ significantly in functional form from normality. It would certainly make sense in this case to characterize behavior in the XmR chart. More generally, in practice, the XmR chart has proven to be very useful, and more robust to (at least theoretical) departures from normality than the p/np-charts are to departures from binomial assumptions.
For attribute data, the p- and np-charts are good charts to use when appropriate. Often, though, you find that the data don’t fit the binomial assumptions well enough to make the charts useful. When confronted with that case, consider using what Ed Halteman and Dan Greer called “the Swiss Army Knife of control charts,” the XmR chart. It may well be a better chart for your data.
1. Wheeler, D.J. (1995). Advanced Topics in Statistical Process Control. Knoxville, TN: SPC Press, Inc.
2. Wheeler, D.J. (2003). Making Sense of Data. Knoxville, TN: SPC Press, Inc.
3. Shewhart, W. A. (1980). Economic control of quality of manufactured product. Milwaukee, WI, ASQ Quality Press (Original work published in 1931).
4. Balestracci, D. (1998). Data “sanity”: statistical thinking applied to everyday data. Special Publication ASQ Statistics Division Summer 1998. Cedarburg, WI: Quality Information Center