© 2021 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.

“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.

Published on *Quality Digest* (https://www.qualitydigest.com)

**Published: **09/30/2011

All charts for count-based data are charts for individual values. Regardless of whether we are working with a count or a rate, we obtain one value per time period and want to plot a point every time we get a value. This is why four specialty charts for count-based data had been developed before a general approach for charting individual values was discovered. These four charts are the *p*-chart, the *np*-chart, the *c*-chart, and the *u*-chart. The question addressed in this column is when to use these and other specialty charts with your count-based data.

The first of these specialty charts, the *p*-chart, was created by Walter Shewhart in 1924. At that time the idea of using the two-point moving range to measure the dispersion of a set of individual values had not yet occurred. (W. J. Jennett would have this idea in 1942.) So the problem Shewhart faced was how to create a process behavior chart for individual values based on counts. While he could plot the data in a running record, and while he could use an average value as the central line for this running record, the obstacle was how to measure the dispersion so as to filter out the routine variation. With individual values he did not see how to use the within-subgroup variation, and he knew better than to try and use the global standard deviation statistic which would be inflated by any exceptional variation present. So he decided to use theoretical limits based on a probability model.

The classic probability models for simple count data are the Binomial and the Poisson, and Shewhart knew that both of these models have a dispersion parameter that is a function of their location parameter. This meant that the estimate of location obtained from the data could also be used to estimate the dispersion. Thus, with one location statistic he could estimate both the central line and the three-sigma distance.

**Figure 1:** Specialty charts for count-based data

This dual use of an average to characterize both location and dispersion means that *p*-charts, *np*-charts, *c*-charts, and *u*-charts all have limits that are based upon a theoretical relationship between the mean and the dispersion. Hence these specialty charts can all be said to use theoretical limits. If the counts can be reasonably modeled by either a Binomial distribution or a Poisson distribution, then one of these specialty charts will provide appropriate limits for the data. Over the years many textbooks and standards have forgotten that the assumption of a Binomial model or a Poisson model is a prerequisite for the use of these specialty charts. This is a problem because there are many types of count-based data that cannot be characterized by either a Binomial or a Poisson distribution. When such data are placed on a *p*-chart, *np*-chart, *c*-chart or *u*-chart the theoretical limits obtained will be wrong.

So what are we to do? The problem with the theoretical limits lies in the assumption that we know the exact relationship between the central line and the three-sigma distance. The solution is to obtain a separate estimate of dispersion, which is what the *XmR* chart does: While the average will characterize the location and serve as the central line for the *X* chart, the average moving range will characterize dispersion and serve as the basis for computing the three-sigma distance for the *X* chart.

Thus, the major difference between the specialty charts and the *XmR* chart is the way in which the three-sigma distance is computed. The *p*-chart, *np*-chart, *c*-chart, and *u*-chart will have the same running record, and essentially the same central lines, as the *X* chart. But when it comes to computing the three-sigma limits the specialty charts use an assumed theoretical relationship to compute theoretical values while the *XmR* chart actually measures the variation present in the data and constructs empirical limits.

To compare the specialty charts with the *XmR* chart we shall use three examples. The first of these will use the data of figure 2. These values come from an accounting department which keeps track of how many of their monthly closings of departmental accounts are finished “on time.” The counts shown are the monthly numbers of closings, out of 35 closings, that are completed on time.

**Figure 2:** The *X* chart and *np*-chart for the on-time closing data

Here both the *np*-chart and the *X* chart computations give essentially the same limits. (The upper limit value of 36.8 is not shown since it exceeds the maximum value of 35 on-time closings.) Here the two approaches are essentially identical because these counts seem to be appropriately modeled by a Binomial distribution. If you are sophisticated enough to determine when this happens, then you will know when the *np*-chart will work and can use it successfully. On the other hand, if you are not sophisticated enough to know when a Binomial model is appropriate, then you can still use an *XmR* chart. As may be seen here, when the *np*-chart would have worked, the empirical limits of the *X* chart will mimic the theoretical limits of the *np*-chart, and you will not have lost anything by using the *XmR* chart instead of the *np*-chart.

Our next example will use the on-time shipments for a plant. The data are shown in figure 3 along with both the *X* chart and the *p*-chart for these data.

**Figure 3:** The *X* chart and *p*-chart for the on-time shipments

The *X* chart shows a process with three points at or below the lower limit. The variable width *p*-chart limits are five times wider than the limits found using the moving ranges. No points fall outside these limits. This discrepancy between the two sets of limits is an indication that the data of figure 3 do not satisfy the Binomial conditions. Specifically, the probability of a shipment being on time is not the same for all of the shipments in any given month. Because the Binomial model is inappropriate the theoretical *p*-chart limits are incorrect. However, the empirical limits of the *XmR* chart, which do not depend upon the appropriateness of a particular probability model, are correct.

Our final comparison will use the data of figure 4. There we have the percentage of incoming shipments for one electronics assembly plant that were shipped using air freight. Two points fall outside the variable width *p*-chart limits while no points fall outside the *X* chart limits.

**Figure 4:** The *X *chart and *p*-chart for the premium freight data

Figure 4 is typical of what happens when the area of opportunity for a count of items gets excessively large. The Binomial model *requires* that all of the items in any given time period will have the same chance of possessing the attribute being counted. Here this requirement is not satisfied. With thousands of shipments each month, the probability of a shipment being shipped by air is not the same for all of the shipments. Thus, the Binomial model is inappropriate, and the theoretical *p*-chart limits which depend upon the Binomial model are incorrect. The *X* chart limits, which here are twice as wide as the *p*-chart limits, properly characterize both the location and dispersion of these data and are the correct limits to use.

Thus, the difficulty with using a *p*-chart, *np*-chart, *c*-chart, or *u*-chart is the difficulty of determining whether the Binomial or Poisson models are appropriate for the data. As seen in figures 3 and 4, if you overlook the prerequisites for a specialty chart you will risk making a serious mistake in practice. This is why you should avoid using the specialty charts if you do not know how to evaluate the appropriateness of these probability models.

In contrast to this use of theoretical models which may or may not be correct, the *XmR* chart provides us with empirical limits that are actually based upon the variation present in the data. This means that you can use an *XmR* chart with count based data anytime you wish. Since the *p*-chart, the *np*-chart, the *c*-chart, and the *u*-chart are all special cases of the chart for individual values, the *XmR* chart will mimic these specialty charts when they are appropriate and will differ from them when they are wrong. (In the case of specialty charts that have variable width limits, the *XmR* chart will mimic limits based on the average-sized area of opportunity. Also, in making these comparisons I prefer to have at least 24 counts in the baseline period.)

**Figure 5:** An assumption-free approach for count-based data

Thus, if you do not have advanced degrees in statistics, or if you simply have a hard time determining if your counts can be characterized by a Binomial or a Poisson distribution, you can still verify your choice of specialty chart for your count-based data by comparing the theoretical limits with the empirical limits of an *XmR* chart. If the empirical limits are approximately the same as the theoretical limits, then the probability model works. If the empirical limits do not approximate the theoretical limits, then the probability model is wrong.

Of course, you can guarantee that you have the right limits for your count-based data by simply using the *XmR* chart to begin with. The empirical approach will always be right.

**Links:**

[1] /ad/redirect/18605