## What About *p*-Charts?

### When should we use the specialty charts for count data?

Published: Friday, September 30, 2011 - 12:26

All charts for count-based data are charts for individual values. Regardless of whether we are working with a count or a rate, we obtain one value per time period and want to plot a point every time we get a value. This is why four specialty charts for count-based data had been developed before a general approach for charting individual values was discovered. These four charts are the *p*-chart, the *np*-chart, the *c*-chart, and the *u*-chart. The question addressed in this column is when to use these and other specialty charts with your count-based data.

The first of these specialty charts, the *p*-chart, was created by Walter Shewhart in 1924. At that time the idea of using the two-point moving range to measure the dispersion of a set of individual values had not yet occurred. (W. J. Jennett would have this idea in 1942.) So the problem Shewhart faced was how to create a process behavior chart for individual values based on counts. While he could plot the data in a running record, and while he could use an average value as the central line for this running record, the obstacle was how to measure the dispersion so as to filter out the routine variation. With individual values he did not see how to use the within-subgroup variation, and he knew better than to try and use the global standard deviation statistic which would be inflated by any exceptional variation present. So he decided to use theoretical limits based on a probability model.

The classic probability models for simple count data are the Binomial and the Poisson, and Shewhart knew that both of these models have a dispersion parameter that is a function of their location parameter. This meant that the estimate of location obtained from the data could also be used to estimate the dispersion. Thus, with one location statistic he could estimate both the central line and the three-sigma distance.

**Figure 1:** Specialty charts for count-based data

This dual use of an average to characterize both location and dispersion means that *p*-charts, *np*-charts, *c*-charts, and *u*-charts all have limits that are based upon a theoretical relationship between the mean and the dispersion. Hence these specialty charts can all be said to use theoretical limits. If the counts can be reasonably modeled by either a Binomial distribution or a Poisson distribution, then one of these specialty charts will provide appropriate limits for the data. Over the years many textbooks and standards have forgotten that the assumption of a Binomial model or a Poisson model is a prerequisite for the use of these specialty charts. This is a problem because there are many types of count-based data that cannot be characterized by either a Binomial or a Poisson distribution. When such data are placed on a *p*-chart, *np*-chart, *c*-chart or *u*-chart the theoretical limits obtained will be wrong.

So what are we to do? The problem with the theoretical limits lies in the assumption that we know the exact relationship between the central line and the three-sigma distance. The solution is to obtain a separate estimate of dispersion, which is what the *XmR* chart does: While the average will characterize the location and serve as the central line for the *X* chart, the average moving range will characterize dispersion and serve as the basis for computing the three-sigma distance for the *X* chart.

Thus, the major difference between the specialty charts and the *XmR* chart is the way in which the three-sigma distance is computed. The *p*-chart, *np*-chart, *c*-chart, and *u*-chart will have the same running record, and essentially the same central lines, as the *X* chart. But when it comes to computing the three-sigma limits the specialty charts use an assumed theoretical relationship to compute theoretical values while the *XmR* chart actually measures the variation present in the data and constructs empirical limits.

To compare the specialty charts with the *XmR* chart we shall use three examples. The first of these will use the data of figure 2. These values come from an accounting department which keeps track of how many of their monthly closings of departmental accounts are finished “on time.” The counts shown are the monthly numbers of closings, out of 35 closings, that are completed on time.

**Figure 2:** The *X* chart and *np*-chart for the on-time closing data

Here both the *np*-chart and the *X* chart computations give essentially the same limits. (The upper limit value of 36.8 is not shown since it exceeds the maximum value of 35 on-time closings.) Here the two approaches are essentially identical because these counts seem to be appropriately modeled by a Binomial distribution. If you are sophisticated enough to determine when this happens, then you will know when the *np*-chart will work and can use it successfully. On the other hand, if you are not sophisticated enough to know when a Binomial model is appropriate, then you can still use an *XmR* chart. As may be seen here, when the *np*-chart would have worked, the empirical limits of the *X* chart will mimic the theoretical limits of the *np*-chart, and you will not have lost anything by using the *XmR* chart instead of the *np*-chart.

Our next example will use the on-time shipments for a plant. The data are shown in figure 3 along with both the *X* chart and the *p*-chart for these data.

**Figure 3:** The *X* chart and *p*-chart for the on-time shipments

The *X* chart shows a process with three points at or below the lower limit. The variable width *p*-chart limits are five times wider than the limits found using the moving ranges. No points fall outside these limits. This discrepancy between the two sets of limits is an indication that the data of figure 3 do not satisfy the Binomial conditions. Specifically, the probability of a shipment being on time is not the same for all of the shipments in any given month. Because the Binomial model is inappropriate the theoretical *p*-chart limits are incorrect. However, the empirical limits of the *XmR* chart, which do not depend upon the appropriateness of a particular probability model, are correct.

Our final comparison will use the data of figure 4. There we have the percentage of incoming shipments for one electronics assembly plant that were shipped using air freight. Two points fall outside the variable width *p*-chart limits while no points fall outside the *X* chart limits.

**Figure 4:** The *X *chart and *p*-chart for the premium freight data

Figure 4 is typical of what happens when the area of opportunity for a count of items gets excessively large. The Binomial model *requires* that all of the items in any given time period will have the same chance of possessing the attribute being counted. Here this requirement is not satisfied. With thousands of shipments each month, the probability of a shipment being shipped by air is not the same for all of the shipments. Thus, the Binomial model is inappropriate, and the theoretical *p*-chart limits which depend upon the Binomial model are incorrect. The *X* chart limits, which here are twice as wide as the *p*-chart limits, properly characterize both the location and dispersion of these data and are the correct limits to use.

Thus, the difficulty with using a *p*-chart, *np*-chart, *c*-chart, or *u*-chart is the difficulty of determining whether the Binomial or Poisson models are appropriate for the data. As seen in figures 3 and 4, if you overlook the prerequisites for a specialty chart you will risk making a serious mistake in practice. This is why you should avoid using the specialty charts if you do not know how to evaluate the appropriateness of these probability models.

In contrast to this use of theoretical models which may or may not be correct, the *XmR* chart provides us with empirical limits that are actually based upon the variation present in the data. This means that you can use an *XmR* chart with count based data anytime you wish. Since the *p*-chart, the *np*-chart, the *c*-chart, and the *u*-chart are all special cases of the chart for individual values, the *XmR* chart will mimic these specialty charts when they are appropriate and will differ from them when they are wrong. (In the case of specialty charts that have variable width limits, the *XmR* chart will mimic limits based on the average-sized area of opportunity. Also, in making these comparisons I prefer to have at least 24 counts in the baseline period.)

**Figure 5:** An assumption-free approach for count-based data

Thus, if you do not have advanced degrees in statistics, or if you simply have a hard time determining if your counts can be characterized by a Binomial or a Poisson distribution, you can still verify your choice of specialty chart for your count-based data by comparing the theoretical limits with the empirical limits of an *XmR* chart. If the empirical limits are approximately the same as the theoretical limits, then the probability model works. If the empirical limits do not approximate the theoretical limits, then the probability model is wrong.

Of course, you can guarantee that you have the right limits for your count-based data by simply using the *XmR* chart to begin with. The empirical approach will always be right.

## Comments

## Order of Production

But the purpose of plotting data sequentially is to detect changes OVER TIME. You can't just arbitrary change the running order at whim to made a statistical point. A different time series would be a differnt process.

Rich DeRoeck

## Variability of the Opprtunity area

Dear Donald, thank you for this masterpiece. I have one questio about the area of opportunity in general.

If the area of opportunity varies a lot, how do I have to take into account this variability? For example, if in your example n.4 one month total shipments would be... 343, one order of magintude less than the others. Would the percentage of air freigfht shipments be significant as the others? Could I safely place it on the XmR chart together with other percentages? In my experience people tend to discard percentages which arise from areas of opportunity which are "very different" from the others.

Thank you for any advice!

Giuseppe

## XmR Control Limits Highly Variable

What I never see mentioned in the advocacy of XmR charts is that the control limits can be highly sensitive to the chronological order of the data. In Dr. Wheeler's Figure 4 Premum Freight Data example, the order of the data can make the XmR chart control limits as wide as 1.5 & 9.2, or as narrow as 4.2 & 6.5 (roughly the same as the p-chart limits in this case). To claim that the limits obtained for the particular order presented to be the "correct" limits ignores this variability. What if the various percentages had occurred in a different order? Similarly, in the Figure 3 example of On-time shipments, the data suffered particularly from autocorrelation, making the XmR chart limits much narrower than if the autocorrelation had been absent. If the order of the data had been different (reflecting less autocorrelation), the three points below the lower limits would then have been above the lower limits.

## Reply to Steve

The story you relate did not come from Dr. Deming, it came from David S. Chambers and is found in Exercise 10.4 of our popular book, Understanding Statistical Process Control. Moreover, it was David that hit me over the head with the fact that all charts for count-based data are charts for individual values. This one instance does not justify the thousands of mistaken np charts and p charts that are created in error. As I said in the article, if you are sophisticated enough to know when the data ought to be binomial or Poisson, then you may use the specialty charts. But for everyone else using count-based data, the only safe approach is the XmR Chart.

(By the way, the inspector also kept the percentage above 8 percent in order to keep from loosing his job.)

## Thanks

Thanks for the reply. Since I use the story in training, I can now attribute it to the correct source, and interesting about the 8% minimum. Non-statistical folks are not good at falsifying data . . .

I know Dr. Deming drew in stories and ideas from others, thought usually he was good at providing attribution to the source.

I do agree that there are a multitude of folks not applying the specialized charts properly.

## Deming and defective shoes

There is an interesting story in Quality Productivity and Competitive Position, page 208 - 209. Here is a percent defective for a process, and the X-mR chart would say it was in control at very tight limits. However, Dr Deming did look at the p-chart and noticed the pchart ucl and lcl were 4% and 15%, and the data ranged from 8% to 10%. This discrepancy led him to ask questions that determined the data were being falsified. It was thought that if the percent defective ever exceeded 10%, management would shut down the plant. Therefore, the head inspector made sure that the reported percent defective never exceeded 10%.

The vast majority of the data I work with are ESH&QA counts / counts per / and percentages. At least in that realm, I find the p, c, and u charts to be good "safety checks" for falsification or other manipulation of reporting.

Of course, I do agree that when in doubt, go xmR.

Thanks,