Featured Product
This Week in Quality Digest Live
Operations Features
Libby Sander
Seven tips to boost well-being and productivity
Matt Fieldman
It’s vital that we address social determinants of work
Lisa Wong Macabasco
Challenging stereotypes without sacrificing likability
Lite Nartey
Firms can enhance cooperation and reduce conflict by understanding the different dimensions of stakeholder dialogue
Bruce Hamilton
Will lean thinking inform the designers of AI?

More Features

Operations News
Entire surfaces of wafers up to 4 in. accessible for printing
Gartner survey reveals how organizations are developing their use of AI
Witness digital capabilities and live on-the-spot machining
Solutions range from simple to sophisticated
System will be used to test Nexteer Automotive’s new line of EV propulsion systems
For light-duty industrial weighing applications
Precision manufacturers can monitor Universal Robots in real time and over time
Demonstrating a commitment to keeping people safe and organizations running

More News

Scott A. Hindle


What Is the Penalty in Being Wrong?

More on the use of Cp and Cpk

Published: Wednesday, October 19, 2016 - 16:10

In all walks of life, being wrong can come with a penalty. It’s also true that, if you’re lucky, you sometimes get away with it without anybody being the wiser. To understand what this means in relation to the capability indexes Cp and Cpk, read on.


In part 3 of “Process Capability: What It Is and How It Helps,” I wrote regarding the interpretation of the two most commonly used capability indexes:
• Predictable processes: Cp and Cpk can be considered reliable indicators of future performance.
• Unpredictable processes: Cp and Cpk may be false, or very misleading, indicators of what the process will give in the future.

Processes characterized as predictable (or “in control”)—using the control chart as an operational definition of a predictable process—provide trustworthy Cp and Cpk statistics given that the process remains effectively unchanged in future operation. The word process encompasses incoming materials and their quality, actual operation of the manufacturing line, performance of equipment in the line, and the introduction of variation due to measurement (i.e., measurement error).

Two examples are given below. In both cases it is judged that the data were well collected to represent the routine, common cause, variation in the process. And the commonly used minimum capability standard of 1.33 is in place.

Example one

A process characteristic under study for capability is considered to have the potential to change in a way worth knowing about every 15 minutes or so in routine operation. Data collected at this frequency and as “one value per time period” are appropriate for an XmR chart, which is used for individual values with a moving range.

The process was operated twice with 45 data obtained. With the data arranged in time order, the average moving range method was used to estimate the process’s standard deviation. The following inputs were known or calculated:

Requirement of the process (known):
• Upper specification limit (USL) = 56 and lower specification limit (LSL) = 44
• Process target value: 50, the midpoint of the specifications (The target value is important but is not required to compute Cp and Cpk.)

Statistics from the 45 data (calculated):
• Average: 50.39
• Standard deviation, SDwithin: 1.44 (average moving range of 1.62 ÷ d2 of 1.128)

Cp and Cpk were calculated as shown below. These values would likely be interpreted as evidence of success in a study of capability if two production runs were considered sufficient.

A one-sample t-test of p-value of 0.775 finds no evidence of this process operating off-target. (This procedure compares the average of 50.39 with the target of 50 and uses the variation in the data.) Hence, Cp and Cpk can be interpreted as two estimates of the same quantity, giving a best estimate of capability of around 1.3 to 1.4. The conclusion:
• Capable process
• Centered and on-target process

Under the commonly used assumption of normality, statistical software can be used to create a representation of the above as shown in figure 1, for which Minitab software was used. To generate a normal distribution—the red curve—we need only estimate its mean and standard deviation parameters, which are 50.39 and 1.44, respectively.

Figure 1: This is the customized capability output in Minitab software using example one’s specifications and a normal distribution based on the average and SDwithin statistics. (PPM stands for parts per million.) Click here for larger image.

Figure 1 provides a good picture to show that, if the third operation of this process were comparable to the first two production runs, only on-target, conforming product would be the expected result for this characteristic under study. More on example one later.

Example two

As might happen, let us assume that you’ve got some process data, which due to a few urgent and unexpected jobs, have sat unopened in your inbox for a couple of weeks. These data, 204 observations in total, are arranged into 51 subgroups of size four and provide the following statistics:
• Average: 4498.18
• Standard deviation, SDwithin: 319.88 (average range of 658.63 ÷ d2 of 2.059)

Upper and lower specification limits in example two are USL = 5,500 and LSL = 3,500, giving the capability statistics below.

Arguing that all is in order wouldn’t be too easy if the expectation is Cp and Cpk ≥ 1.33. The Cp and Cpk statistics cause you concern, and the delay in having done this job only compounds this concern. As with figure 1, the output of figure 2 could be generated under the same assumption of normality.

The normal curve shown by the software extends beyond the specifications, and the fraction nonconforming is estimated as 0.18 percent. Given the tardiness in your response, should this be communicated as a process in need of more attention and improvement, or is it better to keep a low profile?

Figure 2: Customized capability output for example two as per figure 1. Click here for larger image.

Back to example one

Comparing the prediction of figure 1 with the actual observed fraction nonconforming, we have:
• Predicted: 0.005 percent (from the probability model)
• Observed: 100 percent (all 45 values are out of specification)

This is a worst-case example of a penalty for being wrong. Everything is thought to be good, where in fact, it couldn’t be worse. Just looking at the individual data values would have uncovered this problem of woeful process performance. If, however, your Cp and Cpk values are directly uploaded onto a dashboard, you might be led terribly astray.

The XmR chart for these data is shown in figure 3. Table 1 at the end of this article shows the 45 individual data values and the 44 moving range values.

Figure 3: XmR chart for example one’s data. Click here for larger image.

Every individual data value falls the wrong side of the natural process limits in figure 3’s X chart. Conversely, just one moving range value falls the wrong side of the upper range limit: of 44 moving range values—for 45 total data—43 of these are small, and one is massive in comparison due to the upward shift between production runs of around 18 units. This massive moving range value does inflate the average moving range, but not sufficiently for the average moving range statistic that is behind SDwithin to reveal a problem when calculating Cp and Cpk.

The good Cp and Cpk values were therefore totally oblivious to the fact that:
• Production run one gave nothing but nonconforming output below the lower specification limit
• Production run two gave nothing but nonconforming output above the upper specification limit

Hence, if you have a habit of jumping straight to statistics, you might not know until it’s too late that your good capability statistics are associated with the worst possible penalty.

Back to example two

Figure 4 shows an average and range chart because the 204 observations were arranged into 51 subgroups of size four. Only detection rule one, a point beyond the 3-sigma limits, is applied in the search for assignable cause variation.

Figure 4: Average and range chart for example two’s data. Click here for larger image.

The unpredictability in this process is evident at first glance: 10 of the 51 subgroup averages fall the wrong side of the limits for averages. If you’d kept quiet based on the estimated fraction nonconforming of 0.18 percent from the assumption of normality (see figure 2), you might now be regretting that. With 8 of 204 observations out of specification, the observed fraction nonconforming equals 3.92 percent, about 22 times higher than the expectation based on the probability model. How would this outcome be received by those not yet convinced of the value of statistics in industry?

However, keeping a low profile in this case would have almost certainly gone without penalty because these data were collected before improvement to the process. A second average and range chart is shown in figure 5, which shows the impact of the improvements on the quality of process output.

Figure 5: Average and range chart for example two’s data showing both before and after improvement. Click here for larger image.

Figure 5 clearly shows the after-improvement data to be representative of a different, better, and more consistent process. Knowing this, how could the before-improvement data be suitable to assess the current, or actual, capability of this process?

Using after-improvement data to assess current process capability results in figure 6’s output, which this time comes with the blessing of a predictable process behind it (see figure 5).

Figure 6: Customized capability output using Minitab software based on example two’s after-improvement data. Click here for larger image.

None of the after-improvement 64 data values are out of specification; hence, the observed PPM nonconforming in figure 6 is zero. Moreover, with Cp and Cpk statistics of 1.87 and 1.71, respectively, the estimation of nonconforming output in future operation is essentially zero so long as the process remains effectively unchanged.

If the process were to change, the statistics from these 64 data would no longer describe the process after the change. Two types of change can occur, whether temporary or permanent:
• Desired change: Any fraction nonconforming estimates would have to come from new data representing the process after the change (given a state of predictability)
• Undesired change: The process is telling you it can change without warning, so predicting what it will do next is a tough sell. If you’ve learned of the process change through a control chart, investigate the cause of the change, and take action to remove its effect.

Given the context of example two:
• It’s a mistake to use capability statistics to predict what a process will do when the process is actually unpredictable.
• It’s also a mistake to compute capability statistics without knowing if the data you’re using are the most recent data on the process.
• Both of these mistakes expose you to an unnecessary risk of a penalty (see figure 7).

But if things panned out as per example two, you’d have hopefully learned what to do better next time. However, no one but you would have been the wiser... probably.


To avoid being wrong with capability indexes like Cp and Cpk, you need the following:
1. Well-defined specifications for the product or process characteristic under study
2. Characterization of process behavior as predictable
3. Data that represent the current process
4. No known plan to make a change to the current process in the near future

Can software help to address summary points 3 and 4?

Hence, while example one represents the maximum possible penalty for being wrong, example two shows that there’s also a chance of getting away with the mistake, as shown in figure 7. A key point is that an incorrect use of Cp or Cpk means you’ll find yourself somewhere on the penalty axis shown in figure 7.

Figure 7: Penalty axis associated with a wrong use of Cp and/or Cpk

If the aim is to wipe out a wrong use of process capability, consider that no capability index can be safely interpreted without a graphical plot of the data. Safeguarding these graphical plots are the statistical process control principles of rational subgrouping and rational sampling. The most common approach to plotting the data is:
• First characterize a process as predictable or unpredictable through the use of a process behavior chart.
• Then assess the capability of a predictable process using a histogram of the individual values on which the specifications are included, and possibly include the process target value and the natural process limits.

In conclusion, dealing with process capability statistics can be like having a conversation with a liar: You know what you’ve been told, but can you believe it? The calculated statistics can make a fool of us if we fail to look at the actual data skeptically. The skeptical approach is to use graphical methods like control charts and histograms. Capability statistics can only complement these graphs, not replace them.


The past performance and hypothetical capability of an unpredictable process can still be assessed using the same graphical approach described above. Hypothetical capability refers to an estimate of the capability to expect if, and only if, the process is brought into a predictable state by taking successful action on the assignable causes behind the unpredictability. As shown in figure 5, the capability attained after bringing the process into a predictable state may be better than the hypothetical capability estimated from an unpredictable process. In figure 5 the after-improvement limits are much narrower than those before improvement.

For those wondering, example one’s data were made up (see table 1 below for the data). Because SDwithin is an average dispersion statistic—here based on the average of 44 moving range values—the extent to which the process shift of around +18 units inflates SDwithin is much less than the inflation seen in the global standard deviation statistic (StDev) values 1.44 and 9.15, respectively.

Table 1: Example one’s data. Observations 1–22 represent production run one, and observations 23–45 represent production run two.

Moreover, the p-value of 0.775 from the one-sample t-test in example one was nothing but misleading, even though it was a correctly computed statistic. The data had an average of close to 50, but the process operated far away from this level.

The data behind example two are data on the electrical resistance of insulation in megaohms from Walter A. Shewhart’s book published in 1931, Economic Control of Quality Of Manufactured Product (Martino Fine Books, rev. ed. 2015). Shewhart used the 204 data before improvement to show what can be achieved if the causes behind detected process changes, or assignable causes, are identified through investigation, and their effect is then removed from the process through action on these causes. See the reduction in width of the limits in figure 5.

In both examples, running a test of non-normality should have put the brakes on using the normal probability model in figures 1 and 2 because in both cases the p-value was < 0.005. How many times do process data have either a bell-shaped-looking histogram or give a p-value ≥ 0.05, yet be anything but consistent with a stable, predictable process?


About The Author

Scott A. Hindle’s picture

Scott A. Hindle

Scott Hindle supports R&D and factory operations on process capability studies for new products and processes, statistical process control (SPC) for use in routine production, and the use of online measurement devices as a part of both SPC and engineering process control.


Cp/CpK and Pp/Ppk; Rational Subgroups

Hey Scott-  Another detail that often goes unmentioned is the topic of rational subgroups.  Generally speaking, Cp and CpK are calculated using an Rbar over D2 or an Xbar over D4 approximation of the sample standard distribution.  This makes sense if the data is collected in rational subgroups.  What makes up a rational subgroup?  Not the setting on the statistical software!  They are inherent in the data collection.  If an established sample size is collecteded each time samples are measured rational subgroups are established.  They cannot be "created" after random data is recorded.  

The use of rational subgroups enforces the assumption that the process is under control; variation within subgroups is considered while variation between subgroups is essentially ignored.

The alternate indicies, Pp and PpK should also be considered.  In these indicies the sample standard deviation is the actual value based on individual measurements.  All variation is considered.  One benefit of considering Pp/PpK along with Cp/CpK (which again can only be used if the data is collected in rational subgroups!!) is that individual outliers can be set aside.  When one "Plots the Dots" there may be obvious outliers from otherwise in-control data.  The special causes of the outliers can be sought (and hopefully eliminated!) while the otherwise in control process can be left alone and untampered with.

Capability Indicies

The use of tools like a one sample t test is impressive but the whole debacle would have been revealed immediately if the individuals chart had been the FIRST thing done

Your comment

Fully agree. The quote synonymous with Ellis Ott comes to mind "PLOT THE DATA" (https://www.qualitydigest.com/may07/articles/05_article.shtml). Most data have a time component so a simple time-series plot assures the kind of silly mistake made above with Cpk wouldn't happen in practice.