## The Problem of Long-Term Capability

### Poor labels lead to incorrect ideas

Published: Monday, July 8, 2013 - 13:38

Based on some recent inquiries there seems to be some need to review the four capability indexes in common use today. A clear understanding of what each index does, and does not do, is essential to clear thinking and good usage. To see how to use the four indexes, to tell the story contained in your data, and to learn how to avoid a common pitfall, read on.

### The four indexes

Four indexes in common use today are the capability ratio, *C**p;* the performance ratio, *P**p*; the centered capability ratio, *C**pk;* and the centered performance ratio, *P**pk*. The formulas for these four ratios are:

To understand these ratios we need to understand the four components used in their construction. The difference between the specification limits, *USL* –* LSL*, is the specified tolerance. It defines the *total space available* for the process.

The distance to the nearer specification, *DNS*, is the distance from the average to the nearer specification limit. Operating with an average that is closer to one specification than the other effectively narrows the space available to the process. It is like having a process that is centered within limits that have a specified tolerance = 2 *DNS*. Thus, the numerator of both the centered capability ratio and the centered performance ratio characterizes the *effective space available* due to the fact that the process is not centered within the actual specification limits.

*Sigma(X)* denotes any one of several within-subgroup measures of dispersion. One such measure would be the average of the subgroup ranges divided by the appropriate bias correction factor. Another such measure is the average of the subgroup standard deviation statistics divided by the appropriate bias correction factor. The quantity denoted by 6 *Sigma(X) *represents the *generic space required by a process* when that process is operated up to its full potential.

The global standard deviation statistic, *s*, is the descriptive statistic introduced in every statistics class. Since it is computed using all of the data, it effectively treats the data as one homogeneous group of values. This descriptive statistic is useful for summarizing the past, but if the process is not being operated up to its full potential the changes in the process will tend to inflate this global measure of dispersion. Thus, this measure of dispersion simply describes the past without respect to whether the process has been operated up to its full potential or not. The denominators of 6*s* define the *space used by the process in the past*.

A glance at the formulas above will reveal that the only difference between the capability indexes and the corresponding performance indexes is simply which measure of dispersion is used. The performance indexes use the global standard deviation statistic to describe the past. The capability indexes use a within-subgroup measure of dispersion to approximate the process potential. Whenever and wherever this profound difference between these measures of dispersion is not appreciated it is inevitable that capability confusion will follow.

Depending upon what is happening with the underlying process, the four indexes above can be four estimates of one quantity, four estimates of two different quantities, or even four estimates of four different quantities. This variable nature of what these index numbers represent has complicated their interpretation in practice. As a result, many different explanations have been offered. Unfortunately, some of these explanations have been flawed and even misleading.

### What the four indexes measure

Using these four components defined above, we see that the capability ratio, *C**p*, expresses the space available within the specifications as a multiple of the space required by the process when it is centered within the specifications and is operated predictably. It is the *space available* divided by the *space required *under the best possible circumstances.

The performance ratio, *P**p*, expresses the *space available *within the specifications as a multiple of the *space used in the past* by this process. If the process has been operated up to its full potential, the space used in the past and the space required by the process will be essentially the same, and the performance ratio will be quite similar to the capability ratio. If the process has not been operated up to its full potential then the space used by the process in the past will always exceed the space required by the process, and the performance ratio will be smaller than the capability ratio. Thus, the agreement between the capability ratio and the performance ratio will characterize the extent to which the process is, or is not, being operated predictably.

The centered capability ratio, *C**pk*, expresses the *effective space available* as a multiple of the *space required *by the process when it is operated predictably at the current average. It is the effective space available divided by the space required. The extent to which the centered capability ratio is smaller than the capability ratio will characterize how far off-center the process is operating.

The centered performance ratio, *P**pk*, expresses the *effective space available* as a multiple of the *space used by the process in the past*. This ratio essentially describes the process as it is, where it is, without any consideration of what the process has the potential to do. The extent to which the centered performance ratio is smaller than the performance ratio is a characterization of how far off-center the process has been operated.

The relationship between these four indexes may be seen in figure 1. There the top tier represents either the actual capability of a process that is operated predictably, or the hypothetical capability of a process that is operated unpredictably. The bottom tier represents the actual performance of a process that is operated unpredictably. The left side represents what happens when the process is centered at the mid-point of the specifications, while the right side takes into account the effect of having an average value that is not centered at the midpoint of the specifications.

Thus, the top tier of figure 1 is concerned with the process potential, and the bottom tier describes the process performance. As a process is operated ever more closely to its full potential, the values in the bottom tier will move up to be closer to those in the top tier.

The left side implicitly assumes the process is centered within the specifications; the right side takes into account the extent to which the process may be off-center. As a process is operated closer to the center of the specifications the values on the right will move over to be closer to those on the left.

Thus, when a process is operated predictably and on target, the four indexes will be four estimates of the same thing. This will result in the four indexes being close to each other. (Since the indexes are all statistics, they will rarely be exactly the same.)

When a process is operated predictably but is not centered within the specifications, the discrepancy between the right and left sides of figure 1 will quantify the effects of being off-center. With a predictable process, the two indexes on the right side of figure 1 will both estimate the same thing while the two indexes on the left side will be two estimates of another quantity.

When a process is operated unpredictably, the indexes in the bottom row of figure 1 will be smaller than those in the top row, and these discrepancies will quantify the gap due to unpredictable operation.

When a process is operated unpredictably and off-target, the four indexes will represent four different quantities.

Thus, the capability ratio, *C**p*, is the best-case value, and the centered performance ratio, *P**pk*, is the worst-case value. The gap between these two values is the opportunity that exists for improving the current process by operating it up to its full potential.

The capability ratio, *C**p*, approximates what can be done without reengineering the process. If this best-case value is good enough, then the current process can be made to operate in such a way as to meet the process requirements. Experience has repeatedly shown that it is cheaper to learn how to operate the existing process predictably and on-target than it is to try to upgrade or reengineer that process.

Thus, by comparing the four capability and performance indexes you can quickly and easily get some idea about how a process is being operated. How close is it to being operated up to its full potential? Is it being operated on-target? Will it be necessary to reengineer the process, or can it be made to meet the process requirements without the trouble and expense of reengineering?

### Example one

Figure 2 contains 260 observations from a predictable process. The corresponding average and range chart is shown in figure 3. The specifications for this process are 10.0 ± 3.5.

This process has a grand average of 10.15. The specification limits are 6.5 and 13.5. Thus, the distance to nearer specification will be *DNS* = 13.5 – 10.15 = 3.35. The average range is 4.25. With subgroups of size 5 this latter value results in a value for *Sigma(X)* of 4.25/2.326 = 1.83. Finally, the global standard deviation statistic is *s* = 1.847. Thus, the four capability and performance ratios are:

Here all four indexes tell the same story. They all might be taken to be estimates of the same quantity. Even without the average and range chart of figure 3 we could tell that this process was being operated predictably and is fairly well-centered within the specifications. The fact that these indexes are all near 60 percent implies that this process is not capable of meeting the specifications even though it is being operated up to its full potential.

### Example two

Raw materials for a compound are dry-mixed in a pharmaceutical blender. The recipe calls for batches that are supposed to weigh 1,000 kg. If the weight of a batch is off, then presumably the recipe is also off. As each batch is dumped out of the blender the weight is recorded. Figure 4 shows the weights of all 259 batches produced during one week. The values are in time-order by rows. The *XmR* chart for these values is shown in figure 5. The limits shown were based on the first 45 values. There are points outside the limits within this baseline period, and the process deteriorates as the week progresses.

The specifications for the batch weights are 900 kg. to 1,100 kg. With an average moving range of 27.84 the value for *Sigma(X) *is 27.84/1.128 = 24.7 kg. The global standard deviation statistic for all 259 values is *s* = 61.3 kg. With an average of 936.9, the *DNS* value is 36.9 kg. Thus, the four indexes are:

The discrepancy between the capability ratio and the performance ratio shows that this process is being operated unpredictably. The discrepancy between the centered performance ratio and the performance ratio shows that the average is not centered within the specifications. The capability ratio describes what the current process is capable of doing when operated predictably and on target. The centered performance ratio describes the train wreck of what they actually accomplished during this week, and the gap between these two indexes describes the opportunity that exists for this process.

### Long-term capability

As shown in these examples, each of the four index numbers makes a specific comparison between the specified tolerance or the effective space available and either the within-subgroup variation or the global standard deviation statistic. In an effort to distinguish between the capability indexes and the performance indexes the performance indexes have sometimes been called “long-term capability indexes.” This nomenclature is misleading and inappropriate.

The idea behind the terminology of long-term capability is that if you just collect enough data over a long enough period of time you will end up with a good estimate of the process capability. To illustrate how this is supposed to work we will use data from example one to perform a sequence of computations using successively more and more data at each step. Although we would not normally perform the computations in this way in practice, we do so here to see how increasing amounts of data affect the computation of performance and capability ratios.

We begin with the first eight subgroups. The global standard deviation statistic for these 40 values is 1.974. The specifications are 6.5 to 13.5, so our *USL – LSL = 7.0*. Using these values we get a performance ratio of 0.591. The average range for these eight subgroups is 4.375, so *Sigma(X)* is 1.881, and with this value we get a capability ratio of 0.620. It is instructive to note how close these values are to the values found using all the data in example one above.

The first 12 subgroups contain 60 values. The global standard deviation statistic for these 60 values is 1.742. Using this value we get a performance ratio of 0.670. The average range for these 12 subgroups is 3.833, so *Sigma(X)* is 1.648, and with this value we get a capability ratio of 0.708.

The first 16 subgroups contain 80 values. The global standard deviation statistic for these 80 values is 1.678. Using this value we get a performance ratio of 0.691. The average range for these 16 subgroups is 3.875, so *Sigma(X)* is 1.666, and with this value we get a capability ratio of 0.700.

Continuing in this manner, adding 20 more values at each step, we get the performance ratios and capability ratios shown in figure 6. There we see that as we use greater amounts of data in the calculations these ratios settle down and get closer and closer to a value near 0.640.

Of course, as may be seen above, when a process is operated predictably, the capability ratio and the performance ratio both estimate the same quantity. Thus, when a process is operated up to its full potential there is no distinction to be made between the short-term capability and the long-term capability. Both computations describe the actual capability of the predictable process.

The convergence of a statistic to some asymptotic value that occurs with increasing amounts of data that is seen in figure 6 is the idea behind many things we do in statistics. Unfortunately, this convergence only happens when the data are homogeneous. In order to see what happens with a process that is not operated up to its full potential, we shall repeat the exercise above using the data from example two.

The first 40 batch weights have a global standard deviation statistic of 41.60. The specifications are 900 to 1,100, so our specified tolerance is *USL – LSL = 200.* Using these values we get a performance ratio of 0.801. The average moving range for these 40 values is 29.10, so *Sigma(X)* is 25.80, and with this value we get a capability ratio of 1.292.

The first 60 batch weights have a global standard deviation statistic of 44.20. Using this value we get a performance ratio of 0.754. The average moving range for these 40 values is 25.76 so *Sigma(X)* is 22.84, and with this value we get a capability ratio of 1.459.

Continuing in this manner, adding 20 more values at each step, we get the performance ratios and capability ratios shown in figure 7. For the sake of comparison, both figure 6 and figure 7 use the same horizontal and vertical scales.

To what value is the performance ratio curve in figure 7 converging? After 120 values it appears to be approaching 0.80, then with 20 additional values it suddenly drops down to the neighborhood of 0.70. After 180 values it seems to be approaching 0.70, then with 20 more values it drops down to the neighborhood of 0.60. After 240 values we are still in the vicinity of 0.60, but then with 259 values we drop down to 0.54. So which value are you going to use as your long-term capability? 0.80? 0.70? 0.60? or 0.54?

Here we see that even though we use ever greater amounts of data, the ratios do not settle down to any particular value. Neither do we see the agreement between the performance ratio and the capability ratio that was evident in figure 6. Clearly these two ratios characterize different aspects of the data in this case. Both the migration and the estimation of different things happen because this process is changing over time. Because of these changes there is no magic amount of data that will result in a “good number.” The computations are chasing a moving target. The question “What is the long-term capability of this process?” is meaningless simply because there is no such quantity to be estimated regardless of how many data we might use.

With an unpredictable process, as we use greater amounts of data in our computation we eventually combine values that were obtained while the process was acting differently. This combination of unlike values does not prevent us from computing our summary statistics, but it does complicate the interpretation of those statistics. With an unpredictable process there is no single value for the process average, or the process variation, or the process capability. All such notions of process characteristics become chimeras, and any attempt to use our statistics to estimate these nonexistent process characteristics is an exercise in frustration. This is why the idea of long-term capability is just so much nonsense.

However, once we understand that we are working with an unpredictable process, we are free to use our statistics to characterize different aspects of the *data* (as opposed to the process). As noted earlier, the capability ratio of 1.35 computed from the first 45 values of example two provides an approximation of what this process has the potential to do. In the same manner, the centered performance ratio of 0.20 describes what was done during this week. And the difference between these two statistics characterizes the gap between performance and potential. Thus, we may use the capability and performance indexes to identify opportunities even when they do not estimate fixed aspects of the underlying process.

Thus, referring to the performance indexes as long-term capabilities confuses the issue and misleads everyone. They are descriptive statistics that summarize the past. They do not estimate any fixed quantity unless the process is being operated predictably. And they definitely do not describe the indescribable “long-term capability of an unpredictable process.”

## Comments

## Reasons why you WOULDN'T use Capability for Quarterly reports

Once a process is stable, capability index numbers won't change significantly unless one of two things happens:

1. There is a shift in the process (signalled by the control chart), or

2. someone decides to change the specification limits.

Having said that, the number might change a little...maybe Cpk is originally reported as 1.31, based on 100 observations in a control chart indicating a state of statistical control. Maybe someone takes a snapshot three months later, and Cpk is 1.30. Have things gotten worse? No...

What management needs to know is whether there has been a shift in the process that has adversely affected production. That signal will come from the control chart first, and should be acted upon as soon as the assignable cause signal happens.

Maybe you had chartered a project to improve the performance of the system; had found some dominant but hidden causes and addressed them, or re-engineered some portion of the process. You would expect, then, to see a significant shift in the process, moving the mean closer to the nominal value and reducing variation. That shift in capability might be reported at a quarterly meeting, but with the process stable again after the changes, the Cpk isn't going to change significantly quarter-to-quarter. A management group that wants to look at capability numbers quarterly simply does not understand what those numbers represent. I'd suggest a thorough reading of Don's article for everyone involved...

## Quarterly Capability tracking

Linda - I understand why you want to do this. Management tends to think in quarterly increments as that is frequency of financial updates, taxes, etc. Mangement has a financial obligation to understand process capability and to take appropriate actions to improve processes. Unfortunately, the methods (e.g. rote use of capability indices) by which we try to meet this need are often counterproductive. This occurs because we are a 'Cliff's notes', just give the bottom line type of culture. We

wantstraightforward yes or no answers. Weneedthe answer to the question: are things getting better, worse or staying the same? Additionally management really does need to focus on the actions (or non-action) that are required given the answer to the question regarding capability. It is our job to provide this focus using the appropriate methods.So the first message is that capabiltiy indexes simply have too many issues to use them in a productive manner. Quarterly comparisons of summary data are also innappropriate (Rip articulated this fairly well). In my experience, the best way is to use a control chart for defect rate data and a

multi-vari*(individual values plotted against the specification limits) for continuous data. The subgrouping (X-axis time frame) should be appropriate for the data. I always have the control chart for the continuous data multi-vari to ensure that the assessment of stability is correct, but the spec limimts are what gives us the 'capability' assessment. (Why do I advocate the multi-vari? It is data rich - without data summaries - and is intuitive for the audience to interpret.) Often when I am reporting on multiple metrics I will use thesmall multiple* approach (many thumbnails of the charts with a table like assessment of the stabiilty and capability. The processes that I want to focus management on are highlighted and their graphs are presented in large format for discussion. (I keep the management team focused on the items they need to take action on or that have just demonstrated a much improved state...) For large metric sets I have started usingspark lines*and simple color coding to communicate a large amount of information in a concise format that is not mis-leading at all.I hope this gives you some useful things to think about

*italics: a simple internet search should turn up a plethora of useful links for information on the italicized methods...

## Trending Capabilities

For a quarterly review I would like to see what the capability of a process is and compare its capabilty to its original validated capability and also see if there is any change in the capabilty quarter to quarter. This is for a high level management overview process. How would you design this metric so that it is meaningful for finding trends?

## Tracking Cpk on a XmR Chart

Well of course it will change each month even if the underlying process is stable (in-control). You can track them (Cpk values) on a XmR chart and if the values remain within the process limits

ANDthe underlying process is in control-nothing has changed. I wrote an article for the STANDARD newsletter (ASQ-MQD) way back in 2000 () that describes this method. I can send you a copy if you like.Reporting Capability NumbersRich DeRoeck

## tracking capability indices on a XMR chart

I think the question isn't how to do it, but rather what value do you get out of doing it?

Of course you can track quarterly index numbers on a XMR chart. and it will provide a decent indication of the stabilty of the index. HOWEVER, why would you?

First quarterly subgroups provide only 4 data points per year so to have 'good' control limits one woudl need several years of data.

Secondly, what - really - is the value add? You already have control charts for the process with the appropriate subgrouping so calculating the capability index and then charting them on a control chart is a lot of extra work.

And quarterly values will tend to supress signals and delay the recognition (proof without resorting the natural control chart) that a change has occured - for the better or worse.

In the worst case calculating the indexes themselves can be very misleading if the process is not naturally Normal. (and many are non-normal and non-homogenous, yet capable and stable but these processes tend to substantially over estimate the standard deviation and make capable processes look incapable.)

"Just because we can, doesn't mean we should"## Hers's Why

I once had a customer who requested Cpk numbers with every shipment we sent them. Even though the process was relatively stable, the numbers would change (1.048, 1.039, 1.117 1.250 etc....). In order to avoid his questions as to why the capabilty numbers were getting worse or better (interpreting noise as a signal) I placed them on an XmR and explained to him that these changes were meaningless. For me it was worth it. My experience is that most managers have NO understanding of variation. If the numbers get better they are happy, but if they get worse they want an answer as to why (Not much fun).

To quote Wheeler:

"The purpose of placing capability indexes on an XmR chart is to

notmonitor the process but to establish the amount of uncertainty that is inherent in the capability indexes themselves. Knowing the extent of this uncertainty will help you to avoid becoming excited about meaningless changes in the value of the capability index. It might even help you to calm down your customer when he thinks your numbers have taken a turn for the worse"It did for me!Rich DeRoeck

## Tracking Cpk

Rich - I totally agree that if your Customer is requesting monthly or quarterly Cpk numbers an dyou can't have a rational conversation with them to eliminate that requirement then it is certainly in your best interest to track the indices on control charts to attempt to avoid further non-value add churn about 'variation'.

Linda's question was for reporting to her internal management. As quality professionals sometimes our obligation is to provide our maagement what they actually need not what they ask for (well intended but misguided request). It is our obligation to educate on best practices and teh process capability indices are simply not best practice.

## Tracking Cpk on a XmR Chart

Rich,

Would appreciate a pdf copy of your article. Email is sahindle@hotmail.com

Thanks, Scott.

## Why?

Why would you do that? The practice of comparing this quarter to last quarter, and this quarter to the same quarter last year, is poor practice. The old saying still works: "Any fool can make a trend out of two points." Management by fool's trends is a harmful practice. I recommend you look up Don's article on "Avoiding Man-Made Chaos" to see a much better approach.

As long as the process stays in control, these indexes won't change significantly, unless some significant work is done to the system to change its performance. Looking at quarterly numbers, when you have performance this well-defined, is a waste of everyone's time.

## Like this article

1) Would it be possible to email this article seperately so that I can digest it more fully? My email is tjohnson9916@charter.net.

2) What do you mean, "operate a process to its full potential?" This is a key term, but I do not leave the article feeling I understand it fully.

Regards,

Tom Johnson

## Full Potential

## Emailing an article