## Process Capability: What It Is and How to Ensure It Helps, Part 5

### Why is it important to keep the process stable?

Published: Wednesday, February 13, 2019 - 13:03

‘Process Capability: What It Is and How It Helps,” parts one, two, three, and four, discussed Alan’s development in the field of process capability^{1} He’d learned about the mistakes that can be made and how to avoid them in practice to become better at his job. Alan had since passed on his learning to colleagues, one of whom, Owen, had led some successful assessments of process capability.

### Sometimes process capability doesn’t help…

Owen, a member of the plant’s technical team, felt somewhat alone and frustrated as he listened to line three’s production manager, Fiona, condemning process capability and (he felt) him, too. “How has this helped us?” she asked, looking menacingly in his direction. “You stood here four months ago and said we’d produce in-spec, but recently I’ve been hammered every second or third week because the reworked batches meant we couldn’t meet the production schedule.”

How did it get to this? Owen asked himself. To understand why, we have to go back four months.

### Four months earlier

Owen was confident as he embarked on the process capability assessment for line three. Key for him was the paragraph from part 3, which came from Lloyd Provost’s book, *The Health Care Data Guide: Learning from Data for Improvement* (Jossey-Bass, 2011):

“Process capability is a prediction of future performance of the process. It can be stated as the range of outcomes expected for any measure of interest. Before considering capability, the process for the measure of interest must be stable (no special causes). This gives us a rational basis for the prediction. Thus, developing an appropriate Shewhart chart for the measure of interest is a prelude to a capability analysis. The process capability can be compared to the requirements or specifications for the measure. This is best done graphically.”

Owen, most familiar with production lines one and two, moved into newer territory with the job on line three. Working on product 13 D, he agreed with Fiona, newly appointed as production manager, to study product characteristic 12 for process capability. For this characteristic, the voice of the customer, i.e., the specifications, were:

• LSL: 63 (lower specification limit)

• USL: 79 (upper specification limit)

Owen learned that, in production, product is routinely sampled every second batch for measurement of characteristic 12, which is roughly once per hour. As one-value-per-time-period data, Owen judged the appropriate Shewhart chart to be a process behavior chart, or control chart, for individual values.

### Baseline data and the voice of the process

Owen pondered the set of baseline data he needed to assess the stability of the process. With production run lengths in the order of 15 to 20 hours, Owen could count on 15 to 20 measurements per run. Wanting two or more production runs to assure the inclusion of *between*-production variation, Owen wasn’t expecting to be short on data.

After some discussion with colleagues, the decision was taken to use data from the last five productions, totaling 85 measurements. Owen plotted the data for characteristic 12 in time order of production on a process behavior chart, as shown in figure 1. (The baseline data and computations for the process behavior chart are found in Table 1—see Note 2.)

Figure 1: |

Owen, like Alan in part 3, routinely applied two detection rules to operationally define signals of instability in process data (meaning unstable, or unpredictable, process behavior):

Signal 1: A point that falls beyond a natural process limit (or control limit)

Signal 2: Nine or more consecutive points on either side of the central line

With neither detection rule finding a signal of instability in the data, and with no unnatural pattern evident in the data,^{3} Owen characterized the process as stable over time (or predictable, or in control). As per Provost, the stable process (no special causes) gives the rational basis for prediction.

The prediction is a range of values, not an exact value, and comes from the natural process limits of 66.1 to 76.5 shown in figure 1 (LCL and UCL, respectively^{4}). These limits define the “voice of the process” and are used to predict the output from future productions under the condition that the process continues to display stable behavior.

### Process capability of the stable process

Process capability is based on a comparison of the voice of the process with the voice of the customer. For product characteristic 12:

• Voice of the process: 66.1 to 76.5 = Predicted range of values of process output

• Voice of the customer: 63.0 to 79.0 = Range of acceptable process output

Does the voice of the process fit inside the voice of the customer? Yes, and comfortably so, as shown in figure 2’s histogram.

Owen proceeded to compute the capability indexes Cp and Cpk. (See part 1 for a first discussion of these indexes and part 3 for a discussion of which standard deviation is used and why.)

Rounding to give Cp and Cpk equal to 1.5, Owen was happy with the outcome: He judged the process to be capable.

### Out-of-specification product

The observed, or actual, out-of-specification fraction for the baseline data is: *Out of spec* = 0/85 = 0.00

Two digits are used after the decimal because Owen had 85 data, meaning he could legitimately compute his decimal out to parts per hundred, certainly no more than parts per thousand (three digits after the decimal).

Thinking about the process’s future performance, Owen looked again at figure 2’s histogram and asked himself, “How likely is it that the *same stable process* gives a measurement that falls out of specification?”

Owen concluded that the capability assessment was to be summarized simply and as follows: Stable and capable process for characteristic 12 with a best estimate of zero produced units being out of specification.

(For further discussion on out-of-specification product, see note 5 at the end of this article.)

### Communication

Owen was given a few minutes during the weekly production meeting to communicate the outcome of the capability assessment. He passed around copies of figure 2, and the take-home message was no trouble from this process with fully conforming product expected. Fiona gave Owen the thumbs up and thanked him for a job well done. Owen had added another successful assessment of process capability to his credit.

### Fast forward four months

Owen, who’d gone back to working on production lines one and two since the capability assessment, got wind of some quality problems in line three’s product 13 D. In the close-to-30 production runs during the last four months, there had been five batches out of specification for characteristic 12.

507 measurements had been made since the baseline capability assessment, meaning the observed out-of-specification occurrence was 5/507 × 100 = 1.0%.^{6} Owen wondered what had gone wrong because 1-percent out-of-specification measurements was not in line with his predicted “best estimate” of zero.

1-percent out of specification spelled trouble because product 13 D was in high demand, and meeting supply was hard enough when everything went well, never mind when nonconforming batches had to be reworked. An improvement project was already underway because no solution to the problem had been found.

Owen plotted characteristic 12’s measurement data that had been collected since the baseline period on a time series graph, as shown in figure 3. The out-of-specification measurements occurred in production runs 12, 18, 23, 27, and 32 (points above the upper specification).

Owen thought back to his communication, or “sales pitch,” of the capability assessment and realized he hadn’t mentioned anything related to *sustaining process capability*, nor had he shown the process behavior chart (figure 1). Yet, he’d learned from Alan that the only way to sustain process capability over time is by 1) using a process behavior chart to monitor the process’s ongoing stability; and 2) taking action on the causes of detected instability to remove their effect. As stated by Provost, “...the process for the measure of interest must be stable....”

Did characteristic 12’s data display continued stable behavior over time? To find out, Owen plotted the measurement data on the same process behavior chart he had created with the baseline data, as shown in figure 4. (The natural process limits are therefore the same as those in figure 1.)

With 12 measurements above the upper natural process limit, this process did not display continued stability over time. The extent to which several of the points fall above the upper limit leaves this interpretation beyond doubt.

What does this mean for the baseline process capability demonstrated four months earlier? Quite simply, it no longer exists. Consequently, Owen’s rationale for the “best estimate” of zero out of specification had ceased to exist, and the process had been operating under false expectations.

As seen in figure 4, immediately after the baseline period and in production 6, detection rule 2 signaled a downward shift. Then, in production 7, a point fell above the upper natural process limit. These signals were, as Owen realized, warnings that this process was not performing in line with its capability. But, they were neither detected nor heeded. Such “warnings” are oftentimes ignored^{7} even though they are opportunities to drive learning and improvement, thereby preventing trouble and the need to “firefight.”

Following Fiona’s apparent condemnation of process capability (refer to the start of this article), Owen arranged a meeting with her during which she confirmed that product characteristic 12 had not been monitored on a process behavior chart. In routine production, measurement data were essentially used for comparison to specification limits, meaning that only out-of-specification points that led to rework were deemed trouble. Moreover, the workforce had lost some faith in process capability because the expectation of a trouble-free, fully conforming product just hadn’t been realized. Fiona also spoke of the improvement project being necessary because no root cause had yet been found for the five out-of-specification batches.^{8}

Owen discussed figure 4’s process behavior chart with Fiona, showing that there were many more than five signals—the five out-of-specification measurements^{9}—in the data. He explained the signals 1 and 2 in figure 4:

Signal 1—Points outside the process limits: These are the sudden, big changes in the process that stand out like a sore thumb.

Signal 2—A run on the central line: smaller changes in the process that indicate a sustained shift up or down

Owen stated that only by detecting and then responding to signals of instability on a process behavior chart can process capability be sustained. Fiona asked Owen why he hadn’t stressed the importance of process stability for process capability to stay relevant after the baseline assessment. The best Owen could do was to look somewhat apologetic. Fiona encouraged Owen to look forward, not backward, and asked if he’d help the team working on the improvement project. She also said that she wanted to know more about the use of process behavior charts in routine production. Owen said a good start would be to discuss the paper “Sustaining Predictable and Economic Operation: What Does It Take?” (*Quality Digest*, Nov. 6, 2017). Putting the content of this paper into practice, he said, would ensure that process capability helps more next time.

### Insights from this story

1. Owen, responsible for the baseline process capability assessment for product characteristic 12, mistakenly overlooked the importance of maintained process stability beyond the baseline period in his communication to the production team. Unfortunately, without a sustained stable process, there was nothing to sustain Owen’s prediction of zero out-of-specification product.

2. The consequence of overlooking process stability is that the demonstrated process capability was lost almost immediately. The process never recovered to perform up to its capability during the four months after the baseline assessment, meaning that Fiona and her production team had unjustified, overly optimistic, expectations of what the process would produce. (See discussion of figure 4.)

3. To justify a high degree of belief in predictions of future process performance from process capability:

• A suitable baseline period is chosen over which stable process behavior is displayed (see figure 1 and discussion thereof).

• After the baseline period, stable process behavior is expected to continue; knowledge of the process is paramount here.

• Process limits—obtained from the baseline data—are extended into the future to monitor process behavior as new data are obtained. The process behavior chart monitors the process’s ongoing stability and, at the same time, tells you if process capability is sustained or not.

• When signals of unstable behavior are detected by the process behavior chart—and, inevitably, this *will* happen—a plan is in place to use the chart to drive investigations and actions, with the aim of restoring the stable process. This means the process behavior chart also drives decision-making in routine production.

4. Although specification limits are a must to calculate the process capability indexes Cp and Cpk, specification limits are not required to monitor the behavior of the process, or to tell you if an achieved level of capability is sustained or not.

### Your opinion…

When introduced to the capability study on line three, Owen learned that product is routinely sampled every second batch for measurement of characteristic 12:

• Question 1: Based on the learnings from the baseline period, which rationale can be used to justify sampling from every second batch produced, and not every batch?*• *Question 2: What happens to this rationale after the baseline period?

See note 10 below for a plausible explanation.

### Notes

1. See also the discussion during the Sept. 2, 2016, episode of *Quality Digest Live*.

2. Table 1 shows the baseline data and the computations for figure 1’s process behavior chart.

3. Two examples of an unnatural, or nonrandom, pattern are:

• The first measurement value of each production run always being high in comparison with the other values

• A pattern that repeats itself eight times or more, such as low-high-high, low-high-high, low-high-high...

4. LCL and UCL stand for lower and upper control limits in line with the traditional name of “control chart.”

5. Out of specification product for the 85 baseline data:

Owen’s statistical software reported that the fraction of process output *expected* to be out of specification was 0.000005, which might be expressed as 0.0005 percent, or five parts per million.

Where does the value 0.000005 come from? To start, the statistical software assumes a normal distribution as the theoretical probability model for the underlying process. Using the Shapiro-Wilk lack-of-fit test, the *p*-value for the baseline data is 0.89, a value that is commonly, and also mistakenly, interpreted to mean that the data are *proven* to be normally distributed. (The lack-of-fit test can only detect departures from an assumed probability model. Hence, the *p*-value of 0.89 finds no evidence to reject the initial assumption of normality.)

Continuing with the assumption of normality, z-scores can be computed vs. each specification:

and

Tables for the standard normal distribution, or the Excel function “NORMSDIST,” can then be used to determine the theoretical proportion of process output that would fall on the wrong side of the two specification limits. For the z-scores of 4.758 and 4.459, the theoretical fraction below the lower and above the upper specification limits is 0.000001 and 0.000004, respectively, which gives the value of 0.000005.

Is the value 0.000005 so precise as to warrant six digits after the decimal? Because the baseline data set consists of 85 values, the value 0.000005 is appropriately rounded to 0.00, which is in line with Owen’s best estimate of zero for the predicted out-of-specification rate (see discussion of figure 2).

How uncertain is Owen’s best estimate of zero? One option to estimate the uncertainty is to use a 95-percent confidence interval, or interval estimate, for Cpk, which is 1.19 to 1.78. The theoretical fraction out of specification for the Cpk lower confidence interval value of 1.19, again under the assumption of normality, is 0.000178491 [in EXCEL “=1-NORMSDIST(3*1.19)”]. Rounding the value 0.000178491 to properly account for the 85 baseline data gives, once again, 0.00, or perhaps as far as 0.000, but nonetheless zero. Hence, there is no computable risk of out-of-specification product; 85 data do not justify the apparent precision seen in the calculated value of 0.000178491, nor in the possible computations thereof of 178 parts per million or 0.0178 percent.

A second option could be based on a different probability model but, if so, which one and why? (Why, one could also challenge, is it appropriate to assume normality for this process?)

A third, and also simpler option because it bypasses the need to assume a probability model for the underlying process, is to use an interval estimate of a binomial proportion, such as the Agresti-Coull interval estimate. (For the baseline data, the binomial proportion, or point estimate, is 0.00 as shown above.) For further details see Agresti and Coull’s paper, “Approximate Is Better Than ‘Exact’ for Interval Estimation of Binomial Proportions” (*The American Statistician*, May 1998).

Note that meaningful predictions from these out-of-specification calculations are conditioned upon a stable process. Without a stable process, these computations can give values that are meaningless and, worse still, misleading. For a broader, more in-depth discussion of this subject see, for example, Donald J. Wheeler’s “The Parts Per Million Problem: When does a computation result in a number?” (*Quality Digest Daily*, May 11, 2015) and “Invisible Probability Models: The gap between computations and reality” (*Quality Digest*, June 4, 2018).

6. With 507 measurements, the observed out-of-specification fraction since the baseline period is: Out of spec = 5/507 = 0.009862. With 507 data, we appropriately round up to the thousands, hence 0.010 to account for the three decimal places. Expressed as a percentage, we have 1.0 percent. Expressing the decimal as 9,862 parts per million would overstate the precision in this value because we have only enough data to support rounding to the thousands.

7. See “SPC: Hunting the Big Picture and the Big Payoff,” by Douglas Fair (*Quality Magazine*, Nov. 5, 2018), which discusses that “Data that are within specification limits, for example, get saved to the database and rarely (if ever) viewed again for improvement purposes.”

8. Examination of figure 3 suggests that a reduction in the average of about one to two units might help to reduce the risk of out-of-specification product.

9. These five points were signals because they fell above the upper natural process limit (see figure 4), and not because they were out-of-specification measurements.

10. Question 1: As per the baseline data, process behavior was stable (figure 1). With stability, the amount of variation from one time period to another is predictable, providing the rationale to predict the range of outcomes for characteristic 12, which is the range 66.1 to 76.5 (see figure 1’s process limits). Hence, the nonsampled batches from this period—i.e., every second batch—would be expected to have measurements in the range 66.1 to 76.5. Because this range comfortably fits inside the specifications (figure 2), a high degree of belief can be held that *all* product made in the baseline period was in specification for characteristic 12. With this degree of belief, why sample every batch?

Question 2: After the baseline period, process behavior was no longer stable (figure 4). Without stability, the amount of variation from one time period to another is unpredictable, and the process limits no longer serve to predict a range of expected process outcomes. With this, the rationale to sample only every second batch is weakened, perhaps lost altogether. If assurance is needed that only in-specification product is shipped, what else to do but sample every produced batch, thereby doubling the cost of inspection?