



© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 05/01/2023
Ever since 1935 people have been trying to fine-tune Walter Shewhart’s simple but sophisticated process behavior chart. One of these embellishments is the use of two-sigma “warning” limits. This column will consider the theoretical and practical consequences of using two-sigma warning limits.
British statistician Egon Sharpe Pearson wanted to use warning limits set at plus or minus 1.96 sigma on either side of the central line. Others simply round the 1.96 off to 2.00. Either way, these warning limits are analogous to the 95-percent confidence intervals encountered in introductory courses in statistics. However, the use of such warning limits fails to consider the difference between the objective of a confidence interval and the purpose of using a process behavior chart.
A confidence interval is a one-time analysis that seeks to describe the properties of a specific lot or batch. It’s all about how many or how much is present. A confidence interval describes the uncertainty in an estimate of some static quantity.
A process behavior chart is a sequential analysis that characterizes a process as operating either predictably or unpredictably. Every time we plot a new point on a process behavior chart, we’re carrying out an act of analysis. Here, the outcome isn’t a static quantity but a continuing interaction with the process. As long as the process is being operated predictably, it will operate up to its full potential with no interventions needed. Whenever the process shows evidence of unpredictable operation, we can be confident that some assignable cause is affecting our process. To eliminate the excess costs due to the assignable cause, and to get the process back to operating at full potential, we’ll need to identify and control the assignable cause. Thus, the objective of a process behavior chart is to know when to take action on our process and when to refrain from taking action on it.
Decision theory shows that optimum decision rules for a one-time analysis technique will tend to have about a 5-percent risk of a false alarm. Thus, 95-percent confidence intervals are reasonable estimates for the question of how many or how much. However, with sequential techniques, optimum decision rules will require less than a 1-percent risk of a false alarm for each act of analysis. This is why a process behavior chart uses three-sigma limits rather than two-sigma limits.
Shewhart’s original three-sigma limits have been thoroughly proven in years of use for all kinds of applications and in all kinds of industries. Most of the time we don’t need to increase the sensitivity. On those rare occasions when increased sensitivity is desired, two-sigma warning limits aren’t the right way to proceed.
Points outside the limits call for action. However, we only want to take action when it’s economical to do so. So while we really want to know about the larger process changes, we don’t need to know about every little process hiccup. And history has shown that points outside the three-sigma limits tend to be economically interesting.
Theory tells us that the a posteriori probability of a point outside three-sigma limits actually representing a real process change is approximately 90 percent. However, when you have a point between a two-sigma warning limit and a three-sigma limit, the a posteriori probability that it represents a change in your process is only about 60 percent. So when you use two-sigma warning limits you should expect about 40 percent of your “signals” to be false alarms. (That’s only slightly better than tossing a coin.)
Since 1956 the recognized and accepted way to increase the sensitivity of a process behavior chart has been to use the Western Electric run-tests in addition to the primary detection rule of a point falling outside the three-sigma limits. For clarity’s sake, these detection rules are:
Detection Rule 1: A point outside the three-sigma limits is likely to signal a large process change.
Detection Rule 2: Two out of three successive values that are both on the same side of the average and are beyond one of the two-sigma lines are likely to signal a moderate process change.
Detection Rule 3: Four out of five successive values that are all on the same side of the average and are beyond one of the one-sigma lines are likely to signal a moderate, sustained shift in the process.
Detection Rule 4: Eight successive values on the same side of the average are likely to signal a small, sustained shift in the process.
Detection rules 2, 3, and 4 are the Western Electric run-tests. Collectively, all four rules are often referred to as the Western Electric zone tests. Because these run-tests look for smaller signals, they increase the sensitivity of a process behavior chart when they are used with Rule 1.
In the sections that follow, I’ll compare the use of these detection rules with the use of two-sigma limits as a means of increasing the sensitivity of a process behavior chart. This will be done in three ways. First, we’ll look at the power functions. Next, we’ll look at the average run length curves. Finally, we’ll look at the probabilities of a false alarm.
The power function for a statistical technique describes the probability of detecting a signal. Of course this probability will depend upon the size of the signal, the number of data available, and the technique itself. When the signal is large, useful techniques will have a 100-percent probability of detecting that signal. However, as the size of the signal gets smaller, the probability of detection will generally drop. Finally, in the limiting case where there’s no signal present, desirable techniques will have a small probability of a false alarm. If we plot the probability of detecting a signal on the vertical axis, and plot the size of the signal on the horizontal axis, then we would like to see a curve that starts near zero on the left and climbs rapidly up to 1.00 on the right. (I first published the formulas for the power functions for a process behavior chart 40 years ago. They may be found in my text Advanced Topics in Statistical Process Control [SPC Press, 2004] or downloaded in manuscript 321 on my website.) These formulas are for detecting a shift in location using either an X-chart or an average chart. To remove the effects of subgroup size, the shifts are expressed in standard error units. The curves shown here are the power functions for exactly k = 10 subgroups. Ten subgroups were used because Rule 4 can’t be used with fewer than eight subgroups.
Figure 1: Power function for detection Rule 1 alone
Figure 1 shows the power function for using Detection Rule 1 alone. The left end-point of the power function defines the risk of a false alarm. Here we find that there is a 2.7-percent chance of a false alarm for an average chart using 10 subgroups. When a shift occurs, the probability of detecting that shift within 10 subgroups of when it actually occurs climbs as the size of the shift increases. An X-chart of average chart using Detection Rule 1 alone will have a 100-percent chance of detecting a 3.0 standard error shift in location within 10 subgroups of when that shift occurs. Since the objective is to find those shifts that are large enough to justify the expense of fixing the problem, this curve shows why Rule 1 is usually sufficient.
Figure 2: Power function for Detection Rules 1 and 2
Figure 2 shows the power function for using Detection Rules 1 and 2. The increased sensitivity can be seen in the steeper power function curve. With Rules 1 and 2, you have a 100-percent chance of detecting a 2.5 standard error shift within 10 subgroups of when that shift occurred. However, as is always the case, using additional detection rules results in an increased risk of a false alarm. Here it’s 4.3 percent for an average chart with 10 subgroups.
Figure 3: Power functions for Western Electric zone tests
Figure 3 shows the power function for using Rules 1, 2, and 3, and the power function for using all of the Western Electric zone tests. These curves are slightly steeper than the curve for Rules 1 and 2 combined. They both show a 100-percent probability of detecting a 2.0 standard error shift in location within 10 subgroups of when it occurred. The false alarm risks for these two curves for an average chart with 10 subgroups are approximately 6 percent and 8 percent. In fact, these last two power function curves show probabilities that differ by less than 0.10 for shifts smaller than 2.0 standard errors. Such small differences in power are hard to detect in practice. Using Rules 1, 2, and 3 will work about as well as using all of the detection rules. Rule 4 will only add some sensitivity to small and sustained shifts.
The curves in figure 3 are getting squeezed together because there’s a limit to steepness of the power function, and these curves are approaching that limit. Once you hit this limit, the only way to raise the power function curve is by raising the left-hand endpoint of the curve. We essentially see this beginning to happen in the last two curves of figure 3.
Figure 4: Power function for two-sigma limits
Figure 4 shows the power function curve for using two-sigma warning limits. Here we see that with warning limits we’re 4-percent more likely to detect a 2.0 standard error chart than we would be using Detection Rules 1 and 2. For this very slight increase in sensitivity to changes of economic importance, these warning limits increase our risk of a false alarm tenfold, from 4 percent to 38 percent!
A different perspective is provided by the average run length (ARL) curves. The average run length is the average number of subgroups between the occurrence of a signal and the detection of that signal. When these ARL values are plotted against the size of the signal, we end up with the curves in figure 5.
Figure 5: ARL curves for two-sigma limits and Western Electric zone tests
For shifts in excess of 2.0 standard errors, the use of two-sigma warning limits will have an average run length that is less than one subgroup smaller than that of all four detection rules. However, when there are no signals, the use of two-sigma warning limits will result in one false alarm every 22 subgroups on average. In contrast, using all four detection rules will result in one false alarm every 91 subgroups on average. So, by using two-sigma limits, you’re increasing your false alarm rate fourfold in return for a very slight advantage in detecting a signal that is large enough to be of any practical consequence.
Figure 4 showed a comparison while holding the number of subgroups constant. Figure 5 showed the average number of subgroups between the signal and the detection of that signal. As noted earlier, the risk of a false alarm on each step of a sequential procedure isn’t the same as the overall risk of a false alarm across several steps. Here we’ll look at how the false alarm probability increases as the number of subgroups increases.
Figure 6 shows the probability of a false alarm on the vertical axis and the number of subgroups considered on the horizontal axis. Here we compare the probabilities of false alarms for the traditional process behavior charts and a chart using two-sigma limits.
Figure 6: False alarms for two-sigma limits and Western Electric zone tests
The use of two-sigma warning limits will result in a dramatic increase in the number of false alarms compared to the other charts. This dramatic increase begins immediately, and just gets bigger as more data are collected. Since, in order to use the charts effectively, you must investigate each and every out-of-limits point in search of an assignable cause, this excessive number of false alarms will inevitably undermine the credibility of both the chart and the person using it. In short, an excessive number of false alarms will kill the use of the charts.
In practice, most people who use process behavior charts effectively find that they have plenty of signals using Detection Rule 1. In fact, the problem is usually one of needing a procedure that is less sensitive, rather than more sensitive. However, for those situations where an increased sensitivity is desired, the addition of Detection Rules 2, 3, or 4 will suffice.
The only reason to collect data is to use them to take action. To use data to take action, you must have a properly balanced decision rule for interpreting those data. Otherwise, you’ll either err in the direction of missing signals or else you’ll err in the direction of taking action based on noise.
The purpose of a process behavior chart is to tell you when to take action and when to refrain from taking action; when to look for an assignable cause of exceptional variation and when not to do so. The idea behind the chart is as old as Aristotle, who taught us that the time to identify a cause is at that point where a change occurs. This is why we’re only concerned with changes that are large enough to be of economic consequence. And Shewhart’s three-sigma limits are sufficient to detect these changes virtually every time.
Tightening up limits on a process behavior chart will not improve things. You can’t squeeze the voice of the process. Tightening up the limits on a process behavior chart will only increase the false alarm rate. When you look for nonexistent assignable causes, you’ll be wasting time and effort while undermining the usefulness of the process behavior chart. Three-sigma limits strike a balance between the economic consequences of the dual mistakes of missing signals and getting false alarms.
Finally, in this column I’ve used the power functions, ARL curves, and false alarm probabilities computed in the usual way simply because that’s the only way to obtain valid comparisons between different techniques. These usual assumptions are that the shift in location can be modeled by a step function, that the measurements are continuous, that the measurements are independent of each other, and that the measurements are normally distributed. These assumptions are necessary to carry out the mathematics. In practice none of these assumptions are realistic. This is why theory only provides a starting place for practice. So, while theory suggests that three-sigma limits should work, almost 100 years of practice has proven beyond any doubt that they do work as expected. Make no changes. Accept no substitutes.
Links:
[1] https://www.spcpress.com/book_advanced_topics_in_spc.php