Featured Product
This Week in Quality Digest Live
Six Sigma Features
Scott A. Hindle
Part 4 of our series on SPC in the digital era
Donald J. Wheeler
What are the symptoms?
Douglas C. Fair
Part 3 of our series on SPC in a digital era
Scott A. Hindle
Part 2 of our series on SPC in a digital era
Donald J. Wheeler
Part 2: By trying to do better, we can make things worse

More Features

Six Sigma News
How to use Minitab statistical functions to improve business processes
Sept. 28–29, 2022, at the MassMutual Center in Springfield, MA
Elsmar Cove is a leading forum for quality and standards compliance
Is the future of quality management actually business management?
Too often process enhancements occur in silos where there is little positive impact on the big picture
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Floor symbols and decals create a SMART floor environment, adding visual organization to any environment
A guide for practitioners and managers

More News

Donald J. Wheeler

Six Sigma

Properties of Probability Models, Part 1

What they forgot to tell you about Weibull distributions

Published: Monday, August 3, 2015 - 16:04

Some commonly held ideas about skewed probability models are incorrect. These incorrect ideas are one source of complexity and confusion regarding the analysis of data. By examining the basic properties of skewed distributions this article can help you to greater clarity of thought and may even simplify your next data analysis.

How would you characterize a skewed distribution? When asked this question most will answer, “A skewed distribution is one that has a heavy, elongated tail.” This idea is expressed by saying that a distribution becomes more heavy-tailed as its skewness and kurtosis increase. To examine these ideas we shall use a popular family of skewed distributions, the Weibulls.

The Weibull family of distributions

Weibull distributions are widely used in reliability theory and are generally found in most statistical software packages. This makes these distributions easy to use without having to work with complicated equations. The following equations are included here in the interest of clarity. The Weibull distributions depend upon two parameters: alpha, α, and beta, β. The cumulative distribution function for the Weibull family has the form:

The Weibull alpha parameter determines the shape of the distribution while the beta parameter determines the scale. Since we will consider the Weibulls in standardized form, where the distribution is shifted to have a mean of zero and stretched or shrunk to have a standard deviation of 1.00, the value for the beta parameter will not matter. Changing the value of β will not affect any of the results noted here. Thus, the horizontal scale for the probability models shown in the figures that follow are in standard deviation units.

When the alpha parameter is 1.00 or less the Weibull model will be J-shaped. When the alpha parameter is between 1.00 and 3.60 the Weibull distributions will be mound-shaped with positive skewness. When the parameter a has the value 3.60 the Weibull model will have a skewness that is very near to zero. When the parameter α is greater than 3.60 the Weibull models will be mound-shaped and negatively skewed. Since the shape of these negatively skewed Weibull models essentially stops changing as the alpha parameter increases beyond 10.0, the negatively skewed Weibulls are of little practical interest. This freezing in shape can be seen by comparing the standardized distributions for α = 15 and α = 800 in figure 1.

Figure 1: Eight standardized Weibull distributions

So what is changing as you select different Weibull probability models? To answer this question figure 2 considers 23 different Weibull models. For each model we have the skewness and kurtosis, the areas within fixed-width central intervals (encompassing one, two, and three standard deviations on either side of the mean), and the z-score for the most extreme part per thousand of the model.

The z-scores in the last column of figure 2 would seem to validate the idea that increasing skewness corresponds to elongated tails. As the skewness gets larger in magnitude the z-score for the most extreme part per thousand also increases in magnitude. This may be seen in figure 3 which plots the skewness vs. the z-scores for the most extreme part per thousand. So skewness is directly related to elongation, as is commonly thought. But what about the weight of the tails?

Figure 2: Characteristics for various Weibull models

Figure 3: Skewness and elongation for Weibull models

Figure 4 plots the areas for the fixed-width central intervals against the skewness of models from figure 2. The bottom curve of figure 4 shows that the areas found within one standard deviation of the mean of a Weibull distribution increase with increasing skewness. Since the tails of a probability model are traditionally defined as those regions that are more than one standard deviation away from the mean, the bottom curve of figure 4 shows us that the areas in the tails must decrease with increasing skewness. This contradicts the common notion about skewness and a heavy tail.

Figure 4: How the coverages vary with skewness for Weibull distributions

So while the infinitesimal areas under the extreme tails will move further away from the mean with increasing skewness, the classically defined tails do not get heavier, they actually get much lighter with increasing skewness. To move the outer few parts per thousand further away from the mean you have to compensate by moving a much larger percentage closer to the mean. This compensation is unavoidable and inevitable. To stretch the long tail you have to pack an ever increasing proportion into the center of the distribution!

Figure 5: How the tails get lighter with skewness for Weibull distributions

So while skewness is associated with one tail being elongated, that elongation does not result in a heavier tail, but rather in a lighter tail. Increasing skewness is rather like squeezing toothpaste up to the top of the tube: while concentrating the bulk at one end, little bits get left behind and are squeezed down toward the other end. As these little bits become more isolated from the bulk, the “tail” becomes elongated.

However, there are a couple of surprises about this whole process. The first of these is the middle curve of figure 4 which shows the areas within the fixed-width, two-standard-deviation central intervals. The flatness of this curve shows that the areas within two standard deviations of the mean of a Weibull stay around 95 percent to 96 percent regardless of the skewness.

In statistics classes students are taught that having approximately 95% within two standard deviations of the mean is a property of the normal distribution. While this is true, the fact that this property also applies to most of the Weibull distributions is unexpected. From the negatively skewed Weibulls, through the positively skewed mound-shaped Weibulls, and for all but the most extreme of the J-shaped Weibulls there will be approximately 95 percent to 96 percent within two standard deviations of the mean.

Figure 6: What Weibull distributions have in common

The second unexpected characteristic of the Weibulls is seen in the top curve of figure 4, which shows the areas within the fixed-width, three-standard-deviation central intervals. While these areas do drop slightly at first, they stabilize for the J-shaped Weibulls at about 98 percent. This means that a fixed-width, three-standard-deviation central interval for a Weibull distribution will always contain approximately 98 percent or more of that distribution.

So if you think your data are modeled by a Weibull distribution, then even without any specific knowledge as to which Weibull distribution is appropriate, you can safely say that approximately 98 percent or more will fall within three standard deviations of the mean, and that approximately 95 percent or more will fall within two standard deviations of the mean. Fitting a particular Weibull probability model to your data will not change either of these statements to any practical extent.

For many purposes these two results will be all you need to know about your Weibull model. Without ever actually fitting a Weibull probability model to your data, you can filter out either 95 percent or 98 percent of the probable noise using generic, fixed-width central intervals.

What gets stretched?

If the tail gets both elongated and thinner at the same time, something has to get stretched. To visualize how skewness works for Weibull models we can compare the widths of various fixed-coverage central intervals. These fixed-coverage central intervals will be symmetrical intervals of the form:

While this looks like the formula for the earlier fixed-width intervals, the difference is in what we are holding constant and what we are comparing. With the fixed-width intervals we compared the areas covered. With the fixed-coverage intervals we compare the widths of the intervals. These widths are characterized by the z-scores in figure 7. For example, a Weibull model with an alpha parameter of 1.000 will have 92 percent of its area within 1.52 standard deviations of the mean, and it will have 99 percent of its area within 3.61 standard deviations of the mean.

Figure 7: Widths of fixed-coverage central intervals for Weibull models

Figure 8 shows each column in figure 7 plotted against skewness. The bottom curve shows that the middle 92 percent of a Weibull will shrink with increasing skewness. The 95-percent fixed-coverage intervals are remarkably stable until the increasing mass near the mean eventually begins to pull this curve down. The 98-percent fixed-coverage intervals initially grow, and then they plateau near three standard deviations.

Figure 8: Widths of fixed-coverage central intervals for Weibull models

The spread of the top three curves shows that for the Weibull models it is primarily the outermost two percent that gets stretched into the extreme upper tail. While 920 parts per thousand are moving toward the mean, and another 60 parts per thousand get slightly shifted outward and then stabilize, it is primarily the outer 20 parts per thousand that bear the brunt of the stretching and elongation that goes with increasing skewness.

The benefits of fitting a Weibull distribution

So what do you gain by fitting a Weibull model to your data? The value for the alpha parameter may be estimated from the average and standard deviation statistics, and this will, in turn determine the shape of the specific Weibull model you fit to your data. Since these statistics will be more dependent upon the middle 95 percent of the data than the outer one percent or less, you will end up primarily using the middle portion of the data to choose a Weibull model. Since the tails of a Weibull model become lighter with increasing skewness, you will end up making a much stronger statement about how much of the area is within one standard deviation of the mean than about the size of the elongated tail. Fitting a Weibull distribution is not so much about the tails as it is about how much of the model is found within one standard deviation of the mean. So, while we generally think of fitting a model as matching the elongated tail of a histogram, the reality is quite different.

Once you have a specific Weibull model, you can then use that model to extrapolate out into the extreme tail (where you are unlikely to have any data) to compute critical values that correspond to infinitesimal areas under the curve. However, as may be seen in figure 3, even small errors in estimating the parameter alpha can have a large impact upon the critical values computed for the infinitesimal areas under the extreme tail of your Weibull model. As a result, the critical values you compute for the upper one or two percent of your Weibull model will have virtually no contact with reality. Such computations will always be more of an artifact of the model used than a characteristic of either the data or the process that produced the data. To understand the problems attached to this extrapolation from the region where we have data to the region where we have no data see “Why We Keep Having 100-Year Floods,” (QDD, June 4, 2013) and “The Parts Per Million Problem” (QDD, May 11, 2015).

Industrial data analysis

What impact does all this have on how we analyze data? It helps to provide some perspective on how and why there are two distinctly different approaches to data analysis. For clarity call these the statistical approach and Shewhart’s approach.

The statistical approach uses fixed-coverage intervals for the analysis of experimental data. In some cases these fixed-coverage intervals are not centered on the mean, but rather involve fixed coverages for the tail areas, but this is still analogous to the fixed-coverage central intervals used above. Fixed coverages are used because experiments are designed and conducted to detect specific signals, and we want the analysis to detect these signals in spite of the noise present in the data. By using fixed coverages statisticians can fine-tune just how much of the noise is being filtered out. This fine-tuning is important because additional data are not generally going to be available and they need to get the most out of the limited amount of experimental data. Thus, the complexity and cost of most experiments will justify a fair amount of complexity in the analysis. Moreover, to avoid missing real signals within the experimental data, it is traditional to filter out only 95 percent of the probable noise.

Shewhart’s approach was created for the continuing analysis of observational data that are the by-product of operations. To this end Shewhart used a fixed-width interval rather than a fixed-coverage interval. His argument was that we will never have enough data to ever fully specify a particular probability model for the original data. Moreover, since additional data will typically be available, we do not need to fine-tune our analysis—as long as the analysis is reasonably conservative the exact value of the coverage is no longer critical. This approach allows us to find those signals that are large enough to be of economic importance without getting too many false alarms. So, for the real-time analysis of observational data, Shewhart chose to use a fixed-width, three-sigma central interval. As we have seen, such an interval will routinely filter upwards of 98 percent of the probable noise for any Weibull distribution.

Figure 9: How three-sigma limits work with Weibull distributions

What we have discovered here is that Shewhart’s simple, generic, three-sigma limits will provide a conservative analysis for any and every data set that might logically be considered to be modeled by a Weibull distribution. This is why finding exact critical values for a specific probability model is not a prerequisite for using a process behavior chart. Once you filter out at least 98 percent of the probable noise, anything left over is a potential signal.


About The Author

Donald J. Wheeler’s picture

Donald J. Wheeler

Dr. Wheeler is a fellow of both the American Statistical Association and the American Society for Quality who has taught more than 1,000 seminars in 17 countries on six continents. He welcomes your questions; you can contact him at djwheeler@spcpress.com.



Real World

Meanwhile, out in the land of business, where it seems impossible to make the basics simple enough ... client: "  " ... we are only interested in the results, not the Six Sigma math." Consultant: "I take them through Shewharts Control Charts, +/- 3 Std Dev and why the +/- 1.5 process shift allowance is such nonsense - and it just gets blank looks" '

1.5 sigma shift and the Yeti

I don't know why this is being discussed here.  Dr. Wheeler did not mention this in his article...

Six Sigma math

I can't see how a 1.5 sigma shift could last very long noting that, for an x-bar chart with a sample size of 4, the average run length is 2. That is, the UCL is actually 3/SQRT(4) or 1.5 sigma from the center line, so you have a 50:50 chance of being outside it for any sample.

This brings up the issue of frequency in the context of risk

"Moreover, to avoid missing real signals within the experimental data, it is traditional to filter out only 95 percent of the probable noise." This will actually help me illustrate the issue of frequency, or exposure, in the context of risk-based thinking for ISO 9001:2015.

Frequency of exposure is not from standard FMEA practice, in which we consider only the individual probability of occurrence, but from the Army's risk management process (ATP 5-19)--a public domain document that easily meets the requirements of ISO 31000, which is rather expensive.

"Probability is assessed as frequent if a harmful occurrence is known to happen continuously, regularly, or inevitably because of exposure. Exposure is the frequency and length of time personnel and equipment are subjected to a hazard or hazards. For example, given about 500 exposures, without proper controls, a harmful event will occur. Increased exposure—during a certain activity or over iterations of the activity—increases risk. An example of frequent occurrence is a heat injury during a battalionphysical training run, with a category 5 heat index and nonacclimated Soldiers."

This is something traditional FMEA does NOT consider.

In the case of DOE, we have a traditional 5% chance of wrongly rejecting the null hypothesis, but we are exposed to this risk only once because the experiment is a one-time event. In SPC, however, we are exposed to our false alarm risk every time we take a sample. A 5% alpha risk that is acceptable for a one-time experiment is definitely not acceptable for process management. Even a 2% risk will give us, on average, one false alarm per 50 occurrences. 0.27% gives us, on average, 2.7 per 1000 samples (but more if we throw in the Western Electric zone tests). The frequency of exposure issue makes a 5% Type I risk acceptable for most DOE applications, but not for SPC where we are exposed to the risk hundreds or thousands of times.

As for actually fitting a Weibull (or gamma) distribution, I would not use the average and standard deviation even though one can estimate the parameters this way. The maximum likelihood method, which is used by Minitab and StatGraphics, is much better. In addition, with regard to the long tails of the Weibull distribution, you can get a 95% confidence limit on the nonconforming fraction, which makes the calculations meaningful despite the uncertainty in the data. We have the same issue, by the way, with process performance indices for normal distributions, in which our "Six Sigma" process could be as little as four sigma if we don't have enough measurements. One can similarly get lower confidence limits for PP (chi square distribution for the confidence limits for the process standard deviation), PPU and PPL (noncentral t distribution, foundation for the tolerance interval), and PPk (somewhat harder).

The bottom line is that, if we have enough data, we can get meaningful confidence limits on the nonconforming fraction from a normal or non-normal distribution. If we don't have enough data, we cannot get meaningful estimates of PP or PPk from any distribution.

"exposure" and risk

While I agree with the premise that even a small rate of occurence for a large numbe rof opportunities results in a large numebr of events AND our tolerance for defects has narrowed over the years I'm nto sure the analogy applies to SPC.  One can make the argument about the risk of a false alarm for a single sample, but SPC cocnerns itself with time series data and the use of additional rules and the persistance of a shift will address the possibility of a 'single' false alarm.  This provides us far more protection than any increase in the precision of a distributional model to real world data.