Featured Product
This Week in Quality Digest Live
Management Features
Gleb Tsipursky
Belief that innovation is geographically bound to office spaces is challenged by empirical evidence
Andy J. Yap
When organizations merge, people must come together
Gene Russell
Resources to help increase your financial literacy
Michael King
Augmenting and empowering life-science professionals
Meg Sinclair
100% real, 100% anonymized, 100% scary
Management News
For companies using TLS 1.3 while performing required audits on incoming internet traffic
Accelerates service and drives manufacturing profitability
New video in the NIST ‘Heroes’ series
A tool to help detect sinister email
Developing tools to measure and improve trustworthiness
Manufacturers embrace quality management to improve operations, minimize risk
How well are women supported after landing technical positions?
Part of CIMdata’s educational series
Management

## What You Need to Know About Gamma Probability Models

### The more you know, the easier it becomes to use your data

Published: Wednesday, February 7, 2024 - 12:03

Clear thinking and simplicity of analysis require concise, clear, and correct notions about probability models and how to use them. Here, we’ll examine the basic properties of the family of gamma and chi-square distributions that play major roles in the development of statistical techniques. An appreciation of how these probability models function will allow you to analyze your data with confidence that your results are reliable, solid, and correct.

### The gamma family of distributions

Gamma distributions are widely used in all areas of statistics, and are found in most statistical software. Since software facilitates our use of the gamma models, the following formulas are given here only in the interest of clarity of notation. Gamma distributions depend upon two parameters, denoted here by alpha and beta. The probability density function for the gamma family has the form:

where the symbol Γ(α) denotes the gamma function (for α > 0):

The mean and variance for a gamma distribution are:

The alpha parameter determines the shape of the gamma model, and the beta parameter determines the scale. When the value for alpha is 1.00 or less, the gamma distributions will be J-shaped. Alpha values greater than 1.00 result in mound-shaped gamma models. As the value for alpha gets large, the gammas approach the normal distribution. Since we consider these distributions in standardized form, the value for the beta parameter won’t affect any of the following results. Five standardized gamma distributions are shown in Figure 1.

Figure 1: Five standardized gamma distributions

Chi-square distributions are a subset of the family of gamma distributions. A chi-square distribution with k degrees of freedom is a gamma distribution with beta = 2 and alpha = k/2 (for integer values of k). Thus, the distributions above are standardized chi-square distributions with 1, 2, 4, 8, and 32 degrees of freedom.

So what’s changing as you select different gamma probability models? To answer this question, Figure 2 considers 19 different gamma models. For each model, we have the skewness and kurtosis, the areas within one, two, and three standard deviations on either side of the mean, and the z-score for the 99.9th percentile of the model.

Figure 2: Some properties of gamma models

The z-scores for the last part per thousand show that the upper tails get elongated with increasing skewness. But there is a surprise contained in the other columns of Figure 2.

Figure 3 plots the areas for the three, fixed-width, central intervals. The bottom curve of Figure 3 (k = 1) shows that the areas found within one standard deviation of the mean of a gamma distribution will increase with increasing skewness. Since the tails of a probability model are traditionally defined as those regions that are more than one standard deviation away from the mean, the bottom curve of Figure 3 shows us that the areas in the tails must decrease with increasing skewness. This contradicts the common notion about skewness being associated with a heavy tail.

Figure 3: How the coverages vary with skewness for gamma distributions

While the infinitesimal areas under the extreme tails will move farther away from the mean with increasing skewness, the tail as a whole does not get heavier. Rather, it actually gets much lighter with increasing skewness. To move the outer few parts per thousand farther away from the mean, you must compensate by moving a much larger percentage of the area closer to the mean. This compensation is unavoidable and inevitable. To stretch the long tail, you have to pack an ever-increasing proportion into the center of the distribution. An illustration of this compensation is shown in Figure 4.

Figure 4: The compensation for increasing skewness for gammas

So while skewness is associated with one tail being elongated, that elongation doesn’t result in a heavier tail but rather in a lighter tail. Moreover, Figure 3 also contains a couple of additional surprises about this family of distributions. The first of these is the middle curve (k = 2), which shows the areas within two standard deviations of the mean. The flatness of this curve shows that, regardless of the skewness, a gamma distribution will always have about 95% to 96% of its area within two standard deviations of the mean.

The second unexpected characteristic of the family of gamma distributions is seen in the top curve of Figure 3 (k = 3), which shows the areas within three standard deviations of the mean. Although this area does drop slightly at first, it stabilizes for the J-shaped gammas at about 97.6%. This means that a fixed-width, three-standard-deviation central interval for a gamma distribution will always contain at least 97.6% of that distribution.

Figure 5: What gamma distributions have in common

### So what gets stretched?

If the tail gets both elongated and thinner at the same time, something has to get stretched. To visualize what gets stretched, we’ll look at the radii for intervals centered on the mean that contain a specified area under the curve. The columns in Figure 6 show different fixed areas, while the rows correspond to different gamma distributions.

For example, a gamma model with an alpha parameter of 64 will have 92% of its area within 1.74 standard deviations of the mean, and it will have 95% of its area within 1.95 standard deviations of the mean. Additionally, a gamma model with an alpha parameter of 1.25 will have 92% of its area within 1.53 standard deviations of the mean, and it will have 98% of its area within 2.84 standard deviations of the mean.

Figure 6: Radii for central intervals covering fixed areas

Figure 7 shows the values in each column of Figure 6 plotted against skewness. The bottom curve shows that the middle 92% of a gamma will shift toward the mean with increasing skewness. The 95% fixed-coverage intervals are remarkably stable until the increasing mass near the mean eventually begins to pull this curve down. The 97.5% fixed-coverage intervals initially grow until they plateau near three standard deviations. The spread of the top three curves shows that for the gamma models it’s primarily the outermost 2% that gets stretched into the extreme upper tail.

Figure 7: Widths of fixed-coverage central intervals

So, while the central 920 parts per thousand are shifting toward the mean, and while another 60 parts per thousand get slightly shifted outward and then stabilize, it’s primarily the outer 20 parts per thousand that bear the brunt of the stretching and elongation that goes with increasing skewness.

### Fitting a gamma distribution to your data

To fit a gamma distribution to your data, you may estimate the alpha parameter by squaring the ratio of your average to your standard deviation statistic. To estimate the beta scale parameter, you then divide your average by the estimate of alpha. With these two estimated parameter values, your software will provide you with critical values or computed areas beyond specification limits. Easy as can be. However, your estimated parameters will depend upon a couple of ratios, which makes them—and your results—highly variable.

You can investigate this uncertainty with a simple simulation study. I used an exponential distribution (where alpha and beta were both equal to 1.00) to generate 5,000 data sets of size n = 100. For each data set, I estimated the value of alpha and beta as described above. The estimates for the shape parameter alpha ranged from 0.495 to 2.103.

As a result, even though we may compute exact critical values based on our estimated parameters, the results will never have the expected precision. While we may want to filter out exactly 95% of the noise, we’ll only filter out approximately 95%. The uncertainty is hidden by the complexity of the computations, but it remains there all the same.

For an example of this uncertainty, assume that the upper specification limit is six standard deviations above the mean. The original model with alpha = 1.0 would then have 912 ppm nonconforming. But with alpha = 2.1, the fitted model would predict only 8 ppm nonconforming. And with alpha = 0.5, the fitted model would predict 8,151 ppm nonconforming. These two estimates only differ by a factor of a thousand.

When you fit a model to your data you will be fitting the central portion, not the tails. And the uncertainty in the estimated parameter values will always result in huge differences in the infinitesimal areas under the extreme tails. This is what makes fitting a model to your data and then using that model to compute a parts per million nonconforming value completely bogus. You can do the computation, but the results have nothing to do with the underlying process. This is why, when people start talking about parts per million nonconforming, you can be sure that reality has left the building.

### The purpose of analysis

The purpose of analysis is insight. To gain insight, we have to detect any potential signals within the data. To detect potential signals, we must filter out the probable noise. And filtering out the probable noise is the objective of statistical analysis.

When working with experimental data, where researchers are spending time and money trying to detect potential signals, it’s reasonable to very carefully model the routine variation. By modeling the routine variation, we can package some specified percentage of the probable noise to be ignored (usually 95%) and then look for values outside that package that look like potential signals. This approach is like using the table in Figure 6. Fit a model, choose a specific area to filter out, and then find the exact width of interval to use in packaging the noise.

With industrial data, a simpler approach is feasible. Here, we’re trying to do the same thing over and over, and the signals of interest are changes that are large enough to have an economic effect. In this case, we bundle up nearly all of the routine variation as probable noise and react only to potential signals that are clearly not part of the routine variation.

From Figure 2 we see that the mean plus-or-minus three standard deviations will filter out 97.6% or more of any and every gamma distribution. Thus, this one-size-fits-all approach will filter out virtually all of the probable noise for any set of data that might be modeled by a gamma distribution.

Figure 8: How three-sigma limits work with gamma distributions

### Summary

So how do you filter out the noise when you think your data are modeled by a gamma distribution?

You could find bespoke values for the parameters of a gamma distribution based on your data, and then find an exact interval that you hope will wrap up a specific amount of the probable noise. (This approach becomes unreliable in the extreme tail.)

Or, you could use the one-size-fits-all approach of three-sigma limits. This approach is guaranteed to filter out at least 97.6% of the probable noise regardless of which gamma model may fit your data. This is why a process behavior chart for individual values will work even when you think the data might be modeled by a skewed gamma distribution.

Either way, regardless of whether we construct a complex filter that we hope may be right or use a simple filter that we know will work, we’re talking about packaging that portion of the data that will be of little interest. We don’t need to argue about how to package the trash. The interesting parts of our data will be the potential signals that are left over after we filter out the noise. This is where the insights will be found. And the best analysis will always be the simplest analysis that allows us to gain these insights.

### Donald J. Wheeler

Dr. Wheeler is a fellow of both the American Statistical Association and the American Society for Quality who has taught more than 1,000 seminars in 17 countries on six continents. He welcomes your questions; you can contact him at djwheeler@spcpress.com.