When do we need to fit a lognormal distribution to our skewed histograms? This article considers the basic properties of the lognormal family of distributions and reveals some interesting and time-saving characteristics that are useful when analyzing data.

ADVERTISEMENT |

### The lognormal family of distributions

While software facilitates the use of lognormal distributions, the following formulas are given here in the interest of clarity of notation. If *X* is a lognormally distributed random variable with parameters alpha and beta, then *Y* = *ln(X)* will be a normally distributed variable with:

While the value for the alpha parameter defines both the median and the scale for the distribution of *X*, it’s the value for the beta parameter that defines the shape of the distribution of *X*. The skewness and kurtosis of the lognormal distribution will increase as beta increases. Below, Figure 1 shows the standardized versions of the lognormal distributions for beta values of 0.25, 0.50, 0.75, 1.00, and 1.25. While all lognormal distributions are said to be mound-shaped, Figure 1 shows that the distinction between J-shaped and mound-shaped blurs for large values of beta.

To compare different lognormal distributions, the table in Figure 2 uses 18 different models with beta values ranging from 0.10 to 1.50. For each model we have the skewness and kurtosis, the areas within one, two, and three standard deviations of the mean, and the z-score for the 99.9th percentile of the model.

The z-scores in the last column of Figure 2 would seem to validate the idea that increasing skewness corresponds to elongated tails. As the skewness gets larger, the z-score for the most extreme part per thousand also increases. But what about the weight of the tails?

Figure 3 plots the areas for the three central intervals against the skewness of the models from Figure 2. The bottom curve of Figure 3 (*k* = 1) shows that the areas found within one standard deviation of the mean of a lognormal distribution will increase with increasing skewness. Since the tails of a probability model are traditionally defined as those regions that are more than one standard deviation away from the mean, the bottom curve of Figure 3 shows us that the areas in the tails decrease with increasing skewness. This contradicts the common notion about skewness and a heavy tail.

So while the infinitesimal areas under the extreme tails will move farther away from the mean with increasing skewness, the classically defined tails don’t get heavier. Rather, they actually get much lighter with increasing skewness. To move the outer few parts per thousand farther away from the mean, you have to compensate by moving a much larger percentage closer to the mean. This compensation is required by the laws of rotational inertia and is both unavoidable and inevitable. To stretch the long tail, you have to pack an ever-increasing proportion into the center of the distribution!

An illustration of this compensation is shown in Figure 4.

So, while skewness is associated with one tail being elongated, that elongation doesn’t result in a heavier tail but rather in a lighter tail. Moreover, Figure 3 also contains a couple of additional surprises about this family of distributions. The first of these is the middle curve (*k* = 2), which shows the areas within two standard deviations of the mean. The flatness of this curve shows that, regardless of the skewness, your lognormal distribution will always have approximately 96% of its area within two standard deviations of the mean.

The second unexpected characteristic of the lognormals is seen in the top curve of Figure 3 (*k* = 3), which shows the areas within three-standard-deviations of the mean. While these areas drop slightly at first, they stabilize around 98% before beginning to climb back up. This means that a fixed-width, three-standard-deviation central interval for a lognormal distribution will always contain at least 98% of that distribution.

### So what gets stretched?

If the tail gets both elongated and thinner at the same time, something has to get stretched. To visualize what gets stretched, we’ll look at the widths of intervals centered on the mean that contain specified areas under the curve. Each column in Figure 6 is for a fixed area, and the values shown are the widths of the corresponding intervals for each of the 18 lognormal models. For example, a lognormal model with a beta parameter of 0.50 will have 92% of its area within 1.49 standard deviations of the mean, and it will have 95% of its area within 1.89 standard deviations of the mean.

Figure 7 shows values from the columns of Figure 6 plotted against skewness. The bottom curve shows that the middle 92% of a lognormal will shrink down into the central zone as skewness increases. The 96% curve remains remarkably flat, hovering near 2.0 standard deviations until the increasing mass in the center of the distribution eventually begins to pull it down. The 98% curve initially grows, and then plateaus just below 3.0 standard deviations. The spread of the top three curves shows that for the lognormal models it’s primarily the outermost 2% that gets stretched into the extreme upper tail.

So, while 920 parts per thousand are moving toward the mean, and while another 60 parts per thousand get slightly shifted outward and then stabilize, it’s primarily the outer 20 parts per thousand that bear the brunt of the stretching and elongation that goes with increasing skewness.

### The purpose of analysis

The purpose of analysis is insight, and to gain insight from our data, we have to filter out the probable noise to find any potential signals. This is the objective of statistical analysis. Statistical techniques seek to wrap up the routine variation so we can identify the unusual values within our data.

When working with experimental data that may have taken months to obtain, statisticians tend to very carefully model the routine variation to be sure that the interval associated with the probable noise will be unlikely to contain any of the potential signals that the researchers have just spent time and money trying to find. This approach is like using the table in Figure 6. Fit a model, fix the area to filter out, and then find the exact width of interval to use.

With industrial data, a simpler approach is feasible: Here, we’re trying to do the same thing over and over, and the signals of interest are signals of changes that are substantial enough to be of economic impact. So we bundle up nearly everything as probable noise and react only to potential signals that are clearly not part of the routine variation. From Figure 2 we see that the mean plus or minus three standard deviations will filter out 98% or more of every lognormal distribution. This one-size-fits-all approach will filter out virtually all of the probable noise for any set of data that might be modeled by a lognormal distribution.

### Summary

So, how do you filter out the noise when you think your data are modeled by a lognormal distribution?

You could find bespoke values for the parameters of a lognormal distribution based on your data, and then find the exact interval that will wrap up a specific amount of the probable noise.

Or you could use the one-size-fits-all approach of three-sigma limits. This approach is guaranteed to filter out at least 98% of the probable noise regardless of which lognormal model may fit your data, which is why a process behavior chart for individual values will work even when you think the data might be lognormally distributed.

Either way, regardless of whether we construct a complex filter or use a simple filter, we’re talking about packaging that portion of the data that will be of little interest. The interesting parts of our data will be the potential signals that are left over after we filter out the noise. This is where the insights will be found. And the best analysis will always be the simplest analysis that allows us to gain these insights.

## Add new comment