We all know what happens when we assume. For example, traditional designed experiments assume that residuals, the differences between the actual and modeled data, follow the normal distribution (as seen in figure 1). These experiments include t tests, analysis of variance (ANOVA), factorial designs, and linear regression. ANOVA is said to be robust to this assumption—that is, it will work even if the normality requirement is not strictly met. However, there are limits even to this.
ADVERTISEMENT |
The results are far more trustworthy, and the test may be better able to detect differences between treatments or levels, if a transformation can make the residuals conform to the normal distribution.
“Assume” is certainly true if we perform a designed experiment and fail to test the residuals for normality. The same applies if we report a process performance index without making sure that the process data follow the assumed distribution. The estimated nonconforming fraction can easily be off by orders of magnitude.
A wide variety of tests for normality are available, of which the Anderson-Darling test is among the most powerful; that is, it’s best able to detect departures for normality. Consider, for example, this simulated experiment to reduce impurities (e.g., ppm) from a process. Impurities generally follow a gamma distribution, which was the basis of this simulation.
Control |
Experiment |
0.931 |
0.091 |
0.334 |
0.354 |
1.597 |
0.140 |
0.912 |
0.291 |
0.883 |
1.428 |
0.373 |
0.043 |
2.503 |
0.755 |
1.090 |
0.445 |
0.367 |
0.438 |
0.274 |
0.030 |
1.140 |
1.152 |
0.390 |
0.253 |
2.318 |
0.044 |
1.437 |
0.321 |
0.370 |
1.073 |
0.807 |
0.417 |
2.105 |
0.743 |
1.108 |
0.536 |
0.784 |
0.921 |
2.241 |
0.456 |
0.648 |
0.328 |
0.536 |
0.548 |
2.808 |
0.605 |
0.355 |
0.408 |
0.858 |
0.191 |
0.202 |
0.216 |
0.644 |
0.807 |
0.327 |
0.297 |
0.649 |
0.046 |
1.778 |
0.027 |
If we perform one-way ANOVA, StatGraphics returns an F statistic of 14.80 with one numerator and 58 denominator degrees of freedom. It’s not adequate, however, to simply put the data into a software package and “assume” that the conclusions are correct.
Always check the residuals
Assessment of the residuals is a mandatory part of any designed experiment. The residual for the ith measurement is defined as where is the value predicted from the model. The latter can be the function of the regressor variables (Xi) in linear regression or, in ANOVA, the average for the combination of treatments that produced the response yi . In this case, the residual is, for a control datum, the individual measurement minus the average of all the controls. The residual for an experimental measurement is that measurement minus the average of all the experimental measurements. The residuals should therefore reflect random error exclusively, and these random errors should follow a normal distribution with a mean of zero. Figure 2 shows their histogram, and figure 3 their normal probability plot. Both figures are from StatGraphics, as is figure 4.
Figure 1.
Figure 2.
Another good check on a designed experiment is a plot of the residuals vs. the level of the factor, as seen in figure 4. Because the residual is simply the difference between the measurement and the average of its level, choice of level should not affect the distribution of the residuals. Figure 4 shows clearly, however, that there’s less spread in the residuals from the experiment than in those from the control.
Figure 3.
A visual assessment of figures 2 through 4 suggests that the residuals do not meet the requirements of the normality assumption. The modified Anderson-Darling test statistic of 1.774 has a p value of 0.000154, which means we can reject the normality assumption with better than 99.9-percent confidence. The Wilk-Shapiro test also rejects the normality assumption. In addition, the chi square test for goodness of fit returns 13.03 with five degrees of freedom (after combining the end cells to meet the requirement that the expected count be five or more in each cell), which also rejects soundly the hypothesis that the residuals follow the normal distribution.
In terms of fewer impurities, the experiment was in fact superior to the control, as confirmed by ANOVA on the cube-root transformation of the data. The cube root is known to be a good transformation for the gamma distribution, and nothing about the residuals suggested that they might be nonnormal. The nonparametric Kruskal-Wallis rank sum test, which relies on no assumptions about the distribution of the data or their residuals, delivers the same conclusion.
I recall, however, an actual workplace example that involved four factors, with defects as the response variable. Assessment of the factorial design failed to reject the null hypothesis for all four factors, even though it was obvious from the data that some level combinations delivered far more defects than others. Needless to say, the normal probability plot and histogram of the residuals showed that there was a problem. The recommended square-root transformation for defect (Poisson) data, however, resulted in textbook-perfect results that identified the significant factors, as well as non-rejection of the normality assumption for the residuals.
I use “non-rejection” deliberately because we can never prove the null hypothesis that the residuals follow the normal distribution, just as acquittal in a criminal trial does not prove the defendant innocent. The basic idea is that we start by assuming that the residuals follow the normal distribution but, to avoid the situation depicted in figure 1, we perform the goodness-of-fit tests to determine whether this hypothesis can be rejected beyond a specified reasonable doubt. This is generally 5 percent, although a judgment call could be made even if the significance level, Type I risk, or P value (they all mean the same thing) is greater.
What if the assumptions aren’t met?
If we must reject the null hypothesis that the residuals follow the normal distribution, two remedies are available. The first is to use a nonparametric method such as the previously mentioned Kruskal-Wallis test, which relies solely on the ranks (ordinal data) of the control and experimental measurements. The drawback is that a nonparametric test is less powerful, i.e., it’s less able to detect differences between the experiment and the control than the tests that do rely on the normality assumption.
Another remedy is to use the Box-Cox transformation, for which StatGraphics and Minitab both have built-in procedures. The basic idea is as follows. Use the following as the response variable in your ANOVA or regression problem where is the geometric mean of the responses.
The geometric mean is the 1/n root of the product of n data, or alternatively
Find the lambda that minimizes the sum of squares of errors, and use it to transform the response variables. If lambda equals zero, use the natural logs of the data. Note, however, that all the responses must be positive because you can’t take a negative number to a non-integral power. If we do this with the data in table 1, we get figure 5 (from Excel). This suggests that the optimum transformation is roughly the 0.28 power rather than the cube root, although the cube root worked quite well.
Figure 4.
StatGraphics, in fact, optimizes lambda at 0.282 (as seen in figure 6) if the ANOVA problem is presented as a regression problem with indicator variables—that is, x = 0 for the control and 1 for the experiment.
Figure 5.
Another example involves linear regression to find viscosity (a measurement of resistance to flow) as a function of temperature. Liquids become less viscous as their temperature increases, as modeled by the Andrade Equation, where T is the absolute temperature (Kelvin or Rankine), and A and B are constants. It is, in fact, similar to the Arrhenius equation for chemical reaction rate constants, except the argument of the exp operator is negative.
If we pretend to know only that viscosity is somehow related to the reciprocal temperature, StatGraphics optimizes lambda as 0.02 for 10 data pairs for glycerol. This suggests that the natural log of the viscosity is, in fact, the best transformation, which takes us back to the Andrade Equation.
In summary, then, we should never assume. Instead, keeping figure 1 in mind, we should test the assumption that the residuals from the experiment follow a normal distribution. If they don’t, our conclusions are likely to be invalid. If an optimized transformation won’t do the job, though, a nonparametric method is the next best alternative.
Comments
Pop corn
This is turning into a war. My money is on Shewhart and Dr Wheeler. Pass the pop corn.
Add new comment