An Alternative Test for Randomness of Error Terms in a Regression Model

There’s more than one way to monitor key variables

Regression analysis is used in a variety of manufacturing applications. An example of such an application would be to learn the effect of process variables on output quality variables. This allows the process control people to monitor those key variables and keep the output variables at the desired level.

Regression analysis is also used in design of experiments (DOE) to identify the key process variables that have the most effect on the quality of the end product or service. In addition, if the process is autocorrelated and we want to perform statistical process control (SPC), regression models (i.e., autoregressive models) could help model the autocorrelation in the process and help modify the SPC application accordingly so that the right questions can be tested on the control charts.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Question about +/- 3

I'm curious why you would promote +/-3 vs a 95% confidence interval for the test statistic (which would be +/- 2 essentially). We're much more conservative on a single observation of subgroup size of whatever, so +/- 3 makes sense. But it doesn't make sense to me in that most cases our default value for test statistics is 95%. Why that conservative (essentially 99.73% confidence for rejecting the Ho)? Thanks!

Response

Thanks for the comment. You can use +/- 2; we used +/-3 as an example. Just keep in mind that when +/-2 is used, type 1 error would be higher.

Power of Randomness Tests

Tukey (1991) observed:

Statisticians classically asked the wrong question - and were willing to answer with a lie, one that was often a downright lie. They asked "Are the effects of A and B different?" and they were willing to answer "no".

All we know about the world teaches us that the effects of A and B are always different - in some decimal place - for any A and B. (p. 100)

Most of us know the procedure for guaranteeing that we find a statistically significant difference. We select a huge sample and evaluate the data from this sample. The same effect size (absolute difference between s² and q²) could be statistically significant for a sample of n=50 and not statistically significant for a sample of n=10. The p value provides no information about the magnitude of the effect size (e.g., Cohen, 1994; Cook, 2010).

Thus, "Is the difference between the variance calculated from the sum of squared deviation scores divided by degrees of freedom (s² - the "regular variance" in equation 1) and the variance calculated from the mean square successive differences (q² - the "MSSD variance" in equations 2 & 3) statistically significant?" is not (to my mind) the most important question.

The question that interests me more is "what is the magnitude of a meaningful (concerning) difference between s² and q²". In other words, "What is the magnitude of a 'meaningful' effect size?"

The next question that interests me is "what is the power of the test"? Power refers to the proportion of times we would reject the Null Hypothesis when the effect size has at least a "meaningful" magnitude.

Is there any information concerning the power of tests of the randomness of error terms?

References

Cohen, J. (1994). The Earth is round (p<0.05). American Psychologist, 49(12), 997–1003.

Cook, C. (2010). Five per cent of the time it works 100 per cent of the time: the erroneousness of the P value. Journal of Manual and Manipulative Therapy, 18(3), 123-125.

Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6(1), 100-116.

response

Thanks for the comments. Your comments hold for all the significance tests. The American Statistical Association, for example, recently published a report on the abuse of the p-value (“The ASA's statement on p-values: context, process, and purpose,” by Ronald L. Wasserstein & Nicole A. Lazar, The American Statistician, March 2016). The intent of this paper was not to discuss “the magnitude of a meaningful difference between s² and q²" or develop a power test for the proposed method. It was merely offering another alternative to the DW test. The issue that you raised and the issue about the power of the test are well taken and could be the topic of another research paper.

Type I error

I recognize the Type I error. I actually cited it in my response, which can't be said for the article. My question remains. Why promote +/- 3 for a hypothesis test? I get it on a single value basis, but not on a hypothesis test of a test statistic. What you have is a test statistic. Most of the world chooses 90%, 95% and 99% for confidence levels in hypothesis tests. Your article doesn't suggest anything about alternatives to +/-3 so the inference is that you believe that +/- 3 is the best choice (or 99.73%). Why?

We did not say that we

We did not say that we believe +/- 3 is the best choice (or 99.73%); +/- 3 sigma limits are commonly used in SPC applications (e.g., control charts). When Walter Shewhart introduced the control charts, he stated that +/- 3 sigma limits balances the cost of Type I and Type II errors. Though one can use any limits he/she prefers.