A nifty way is to transform the data so that it _does_ follow a normal distribution. then do the inference, back transform, and there you are!
When done properly it is technically valid. When you are in an 'exploratory' mode, it will save your hide.
I did one once where the prediction came out to -4 defects per 10,000 joints. Whoops! thopse are the boards that fix themselves :)
Transform the data toward normality, prediction, backtransform to get a prediciton of 50 defects per 10,000 joints. The confirmation trial (did I mention that? We need one) came out 53. fortuitously close, but certainly what we expected. The prediciton was sound.
--Jay (by way of qdigest)
Some statistical software packages include automatic Box-Cox transformations to help achieve approximate normality. Incorrectly assuming normality can have a huge impact on the results, as you demonstrate. But you need to get past the fear of transforming data which so many engineers seem to have when doing statistics. More often than not, the choice of how to measure something is arbitrary (for example, as a time or as a rate, as a weight or as a dimension). How to analyze it is not: it must be analyzed in whatever metric shows approximate normality.
P.S. When I said "you" need to get past the fear, I meant we all need to help OTHERS get past the fear. Some many people think we are trying to pull a fast one if we transform the data.
This depends upon the data itself, and the confidence level the user wishes to accomplish. The distribution type can be of great assistance to determine the methodology to employ. The distribution can give clues of what could be happening with that set of data. Once these preliminary ideas are applied to the data, then statistical tests can be applied to determine where to go next. Usually, the experimenter will have a good idea of how the experimental test data will react, I am assuming the experimenter is familiar with the type of test or process from which the data came. It is typical that data from a particular type to test will have recognizable characteristics, so much so that other work on that particular type can be researched and utilized. Following in the footsteps of others is especially useful when the number of test statistics is limited, such as in the case of destructive testing. In closing, there are a series of steps and questions that must be applied that are dependent on the data and information available. One good free software package called EasyStat is located at URL: http://www.geocities.com/Tokyo/Ginza/1276/
Community metadata Could not be loaded. No pre-configured community forum parent found for the incoming fid: . You need to add this forum into custom_code/community.sql
Comments
jims 1/24/2001
A nifty way is to transform the data so that it _does_ follow a normal distribution. then do the inference, back transform, and there you are!
When done properly it is technically valid. When you are in an 'exploratory' mode, it will save your hide.
I did one once where the prediction came out to -4 defects per 10,000 joints. Whoops! thopse are the boards that fix themselves :)
Transform the data toward normality, prediction, backtransform to get a prediciton of 50 defects per 10,000 joints. The confirmation trial (did I mention that? We need one) came out 53. fortuitously close, but certainly what we expected. The prediciton was sound.
--Jay (by way of qdigest)
pyzdek 2/26/2001
I've written an article about performing process capability analysis with non-normal data. It's on the web at http://www.pyzdek.com/non-normal.htm.
Tom
statpoint 2/23/2001
Some statistical software packages include automatic Box-Cox transformations to help achieve approximate normality. Incorrectly assuming normality can have a huge impact on the results, as you demonstrate. But you need to get past the fear of transforming data which so many engineers seem to have when doing statistics. More often than not, the choice of how to measure something is arbitrary (for example, as a time or as a rate, as a weight or as a dimension). How to analyze it is not: it must be analyzed in whatever metric shows approximate normality.
statpoint 2/23/2001
P.S. When I said "you" need to get past the fear, I meant we all need to help OTHERS get past the fear. Some many people think we are trying to pull a fast one if we transform the data.
qualityvision 1/22/2001
This depends upon the data itself, and the confidence level the user wishes to accomplish. The distribution type can be of great assistance to determine the methodology to employ. The distribution can give clues of what could be happening with that set of data. Once these preliminary ideas are applied to the data, then statistical tests can be applied to determine where to go next. Usually, the experimenter will have a good idea of how the experimental test data will react, I am assuming the experimenter is familiar with the type of test or process from which the data came. It is typical that data from a particular type to test will have recognizable characteristics, so much so that other work on that particular type can be researched and utilized. Following in the footsteps of others is especially useful when the number of test statistics is limited, such as in the case of destructive testing. In closing, there are a series of steps and questions that must be applied that are dependent on the data and information available. One good free software package called EasyStat is located at URL: http://www.geocities.com/Tokyo/Ginza/1276/