One thing burned into the brains of those who survive a statistics class is that you have to specify an alpha-level before you do anything statistical. And when it comes to statistical inference, they are correct. But just what does the alpha-level represent? What does it mean in practice? Read on to find out.
ADVERTISEMENT |
The decision problem
With experimental data, the essence of statistical analysis is the decision regarding the relationship between a factor and some response variable. For example, if changes in the level of Factor A do not result in changes in the level of the Response Variable Y, then Factor A does not have an effect upon Variable Y. (This is usually called the null state.) If changes in Factor A do result in changes in the level of Variable Y, then Factor A does have an effect upon the response variable. (Call this the alternate state.) The experimenter must decide which state of nature exists. Thus we have the decision matrix shown in figure 1. On the left side we have the two possible decisions we can make, and across the top we have the two states of nature. The resulting two-by-two grid contains two correct decisions and two incorrect decisions.
A study will be said to be conservative when the analysis is performed in such a way that there should be very few false alarms. Here it is more important to be right than to find all of the signals, so we want to be confident that we are likely to be correct when we say Factor A affects the Response Y.
Given a finite amount of data, the only way to minimize false alarms will be to minimize the number of times that Decision No. 1 is made. By avoiding the first decision, we will automatically minimize the opportunities for a false alarm. A conservative analysis is appropriate for experimental data when confirmation from subsequent experiments is not readily available.
A study will be said to be exploratory when the analysis is performed in such a way that there should be few missed signals. Here it is more important to find all of the signals than to avoid false alarms, so we want to be confident that we are likely to be correct when we say that Factor A does not affect Response Y.
Given a finite amount of data, we obtain an exploratory analysis by traditionally using decision rules which encourage the first decision: “Factor A affects Y.” By making Decision No. 1 more often, Decision No. 2 will be made less often, and there will be fewer opportunities for a missed signal.
Equation 1
In order to characterize the four outcomes for the decision matrix in figure 1 more completely, let Θ represent the a priori probability that the null state is true. This means that the a priori probability for the alternate state will be
[ 1 – Θ].
Probability that null state is true = Θ
Probability that alternate state is true = 1 – Θ
Equation 2
Furthermore, under the condition that the null state is true, let the probability of Decision No. 1 be denoted by α. This will result in a conditional probability for Decision No. 2 of [ 1 – α ].
Probability of making Decision No. 1 given that null state is true = α
Probability of making Decision No. 2 given that null state is true = 1 – α
Equation 3
Finally, under the condition that the alternate state is true, let the probability of Decision No. 2 be denoted by β. This will result in a conditional probability for Decision No. 1 of [ 1 – β ].
Probability of making Decision No. 1 given that alternate state is true = 1 – β
Probability of making Decision No. 2 given that alternate state is true = β
Using the three values of α, β, and Θ we can characterize the probabilities of each of the four outcomes in the decision matrix as shown in figure 2.
Equation 4
Before we make a decision we have four possible outcomes. Two of these are correct, and two are incorrect. Thus, prior to making a decision, the probability of making an incorrect decision is:
Pr{ False Alarm } + Pr{ Missed Signal } = αΘ + β (1-Θ)
Other considerations require that β must be less than [ 1 – α ]. Because of this, for any fixed amount of data, any increase in α will result in a decrease in β, and any decrease in α will result in a increase in β. This means that you cannot simultaneously reduce both terms in equation four except by collecting more data. However, equation four shows the rationale behind exploratory and conservative analyses.
If we believe the null state is correct, then θ will be large, and to avoid a false alarm we will want a small value for α.
If we believe the alternate state is correct then [ 1 – Θ ] will be large, and to avoid a missed signal we will want a small value for β. Since β must be less than 1-α we can always make β smaller by choosing a larger value for α.
If we must perform our analysis in such a way that we can convince a skeptic, then we will adopt the skeptic’s position, assume that the null state is true, and perform a conservative analysis using a small value for α
If we have the luxury of easily confirming our decision when we make Decision No. 1, so the problem of an occasional false alarm is not so severe, we might take a more exploratory approach and use a larger alpha-level in order to reduce the risk of missing a signal.
What does the alpha-level do?
As noted in the last two statements, the alpha-level specifies our attitude to the analysis. We use the value of alpha to characterize our approach to the decision problem. The value of alpha does not, by itself, characterize any aspect of the final results of our decision-making process.
Consider, for example, what happens if we make Decision No. 1. If we make Decision No. 1 then we hope that the alternate state is true, and according to the calculus of probabilities the probability of having made a correct decision is:
Equation 5
Prob{ Alternate State is true given Decision No. 1 } =
while the probability that we have made an incorrect decision is:
Equation 6
Prob{ Null State is true given Decision No. 1 } =
On the other hand, when we make Decision No. 2 we hope that the null state is true, and the probability that we have made a correct decision becomes:
Equation 7
Prob{ Null State is true given Decision No. 2 } =
while the probability that we have made an incorrect decision is:
Equation 8
Prob{ Alternate State is true given Decision No. 2 } =
The point of writing out these four formulas is to make it clear that none of these four probabilities is uniquely determined by the value of α.Therefore, when we call α the “significance level for the test,” we are saying one thing while meaning something else. Whenever we specify the alpha-level for a decision we are not defining the probability that we will make the right or wrong decision, but we are rather characterizing our attitude toward the analysis by specifying our willingness to risk a false alarm under the condition that the null state is true.
In traditional inference techniques we have to specify a value for alpha in order to use a probability model to obtain critical values that we can use to make the decision. In finding these critical values we usually assume that:
1. The measurements do not display any discreteness
2. The measurements come from a process that does not change over time
3. The measurements are normally and independently distributed
Under these assumptions it is possible to obtain decision rules that will yield specific theoretical values for α and β. Unfortunately, actual data are always discrete, they are never normally distributed, and processes do change over time. However, given the necessity of filtering out at least some of the noise, and lacking any other guidelines on how to obtain a decision rule for separating potential signals from background noise, we use decision rules based upon the probability models. The probability models can be thought of as providing a theoretical approximation that can be used as a guide for practice. The decision rules obtained in this manner are probably not the optimum decision rules for any given real-world situation, but they at least have the property that they are logical and appropriate in the theoretical world, and they are reasonably efficient in practice.
Thus, in describing the decision rules based upon probability models, we have to specify a value for alpha. In practice, the alpha-level for a procedure simply tells the reader how the experimenter defined the decision rule for interpreting the data. Conservative analyses will use a small alpha-level, and exploratory analyses will use a large alpha-level. Therefore, while the alpha-level does not actually give the probability of a false alarm, or define the confidence level for a set of results, it does rank the sensitivity of the analysis. For a given data set, larger alpha-levels will yield a more sensitive analysis, while smaller alpha-levels will yield a less sensitive analysis. Conservative analyses will generally use a 1-percent alpha-level. Traditional analyses will generally use a 5-percent alpha-level, and exploratory analyses will generally use a 10-percent alpha-level.
In the industrial context, where today’s experimental results can often be verified by tomorrow’s production, most experimental studies should use the exploratory approach. With the exploratory approach you are less likely to miss any effects that might be of interest, and any false alarms will be quickly identified as such by the accumulation of subsequent data.
In those situations where subsequent experiments are difficult to arrange it is common to build the replication into the experiment by using multiple experimental units in a single study. This model is frequently used in agricultural and biomedical experiments. Here a traditional or conservative approach to analysis is appropriate.
How does the alpha-level differ from the p-value?
Say you have an F-test or a t-test. When performing this test manually you would compare the observed test statistic (F-observed) with a critical value (F-critical). The critical value would depend upon the alpha-level for your test. If the observed test statistic (F-observed) is more extreme than the critical value (F-critical), you would say that you have found a detectable effect at whatever alpha-level you were using. Here you would make Decision No. 1. Otherwise you would have failed to detect an effect and would make Decision No. 2.
Most software packages make the comparison above by reporting a p-value for the test. This p-value is the probability of exceedance for the test statistic. This means that the p-value is a transformed version of the observed test statistic (F-observed). Instead of comparing the observed test statistic with a critical value, the p-value becomes the test statistic and it is directly compared to the alpha-level chosen by the investigator. When the p-value is smaller than the alpha-level you make Decision No. 1, and when the p-value is larger than the alpha-level you make Decision No. 2.
So, once again, the alpha-level defines the investigator’s attitude to the analysis. It defines the user’s cut-off between the potential signals and the probable noise. A p-value that is smaller than the chosen alpha-level will lead to Decision No. 1, and a p-value that is larger than the chosen alpha-level will lead to Decision No. 2.
Neither the p-value nor the alpha-level define the significance (importance) of the result, they merely allow you to separate the potential signals from the probable noise. They allow you to identify the detectable signals. With a lot of data you may detect trivial signals, and with small amounts of data you may fail to detect important signals. This is why I do not use the language of “statistical significance.” It is completely misleading and a continual source of confusion. To say that something is “statistically significant at a 5-percent alpha-level” simply means that you have detected a potential signal by using critical values that will give a false alarm 5-percent of the time when the null state is true. The actual probability that you have made the correct decision is given by equation 5 above. While a small value for alpha will tend to make equation 5 large, we cannot always compute the value for equation five in practice. So stop talking about “significant results” and start talking about “detectable signals.” The resulting clarity will be appreciated by your clients.
Comments
Symbols
brave new statistical world
I thought alpha-levels were only mentioned in the famous Huxley's novel, I was evidently wrong. But we are all wrong when we think and speak in terms of "results" only: it's years that I suggest to think more in non teleological terms, thefore not thinking in terms of results and objectives only but of a process "as it is". I find the categorization "detectable signal" a very interesting - though quizzical - one: clients, and not only them, will have to be guided to understand its meaning and use it.
couldn't read either
I thought I was the only one. I knew the material so I didn't post. Yes, can't read the greek letters either except in the pictures.
Greek Letters: Apologies
To everyone not seeing the Greek letters. Apologies. They show up in FireFox and Chrome but not in IE for some reason. I am trying to get this fixed. Meanwhile, open in FireFox or Chrome. Thanks
Update. The problem has been fixed. Thanks
Greek Letters
Add new comment