Our PROMISE: Our ads will never cover up content.
Our children thank you.
Donald J. Wheeler
Published: Monday, April 5, 2010 - 10:10
Courses in statistics generally emphasize the problem of inference. In my December column, “The Four Questions of Data Analysis,” I defined this problem in the following manner:
Given a single unknown universe, and a sample drawn from that universe, how can we describe the properties of that universe?
In general, we attempt to answer this question by estimating characteristics of the universe using statistics computed from our sample.
My point estimate for the proportion of yellow beads in the bead box is thus:
p = 65/500 = 0.1300 (or 13%)
and the usual 90-percent interval estimate for the proportion of yellow beads is:
I repeat the experiment and find 43 yellow beads out of 500 beads sampled. Combining the results from both experiments we have a point estimate for the proportion of yellow beads in the bead box of:
p = 108/1000 = 0.1080 (or 10.8%)
While the point estimate changed, the element of interest here is how the uncertainty decreased from 0.0247 to 0.0161 as we went from using 10 samples to using 20 samples in our estimate. With increasing amounts of data our estimates come to have lower levels of uncertainty. Figure 1 shows the results of 20 repetitions of this experiment of drawing 10 samples of 50 beads each from my bead box. The second column gives the cumulative number of yellow beads. The third column gives the cumulative number of beads sampled. The fourth column lists the cumulative point estimates of the proportion of yellow beads, while the last two columns list the end points for the 90-percent interval estimates for the proportion of yellow beads. Figure 2 shows these last three columns plotted against the number of the experiment which is listed in the first column of figure 1.
Figure 1: Twenty bead box experiment
Figure 2: Cumulative proportion yellow and 90% interval estimates
As we look at figure 2, we see the point estimate converge on a value near 0.11 and stabilize there while the uncertainty keeps dropping and the interval estimate gets tighter and tighter. This is the picture shown in textbook after textbook, and the source of the theorem of large samples. Based on this graph, it would appear that this experiment will yield an average of 11-percent yellow beads.
But is this a reasonable estimate of the proportion of yellow beads in my bead box? Because there are only 4,800 beads in the box, the 20 repetitions of our experiment effectively looked at every bead in the box twice. Yet, by actual count, the box contains only 10-percent yellow beads, a value that was outside the interval estimate from experiment No. 5 on. As we collected more and more data our point estimate did converge, but it did not converge to the “true value.”
So here we come to the first problem with the theorem of large samples. The whole body of computations involved with estimation are built on certain assumptions. One of these is the assumption that we have drawn random samples from the universe. However, random samples are nothing more than a concept. There is no rigorous mathematical definition of random. In practice, we always have to use some sort of sampling system or sampling device. Here we used mechanical sampling. And regardless of how careful we may be, mechanical sampling is not the same as the assumption of random sampling.
Here we ended up with an excess number of yellow beads in our samples. Nothing in our computations can compensate for this bias. Moreover, in practice, where we cannot stop the experiment and find the “true value” by counting all the beads, there will be no way to even detect this bias. Thus, in practice, the first problem with the theorem of large samples is that, because of the way we obtain our data, our estimates may not converge to the values that we expect.
If this problem is not enough to give you pause, there is an even bigger problem with the theorem of large samples.
To illustrate this second problem, I shall use the batch weight data shown in figure 3. There you will find the weights, in kilograms, of 259 successive batches produced during one week at a plant in Scotland. For purposes of this example, assume that the specifications are 850 kg to 990 kg. After every tenth batch, the capability ratio is computed using all of the data for that week. Thus, just like the proportion of yellow beads, we would expect to see these capability ratios converge to some value as the uncertainty drops with the increasing amounts of data used. Figure 4 shows the number of batches used for each computation, the capability ratios found, and the 90-percent interval estimates for the process capability. These values are plotted in sequence in figure 5.
Figure 3: The batch weight data
Figure 4: Cumulative estimates of capability ratio for batch weight data
Figure 5: Cumulative estimates of capability ratio for the batch weight data
There we see that these capability ratios do not converge to any one value. Instead they meander over time. While the uncertainties decrease with increasing amounts of data, this reduction in uncertainty is meaningless when the target itself is uncertain. We get better and better estimates of something, but that something may have already changed by the time we have the estimate.
To understand the behavior of these capability ratios we need to look at the XmR chart for these data in figure 6. The limits shown are based on the first 60 values. This baseline was chosen to obtain a reasonable characterization of the process potential. Here we see a process that is not only unpredictable, but one that gets worse as the week wears on.
Figure 6: XmR chart for the batch weight data
The theorem of large samples implicitly assumes that there is one universe. When there are multiple universes, and these universes are changing around without warning, no amount of data will ever be sufficient to provide a good estimate of any process characteristic.
This takes us back to the question of homogeneity, which is the fundamental question of data analysis. Did these data come from a process or system that appears to be operating in the same way over time? Or do these data show evidence that the underlying process has changed in some manner while the data were collected? The only statistical technique that can answer this question of homogeneity is the process behavior chart.
If the process behavoir chart doesn’t show any evidence of a lack of homogeneity, then our process may be predictable, and the theorem of large samples may only suffer from the problem of estimating the wrong thing. (Drawings out of a bead box are a classic example of a predictable process.)
But if we find evidence of a lack of homogeneity within our data, then we know that our process has a multiple personality disorder, and any attempt to use the theorem of large samples is merely wishful thinking—no amount of data will ever be sufficient to provide a reliable estimate of a process parameter.
Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Dr. Wheeler is a fellow of both the American Statistical Association and the American Society for Quality who has taught more than 1,000 seminars in 17 countries on six continents. He welcomes your questions; you can contact him at djwheeler@spcpress.com. The Imaginary Theorem of Large Samples
How many data samples do you need?
One of the lessons that most students of statistics manage to learn is that, in questions of inference, the uncertainty of an estimator is inversely related to the amount of data used. To illustrate this relationship, I will draw beads from my bead box. A paddle with holes on one side is used to obtain a sample of 50 beads. The number of yellow beads in each sample of 50 beads is recorded. The beads are replaced in the bead box, the beads are stirred up, and the whole sequence begins again. After 10 such drawings, I have drawn 500 beads and have found a total of 65 yellow beads.
and the usual 90-percent interval estimate for the proportion of yellow beads is:
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Donald J. Wheeler
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Comments
Adding samples
I have always read with the gratest interest your articles. In this case, I have doubts that 2 samples of size 500 each is the same as 1 sample of size 1000. In fact, if you calculate the proportion for each of the samples and the interval estimate for the population proportion, we get:
Experiment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# yellow cum 65 108 164 219 276 336 389 448 506 553 609 655 714 770 817 874 927 981 1040 1105
# yellow 65 43 56 55 57 60 53 59 58 47 56 46 59 56 47 57 53 54 59 65
Point estimate 0.130 0.086 0.112 0.110 0.114 0.120 0.106 0.118 0.116 0.094 0.112 0.092 0.118 0.112 0.094 0.114 0.106 0.108 0.118 0.130
Low 0.105 0.065 0.089 0.087 0.091 0.096 0.083 0.094 0.092 0.073 0.089 0.071 0.094 0.089 0.073 0.091 0.083 0.085 0.094 0.105
High 0.155 0.107 0.135 0.133 0.137 0.144 0.129 0.142 0.140 0.115 0.135 0.113 0.142 0.135 0.115 0.137 0.129 0.131 0.142 0.155
0.1 inside? Fail Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Correct Fail
where 2 out of the 20 intervals don't include the correct population proportion 0.1 wich completelly matches our 90% confidence interval.
Spread index Cp
If one would succeed in stabilizing this process, according to the process capability 6s from individual X-chart a Cp = 140/137,4 = 1,019 is computed; indeed amazing that the time estimated Cp does not converge to this value but to a significant lower level! Could we not consider Cp = 1,019 as the expected (target) value as Cp is computed/estimated from the Process Sigma only reflecting the random process noise?
Comment
The punch line here is one that I have believed in for a long time: The Process Behavior Chart is a good tool to utilize as early as possible in the analysis of a problem. This is, of course, contrary to the Six Sigma DMAIC methodology, which utilizes Process Behavior Charts late in the overall process.
Use of XmR Chart
Just a note to the prior comment. Actually, the process behavior chart is used throughout the DMAIC. The first step is to define the project ('D'). Often, one sees a chart like the one shown in fig 6 in the article and would certainly want to place this process step on the list of possible projects. If the cost impact vs probability of solving places the project as one of the best improvement opportunities, then a project would be started. SPC charts are great sources of historical data to determine that a project needs to be started. Of course, they are then used later in the 'I' and 'C' phase to measure the project's success.