August 19, 2022
 Quality Applications SPC Guide Letters First Word Last Word

## Please Don't Feed Averages An understanding of sample size could save you a trip to the zoo.

Michael J. Cleary, Ph.D.
mcleary@qualitydigest.com

The behavior of averages can be as fascinating as that of animals. Hartford Simsack, intrepid quality manager for Greer Grate & Gate, sometimes thinks of himself as a visitor to the zoo, watching his averages and trying to anticipate what, to him, was always a complete surprise in the behavior of data. "I never know if this is going to turn out to be a normal distribution or not," he told his mentor, Dr. Stan Deviation.

Deviation cleared his throat and reminded Simsack yet again that one of the keys to predicting the shape of the averages lies in sample size. Regardless of the distribution shape of the parent population, the distribution of sample means from that population will follow a normal distribution if the sample size is, for example, five. You'll remember that in July's column, Deviation demonstrated a simulation model with 1,000 samples of the sizes one and two from two populations to demonstrate the shape of the distributions from both. Unfortunately, Simsack doesn't recall either the column or his mentor's lecture about this point.

The charts below demonstrate the concept:

Simsack can never quite get his mind around this concept. Unfortunately for him, however, his boss can and often asks him for an explanation. Rock DeBote not only wants an answer, but he also wants to understand the concept well enough so that he can derive his own understanding. After his conversation with Deviation, Simsack is quick to respond to nearly every question about distribution, "It all depends on sample size." DeBote isn't content with this superficial answer and presses Simsack for the statistical concept responsible for this outcome.

"It's the central limit theorem," Simsack responds smugly.

This is a term with which he's familiar, and it's what comes immediately to his lips. Is he correct?

Amazingly, Simsack drops the right term this time. The behavior of averages in this case is indeed related to the central limit theorem. In June's column, we examined the rules for determining out-of-control situations, noting that the important caveat is not which set of rules to use but rather to use them consistently.

Some disagree about whether the central limit theorem is needed as a basis for these sets of rules. Some note that Walter Shewhart never cited the central limit theorem in his seminal work, Economic Control of Quality of Manufactured Products (D. Van Nostrand Co. Inc., 1931). In my university experience, students have proved able to understand the central limit theorem once they grasp the difference between averages (X) and individual values (Xi) and the ways in which the two behave.

The easiest way to demonstrate the central limit theorem is by using PQ Systems' Quality Gamebox. As noted above, if one takes 1,000 samples from a known population, then creates a distribution of sample means (n = 2), that distribution will be different from the population. Using Quality Gamebox, but taking a sample size of five, the following results ensue:

The most interesting result lies in the appearance of the distribution of sample means from the bimodal parent population. This clearly demonstrates the application of the central limit theorem:

The mean of the sample means' distribution is close to the population's mean.

The shape of the distribution of sample means is normal-looking.

The sample means' distribution variability is less than the parent population.

Note: It's equal to the standard deviation of the population divided by the square root of the size of the sample used to create the distribution of sample means:

Once one understands the central limit theorem, the three basic rules for an out-of-control situation are easily derived. The most commonly accepted out-of-control rules were derived directly from the central limit theorem:

Any point outside the control limits is out of control. The probability that this will happen when a system is in control is 0.0023. A point appearing outside the control limits is a signal that the process is out of control.

Runs above the mean or below the mean equal 0.0073 and indicate an out-of-control signal. If seven averages in a row become larger or slower, this is called "runs up" or "runs down." Such an occurrence is unlikely for a process that is in control, so this would be considered a signal for an out-of-control situation.

Because the distribution of sample means (for sample size of five or more) forms a normal distribution, one would expect the X to reflect that pattern. A pattern such as those below suggests the process is out of control.