Inside Six Sigma

  |  04/05/2005

Recognizing Statistical Error in Quality

Is the truth out there?

As quality professionals, we frequently report such facts as proportions (percentages), process capability indices, averages and standard deviations as if we know what we’re talking about. But what’s our actual level of confidence? All of our observations and our understanding are based upon data or, more accurately, a sample of the universe that we’re attempting to describe. Just as the late astronomer Carl Sagan in his description of the universe as being made up of billions and billions of stars (more or less infinite) so is our opportunity to observe our “quality” universe. The magnitude of our astronomical universe did not deter Dr. Sagan’s desire to understand it. The size of our “quality” universe should not diminish our desire to understand it. We can’t observe our universe in totality, neither our astronomical or quality universe.

Just as Dr. Sagan sampled the universe, we need to sample the quality universe. However, it’s important to be careful in making absolute judgments regarding the nature of our universe.

By definition, a statistic is a value derived from a sample of a larger universe (our process or quality universe). All statistics are subject to error, and there’s no way to get around this. We can however recognize, acknowledge and report this error.

Statisticians don’t really know anything for certain; they’re just highly confident. Confidence is the certainty of being right, and risk is the possibility of being wrong. Being wrong in this discussion implies the probability (risk) that the interval within which the truth lies doesn’t contain the truth at a desired level of confidence. Let’s taken a look at several statistics we frequently encounter in pursuit of six sigma quality.

Percentage: P
Consider the following example: You’ve just taken a survey of 200 individuals in your town and found that 13 percent of them aren’t happy with the overall service of the public library. This doesn’t mean that all of the people who use the library are unhappy with the services provided because not all library users were surveyed.

The truth is out there
We quality professionals are like Mulder in the X-files, searching for the truth but never finding it. In most cases, we don’t have the time or resources to sample the entire quality universe. We must obtain a sample and therefore pay the price of error.

The response of 13 percent unhappy library users isn’t a true result. As good statisticians, we should have a disclaimer. We can state that we don’t know the truth, but we’re 90-, 95- or 99-percent confident that it’s between two limits. The 90-, 95- and 99-percent are levels of confidence, and these values correspond to the risk of being wrong (10, 5, and 1 percent respectively). By adding and subtracting a specific amount of error to this estimate, we can provide the level of confidence to the estimate of the truth. This is the confidence interval. The statistic for this first example is percentage. Typical confidence intervals are determined by: statistic error

In many cases, the amount of error is determined by the number of standard deviations for the statistic to cover the prescribed amount of the normal distribution. A 1 standard deviation covers approximately 68 percent of the normal distribution, 2 standard deviations covers approximately 95 percent of the normal distribution, and so on. For levels of confidence of 90, 95 and 99 percent, we need to have 1.645, 1.960 and 2.576 standard deviations respectively. Every statistic has its own unique calculation for its standard deviation. The sample standard deviation for individuals is calculated:

Σ = sum
X i = individual
= average
n = sample size

In the initial example, we’re dealing with the percent of unhappy customers. The outcome of an individual inquiry is either yes or no, making this ideal for the binomial distribution. The standard deviation for the binomial expressed as a percent is determined by:

where P = percent = 13.0
n = sample size = 200

 

= 2.4

With a level of confidence of 95 percent, for example, we calculate the confidence interval to be 95-percent confident that the true percentage of those unhappy with the library service falls within the calculated limits. The confidence interval for a percentage is given by:

The is the factor defining the number of standard deviations required for the specified level of confidence.

Level of Confidence
901.645
951.960
992.576

The 95-percent confidence interval is 8.3 to 17.7. The result can be expressed by the following statement:

“We don’t know the true percentage of those who are unhappy with the service provided by the library, but we’re 95-percent confident that it’s between 8.3 percent and 17.7 percent. There’s a 5 percent probability that the truth is outside these limits.”

Process capability index, Cpk
In the case of Cpk, it doesn’t matter how great the index is. In reality, this index shouldn’t be less than a prescribed value. Instead of calculating a confidence interval, we should focus on the lower confidence limit.

“We don’t know the true Cpk, but we’re _____ percent confident that it isn’t less than _____”

We used a factor to determine the number of standard deviations. For the confidence limit, we can use a to determine the number of standard deviations to subtract from the point estimate to yield the lower confidence limit.

 

Level of Confidence
901.282
951.645
992.326

The standard deviation for the Cpk is given by:

Cpk = observed Cpk from a sample
n = sample size used to determine the Cpk

For example: If we measure 45 units and determine that the Cpk is 1.10, the standard deviation of the calculated Cpk is:

The lower 95-percent confidence limit is given by:

= 1.10 – 1.645(0.13) = 0.89

An appropriate statement regarding the Cpk calculation would be: “We don’t know the true Cpk, but we’re 95-percent confident that it isn’t less than 0.89”

An alternative way of reporting this result is “If we want to be 95-percent confident that our true Cpk isn’t less than 0.89, we must obtain a measured Cpk of 1.10 from a sample size of 45 units”

We can reduce the statistical error in the Cpk (or any statistic for that matter) by:

  • Increasing the sample size
  • Lowering the level of confidence

The following table gives the required Cpk from a sample of n units that’s required to be 95-percent confident that the true Cpk isn’t less than the desired Cpk.

Table 1. Required Sample Cpk to Demonstrate a Desired Cpk 95-Percent Confidence
Desired Cpk:0.90 1.001.201.301.331.401.501.60
Sample size, n         
301.171.301.551.681.711.801.932.06
501.101.221.451.571.611.691.811.93
801.051.161.391.511.541.621.741.85
1001.031.141.371.481.521.591.661.82
1501.011.101.331.441.481.551.661.77
2500.981.091.301.411.441.521.621.73
5000.961.061.271.381.411.481.591.69
1,0000.941.041.251.351.381.461.561.66



 

 

 

 

 


Example using Table 1:
The calculated Cpk from a sample of n = 80, must be ≥1.51 to be 95-percent confident that the true Cpk isn’t less than 1.30

From Row = 80 and Column = 1.30

Averages,

 The average is a little unique in that the multiples of the standard deviation required for a specific level of confidence depends upon the sample size. Samples of less than 30 utilize the t-distribution and samples of 30 or more rely upon the normal distribution. The confidence interval factors for average taken from greater than 30 observations are the same as with the confidence interval for proportion (percentages).

Level of Confidence
901.645
951.960
992.576


The standard deviation for the distribution of averages is determined by:

n = sample size
σ = standard deviation of individuals

The calculation of 45 observations found the standard deviation to be 25.0 and the sample average to be 110.0. The 90-percent confidence interval for an estimate of the true average would be:


The statement would be: “We don’t know the true average, but we’re 90 percent confident that it’s between 103.9 and 116.1.”

If this range of interval isn’t satisfactory, it’s possible to take a lower confidence or take a larger sample. The latter is the more appropriate action.

Standard deviation
The confidence interval for the standard deviation is determined from the Chi-square distribution.

The equation for determining the confidence interval is;

n = sample size
s = sample standard deviation and χ 2 is found in any Chi-square table, using n-1 for the degrees of freedom.

Consider the following question:

What is the 95-percent confidence interval for a standard deviation of 12.5 using a sample of 16 observations?

The chi-square values for are looked up in a chi-square table.

= 95-percent confidence interval = 9.23 < σ < 19.35

Statement: “We don’t know the true standard deviation, but we’re 95-percent confident that it’s between 9.23 and 19.35.”

There are many other quality parameters that weren’t discussed such as percentage GR&R from measurement error evaluations, mean-time-before-failure (MTBF) from reliability studies and correlation coefficients, to name a few. All of these, as with any statistic, have their own error and unique method for calculation.

It’s important to acknowledge the inherent error as a result of sampling when calculating a statistic. The right thing is to include a “statistical disclaimer” with any report. It’s no disgrace to admit the truth isn’t known, this is rather an acknowledgement of reality. The truth is out there, just keep on looking.

Discuss

You can create content!

  • Classifieds
  • File Share
  • Forum Topic
  • Events
  • Links

Sign In to get started!

Quality Information