Featured Product
This Week in Quality Digest Live
Statistics Features
Fred Schenkelberg
Beware the type III error
Adam Conner-Simons
An open-source system makes it possible to create interactive scatterplots of large datasets
Jay Arthur—The KnowWare Man
Here’s a simple way to use Excel PivotTables to dig into your data
Matthew Bundy
Fire protection system design and regulation of flammable materials can be improved with accurate knowledge of fire growth
Douglas Allen
Removing the random noise component from the observation, leaving the signal component

More Features

Statistics News
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment
A guide for practitioners and managers
Gain visibility into real-time quality data to improve manufacturing process efficiency, quality, and profits
Tool for nonstatisticians automatically generates models that glean insights from complex data sets
Version 3.1 increases flexibility and ease of use with expanded data formatting features
Provides accurate visual representations of the plan-do-study-act cycle
SQCpack and GAGEpack offer a comprehensive approach to improving product quality and consistency

More News

Harish Jose

Statistics

Rules of Three and Five

Tips for sample sizes

Published: Wednesday, August 23, 2017 - 11:03

It’s been a while since I’ve written about statistics. So in this column, I will be looking at the rules of three and five. These are heuristics, or rules of thumb, that can help us out. They are associated with sample sizes.

Rule of three

Let’s assume that you are looking at a binomial event (pass or fail). You took 30 samples and tested them to see how many passes or failures you get. The results yielded no failures. Then, based on the rule of 3, you can state that at a 95-percent confidence level, the upper bound for a failure is 3/30 = 10%; in other words the reliability is at least 90 percent. The rule is written as:

p = 3/n

where p is the upper bound of failure, and n is the sample size.

Thus, if you used 300 samples, then you could state with 95-percent confidence that the process is at least 99-percent reliable based on p = 3/300 = 1%. Another way to express this is to say that with 95-percent confidence, fewer than 1 in 100 units will fail under the same conditions.

This rule can be derived from using binomial distribution. The 95-percent confidence comes from the alpha value of 0.05. The calculated value from the rule-of-three formula gets more accurate with a sample size of 20 or more.

Rule of five

I came across the rule of five in Douglas Hubbard’s informative book, How to Measure Anything (Wiley, third edition 2014). Hubbard states the rule of five as: “There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.”

This is a really neat heuristic because you can actually tell a lot from a sample size of five. The median is the 50th percentile value of a population, the point where half of the population is above it, and half of the population is below it. Hubbard points out the probability of picking a value above or below the median is 50 percent—the same as a coin toss. Thus, we can calculate that the probability of getting five heads in a row is 0.5^5 or 3.125%. This would be the same for getting five tails in a row.

Then the probability of not getting all heads or all tails is (100 – (3.125+3.125)) or 93.75%. Thus, we can state that the chance of one value out of five being above the median and at least one value below the median is 93.75 percent.

Final words

Readers should keep in mind that both of the rules require the use of randomly selected samples. The rule of three is a version of Bayes’ Success Run Theorem and Wilk’s One-sided Tolerance calculation. I invite readers to check out my articles, “Relationship Between AQL/RQL and Reliability/Confidence,” “Reliability/Confidence Level Calculator (With c = 0, 1....., n),” and “Wilk’s One-Sided Tolerance Spreadsheet,” which shed more light on this.

When we are utilizing random samples to represent a population, we are calculating a statistic—a representation value of the parameter value. A statistic is an estimate of the parameter, which is the true value from a population. The higher the sample size used, the better the statistic can represent the parameter, and the better your estimation will be.

I will finish with a story based on chance and probability:
It was the day of the final exam, and an undergraduate psychology major was totally hung over from the previous night. He was somewhat relieved to find that the exam was a true/false test. He had taken a basic stat course and did remember his professor once performing a coin-flipping experiment. In a moment of clarity, he decided to flip a coin he had in his pocket to determine the answer for each question. The psychology professor watched the student for the entire two hours of the exam as he was flipping the coin. . . writing the answer... flipping the coin... writing the answer, on and on.

At the end of the two hours, everyone else had left the room except for this one student. The professor walked up to his desk and angrily interrupted the student, saying: “Listen, it is obvious that you did not study for this exam since you didn’t even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?”

The stunned student looked up at the professor and replied bitterly (still flipping the coin): “Shhh! I am checking my answers!”

Always keep on learning....

Discuss

About The Author

Harish Jose’s picture

Harish Jose

Harish Jose has more than seven years experience in the medical device field. He is a graduate of the University of Missouri-Rolla (U.S.), where he obtained a master’s degree in manufacturing engineering and published two articles. Harish is an ASQ member with multiple ASQ certifications, including Quality Engineer, Six Sigma Black Belt, and Reliability Engineer. He is a subject matter expert in lean, data science, database programming, and industrial experiments. Harish publishes frequently on his blog harishnotebook. He can be reached on LinkedIn.