Featured Product
This Week in Quality Digest Live
Statistics Features
Donald J. Wheeler
What does this ratio tell us?
Harish Jose
Any statistical statement we make should reflect our lack of knowledge
Donald J. Wheeler
How to avoid some pitfalls
Kari Miller
CAPA systems require continuous management, effectiveness checks, and support
Donald J. Wheeler
What happens when the measurement increment gets too large?
Statistics News
How to use Minitab statistical functions to improve business processes
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Elsmar Cove is a leading forum for quality and standards compliance
InfinityQS’ quality solutions have helped cold food and beverage manufacturers around the world optimize quality and safety
User friendly graphical user interface makes the R-based statistical engine easily accessible to anyone
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment
Statistics

## Rules of Three and Five

### Tips for sample sizes

Published: Wednesday, August 23, 2017 - 12:03

It’s been a while since I’ve written about statistics. So in this column, I will be looking at the rules of three and five. These are heuristics, or rules of thumb, that can help us out. They are associated with sample sizes.

### Rule of three

Let’s assume that you are looking at a binomial event (pass or fail). You took 30 samples and tested them to see how many passes or failures you get. The results yielded no failures. Then, based on the rule of 3, you can state that at a 95-percent confidence level, the upper bound for a failure is 3/30 = 10%; in other words the reliability is at least 90 percent. The rule is written as:

p = 3/n

where p is the upper bound of failure, and n is the sample size.

Thus, if you used 300 samples, then you could state with 95-percent confidence that the process is at least 99-percent reliable based on p = 3/300 = 1%. Another way to express this is to say that with 95-percent confidence, fewer than 1 in 100 units will fail under the same conditions.

This rule can be derived from using binomial distribution. The 95-percent confidence comes from the alpha value of 0.05. The calculated value from the rule-of-three formula gets more accurate with a sample size of 20 or more.

### Rule of five

I came across the rule of five in Douglas Hubbard’s informative book, How to Measure Anything (Wiley, third edition 2014). Hubbard states the rule of five as: “There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.”

This is a really neat heuristic because you can actually tell a lot from a sample size of five. The median is the 50th percentile value of a population, the point where half of the population is above it, and half of the population is below it. Hubbard points out the probability of picking a value above or below the median is 50 percent—the same as a coin toss. Thus, we can calculate that the probability of getting five heads in a row is 0.5^5 or 3.125%. This would be the same for getting five tails in a row.

Then the probability of not getting all heads or all tails is (100 – (3.125+3.125)) or 93.75%. Thus, we can state that the chance of one value out of five being above the median and at least one value below the median is 93.75 percent.

### Final words

Readers should keep in mind that both of the rules require the use of randomly selected samples. The rule of three is a version of Bayes’ Success Run Theorem and Wilk’s One-sided Tolerance calculation. I invite readers to check out my articles, “Relationship Between AQL/RQL and Reliability/Confidence,” “Reliability/Confidence Level Calculator (With c = 0, 1....., n),” and “Wilk’s One-Sided Tolerance Spreadsheet,” which shed more light on this.

When we are utilizing random samples to represent a population, we are calculating a statistic—a representation value of the parameter value. A statistic is an estimate of the parameter, which is the true value from a population. The higher the sample size used, the better the statistic can represent the parameter, and the better your estimation will be.

I will finish with a story based on chance and probability:
It was the day of the final exam, and an undergraduate psychology major was totally hung over from the previous night. He was somewhat relieved to find that the exam was a true/false test. He had taken a basic stat course and did remember his professor once performing a coin-flipping experiment. In a moment of clarity, he decided to flip a coin he had in his pocket to determine the answer for each question. The psychology professor watched the student for the entire two hours of the exam as he was flipping the coin. . . writing the answer... flipping the coin... writing the answer, on and on.

At the end of the two hours, everyone else had left the room except for this one student. The professor walked up to his desk and angrily interrupted the student, saying: “Listen, it is obvious that you did not study for this exam since you didn’t even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?”

The stunned student looked up at the professor and replied bitterly (still flipping the coin): “Shhh! I am checking my answers!”

Always keep on learning....