*That’s*fake news.

*Real*news COSTS. Please turn off your ad blocker for our web site.

Our PROMISE: Our ads will never cover up content.

Recently I have had several questions about which bias correction factors to use when working with industrial data. Some books use one formula, other books use another, and the software may use a third formula. Which one is right? This article will help you find an answer.

Before we can meaningfully discuss different bias correction factors we need to understand what they do. To this end we must make a distinction between parameters for a probability model and statistics computed from the data. So we shall go back to the origin of our data and move forward.

A statistic is simply a function of the data. Data plus arithmetic equals a statistic. Since arithmetic cannot create meaning, it is the context for the data that gives specific meaning to any statistic. Thus, we will have to begin with the progression from a physical process to a probability model, and then we can look at how the notion of a probability model frames the way we use our statistics.

Assume that we have a process that is producing some product, and assume that periodic checks are made upon some product characteristic. These checks will result in a sequence of values that could be written as:

During the past three months James Beagle and I presented columns that made extensive use of analysis of means techniques. Since these techniques may be new to some, this column explains when to use each technique and where to find tables of the appropriate scaling factors.

In 1967, Ellis R. Ott published his analysis of means technique (ANOM) for comparing treatment averages with their grand average. This technique is a generalized version of the average and range chart. However, the assumption that allows this generalization also imposes a restriction of where this technique can be used. The generalization allows us to compute limits with a fixed overall alpha level (the user-specified risk of a false alarm). The restriction is that we can only use ANOM for the *one-time analysis of a finite amount data* (such as occurs in experimental studies).

Last month we provided an operational definition of when measurement systems are equivalent in terms of bias. Here we will look at comparing the within-instrument measurement error between two or more systems.

Once again we must emphasize that it makes no sense to seek to compare measurement systems that do not display a reasonable degree of consistency. Consistency must be demonstrated, it cannot be assumed, and a consistency chart is the simplest way to do this.

As soon as we have two or more instruments for measuring the same property the question of equivalence raises its head. This paper provides an operational definition of when two or more instruments are equivalent in practice.

Churchill Eisenhart, Ph.D., while working at the U.S. Bureau of Standards in 1963, wrote: “Until a measurement process has been ‘debugged’ to the extent that it has attained a state of statistical control it cannot be regarded, in any logical sense, as measuring anything at all.” Before we begin to talk about the equivalence of measurement systems we need to know whether we have yardsticks or rubber rulers. And the easiest way to answer this question is to use a consistency chart.

Managers the world over want to know if things are “in control.” This usually is taken to mean that the process is producing 100-percent conforming product, and to this end an emphasis is placed upon having a good capability or performance index. But a good index by itself does not tell the whole story. So, if you want to learn how to be sure that you are shipping 100-percent conforming product, read on.

There are four capability and performance indexes that are in common use today. While many other ratios have been proposed, these four indexes effectively summarize the relationship between a process and the product specifications.

The capability ratio uses the difference between the watershed specifications to define the space available and compares this with the generic space required by any process that is operated with minimum variance. This generic space required is computed as six times an appropriate within-subgroup measure of dispersion, *Sigma(X)*.

*Story update 1/15/2019: Thanks to the sharp eye of Dr. Stan Alekman, who spotted an inconsistent value in figure 2, I discovered an error in the program used to construct the table of critical values for the prediction ratio. I have now corrected that problem and updated the entries in the table in figure 2. If you previously downloaded this column, you might want to download the corrected version below. *

Software packages use *p*-values to report the results of many statistical procedures. As a result some people have come to expect a *p*-value as the outcome of any statistical analysis. This column will tell you how to compute and use a *p*-value for a process behavior chart.

Process behavior charts are the interface between your data and your brain. But you have to begin by making a choice about which type of chart to use. You can either plot the individual values themselves, or you can organize your data into rational subgroups and plot the subgroup averages. This paper will discuss the issues involved and provide guidelines for when to use each chart.

Your data almost always possess some sort of time-order sequence. In most cases this order will be linked to the operation of some underlying process. Yet most statistical techniques ignore this time-order sequence. Process behavior charts use this temporal order to characterize the behavior of these underlying processes. In this regard they are fundamentally different from virtually all other statistical procedures. Rather than trying to fit some type of mathematical model to the data, they use the time order in the data to characterize the underlying process as being either predictable or unpredictable.

In Part One and Part Two of this series we discovered some caveats of data snooping. In Part Three we discovered how listening to the voice of the process differs from the model-based approach and how it also provides a way to understand when our models do and do not work. Here we conclude the series with a case history of how big data often works in practice.

Daniel Boorstin summarized the essence of distilling knowledge out of a database when he wrote: “Information is random and miscellaneous, but knowledge is orderly and cumulative.” As we seek to organize our miscellaneous data we have to be careful to make a distinction between signals and noise. The following is the story of one attempt to turn data into knowledge.