Have you ever met people who “do” statistical process control (SPC) only to get some screwy-looking control chart, and then text: OMG I H8 SPC! (If you don’t understand that, ask your nine-year-old child or grandchild.)
Last month we saw how it is not a failure of SPC, but rather an EBKAC (error between keyboard and chair). As I wrote in my last article, “Why Doesn't SPC Work?” perhaps they are not doing the measurement system analysis first, or perhaps autocorrelation in a continuous process. But you batch process folks are not off the hook, which is what this month’s article is about.
A batch process makes a bunch of stuff all at once, like baking a batch of cookies. The batter was mixed at one time, the cookies were put onto the pan and into the oven at the same time, so we expect that all the cookies in that batch are pretty much the same. (If you have ever baked cookies, or run a batch process, I’ll bet you can identify at least three sources of variability within this batch—don’t tell anyone yet though. We already know you're smart since you are reading my article, and you would be stepping on my punchline.)
So let’s say you are running a batch process baking cookies (or heat treating, or forming, or whatever). You want to put some SPC in place, so you naturally start with the quality characteristic of the process. You take a sample of five from each batch (yum!) to test the critical characteristic “color.” The textbook tells you do to an X-bar and R chart; and you end up with a chart as shown in figure 1.
Figure 1: Mean and Range Chart: Color |
---|
Now at this point, you have a decision to make. You could say, “I am not going to show this to anyone! There is no way that our people are going to use this chart to react—they would go crazy!” (Lest you are thinking I am making this up, this exact situation came up at a business where I consulted. The practitioner had been doing this chart for four years and had not shown anyone.)
Or you could say, “This is my process screaming at me—maybe I should listen.”
So what is going on here? It all hinges on understanding how control charts work. They are not magic—they work for sound scientific reasons. The control limits on the chart are intended to show the expected variation range of the top five samples’ averages and the variation of the range across the bottom samples. However, the limits for the averages are calculated from the average range. I know, it seems weird, but the reasoning makes sense. Here’s why:
The purpose of a control chart is to identify when a process is being affected by common cause variability inherent to the current process or if it is being affected by common and special cause variability: something unusual to the process is affecting output. We use a control chart to help identify what these special causes might be so that we can eliminate them, leaving us with the true underlying process variation. Deming called this “finding the process.” Once a process is in control, it’s a lot easier to take the next step and improve the process by reducing that inherent variation or moving the average on target.
Calculating the expected common cause variation of the process is a bit trickier than you might think. If I had a process with some special causes in it and I were to use all the data including the special causes to calculate my expected variation, it tends to inflate the limits and make it harder to detect that there are actually special causes. Makes sense, right? If I calculate the standard deviation across all the data points, including ones that are different due to special causes, I’ll end up with a larger standard deviation than what the underlying process really has—leading to wider limits—which leads to the classification in some events, as within the control limits, when in fact they are not.
So how can I use my data to generate the expected limits of variation for the process when the data themselves might have special cause variation in them? It sounds like a chicken-and-egg situation, but by using our brains, we can see a possible way out.
In our batch process, we are taking five samples from each batch. If a process is totally unaffected by special cause variation, then the variation within each sample and between each sample is just coming from the same source: the within variation. Another way of saying this is that the only reason the average of sample one is different from the average of sample two is because of sampling error within each sample, not because there is any real difference between the two samples. Add to that the fact that the samples from each batch are probably as similar as we know how to make them, and we can hope that the variation within each sample is the minimum variation uninfluenced by special causes. Even if there are a few within-sample oddballs, we will be using the average range, so the effect will tend to be damped out.
So it turns out that the variation within each sample (the average range in this case) is actually a better way to get an estimate of the underlying process variation than the raw data, because the raw data itself would contain the special causes.
Which still leaves us with the messed-up chart in figure 1.
Now that I have reminded you about where those limits come from, can you guess what is going on? The tight limits indicate that the “within variability” is much smaller than the “between variability.” How would that come to occur?
OK, now those of you who have made cookies and run batch processes can shout out the answer. Go ahead—your co-habitants won’t mind. They already know you are prone to random verbal outbursts when you are on your computer.
What if there was an additional source of variability between batches? Maybe I mix up each batch but am sloppy on the measurements, so each batch is essentially a somewhat different recipe. How about this: each time I open the oven to take out some cookies the temperature changes, and the thermal control system either under- or over-corrects, leading to a different thermal profile for each batch. If so, this process is batch or setup dependent.
This control chart is screaming at you about this batch-to-batch variability and, like a filthy car with “Clean Me!” written on the window, it is telling you what to do. To get this process to produce to its underlying variability, you will have to investigate to determine the sources of the batch-to-batch changes and eliminate them. The chart has given you a huge hint to help out in the search; whatever it is, it changes from batch to batch, so talking to your operators and looking at your process logs (You do have process logs when you bake cookies, right?) would be the first step in figuring this out.
By the way, you can test for this situation on a control chart by doing a random-effects, one-way analysis of variance (ANOVA) using each batch as a level. If the between variability is larger than that predicted from the within variability, you will find significance with your ANOVA. This is built right into MVPstats, which is what I used to generate these charts. The ANOVA will also find much more subtle differences between the within variability and the between variability than a control chart will detect, so I use it as a diagnostic on all X-bar charts.
Using similar reasoning, can you figure out how the following situation, as charted in figure 2, might occur? (The limits are calculated from the data you see, so it’s not that we have improved the process over where it used to be.)
Figure 2: Mean and Range Chart: Thickness |
---|
The first one who posts the correct answer on the discussion page gets bragging rights! (This effect happened on a control chart at that same business, too. Go figure.)
Next month, I’ll reveal the answer and finish up with some other errors that I have seen when people make control charts without using the most important tool in the SPC tool box… their brains.
Comments
Whay Doesn't SPC Work
Either the machine which controls the dimension has a self-correcting feature or, more likely, the operator is adjusting the process based on the last control chart point. This is shown because there is less variation between subgroups than expected.
We had this happen when a customer called to complain of out-of-round on a turned part. Whoever heard of out-of-round on a turned part? - a ground part, yes; but not a turned part. We checked and sure enough, they were right. What happened was that the feed mechanism vibrated out of alignment with the machine. But what has this to do with SPC?
Well, this part was a critical fuel injection housing and had about 14 features that had to be charted. Because the part was so critical, we had one of our better operators on the job. He was extremely conscientious, so much so that he was continuously tweaking the machine to keep everything in the middle. If he'd left it to run the way it wanted to, it would have been obvious that a problem existed. As it was, he controlled it too much.
Thanks for responding
Thanks for responding Richard!
The two reasons you list in your first paragraph would probably not result in the pattern we saw on the second chart, though like Bruce above, you have correctly identified the problem is in the sources of variation.
Automatic control is one of those areas that is tricky, and shows up in different ways on a control chart. But, just as with operator adjustment, a process that is in control is only going to get worse as people or a control system adjust it (this is the lesson of the "Funnel Experiment"). Adjusting the process based on the last point results in rapidly increasing the variation in the means, not in decreasing it. When we used to do the Funnel Experiment, the team that used the control method of "adjust by the amount you were off target in the opposite direction" usually ended up having to walk out of the room and into the hall to do a drop to hit the target on the floor int he middle of the room!
The key on this one was coming up with a reason why the within-sample variation was so large as compared to the between-sample variation. See the next article for the answer!
The Second Control Chart Comments
The data indicates the process is demonstrating a stable and predictable pattern of variation, aka "In Control". The range chart indicates the process variation is being dominated by the within subgroup variation. There is no statistically significant differences between sample groups as seen in the Averages being within the control limits on the averages chart. The averages are "hugging" the process average line indicating further that the bulk of variation is within the sample groups/batches not between them.
Assuming the sampling strategy is the same as was used for the first graph, further investigation can (and should) be confined to identifying the sources of variation within any given sample/batch of data. You could look at an individual moving range chart of the samples to see if there are any patterns within each batch over time or gather additional samples from a batch to see if variation could be traced to time after a batch is mixed, location of samples within the oven, differences between ovens (if there are several) etc. You can continue to modify and refine your sub-grouping of sample data until you can identify a statistically significant difference in the averages. You can use graphical Components of Variation techniques and ANOVA to quantify the sources of variation.
Hi Bruce, and thanks for
Hi Bruce, and thanks for reading and responding!
No, the second chart is out of control - it is showing too little variation in the mean based on what we have seen in the past. (This is one way that the term "control" misleads people - it is out of control because it is unpredictably lacking in variability in the means.) You do correctly identify that the source of variation is different within and between sample points. A simple oneway random effects ANOVA on the data would show something weird, but would NOT indicate a significant effect. Check out the next article for more on that!
Add new comment