## On Alleviating Tortured Data

### And a proposal for a ‘new’ run chart rule

Published: Monday, September 29, 2014 - 11:26

Up until a few years ago, I wasn’t a big fan of run charts. Why not just go ahead and construct a process behavior chart and move on? Well, sometimes a run chart is more appropriate for certain data structures.

For example, some data are “chunky”—see Donald Wheeler’s treatment of chunky data in this QD column from 2011. Davis Balestracci has written about run charts in recent columns, including this one from August 2014. But my aim here is to present the progression from trying to make sense of chunky data with a process behavior chart to using a run chart to track a system improvement. I will present a new run chart rule for your consideration, and your feedback regarding it would be appreciated.

Everyone in industry is interested in improving safety. The key universal metric is the total incident rate (TIR), which the government uses to track overall safety performance throughout U.S. industry. You can find these data at www.OSHA.gov.

For any given time period, TIR is calculated as follows:

TIR = (number of OSHA-recordable injuries) × 200,000 / (total man-hours worked)

So, a company that experiences 20 OSHA-recordable injuries in a year in which 2 million man-hours are worked has a TIR of 2.0 for the year. Generally, 1.0 or less is considered world class. All companies under the jurisdiction of OSHA are required to annually report injury data to OSHA via what is called the OSHA 300 Log.

Many companies track and post their injury data monthly throughout their organizations to keep their workforces aware of the data and to motivate them to work more safely. One month without a recordable injury, and management is smiling. Two months in a row with no injuries, and management is ecstatic. The third month in a row might even bring a celebration like a catered meal, drawings for prizes, the boss shaves her head, whatever.

But alas, as usually happens, the next month brings one or two recordable injuries, and we start all over again. Every injury is treated as coming from special cause (alpha error), and we continue to tamper with the system that produces recordable injuries. We forget that every system is perfectly designed to get the results that it gets. So, how *do* we know when we are getting better (or worse)? Let's look at some real safety data.

A manufacturing facility experiences monthly recordable injuries as shown in figure 1.

**Figure 1:**here

From these data, we can construct a process behavior chart as seen in figure 2.

**Figure 2:**here

This chart indicates special causes may have caused excessive recordable injuries during months 29 and 38. However, the data are “chunky”—i.e., there are four or fewer possible values within the limits in the range chart. Note also that there is an abundance of zero ranges, and the limits on the individuals and range charts are very similar—two more indicators of “chunky” data, which, as Wheeler points out, are susceptible to false signals. Thus, an I-MR process behavior chart is probably inappropriate for these data.

Another approach might be to consider a different subgrouping of the data to avoid chunkiness. Days between recordable injuries might work. Figure 3 shows the resulting I-MR process behavior chart with this approach.

**Figure 3:**here

Utilizing the four Western Electric rules, it appears that there was a period of poor safety performance between observations 28 and 33, followed by a “record” gap of 166 days between recordable injuries; a celebration cookout was held at the facility at the 150th day. The moving range chart also shows lack of stability at observations 20–26 (low ranges), immediately followed by a range above the upper natural process limit. Another “record” of 182 days between recordable injuries was experienced at observation 50, this time outside the upper natural process limit. Unfortunately, safety performance appears to have returned to “normal” right after the two “records” were set.

Knowing that sustained safety performance is dependent on the system that the workforce is subjected to, and that the system is perfectly designed to achieve the results that it gets, it is unlikely that the analysis so far has yielded any new knowledge regarding the safety performance at this facility. Perhaps there was just some “good luck” and “bad luck” along the way, better described as random variation.

Perhaps a run chart would shed a different light on the subject? Figure 4 is a run chart of the days between recordable injuries for this facility.

**Figure 4:**here

At first glance, there seems to be no further information to be gleaned from this chart: p-values for clustering, mixtures, trends, and oscillations show no significance. The median is 28.5 days. By definition, the median of a data set represents the 50th percentile—half the data are greater than the median, and half is less than the median. If the system is stable with a median of 28.5, then successive data points should have a 0.50 probability of being greater or less than 28.5. In other words, being greater or less than the median should be like flipping a fair coin.

If we look closely at the last 16 data points on the run chart, there are 12 points greater than (“heads”) and four points less than (“tails”) the median of 28.5. If a fair coin is flipped 16 times, how likely are we to get 12 heads and four tails? The answer can be easily answered by the chi-square goodness-of-fit test for one variable. This test compares the expected number of heads and tails (eight each) with the observed number of heads and tails, and calculates a chi-square statistic as follows:

Chi-Square = (Expected – Observed)^{2} /Observed

The critical chi-square value for alpha = 0.05, and 1 degree of freedom is 3.84. The total chi-square statistic for this example is 4.0, yielding a p-value of 0.046. Thus, it is unlikely that if the safety performance of this facility is stable with a median of 28.5, there would be 12 out of 16 consecutive points greater than 28.5. Figure 5 is a new run chart constructed from these 16 data points.

**Figure 5:**here

Safety performance has shifted to a new median of 60 days between recordable injuries! Note that had this new rule been in effect earlier, the first 11 data points of the last 16 considered here would have demonstrated the shift earlier with 9 of 11 data points being above the median of 28.5, and with a p-value of 0.035.

Figure 6 is an updated run chart that includes all the data used in this study.

**Figure 6:**here

The shift in performance is clear to everyone and serves as an excellent visual aid for safety meetings at all levels.

Figure 7 is a table of p-values for determining whether the number of points greater than or less than the median could have reasonably happened by chance alone at the alpha level 0.05.

**Figure 7:**here

The p-values indicating a statistical shift in performance from the overall median are highlighted in yellow. P-values for more than 16 consecutive data points can be easily computed by virtually any statistics software package.

Finally, I find it useful to approach the data in any study from several directions to double-check my conclusions or gain another perspective. In this case, I decided to looks at the two periods of performance shown in figure 6 through the lenses of Mood's median test (nonparametric) and simple one-way ANOVA (even though I know the data are not normally distributed). The results are as follows:

**Mood's median test for days between**

**One-way ANOVA for days between**

In both cases above, we get very small p-values, indicating that the means and medians between the two periods in question are likely significantly different. Now that the data have been properly tortured until they confessed, I would like to submit a new “rule" to be considered to help determine if a run chart detects a shift in performance. The rule is this: A stable system operating at a given median value will exhibit “heads/tails” patterns in consecutive data points that can be obtained by random chance alone (at a given alpha level). If nonrandom patterns occur, then special causes may have shifted the performance of the system in a positive or negative direction.

It is important to understand why the shift in safety performance occurred. In this case, it was easy to determine. A few months prior to the shift, the organization had engaged a third-party industrial company with an exemplary safety record going back more than 20 years to teach the organization how to rebuild the safety system with a high level of employee involvement and ownership. Management and labor worked together to develop an injury-free culture. Now the system appears to be perfectly designed to achieve a median value of 60 days between recordable injuries. Before, it was perfectly designed for a median of 28.5. The challenge now is to achieve a median of 70, 80, 90....

Naturally, when a change to a system is made for improved performance, management is anxious to declare victory, especially when it comes to safety performance. Safety is an emotional issue, and we must be disciplined to make sure we have statistical evidence that supports any declaration of improved performance. Too many times in the past we have celebrated success when the system was merely exhibiting its own natural variability.

As mentioned above, I'm asking for your feedback regarding my new rule to help interpret run charts. Perhaps what I am proposing is nothing new at all. Perhaps my logic is faulty. Perhaps I have helped you interpret run charts in a different way. In any case, I would welcome hearing from you.

## Comments

## Skewed data with Injuries

You can calculate all sorts of different rules based upon an alpha and k points. It might already be in a non-parametric book table (I don't have one handy but I remember doing such calculations in grad school).

For this particular example with time between injuries, IF (big IF) your data is truly poisson and injuries are "independent" then the time between injuries is by definition exponentially distributed. Nelson (can't remember if it was Lloyd or Wayne) found that the ideal "transform" of such data to get it "close" to normal for use was X^0.27. He had an article either in Technometrics or something similar back in the 1990's. I had independently found with data from my company that taking the square root of the square root (or X^0.25) worked pretty well and that's what I used.

Before I get comments from those about not needing to transform, I'm generally one of those people myself (with rare exceptions)... when the data is heavily skewed and it is skewed because it is bound. We plotted the time between injuries (transformed to X^0.25) on an IMR chart and it worked extremely well. You could easily use run rules about safety getting worse or getting better after a safety initiative was introduced. We plotted the current time since the last injury using a different symbol so that people knew it was still ongoing. If it crossed the UCL for the I chart, then we would ask ourselves did we introduce a change that truly caused that point? We found it to be an effective tool to combat the managerial issue you brought up about feeling good about no injuries when in fact the process hadn't really changed.

The Hawthorne effect is pretty easy to pick up but it is the sustained drive that makes a difference.

Our workforce didn't have too much trouble understanding that time was transformed when we explained it - they understood skewness of the data pretty easily and had enough comfort with SPC that it was easy to grasp and we showed it both in manager meetings and production meetings. So, yes, you can use medians to develop your own set of runs rules (and likely you can find what you need in a non-parametric book), but a simple transformation can work as well and a lot less work in the end.