## Working with Rare Events

### What happens when the average count gets very small?

Published: Friday, October 28, 2011 - 08:23

From the perspective of data analysis, rare events are problematic. Until we have an event, there is nothing to count, and as a result many of our time periods will end up with zero counts. Since zero counts contain no real information, we need to consider alternatives to counting the rare events. This article will consider simple and complex ways of working with rare events.

Our first example will involve spills at a chemical plant. While spills are not desirable, and
while everything possible is done to prevent them, they do occasionally happen. Over the past
few years one plant has averaged one spill every eight months. Of course, if the plant averages
one spill every eight months, then those months with a spill will be 700 percent above average!
(When dealing with small counts a one unit change can result in a huge percentage difference.)
Assuming that these counts are reasonably modeled by a Poisson distribution we could put these counts on a *c*-chart. The central line for this * c*-chart would be the average count. During the first four years there were a total of six spills. Six spills in 48 months gives an average of 0.125 spills per month.

For a *c*-chart the upper limit is found by multiplying the square root of the average count by 3.0 and adding the result to the central line. This gives the upper limit of 1.186 shown on the *c*-chart in figure 1.

In spite of the fact that a single spill is 700 percent above the average, the *c*-chart does not
show any points outside the limits. Here it would take two spills in a single month to make this
chart signal a change. (Only 1500 percent above average!) So, while the use of a *c*-chart might be justified with these counts, it is of little practical use because it is so insensitive.

What about using an *XmR* chart with these counts as I suggested last month? Using the first
four years as our baseline the moving range chart would have an upper limit of 0.83 and the *X*
chart would have an upper limit of 0.80. This makes every month with a spill into a signal of a
change in the system! Clearly this is not a reasonable interpretation of these data. The problem is that this *XmR* chart suffers from the problem of chunky data. (Chunky data can occur with any
type of data. Count data tend to be chunky whenever the average count falls below 1.00.
Chunky data will artificially tighten the limits of a process behavior chart and will result in an
excess number of false alarms. More about the problem of chunky data next month.)

Thus, with counts of rare events the specialty charts become insensitive and the *XmR* chart
breaks down. This is not a problem with the charts, but rather a problem with the data
themselves. *Counts* of rare events are inherently insensitive and weak. No matter how these *
counts* are analyzed, there is nothing to discover by placing the * counts* on a chart of any type. Yet
there are other ways to characterize rare events. Instead of *counting* the number of spills each
month (*counting* events), you could instead *measure* the number of days between the spills
(*measure* the area of opportunity between the rare events). For these data the time intervals
between the spills are computed as follows.

One spill in 322 days converts into a spill rate of 0.0031 spills per day. Multiplying this daily spill rate by 365 gives us a yearly spill rate of 1.13 spills per year. Thus, the interval between the first spill and the second spill is equivalent to having spills at the rate of 1.13 spills per year. In the same way the interval of 247 days between the second and third spills is converted into a spill rate of 1.48 spills per year. Continuing in this manner, every time we have an event we obtain an instantaneous spill rate.

The average spill rate during the first four years is 1.42 spills per year. The average moving
range is 0.2475. Multiplying this latter value by 2.66 and adding and subtracting the result to and
from the 1.42 we get the *X* chart limits shown in Figure 4. The upper range limit is found by
multiplying the average moving range by 3.27. While the use of five values to create an *XmR*
chart is minimal, it took four years to get these five values!

If a future point falls above the upper limit it will mean that the spill rate is increasing. A
future point below the lower limit will mean that the spill rate is decreasing. Points within the
limits will be interpreted as meaning that there has been no change in the overall spill rate.
The two spills in the current year had intervals of 172 days and 115 days respectively. These
intervals convert into spill rates of 2.12 spills per year and 3.17 spills per year. When these values are added to the *XmR* chart the result is figure 5.

While the first spill in the current year is outside the limit, it is barely outside. Given the softness of limits based on five values, we might be slow to interpret the sixth point as a clear signal of a change. However, the seventh point is far enough outside the limits to be safely interpreted as a definite signal—there has been an increase in the spill rate during the current year. If we return to figure 1 we can see that the spills are getting closer together, but we cannot detect this change until we shift from counting the rare events to measuring the area of opportunity between events.

Notice that while both figure 1 and figure 5 are looking at spill rates, there has been a change in the variable between figure 1 and figure 5. In figure 1 the variable was the number of spills per month. Here the numerator was allowed to vary (the umber of spills) while the denominator was held constant (one month). In figure 5 we have instantaneous spill rates where the numerator is held constant (one spill) and the denominator is allowed to vary (days between spills).

Instead of using the instantaneous spill rates, figure 6 uses the number of days between spills to create an XmR chart. While this is feasible, this chart suffers from being the chart of an inverse measure. As the spills become more frequent the points in figure 6 move downward. This simple inversion creates a cognitive dissonance for those who have to interpret this chart. While this is not an insurmountable obstacle, it is still a hurdle that is unnecessary here. The instantaneous spill rates of figure 5 are easier to use and easier to interpret than the number of days between spills in figure 6.

In addition to being an inverse measure, the time between events results in a chart that is less sensitive than the chart for the instantaneous spill rates. Figure 5 will detect an increased spill rate whenever that rate exceeds 2.08 spills per year. The lower limit of figure 6 corresponds to a spill rate of 2.76 spills per year. Given that these are techniques for rare events and that we want to detect any increase in the spill rate in as timely a manner as possible, this lower sensitivity of figure 6 is undesirable.

While the chart for the instantaneous rates will generally be the preferred chart, there is one situation where the chart for the times between events is useful. This is when the lower limit of figure 5 goes below zero. When this happens the instantaneous rate chart will no longer show improvements. If you are involved in taking action to reduce the rate of the rare events, so that detecting improvements is important, then you may need to resort to charting both the instantaneous rates and the time between events. The chart for instantaneous rates will allow you to detect increases in the rate of rare events, while the chart for the time between events will allow you to detect decreases in the rate as points above the upper limit. This will be illustrated by the next example.

Figure 7 contains the number of consecutive cases between post-operative sternal wound infections in one cardiac care unit. These data represent 3106 patients treated at this one facility. With a total of 75 sternal wound infections this unit has an overall infection rate of 2.4 percent. While this summary statistic describes the past, the question of interest is whether we can use it as a prediction of what to expect in the future. To answer this question we will need to look at the way the data behave over time. Figure 7 should be read in columns.

The *X* chart for these counts is shown in figure 8. Here the median count is 26 (which
corresponds to an infection rate of 3.8%) and the median moving range is also equal to 26.
Multiplying the median moving range by 3.145, and adding the result to the median value of 26
results in an upper limit of 107.8. The points above this upper limit represent eight periods
where the infection rate was detectably lower than the rate of 3.8 percent associated with the median count of 26. So while this chart detects eight periods of improved operation, these data will not allow us to detect any increase in the infection rate of this coronary care unit.

In order to detect periods of increased infection rates we will need to invert the counts of
figure 7 to obtain instantaneous infection rates. The *X* chart for these rates is shown in figure 9.
The median rate is 0.040, and the median moving range is 0.056. Here we find seven periods
where the infection rate was detectably higher than four percent. (The eight periods with
detectably lower infection rates from figure 8 are also shown as circled points along the bottom
of figure 9.)

All in all, these charts tell a story of a hospital that is not yet in control of all of the potential sources of infection in its coronary care unit. While their historical average infection rate has been 2.4 percent, this 2.4 percent is not a characteristic of a predictable process, but merely the average over periods with higher and lower infection rates.

Thus, between these two charts we can identify periods of higher than average infection rates and periods with lower than average infection rates. Unfortunately, we can do so only after the fact. Because these charts do not add a point until after an event has occurred, they will always remain essentially report card charts.

The vertical scale of figure 9 is nonlinear on the upper end simply because there are only certain values that occur when we invert counts smaller than 10. This is a natural consequence of having areas of opportunity that are counts.

In figure 8 and figure 9, I used the median moving range to compute the limits because the
average moving ranges were inflated by the extreme values and did not capture the routine
variation within the data. I also used the medians as the central line in both *X* charts simply
because the averages fell at the 64th and 73rd percentiles respectively, making them poor
measures of location.

### Specialty charts for times between events

In the 1990s, specialty charts for the times between events were created. These charts are
commonly known as g-charts and *t*-charts. As with the *p*-chart, *np*-chart, *c*-chart, and *u*-chart
discussed last month, these charts use a probability model to construct theoretical limits for the
times between events. However, unlike the traditional specialty charts, the *g*-chart and *t*-chart
have an implicit assumption of global homogeneity built into the computations. This assumption
of homogeneity is equivalent to the erroneous computation of limits for a regular process
behavior chart using the global standard deviation statistic. (See my columns for January and
February of 2010 for more on this mistake.)

The *g*-chart is based on a geometric distribution, and is applied to counts like those in figure
7, the number of items observed between items having some attribute. The use of this probability model once more allows the computation of a theoretical three-sigma distance directly from the average value. In the case of figure 7 the average value is 41.4. The theoretical three-sigma distance is found by squaring the average value, subtracting the average from this squared value, taking the square root, and multiplying by 3 to get 122.7. When this quantity is added to the average value we find an upper limit of 164. The resulting *g*-chart is shown in figure 10.

While the nature of these data preclude this chart from detecting when things get worse, the
assumption of global homogeneity makes it difficult for this chart to detect when things get
better. So while any chart for the time between events will usually be a report-card chart, this *g*-chart suffers from an inability to tell you when things are changing (compare figure 10 with
figure 8).

Before the counts in figure 10 can be considered to be geometric counts, the series of cases
will need to be a series of Bernoulli events where the probability of an infection remains the same for all of the cases in the baseline period. In short, the computations for the limit of figure 10 assume the data set to be homogeneous. (A series of *n* Bernoulli events will result in a binomial count only when the probability of the counted outcomes p remains constant across the n events. A series of values for the numbers of Bernoulli events between counted outcomes will be a geometric variable only when *p* remains the same for each of the intervals. Thus, in order to use the geometric probability model to obtain appropriate limits for the data in figure 10, the probability of an infection must remain the same throughout the baseline period. This assumption is inconsistent with these data and unrealistic for the context of these data.) Any use of a *g*-chart imposes an assumption of homogeneity upon the data prior to looking for evidence of a lack of homogeneity. As such it is an example of the triumph of computations over common sense. It simply does not provide any opportunity to learn from the data.

Now before you write to tell me how your g-chart had points above the upper limit, you need
to know that the geometric distributions will have an absolute minimum of 1.8 percent false alarms with three-sigma limits, and that some of these false alarm points will be comfortably beyond the upper limit. (It is interesting to note that those who want to transform data in order to avoid having a false alarm rate of one or two percent on an *XmR* chart do not make the same objection to the *g*-chart. Perhaps they are simply unaware of the properties of the techniques they are using.)

### The *t*-chart

While the *g*-chart was intended for those cases where the time between events is a count, the *
t*-chart was intended for those cases where the time between events is a measurement. For an
example return to the first example where the spills occurred in time and the time between events was measured in days. The *t*-chart would use the times between spills (as shown in figure 2) as the raw data.

Now there are two different versions of the t-chart. One version uses a probability model to
compute theoretical limits and the other version transforms the data and places the transformed
data on an *XmR* chart. In both versions the initial assumption is that the rare events are
characterized by a Poisson probability model. (This means that the likelihood of an event is
proportional to the size of the area of opportunity and that the rate of occurrence for the rare
events is constant throughout the area of opportunity.) Next, given that the counts are Poisson,
the times between events will be modeled by an exponential distribution. The mean value for
these exponential variables will depend upon the rate of occurrence of the Poisson events. Both versions of the *t*-chart assume that the times between events come from one and the same
exponential distribution. (In order for all of these exponential variables to have the same mean value the Poisson rate must remain constant across all the occurrences.) The first version uses
this assumption of global homogeneity to create theoretical limits for the data.

The exponential distribution starts at zero and has a standard deviation that is equal to the
mean. This means that there will be no lower limit, and that the upper three-sigma limit will be
equal to four times the mean value. Thus, in constructing this version of the *t*-chart, the central
line will be equal to the average time between events, and the theoretical upper limit will be
equal to four times the average value. Using the first five times between events of figure 2 as our baseline we get figure 11. Compare this with the graph in figure 6. The absurdity of figure 11 is sufficient to warn the experienced analyst that these data are not exponentially distributed and that the theoretical approach based on the assumption of an exponential distribution is incorrect.

The second version of the *t*-chart still assumes that all of the times between events in the
baseline period come from a single exponential distribution, but this version transforms the data
prior to placing them on a chart. Since exponential random variables can be converted into
Weibull random variables by a power transformation, and since a Weibull distribution with
parameter 3.6 has zero skewness, the times between events are all raised to the 1/3.6 = 0.27778 power in order to remove the assumed skewness from the data. Then these transformed values are placed on an *XmR* chart.

As before we use the first five intervals as our baseline and place the transformed intervals on
an *X *chart. The average is 4.689 and the average moving range is 0.239. This gives the chart and limits shown in figure 13. The limits here are substantially different from figure 11 because
these limits are empirical limits computed from the transformed data, rather than being
theoretical limits computed for an assumed probability model.

As in figure 6 the last point in figure 13 falls below the lower limit signifying an increase in the spill rate. However, unlike either figure 5 or figure 6, the chart in figure 13 assumes that the rate of occurrence for the spills is constant over the whole baseline period. While this may be reasonable for these data, it will not always be so. This assumption of a constant rate is particularly troublesome when we are using these report chart charts to evaluate improvement efforts that will hopefully change the rate of occurrence for the rare events. (At the minimum, this assumption will argue against the use of long baselines for computing limits.)

For a final example I shall use a data set coming from a document in my possession. It is the
time in days between the occurrences of an unspecified (unfavorable) event at a hospital. The
times are shown in figure 14. In the document the data were all raised to the 0.27778 power,
placed on an *XmR* chart, and the limits were computed using all of the data. (This corresponds to a baseline of 4.5 years.) The average moving range for the transformed values is 0.751, which
gives the limits shown in figure 15.

**Figure 14:** Events per year and time between events

The only glimmer of a signal is the long run above the central line in figure 15. Unfortunately, when we transform the data in a nonlinear manner we effectively shift the central line relative to the rest of the data (compare the central lines with the running records in figures 15 and 16). This makes the interpretation of long runs tricky and undermines the use of the traditional run tests. While this run of 12 points is likely to be some sort of a signal, this interpretation is subjective and owes more to the following analysis than to the chart in figure 15.

If we simply place the days between events on an *XmR* chart we get the chart shown in
figure 16. For this 4.5 year period the average is 81.9 and the average moving range is 38.8. The
point above the limit and the run leading up to that point suggest that there was a change in the
rate of these events sometime during Year Two. Moreover, the point above the upper range limit
suggests that Year Five is going to have a detectably lower rate for this event than what was seen in Years Two, Three, and Four. However, waiting until you have over four years worth of data before computing limits is simply not realistic for these data. If we assume that an effort to reduce the rate of this event was started at the beginning of Year Two, we might have used Year One as our baseline. This would
produce the chart and limits shown in figure 17.

**F**

Here we detect a lowered rate for this event in Year Three. Interpreting the run in which this point above the limit occurs we see that this improvement might have begun as early as the start of Year Two. And Year Five still looks like it will be better than the previous years. Contrast the clarity and interpretability of the signals in figures 16 and 17 with the lack of any clear signal in figure 15.

### Summary

Whenever the average count per time period drops below 1.00 you are working with rare events. When this happens the *p*-chart, * np*-chart, *c*-chart, and *u*-chart will all become very insensitive. At the same time the problem of chunky data will prevent you from using the *XmR* chart with the counts of items or the counts of events. When this happens you should shift from counting the events per time period and instead measure the area of opportunity between the rare events. Here you cease to get a value every time period, and instead get a value every time you have an event. (This shift in how you collect the data argues against using this approach except in the case of rare events.)

When working with the times between events you may wish to compute instantaneous rates
for each event and place these on an *XmR* chart as illustrated in figures 4 and 5, or you may
work directly with the times between events as shown in figure 6. When these charts become
one-sided, you may need to work with both charts in order to detect both improvements and
deterioration, as shown in figures 8 and 9.

While specialty charts have been created for times between events, they suffer the logical
flaw of making a strong assumption that the data are homogeneous prior to examining those data for homogeneity. As a result the *g*-chart in figure 10 failed to show any of the signals found in figures 8 and 9, and the simple *t*-chart in figure 11 failed to detect the signal found in figure 6.

The complex *t*-chart does a better job than the simple *t*-chart simply because it uses empirical
limits rather than theoretical limits. For this reason figure 13 found the same signal shown in
figure 6. However, the complex *t*-chart still assumes that all of the times between events are
modeled by one and the same exponential distribution, and it applies the same nonlinear
transformation to all of the values prior to placing them on a chart. As is generally the case, this
nonlinear transformation will tend to hide the signals within the data. Thus, figure 15 failed to
show the signals found in figures 16 and 17.

The *g*-chart and the *t*-chart seek to provide exact solutions to specific problems.

Unfortunately, these specific problems do not match the realities of the data we actually encounter. The *XmR* chart simply takes the data as they exist and examines them to see if they
show evidence of a change in the underlying process. This empirical approach may use
approximate limits, but it is a robust approach that works with all types of data. (See my column
for November of 2010 for more on this topic.) And as the famous statistician John Tukey once
said: “It is better to have an approximate answer to the right question than an exact answer to the
wrong question.”

## Comments

## Reply to Jstats

The question addressed by any process behavior chart is more basic than "What is the shape of the histogram?" or "What is the probability model?" It has to do with whether we can meaningfully use any probability model with our data. In starting with distributional questions you beg the question of homogeneity which the chart was created to address. This mistake was made by E. S. Pearson in 1935, and has been repeated many times since. You need to reread Shewhart's two books more carefully.

## Distributions

Don - thank you for your reply to my comments. This seems to be a chicken and egg situation. If you assume data to be homogenous, then you may only detect a future issue that has deviated from the current homogenous state. However, if the data were not homogenous to begin with, then this assumption may be masking signals. So the best decision would seem to be to use some other knowledge to consider whether your assumption is reasonable or not.

In this case I would use two pieces of knowledge. The first is that a Poisson process is known to have "opportunities between occurences" to be geometrically distributed. In this case it may be reasonable to believe this is a Poisson process but we probably want other evidence as well. Which would bring me to the second piece of knowledge which is the graph itself...removing centerline and control limits, the data simply look random. As a practitioner, I would have serious doubts about the points labelled as out-of-control actually having special causes and would not waste time pursuing special causes for 20% of my points.

I have certainly read Shewhart's books and articles and understand he did not want control charts to be interpreted as fitted specific probabilities associated with the normal (or any other) distribution. However, among all of the "example" datasets he uses, there is never a sample that comes out as skewed as data from time between events typically does. They are all roughly "moundish". I don't think considering data to have potentially come from a highly skewed distribution is rejecting Shewhart's ideas so much as evaluating an area in which Shewhart did not publish work.

In determing the best way to plot data from what we have good reason to believe is a highly-skewed distribution, the exponential or geometric distributions may not be perfect but they provide more reasonable limits than Shewhart's methods and only require one chart. I find this to be much more useful and to provide more balanced and reasonable false alarm rates.

Also, I'm interested in investigating the methods you described for G- and T-Charts as they are not the ones I am familiar with. Is there a reference I can consult for more information? Your help would be greatly appreciated!

## Great article

I actually met Don Wheeler at an ASQ annual conference in 1996. I went to his booth asking the question of what to do about rare events. His answer was "Buy this book" (Understanding Variation The Key to Managing Chaos) and even got his autograph (though I am sad to report someone at some point borrowed the book, and I don't have it on hand).

I've used the rate between events chart described in the beginning of the article with good success. It does get a bit hard to explain to management, as the average of the rates ends up being skewed higher than might be expected by the layman. I am convinced this "skewing" is good for the analysis.

The charts do tend to (in my empirical experience) be a little more susceptible to false alarms than others, but in the safety arena I'd rather have a few more false alarms.

One suggestion I have, which removes the past looking "report card" feature is to plot a "ghost point" for today when the chart is updated. That is - if an event were to happen 1 minute after I updated the chart, what would it look like? That gives the context of how long have we gone since the last event. Stretching things a little, I've even converted that to the exponential probability of what is the likelihood that we could have gone that long given the current average rate that is plotted. That hasn't been as beneficial especially in layman conversations and may be questionable statistically, but might be worth considering.

Thanks,

## Rare Events and Safety Statistics

I have been using the methodology described by Dr. Wheeler to analyze Safety data (OSHA Recordable Injury Rates, in particular) for several years with good success in seeing the signals which are otherwise difficult to detect.

## Another Classic from the Master

Andrew Torchia Principal Quality Consultant www.qaexp.com

Once again Dr. Wheeler condenses a chapter's worth material into a single column that is highly educational yet easy to read.

## A classic?

Looking at the graph of infection rates, my instincts told me that the process was stable as nothing about it looked nonrandom. So I did some fact-checking, and the data certainly appears to fit the distributional assumption of geometric (and distribution fit is important for G and T Charts). So then I took Dr. Wheeler's control limits, and calculated the odds of randomd data from that geometric distribution falling outside of the control limits - the result: 18.3%. 18.3%! That's your false alarm rate. He then circles 20% of the points and claims there is a lack of control.

If you think this is "a classic" then have fun chasing false alarms all day. Apparently no research at all was done on the properties of the method he is proposing and as a statistician, I am deeply concerned about anyone reading this and blindly believing this is a better method. As this method additionally requires calculations on your data, I do not see how it is easier either.

Further, the methods described for how standard G and T Charts are constructed are not common methods, and chances are very good that your statistical package uses a much better method and does not require the use of two different charts to detect shifts up and down.

If you have this type of data, check to see if it meets the distribution assumption of geometric or exponential (just a histogram should be a good enough check). If it does, use the standard G or T Chart that Minitab or other packages provide. If it does not then it is most likely more symmetric, in which case an I-MR Chart would work fine although you should ensure you have a posisitive LCL.

Use the method described in this article at your own peril.