© 2022 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 03/19/2019
Within maintenance management, the term MTBF (mean time between failures) is the most important key performance indicator after physical availability.
Unlike MTTF (mean time to failure), which relates directly to available equipment time, MTBF also adds up the time spent inside a repair. That is, it starts its count from a certain failure and only stops when the fault is remedied, and the equipment restarted and performing again. According to ISO 12489, this indicator can only be used for repairable equipment, and MTTF is the equivalent of nonrepairable equipment.
The graphic below illustrates these occurrences:

To calculate the MTBF in figure 1, we add the times T1 and T2 and divide by two. That is, the average of all times between one failure and another, as well as its return, is calculated. It is, therefore, a simple mathematical calculation. But what does MTBF mean?
Generally speaking, this indicator is associated with a reliability quality of assets or asset systems, and may even reach a repairable item, although it is rare to have data available to that detail. Maintenance managers set some benchmark numbers and track performance on a chart over time. In general, the higher the MTBF, the better; or the fewer times of breaks and repairs over the analyzed period.
Once we have determined the concepts, some particularities need to be answered:
1. Can we establish periodicity of a maintenance plan based on MTBF time?
2. Can we calculate failure rates based on MTBF?
3. Can we calculate probability of failure based on MTBF?
4. If the MTBF of an asset or system is 200 hours, after that time will it fail?
Let’s answer each of these questions separately.
The MTBF is an average number calculated from a set of values. That is, these values can be grouped into a histogram to generate a data distribution where the average value is its MTBF, or the average of the data. Imagine that this distribution follows the Gaussian law, and we have a normal curve that was modeled based on the failure data. The chart below shows that the MTBF is positioned in the middle of the chart.

In a modeled PDF (probability density of failure) curve, the mean value, or MTBF, will occur after 50 percent of the failure frequencies have occurred. If we implement the preventive plan with a frequency equal to the MTBF time, it will already have a 50percent probability of failing. Therefore, the MTBF is not a number that indicates the optimal time for a scheduled intervention.
Considering the modeling of the failure data to calculate the MTBF, it’s only possible in the exponential distribution to fix a value where the failure rate is the inverse of the MTBF:
MTBF = 1 / ʎ
In this distribution, the MTBF time already corresponds to 63.2 percent probability of failure.
Figure 3: 
With any modeling other than exponential, the failure rate will be variable and timedependent, so its calculation will also depend on factors such as the probability density function f(t) and the reliability function R(t):
ʎ(t) = h(t) = f(t) / R(t)
Although the exponential distribution is the most adopted in reliability projects, which would generate a constant failure rate over time, most of the assets have variations within their “bathtub curve,” as exemplified by Moubray:

This means that the exponential expression is not best suited to reflect the behavior of most assets in an industrial plant.
As seen above, only in the exponential distribution is there a constant failure rate that can be calculated as the inverse of the MTBF. In this case, yes, we can calculate the probability of failure of an asset using the formula below:
f(t) = ʎˑexp(ʎt)
For other models where the failure rate depends on the time, it is only possible to calculate the probability of failure through data modeling and determination of a parametric statistical curve.
What exactly does that number mean? It was shown that MTBF isn’t used as a maintenance plan frequency. According to the concepts explained above, this time means nothing because it is not comparable to its history over months. If the parametric model governing the behavior of the assets in a reliability study is not determined, the time of 200 hours has no meaning for a probability of failure. In the case of an MTBF provided by equipment manufacturers that is different, through life tests it’s possible to determine exponential curves and thus calculate the time in which there will be 63.2 percent of sample failures.
I hope this discussion has helped us to reflect on the definitions of an indicator that is both used and also so misunderstood within industrial maintenance management.