In part one of this article, the technique of root cause analysis (RCA) was explained using simple examples. Part two contains a detailed list of critical success factors to get maximum results from your RCA. These are based on lessons learned from experience, including mistakes made. Because RCA and other quality techniques are more common in manufacturing, several of the examples in this article are drawn from service industries to offer a broader perspective. The article is equally relevant in manufacturing and service businesses as well as government and nonprofit organizations.
The technique of RCA is summarized in figure 1 below. We will continue to refer to it in this article as we look at the various critical success factors and lessons learned in RCA.
Years of experience applying root cause analysis (RCA) to dozens of business problems taught us many lessons. We also made quite a few mistakes. Hopefully, we have learned from these, too. Listed below are what we found to be critical success factors for RCA to be effective. I have also talked about our mistakes, so that you don’t make them (or, at least have a good laugh at our expense).
In part one, we saw how the RCA process almost stopped at several stages before getting to the actual root cause—once after spraying the air freshener, then after the lizard was found and removed, and again after fixing a hole in the air-conditioning duct. At none of these stages had we actually reached the root cause, which was the absence of preventive maintenance. We were merely attacking the symptoms instead of getting to the root cause and eliminating it. This happened in several business problems as well. We sometimes found ourselves almost unwilling to get to the bottom of a problem. In some cases, we were fortunate to have someone like Dev who gently pushed us along. In others, we learned the hard way: The defect kept recurring, or customers continued to complain, forcing us to eventually identify the root cause and eliminate it.
Time and again, we found that despite doing the RCA and identifying opportunities for improving the process, there was resistance to actually making the process change—or introducing a new process where none existed, as in the lizard example. We were happy to take one-time actions like removing the lizard or even fixing the hole in the duct, but committing to the discipline of following a process repeatedly in the future seemed daunting. Or was it the accountability the process would bring that we were secretly afraid of? Unfortunately, as we learned, the improvement could never be permanent unless the prevention part of RCA is institutionalized. And any institutionalization means going back to the process. In other words, we learned that RCA without process improvement is meaningless.
In fact, we realized from applying RCA in various businesses that there are only two choices for how to eliminate the root cause of any problem—process or people (or often both). By process, we mean an absent or faulty process. By people, we mean that the process exists on paper, but it was not followed. This could be due to lack of either training or discipline. In manufacturing industries, the cause could also be defective machines, parts, or raw materials, but these can usually be traced back to either a process or people issue.
Once the correct root cause is identified, the solution is to introduce a process if none exists, improve the process if it is faulty, or train people to follow the process. Wherever possible, poka-yoke or mistake-proofing, sometimes with the help of technology, can be used to minimize the chance of human error. A familiar example of mistake-proofing in computer data entry is the use of an online calendar to select a date instead of actually typing in a date, which significantly reduces the chances of entering it incorrectly.
One of the biggest hurdles to jump with RCA is the tendency to mistake the question “Who?” (as in whodunit) for “Why?” We often found people trying to fix the blame for the problem rather than getting to the root cause. We had to repeatedly tell—and prove—that the company was not interested in knowing whose fault it was but only in preventing the problem from happening again. This has a lot to do with organizational culture. It is senior management’s responsibility to make everybody comfortable with discussing defects and complaints without getting into a blame game.
Companies like Toyota actually encourage employees to make quality problems visible. They even empower workers to stop production, if necessary, as soon as they detect a defect to fix the problem, and to perform RCA to determine preventive steps.
This is another common road block to RCA. The moment the “why-why” trail led us outside our own department or company, we tended to drop it, saying (with a sense of relief), “That’s beyond our control; there’s nothing we can do.” Almost every business problem we encountered involved more than one department within our company, and often external entities such as vendors, distributors, and partners to whom certain types of work were outsourced, and so forth.
After months of losing improvement opportunities on the pretext of “third-party dependency,” somebody plucked up the courage to ask, “So what?” Why couldn’t we get the relevant “outsiders” to work with us in identifying the root cause and reducing defects? We laughed at the person who suggested this, but we let her to go ahead and try it anyway. (We couldn’t wait to see her get egg on her face.)
The first time this was tried was with an external company to which some key processes were outsourced. In the past, most customer complaints were blamed, conveniently, on this third party. To everybody’s surprise, at our request this external partner was happy to work with us in reducing defects and customer complaints. We knew the company was serious when it appointed three of its best people, who were experienced with the process it was running for us, to become part of the RCA team. Both companies found that not only did working together help reduce and prevent several types of customer complaints, but the process also brought productivity improvements and cost savings. Such benefits are shared by both companies, increasing profits for both.
The best time to do RCA and eliminate a defect is as soon as it is first detected. However, defects often don’t get the attention they deserve until customers start complaining. In some cases, we almost refused to accept the existence of a defect until customers forced us to. Like Raj in part one of this article, even after complaints started to surface, we would sometimes respond, “But nobody’s complained before.” We sometimes needed a flood of complaints to actually start paying heed to a defect. In the process, we lost time, money, and at times, customers.
A common response we get once a root cause and solution are identified is, “But that’s the way it’s always been done,” or “That’s the industry-practice.” The preventive action to eliminate the root cause to most problems is a process or behavior change, and as with any other change, there is often resistance. We learned from experience that the best way to manage this change is to make sure that the people who actually do the process are involved in the RCA. This way, they are part of the solution rather than the problem.
Have the right attitude. Use the template. Ask the customer. Observe the process. In some cases where RCA was being done, we would discover that we had lost our way or reached a dead end at one of the whys. There would be no apparent further answer to why, and neither would the point we had reached look like a root cause with a clear preventive action.
In one instance an insurance company was getting a number of customer complaints for errors on policy documents. The company decided to find out the root cause to help reduce the errors. The RCA team’s answer to the first why to this problem was that there were errors in some of the application forms given by the customers in the first place. These errors were carried onto the policy document. The answer to the next why—“Why were there errors on application forms?”—was that customers completed the forms in a hurry. (Who likes to fill in forms?) Most of the team members felt there was nothing we could do about customer behavior. Some of them said that this RCA was getting us nowhere. Errors on some insurance policy documents were a “normal industry occurrence” that we had to live with.
Fortunately, one team member said, “Wait a minute! Is there really nothing we can do? Is the customer being in a hurry really the reason for errors? After all, the customer is more interested than anybody else in getting an error-free insurance document, so why would so many of them be hasty and put wrong information on the application form? Why don’t we speak to a few customers who complained and find out what really happened?”
When we did this, we realized that we had gotten the answer to the first why wrong. Customers told us that they had never filled in the application in the first place. The company’s agents who sold them the insurance had asked them to just sign the blank application and collected certain mandatory documents for proof of age and address. The agents had told the customers that that they would fill up the details later at the company’s office, to save customers the trouble. Because the agent was usually known to the customers, most of them agreed to this.
The RCA team then decided to observe what the agents did next. Several agents were hurriedly filling out the application forms on behalf of the customers. They would rely on their own knowledge or the documents given by the customer, some of which were barely decipherable photocopies. An agent’s priority, of course, was to hand over the application to the company and collect the commission.
The RCA team realized its folly. It had lost its way by answering the first why incorrectly. The correct reason for the errors was not that the customer was completing the forms in a hurry, but that the customer was not completing the form at all; it was the agent who did this, based on secondhand knowledge.
Once the real cause was identified, the RCA team realized that there was something the company could do about it. The preventive action was to make it mandatory for agents to get the form completed by the customer, and educating all agents about this. The company saw a drastic reduction in the errors and complaints when this was done.
We learned from this example that it is important for the RCA team to have the right attitude of prevention, not blame. To arrive at the right answers to each why, RCA team members must never forget that they are trying to identify what can be done to prevent the problem. Blaming someone outside the “circle of influence” may be convenient but will usually lead to a dead end, as it did in this example. An easy way to avoid this is to pause for a moment after answering each why and asking, “Does this really explain why the previous step happened, and is it leading us closer to finding out how to prevent this problem?” Do this before asking the next why. Using a simple template for RCA (see figure 1) helps a great deal to stay on the right path.
We also learned from this experience that two easy ways of arriving at a root cause or at least remaining on the right path are to ask the complainant for more details and to observe the process as it actually happens. Some of us were at first afraid of speaking to customers who had complained, but we found after talking to several of them that most appreciated the fact the company was taking the complaint seriously and involving the customer in the prevention efforts. Several customers said that the company’s efforts to eliminate defects at the root gave them more confidence in the company despite the recent bad experience and promised more business in future.
We’ll look at more critical success factors and lessons learned in part three of this article.