Featured Product
This Week in Quality Digest Live
Quality Insider Features
Bruce Hamilton
Will lean thinking inform the designers of AI?
Mark Mortensen
Leaders don’t have to choose between delivering results and supporting employees
Gleb Tsipursky
Returning to the office harms diversity
Meridith Wentz
A follow-up conversation with organizational leaders  
Alexander Mirza
A wake-up call for hotel CEOs

More Features

Quality Insider News
1-nanometer axis resolution and less than +/– 0.002 mm line-form accuracy
New calibration software increases efficiency
An MIT researcher and her colleagues are looking to DNA to help guide the process
Requesting quotes for ‘Baldrige Reimagined’
AS-Schneider will present the Digital Valve Kit, customized injection quill solutions at ADIPEC 2022
Providing operators a simple upgrade or ability to switch between two controls on a single machine
From excess inventory and nonvalue work to $2 million in cost savings
Standards-based audit checklist and best practices provide support for growing technology in aerospace

More News

Brian Hughes

Quality Insider

To Root or Not To Root: Part 1

A case study

Published: Thursday, November 6, 2008 - 20:16

Jodi Ullman glanced at her watch: 10:25 a.m. She pushed back from her desk, stood up, and stretched—after first peeking out the door of her office toward the cubical maze to make sure no one was watching. She’d been staring at the screen of her laptop for two hours, poring over the latest qualification test data for a new component. As the quality director for Kulshan Industries, a midsized aerospace manufacturer, she had been spending hundreds of hours working toward the rollout of their latest product—a special electronic control system for a new type of unmanned aircraft. The system was formally named “Natural Instrumentation True Matching Response.” Internally, it was referred to by its acronym NITMAR. However, in private, the project had begun to be referred to as “Nightmare.” After the initial elation at winning the multimillion dollar bid wore off, the realities of actually designing, building, testing, and integrating the system began to set in. “Nightmare” was truly a more accurate moniker.

The NITMAR project was a risk for Kulshan. The founder had started the company out of a small machine shop back in the 1940s as a war-time supplier of precision aircraft instrumentation. The rapid growth of commercial aerospace in the 1950s coupled with dogged determination by the founder transformed Kulshan into a diverse component manufacturer known for its simplicity, quality, and reliability.

The aerospace business changed rapidly in the 1990s due to evolving commercial and military requirements. Smaller companies had been predominantly replaced by offshore alternatives or absorbed by larger entities through acquisition. Pressure to increase quality, decrease price, and shorten lead times forced companies to make radical changes. Kulshan had survived these changes so far, but this was a new world for them. They now had to design and build entire electronic systems—not just high-quality components. While the Kulshan brand remained robust in the market, everything was on the line—they had to prove that they could evolve to satisfy the demands of their changing market.

As Jodi walked to the break room for a new cup of coffee, she was intercepted by Jerry Smith, a senior quality engineer and one of her direct reports. Jerry was agitated. He quickly told her about a new problem that had been discovered during testing of a NITMAR component. A resistor on the main control board failed to pass inspection that morning. Initial observations showed that the solder on the resistor lead had likely been contaminated, which caused the solder to create an imperfect connection between the resistor lead and the circuit board. Resistors and other components are hand-soldered into place by an electronics specialist. Contamination of the solder (externally with dust, grease, or internal impurities in the solder itself) creates an incomplete bond, which could impair the electrical connection. Jerry wanted a complete root cause analysis (RCA) run on the contaminated solder. He was insistent about how lucky they were to find this particular problem before the board was assembled into the rest of the system. Once assembled, only a failure of the system upon testing or in service would have brought the problem to light.

Six months prior, Kulshan had hired a provider of RCA training to train 50 employees, including 10 who completed an additional advanced course. Jodi actually had prior experience with two advanced root cause analysis methods, as well as with the remedial 5 Whys and fishbone diagrams. She liked root cause analysis and was open-minded to Jerry’s request. Technically, he was correct; they should do a full RCA to ensure they didn’t miss anything. However, she had been through too many boardroom arguments about time, schedule, and margin. Everyone on the Kulshan leadership team vocally supported the quality program—including running an RCA—but sometimes had difficulties reconciling the additional costs with the “potential” benefits of finding solutions that prevent recurrence.

When do you account for these additional costs? When the sales team puts together a competitive bid, it’s understandable that they don’t allocate a line in the bid to accommodate anomalies like this. Who could expect to win a project when they plan for failure? Could Jodi support Jerry’s request for an RCA without affecting and upsetting a host of other areas, including operations, production scheduling, integration, and testing? What if the customer found out about the investigation—which might not present Kulshan in the most positive light—particularly when there seemed to be more serious problems with NITMAR.

Jerry estimated the RCA would take three full days and require five key people, with others called in periodically to provide input. That seemed like a lot for a simple case of contaminated solder, even to Jodi, but Jerry’s timing was perfect. He’d caught Jodi on the upswing in her coffee cycle and she was feeling optimistic. She gave him the go-ahead to conduct the RCA.

Two hours later, she received an e-mail from Jerry that he had distributed the meeting request for the RCA. On the “required to attend” list, he included another quality engineer, the technician who installed the resistor, a production manager, a design engineer, and two advanced-training RCA students who needed additional practical experience.

Within five minutes, Nancy Murray, the director of operations, was standing in Jodi’s office doorway. Nancy and Jodi got along most of the time, but there was constant tension between production and quality. Nancy believed in holding production-line employees personally accountable for errors, yet she rewarded successes generously. She was accustomed to this carrot-and-stick strategy and she didn’t fully trust any process that didn’t include personal accountability. In this case, she and her staff believed that the electrical specialist made an error in using the contaminated solder for it was known that this brand of solder had periodic contamination issues. The decision was made three weeks ago in the production meeting that employees were to be informed of the problem, and they must watch for telltale signs of contamination until they used up the remaining stock of solder.

Nancy couldn’t afford to tie up her employees for three days for what she deemed was obviously a cut-and-dry case. At most, couldn’t they just do a fishbone exercise and be done in an hour? What could possibly take three days? Jodi tried to explain that three days was just an estimate, and that there might be other issues—even significant ones—discovered from a full investigation. She also tried to explain that the fishbone method could be confusing and that it led to solutions that were typically ineffective—retrain the worker, council the worker, change the procedure, etc.—all steps she had heard a thousand times, but seemed to not make a sustainable difference.

Nancy was adamant that she couldn’t afford to have people away from their jobs for that amount of time. Jodi saw that the issue wasn’t going to be resolved with each of them entrenched in their positions. She told Nancy that they needed to take the issue to the vice president to decide how to proceed.

What decision should the vice president make? Is a full RCA required in this case or is it overkill given the business environment?

Leave your comments in the “comments” box below. There will be a follow-up article in a couple of weeks on how industry experts would have dealt with this issue.


About The Author

Brian Hughes’s picture

Brian Hughes

Brian Hughes is vice president of Apollo Associated Services -- provider of root cause analysis consulting, training and software. Brian has led significant safety related incident investigations, including those related to major explosions, chemical releases, consumer product contamination and supply chain processes. Brian has helped clients achieve significant savings and improvements in safety, reliability and quality. For more information, visit www.apollorca.com.


Not to root cause

In this case the fault had been found on one small part it would be wrong to assume that all electrical contacts are at fault so a full root cause into a simple issue that could be found by 5 why or fishbone is more appropriate. If there was a trend appearing in the quality data that aligned to the fault that was found then yes more intensive root cause effect would be valued step.

Root Cause Analysis & Spelling, Grammar, etc.

The answer/comment is relatively easy: Constant tension between Production & Quality? Ineffective Solutions? Boardroom arguments? "Potential" benefits of finding solutions that prevent recurrence? A Production meeting decided to use up the current stock of suspect solder? Etc. Serious problems with the whole Quality System and serious lack of buy-in. Need to look at the whole (Hole?) system of Quality/Production/Commitment/Leadership NOW! I think that Kulshan is fortunate that the "bad" solder joint was found now - 3+ days of investigation now will, hopefully, forestall considerable grief (and much greater costs) later in the process, including the negative perception by their customer when they start receiving defective assemblies.

Jodi may have been "peaking" (lead-in article) at 10:25am, but I think the full article has it right - she was probably peeking... "Council" the worker? Maybe that's also right - maybe the workers should be heading up some of the councils that make the key decisions for Kulshan?

Not absolutely sure of who Brian Hughes works for - Appollo or Appolo or Apollo. According to the Web, he apparently works for Apollo Associated Industries, LLC.


To Root or Not To Root

I believe there is a much bigger problem than the bad solder. My theory is that Management is responsible for not knowingly putting suspect product on the production floor. Too many errors of man and method are just waiting to happen in this type of situation. It seems to me that we have failure of the purchasing process, the production process and the monitoring and measurement processes. There is a lot of additional information that I would want, but as a manager, I already see process failures that are likely affecting other products or processes. I would certainly want the RCA to take place.

To Root or Not to Root

I agree as well that a full RCA is necessary somewhat based on the fact that you invested the money and resources in some RCA training. Use it.
I also don't believe this would take anywhere near 3 days worth of analysis. And on the evidence of allowing the contaminated solder to be continued, how do you tell the operator to "watch out" for bad solder?. Get the solder of the floor immediately. And go back and review the environment the soldering is done in and what method was given to check bad solder.

To Root or Not To Root

In light of the other more serious issues on this project, the costs associated with a full 3 day 5 person team to perform Root Cause Analysis may not be justified on this issue at this point. Having said that, as an executive, I would really want to know more details on how extensive this issue is before making any decision. I feel Jodi jumped the gun in authorizing the full blown investigation without more detailed backup.

If I understood the details in the case study, the contamination issue has already had an analysis performed that determined the existing stock of solder was at fault. If this was a "critical application", use of the potentially contaminated solder should have been immediately discontinued at that point in time. Instead, a decision was made (assumed to be by competent persons knowing the impact risk) to continue use of the existing solder until depleted.

Regardless of the criticality of the application, I would definately want to know why the company is still ordering a brand of solder having periodic contamination issues.

David Thuillier
Quality Manager
OASYS Technology, LLC

To root or not to root - understand the problems & risks

Define the problem.

For example, ask how many times has this happened before? Then, what are the hazards and risks exposed to reputation, commercial viability, legal requirements (statutory & regulatory), and people? Use an Enterprise Risk Management approach (eg AS/NZS 4360 or ISO/DIS 31000).

Affinity Diagrams, Interrelationship Digraph type tools are useful to sort out real issues to spend investigation dollars on. Management always has the option to Do Nothing, Do a Quick Fix, or Do a Full RCA "appropriate to the effects of the nonconformities encountered" as it says in ISO 9001/8.5.2. Once you understand the problem and the exposure, then get into problem solving tools such as PDPC or FMEA or the Deming tools.

Basic assumption is that no-one comes to work to do a bad job, so an investigation needs to sort out where the system failed (not the operator). You need to understand what those involved understood at the time - see Prof Sydney Dekker "The Field Guide to Human Error" and Prof James Reason "Human Error".

And of course, there's the "WYMIWYG - YOGWYM" syndrome, "what you measure is what you get - you only get what you measure". If production is measured on units completed, completed units is what management will get; whether or not the units work is another parameter. Which is why any organisation that has any conflict at all between QA and production functions has totally lost the plot.

Hope this helps.

To Root or Not to Root

Yes I agree that a full RCA is necessary. Base don the critical operation of the component and in general for hand soledered electrical connections and also due to the admission that the operators were "using up" known contaminated solder. This is a disaster waiting to happen.
The operators were told to "look out" for contaminated solder; what analysis tools were they given to assist them? What is the potential for other joint failures?
IMO the RCA should be conducted from purchasing to point of use.