This post is a guest entry written by Eric H., an electrical engineer living in Las Cruces NM. Hey, that’s where I live too. If that weren’t coincidental enough, I make his dinner (nearly) every night and he does all the grocery shopping; a fair trade in my estimation.
Think of the last few times that you solved -or thought you solved- a production problem, only to have it rear its ugly head again. And again. Each time it crops up, you panic and pull team members together to come up with ideas for a quick fix to get the product out the door. Why didn’t the quick fix prevent the problem from cropping up again?
Root Cause Analysis is a tool for solving problems. Caution: there is a difference between “solving problems” and “applying quick fixes”! We tend to emphasize the latter because we are pressed for time even though, ironically, the former saves more time than it takes. Cognitive bias is playing a role here, so you need to learn to overcome your innate tendency to emphasize the immediate, the easy, the temporary. Properly done, Root Cause Analysis (RCA) addresses those cognitive biases, uncovers the underlying issues, and prevents problem recurrence.
- What is RCA?
- Why is it done?
- How is it done?
What is RCA?
Many different tools have been developed to try to solve problems systematically. A mass production line (Toyota) might use Taiichi Ohno’s “5 Whys”, the Fishbone (Ishikawa) diagram, or Pareto analysis. Engineers may use Failure Mode and Effects Analysis (FMEA). Safety specialists would use accident analysis and risk management. Any of these can be combined.
Why is it done?
When you are in the midst of dealing with a crisis, chances are that you are blinded or misled by a cognitive bias. Think again of that list of recurring problems: did you ever finally discover an underlying problem that -once that was solved- prevented the recurring problem from recurring? Why didn’t you see that the first time? Look at this list of cognitive biases and check off all the factors that were playing against you. The “recency effect” can be expressed as, “Generals always fight the last war”. The “clustering illusion” occurs when the observer believes they see a pattern that really does not exist. “Confirmation bias” is the tendency to pay attention to facts which support our theories and ignore those which don’t. The “deformation professionelle” or “professional bias” causes one to attribute errors from the point of view of one’s own profession. There’s nothing wrong with people making these mistakes; these are common biases hardwired into our genetic makeup. “Groupthink”, a problem where the group feels its way to a popular but incorrect consensus, is thought to have played a part in the Challenger shuttle disaster. RCA is a tool intended to systematically overcome bias.
A problem arises; you don’t have time for this right now. What do you do? You and your team quickly analyze the problem, spot the immediate cause, apply a band-aid, and move product out. You can’t afford the time to convene a formal investigation; time is money. However, if you were to objectively add up the amount of time that you and everyone around you spends on devising and implementing “quick” fixes, and don’t forget to add the lost materials and sales that result from the problems themselves, and then compare that with the actual amount of time spent on RCA, you would probably be astounded at how much such investigations save you in the long run.
Quick fixes are known in management literature as “first order problem solving”; root cause analysis and underlying source elimination is “second order problem solving” (Kathleen has written about this here, here and here). Many supervisors and managers will typically agree with both of the following statements:
- I should spend more time on second order problem-solving: it is more productive. It eliminates the time thieves that eat into my day.
- I don’t have time for second-order problem solving because I am too busy trying to do first order problem-solving.
See? We recognize the benefit, but still can’t force ourselves to invest our time into high-yield activities like RCA (the result of “hyperbolic discounting” bias!).
A quick fix is always reactive: you wait for the problem to appear, then you decide how to fix it. RCA is proactive: users believe that failure is an opportunity to find everything that led to it and then fix all of those problems. For every real problem, you will find several lesser problems, and dozens of potential problems. In the safety world, for every accident you have, there are a dozen incidents, and a hundred near-hits. By perceiving a problem as an opportunity to exercise your RCA method, you will possibly solve many more problems than just the one, so the payoff will be many times greater than the cost.
In my next entry, I’ll explain how to use simple techniques to solve any problem. In the meantime, we thought it would be fun if some of you could provide examples describing recurring situations you’ve had (via comments, e-mails, or the forum). What fixes did you use that turned out to be band-aids, i.e. didn’t really solve the problem? Feel free to describe problems you are currently having …for the third time! It will be fun to use those (with permission, of course) as demonstrations of the RCA technique.