Root cause analysis
Root cause analysis (RCA) is a method of problem solving that tries to identify the root causes of faults or problems. A root cause is a cause that once removed from the problem fault sequence, prevents the final undesirable event from recurring. A causal factor is a factor that affects an event's outcome, but is not a root cause. Though removing a causal factor can benefit an outcome, it does not prevent its recurrence for certain. RCA arose in the 1950s as a formal study following the introduction of Kepner-Tregoe Analysis, which had limitations in the highly complex arena of rocket design development and launch in the United States by the National Aeronautics and Space Administration (NASA). New methods of problem analysis developed by NASA included a high level assessment practice called MORT (Management Oversight Risk Tree). MORT differed from RCA by assigning causes to common classes of cause shortcomings, that summarized became a short list. These included work practice, procedures, management, fatigue, time pressure, along with several others. For example, an aircraft accident could occur as a result of weather augmented by pressure to leave on time. Failure to observe weather precautions could indicate a management or training problem, while lack of any weather concern might indict work practices.
RCA practice solve problems by attempting to identify and correct the root causes of events, as opposed to simply addressing their symptoms. Focusing correction on root causes has the goal of preventing problem recurrence. RCFA (Root Cause Failure Analysis) recognizes that complete prevention of recurrence by one corrective action is not always possible.
Conversely, there may be several effective measures (methods) that address the root causes of a problem. Thus, RCA is an iterative process and a tool of continuous improvement.
RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a preemptive method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management.
Root cause analysis is not a single, sharply defined methodology; there are many different tools, processes, and philosophies for performing RCA. However, several very-broadly defined approaches or "schools" can be identified by their basic approach or field of origin: safety-based, production-based, process-based, failure-based, and systems-based.
- Safety-based RCA descends from the fields of accident analysis and occupational safety and health.
- Production-based RCA has its origins in the field of quality control for industrial manufacturing.
- Process-based RCA is basically a follow-on to production-based RCA, but with a scope that has been expanded to include business processes.
- Failure-based RCA is rooted in the practice of failure analysis as employed in engineering and maintenance.
- Systems-based RCA has emerged as an amalgamation of the preceding schools, along with ideas taken from fields such as change management, risk management, and systems analysis.
Despite the different approaches among the various schools of root cause analysis, there are some common principles. It is also possible to define several general processes for performing RCA.
General principles of root cause analysis
- The primary aim of root cause analysis is to identify the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes (consequences) of one or more past events in order to identify what behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes and to identify the lessons to be learned to promote the achievement of better consequences. ("Success" is defined as the near-certain prevention of recurrence.)
- To be effective, root cause analysis must be performed systematically, usually as part of an investigation, with conclusions and root causes that are identified backed up by documented evidence. Usually a team effort is required.
- There may be more than one root cause for an event or a problem, the difficult part is demonstrating the persistence and sustaining the effort required to determine them.
- The purpose of identifying all solutions to a problem is to prevent recurrence at lowest cost in the simplest way. If there are alternatives that are equally effective, then the simplest or lowest cost approach is preferred.
- Root causes identified depend on the way in which the problem or event is defined. Effective problem statements and event descriptions (as failures, for example) are helpful, or even required.
- A logical way to trace down root causes, is by data mining hierarchical clustering solution (such as GT data mining). A root cause is defined in that context as "the conditions that enable one or more causes". Root causes can be deductively sorted out from upper groups of which the groups include a specific cause.
- To be effective, the analysis should establish a sequence of events or timeline to understand the relationships between contributory (causal) factors, root cause(s) and the defined problem or event to prevent in the future.
- Root cause analysis can help transform a reactive culture (that reacts to problems) into a forward-looking culture that solves problems before they occur or escalate. More importantly, it reduces the frequency of problems occurring over time within the environment where the root cause analysis process is used.
- Root cause analysis is a threat to many cultures and environments. Threats to cultures often meet with resistance. There may be other forms of management support required to achieve root cause analysis effectiveness and success. For example, a "non-punitive" policy toward problem identifiers may be required.
General process for performing and documenting an RCA-based Corrective Action
RCA (in steps 3, 4 and 5) forms the most critical part of successful corrective action, because it directs the corrective action at the true root cause of the problem. Knowing the root cause is secondary to the goal of prevention, but without knowing the root cause, it is not possible to determine what an effective corrective action for the defined problem would be.
- Define the problem or describe the event to prevent in the future. Include the qualitative and quantitative attributes (properties) of the harmful outcomes. This usually includes specifying the natures, the magnitudes, the locations, and the timing of events. In some cases, "lowering the risks of reoccurrences" may be a reasonable target. For example, "lowering the risks" of future automobile accidents maybe more reasonable target than "preventing all" future automobile accidents.
- Gather data and evidence, classifying it along a timeline of events to the final failure or crisis. For every behavior, condition, action, and inaction specify in the "timeline" what should have been done when it differs from what was done.
- In data mining Hierarchical Clustering models, instead of classifying use the clustering groups: (a) peak the groups that exhibit the specific cause, (b) find their upper-groups, (c) find group characteristics that are consistent, (d) check with experts and validate.
- Ask "why" and identify the causes associated with each step in the sequence towards the defined problem or event. "Why" is taken to mean "What were the factors that directly resulted in the effect?"
- Classify causes into causal factors that relate to an event in the sequence and root causes, that if eliminated, can be agreed to have interrupted that step of the sequence chain.
- Identify all other harmful factors that have equal or better claim to be called "root causes." If there are multiple root causes, which is often the case, reveal those clearly for later optimum selection.
- Identify corrective action(s) that will with certainty prevent recurrence of each harmful effect, including outcomes and factors. Check that each corrective action would, if pre-implemented before the event, have reduced or prevented specific harmful effects.
- Identify solutions that, when effective, and with consensus agreement of the group, prevent recurrence with reasonable certainty, are within the institution's control, meet its goals and objectives and do not cause or introduce other new, unforeseen problems.
- Implement the recommended root cause correction(s).
- Ensure effectiveness by observing the implemented recommendation solutions.
- Identify other methodologies for problem solving and problem avoidance that may be useful.
- Identify and address the other instances of each harmful outcome and harmful factor.
Limitations of Root Cause Analysis
RCA is one of the most widely used methods to improving patient safety, but few data exist that uphold its effectiveness. The quality of RCA varies across facilities, and its effectiveness in lowering risk or improving medical safety has not been systematically established. The quality of RCA is dependent on the accuracy of the input data as well as the capability of the RCA team to appropriately use these data to create an action plan. In some cases, only one source of error or a few sources of error are emphasized, when in reality the situation might be more complex. The thoughts, conversations, and relationships of members play an important role in determining the effectiveness of an RCA team. People tend to select and interpret data to support their prior opinions. An atmosphere of trust, openness, and honesty is critical to encourage members to share what they know without fear of being criticized or unacknowledged. In addition, RCA lacks the ability to allow one to determine the probability, criticality, and severity of events, which can be useful for prioritizing management and preventing future undesirable events. RCA can be very time-consuming because of all the time required for data gathering, as the accuracy of the research is crucial. Organizations should ensure that adequate resources, time, and feedback are sufficiently provided during the RCA process so that the team will be able to carry out its task effectively.
- Factor analysis
- Failure mode and effects analysis
- Fault tree analysis
- Forensic engineering
- Eight Disciplines Problem Solving
- Multiple regression and multivariate linear regression
- Orthogonal Defect Classification
- Barrier Analysis
- A3 Problem Solving
- Wilson, Paul F.; Dell, Larry D.; Anderson, Gaylord F. (1993). Root Cause Analysis: A Tool for Total Quality Management. Milwaukee, Wisconsin: ASQ Quality Press. pp. 8–17. ISBN 0-87389-163-5.
- Jens Rasmussen, Annelise M. Pejtersen, L.P.Goodstein (1994). Cognitive Systems Engineering. John Wiley & Sons. ISBN 0471011983.
- "The Management Oversight and Risk Tree (MORT)". International Crisis Management Association. Retrieved 1 October 2014.
- Entry for MORT on the FAA Human Factors Workbench
- Shaqdan K, Aran S, Daftari Besheli L, Abujudeh H. Root-cause analysis and health failure mode and effect analysis: two leading techniques in health care quality assessment. J Am Coll Radiol. 2014 Jun; 11(6):572-9