System accident

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A system accident, or normal accident, is an "unanticipated interaction of multiple failures" in a complex system. This complexity can either be technological or organizational, and often has elements of both.[1] A system accident can be very easy to see in hindsight, but very difficult to see in foresight. Ahead of time, there are simply too many possible action pathways.

These accidents often resemble Rube Goldberg devices in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an emergent disaster. System accidents were described in 1984 by Charles Perrow, who termed them "normal accidents", as having such characteristics as interactive complexity, tight coupling, cascading failures, and opaqueness. James T. Reason extended this approach with human reliability[2] and the Swiss cheese model, now widely accepted in aviation safety and healthcare.

Once an enterprise passes a certain point in size, with many employees, specialization, backup systems, double-checking, detailed manuals, and formal communication, employees can all too easily recourse to protocol, habit, and "being right." Rather like attempting to watch a complicated movie in a language one is unfamiliar with, the narrative thread of what is going on can be lost. And other phenomena such as groupthink can be occurring at the same time. Real-world accidents almost always have multiple causes. In particular, it is a mark of a dysfunctional organization to simply blame the last person who touched something.

In 2012 Charles Perrow wrote, "A normal accident is where everyone tries very hard to play safe, but unexpected interaction of two or more failures (because of interactive complexity), causes a cascade of failures (because of tight coupling)."[3]

There is an aspect of an animal devouring its own tail, in that more formality and effort to get it exactly right can make the situation worse.[4] For example, the more organizational rigmarole involved in adjusting to changing conditions, the more employees will delay in reporting the changing conditions. The more emphasis on formality, the less likely employees and managers will engage in real communication. New rules can actually make the situation worse, both by adding another layer of complexity and by reminding employees yet again that they are not to think but are just to follow the rules.

In a 1999 article primarily focusing on health care, J. Daniel Beckham wrote, "It is ironic how often tightly coupled devices designed to provide safety are themselves the causes of disasters. Studies of the early warning systems set up to signal missile attacks on North America found that the failure of the safety devices themselves caused the most serious danger: false indicators of an attack that could have easily triggered a retaliation. Accidents at both Chernobyl and Three Mile Island were set off by failed safety systems."[5]

Perhaps anticipating the concept of system accident, the Apollo 13 Review Board wrote in 1970, "It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design."[6]

Possible system accidents[edit]

Apollo 13 space flight, 1970[edit]

For more details on this topic, see Apollo 13.

From the Apollo 13 Review Board ("Cortright Report"):

e. Although Beech did not encounter any problem in detanking during acceptance tests, it was not possible to detank oxygen tank no. 2 using normal procedures at KSC. Tests and analyses indicate that this was due to gas leakage through the displaced fill tube assembly [emphasis added].

f. The special detanking procedures at KSC subjected the tank to an extended period of heater operation and pressure cycling. These procedures had not been used before [emphasis added], and the tank had not been qualified by test for the conditions experienced. However, the procedures did not violate the specifications which governed the operation of the heaters at KSC.

g. In reviewing these procedures before the flight, officials of NASA, ER, and Beech did not recognize the possibility of damage due to overheating. Many of these officials were not aware of the extended heater operation. In any event, adequate thermostatic switches might have been expected to protect the tank [emphasis added].[6]

Three Mile Island, 1979[edit]

For more details on this topic, see Three Mile Island accident.

The 1979 Three Mile Island accident inspired Perrow's Normal Accidents book, where a nuclear accident occurs, resulting from an unanticipated interaction of multiple failures in a complex system. TMI was an example of a normal accident because it was "unexpected, incomprehensible, uncontrollable and unavoidable".[7]

Perrow concluded that the failure at Three Mile Island was a consequence of the system's immense complexity. Such modern high-risk systems, he realized, were prone to failures however well they were managed. It was inevitable that they would eventually suffer what he termed a 'normal accident'. Therefore, he suggested, we might do better to contemplate a radical redesign, or if that was not possible, to abandon such technology entirely.[8]

When systems exhibit both "high complexity" and "tight coupling", as at Three Mile Island, the risk of failure becomes high. Worse still, according to Perrow, "the addition of more safety devices — the stock response to a previous failure — might further reduce the safety margins if it adds complexity".[8]

ValuJet(AirTran) 592, Everglades, 1996[edit]

For more details on this topic, see ValuJet Flight 592.

Step 2. The unmarked cardboard boxes, stored for weeks on a parts rack, were taken over to SabreTech's shipping and receiving department and left on the floor in an area assigned to ValuJet property.

Step 3. Continental Airlines, a potential SabreTech customer, was planning an inspection of the facility, so a SabreTech shipping clerk was instructed to clean up the work place. He decided to send the oxygen generators to ValuJet's headquarters in Atlanta and labelled the boxes "aircraft parts". He had shipped ValuJet material to Atlanta before without formal approval. Furthermore, he misunderstood the green tags to indicate "unserviceable" or "out of service" and jumped to the conclusion that the generators were empty.

Step 4. The shipping clerk made up a load for the forward cargo hold of the five boxes plus two large main tires and a smaller nose tire. He instructed a co-worker to prepare a shipping ticket stating "oxygen canisters - empty". The co-worker wrote, "Oxy Canisters" followed by "Empty" in quotation marks. The tires were also listed.

Step 5. A day or two later the boxes were delivered to the ValuJet ramp agent for acceptance on Flight 592. The shipping ticket listing tires and oxygen canisters should have caught his attention but didn't. The canisters were then loaded against federal regulations, as ValuJet was not registered to transport hazardous materials. It is possible that, in the ramp agent's mind, the possibility of SabreTech workers sending him hazardous cargo was inconceivable[9]


In a 1998 article in The Atlantic regarding the lack of healthy interplay between theory and practice, William Langewiesche wrote, "Such pretend realities extend even into the most self-consciously progressive large organizations, with their attempts to formalize informality, to deregulate the workplace, to share profits and responsibilities, to respect the integrity and initiative of the individual. The systems work in principle, and usually in practice as well, but the two may have little to do with each other. Paperwork floats free of the ground and obscures the murky workplaces where, in the confusion of real life, system accidents are born."[4]

Possible Future Applications of Concept[edit]

In an article entitle "The Human Factor", William Langewiesche talks the 2009 crash of Air France 447 over the mid-Atlantic. He points out that since the 1980s when the transition to automated cockpit systems in airliners began, the safety record has improved fivefold. Langwiesche writes, "In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers." He cites engineer Earl Wiener who added to the humorous statement attributed to the Duchess of Windsor that one can never be too rich or too thin, "or too careful about what you put into a digital flight-guidance system." Wiener says that the effect of automation is typically to reduce the workload when it is light, but to increase it when it's heavy. Boeing Engineer Delmar Fadden said that once capacities are added to flight management systems, they become impossibly expensive to remove because of certification requirements. But if unused, may in a sense lurk in the depths unseen.[10]

Langewiesche cites industrial engineer Nadine Sarter who writes about "automation surprises," often related to modes the pilot does not fully understand or that the system switches into on its own. In fact, one of the more common questions asked in cockpits today is, "What’s it doing now?"[10]

A fivefold increase of safety is not going to be given up. Perhaps one can hope that systems will not talk down to pilots as if they're less intelligent and thereby jumble the complex and the simple? Perhaps what Earl Wiener talked about will be focused on, with the goal of streamlining the workload during times of high workload? Regardless of specifics, there does seem to be room for the more thoughtful approaches advocated by Wiener, Fadden, and Sarter.

References[edit]

Notes

  1. ^ Perrow, Charles (1984). Normal Accidents: Living with High-Risk Technologies, With a New Afterword and a Postscript on the Y2K Problem, Princeton, New Jersey: Princeton University Press, ISBN 0-691-00412-9, 1984, 1999 (first published by Basic Books 1984)
  2. ^ Reason, James (1990-10-26). Human Error. Cambridge University Press. ISBN 0-521-31419-4. 
  3. ^ GETTING TO CATASTROPHE: CONCENTRATIONS, COMPLEXITY AND COUPLING, Charles Perrow, The Montréal Review, December 2012.
  4. ^ a b Langewiesche, William (March 1998). The Lessons of Valujet 592, The Atlantic. See especially the last three paragraphs of this long article: “ . . . Understanding why might keep us from making the system even more complex, and therefore perhaps more dangerous, too.”
  5. ^ The Crash of ValuJet 592: Implications for Health Care, J. Daniel Beckham, Jan. '99. DOC file --> http://www.beckhamco.com/41articlescategory/054_crashofvalujet592.doc Mr. Beckham runs a health care consulting company, and this article is included on the company website.
  6. ^ a b REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report"), Chair Edgar M. Cortright, CHAPTER 5, FINDINGS, DETERMINATIONS, AND RECOMMENDATIONS.
  7. ^ Perrow, C. (1982), ‘The President’s Commission and the Normal Accident’, in Sils, D., Wolf, C. and Shelanski, V. (Eds), Accident at Three Mile Island: The Human Dimensions, Westview, Boulder, pp.173–184.
  8. ^ a b Nick Pidgeon (22 September 2011 Vol 477). "In retrospect:Normal accidents". Nature.  Check date values in: |date= (help);
  9. ^ Stimpson, Brian (October 1998). "Operating Highly Complex and Hazardous Technological Systems Without Mistakes: The Wrong Lessons from ValuJet 592". Manitoba Professional Engineer. Archived from the original (reprint) on 2007-09-27. Retrieved 2008-03-06. 
  10. ^ a b The Human Factor, Vanity Fair, William Langewiesche, Sept. 17, 2014. One big problem with the automation was, "After Dubois arrived, the stall warning temporarily stopped, essentially because the angle of attack was so extreme that the system rejected the data as invalid. This led to a perverse reversal that lasted nearly to the impact: each time Bonin happened to lower the nose, rendering the angle of attack marginally less severe, the stall warning sounded again—a negative reinforcement that may have locked him into his pattern of pitching up, assuming he was hearing the stall warning at all."

Further reading