System accident: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Cewbot (talk | contribs)
m Convert James T. Reason to wikilink (The bot operation is completed 61% in total)
Adjusted & rephrased somewhat, wikified, cite fixes; {{Rename section|date=April 2019|reason="Five-fold increase in airplane safety since 1980s, but flight systems sometimes switch to unexpected "modes" on their own." This is an overly long sentence fragment}}; {{Rename section|date=April 2019|reason="Healthier interplay between theory and practice in which safety rules are sometimes changed?" Section headings shouldn't be in the form of a question}}
Line 1: Line 1:
{{Short description|Unanticipated interaction of multiple failures in a complex system}}
{{Short description|Unanticipated interaction of multiple failures in a complex system}}
{{Multiple issues|
{{Multiple issues|
{{Cleanup reorganize|date=April 2019}}
{{overquote|date=July 2023}}
{{Tone|article|date=July 2023|reason=Style approaches a news or magazine article: '''"chatty"'''. Examples and views are not knitted together for coherent whole}}
{{Lead too long|date=April 2019}}
{{More citations needed|date=April 2019}}
}}
}}


A '''system accident''' (or '''[[Normal accidents|normal accident]]''') is an "unanticipated interaction of multiple failures" in a [[complex system]].<ref>Perrow (1999, p. 70).</ref> This complexity can either be of technology or of human organizations, and is frequently both. A system accident can be easy to see in hindsight, but extremely difficult in foresight because there are simply too many action pathways to seriously consider all of them. [[Charles Perrow]] first developed these ideas in the mid-1980s.<ref>Perrow (1984)</ref> [[William Langewiesche]] in the late 1990s wrote, "the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur."<ref>"Charles Perrow's thinking is more difficult for pilots like me to accept. Perrow came unintentionally to his theory about normal accidents after studying the failings of large organizations. His point is not that some technologies are riskier than others, which is obvious, but that '''the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur''' [Emphasis added]. Those failures will occasionally combine in unforeseeable ways, and if they induce further failures in an operating environment of tightly interrelated processes, the failures will spin out of control, defeating all interventions." —from [https://www.theatlantic.com/magazine/archive/1998/03/the-lessons-of-valujet-592/6534/4/ The Lessons of Valujet 592], ''The Atlantic'', William Langewiesche, March 1998, in section entitled A "Normal Accident" which is about two-thirds of the way into the entire article.</ref>
A '''system accident''' (or '''normal accident''') is an "unanticipated interaction of multiple failures" in a [[complex system]].<ref>Perrow (1999), p. 70</ref> This complexity can either be of technology or of human organizations and is frequently both. A system accident can be easy to see in hindsight, but extremely difficult in foresight because there are simply too many action pathways to seriously consider all of them. [[Charles Perrow]] first developed these ideas in the mid-1980s.<ref>Perrow (1984)</ref>


Pilot and author [[William Langewiesche]] used Perrow's concept in his analysis of the factors at play in a 16 aviation disaster. He wrote in ''The Atlantic'' in 1998: "the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur."<ref name="langew atlantic"/>{{efn|In the same article, Langewiesche continued: [emphasis added]<ref name="langew atlantic"/> {{bq|1=Charles Perrow's thinking is more difficult for pilots like me to accept. Perrow came unintentionally to his theory about normal accidents after studying the failings of large organizations. His point is not that some technologies are riskier than others, which is obvious, but that ''the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur''. Those failures will occasionally combine in unforeseeable ways, and if they induce further failures in an operating environment of tightly interrelated processes, the failures will spin out of control, defeating all interventions.|2=William Langewiesche (March 1998)|3="The Lessons of Valujet 592", p. 23 [Section: "A 'Normal Accident{{'}}"], ''The Atlantic''}} }}
Safety systems themselves are sometimes the added complexity which leads to this type of accident.<ref>''The Crash of ValuJet 592: Implications for Health Care'', J. Daniel Beckham, January '99. DOC file --> http://www.beckhamco.com/41articlescategory/054_crashofvalujet592.doc Mr. Beckham runs a health care consulting company, and this article is included on the company website. He writes, "Accidents at both Chernobyl and Three Mile Island were set off by failed safety systems."</ref> Maintenance problems are common with redundant systems. Maintenance crews can fail to restore a redundant system to active status. They are often overworked or maintenance is deferred due to budget cuts, because managers know that they system will continue to operate without fixing the backup system.<ref>Perrow (1999).</ref>
Safety systems themselves are sometimes the added complexity which leads to this type of accident.<ref>Perrow (1999)</ref>


== General characterization ==
== Characteristics and overview ==


In 2012 Charles Perrow wrote, "A normal accident [system accident] is where everyone tries very hard to play safe, but unexpected interaction of two or more failures (because of interactive complexity), causes a cascade of failures (because of tight coupling)."
In 2012 Charles Perrow wrote, "A normal accident [system accident] is where everyone tries very hard to play safe, but unexpected interaction of two or more failures (because of interactive complexity), causes a cascade of failures (because of tight coupling)." Perrow uses the term ''normal accident'' to emphasize that, given the current level of technology, such accidents are highly likely over a number of years or decades.<ref>{{cite magazine |last1=Perrow |first1=Charles |title=Getting to Catastrophe: Concentrations, Complexity and Coupling |url=https://www.themontrealreview.com/2009/Normal-Accidents-Living-with-High-Risk-Technologies.php |magazine=The Montréal Review |date=December 2012}}</ref> [[James Reason]] extended this approach with [[human reliability]]<ref>{{cite book|title=Human Error|last= Reason| first= James|publisher=[[Cambridge University Press]]|date=1990-10-26|isbn= 0-521-31419-4}}</ref> and the [[Swiss Cheese Model|Swiss cheese model]], now widely accepted in [[aviation safety]] and healthcare.
Charles Perrow uses the term '''[[Normal Accidents|normal accident]]''' to emphasize that, given the current level of technology, such accidents are highly likely over a number of years or decades.<ref>[http://www.themontrealreview.com/2009/Normal-Accidents-Living-with-High-Risk-Technologies.php '''GETTING TO CATASTROPHE''': CONCENTRATIONS, COMPLEXITY AND COUPLING], Charles Perrow, ''The Montréal Review'', December 2012.</ref>


These accidents often resemble [[Rube Goldberg device]]s in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an [[emergence|emergent]] disaster. Langewiesche writes about, "an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls."<ref name="langew atlantic"/> The more formality and effort to get it exactly right, at times can actually make failure more likely.<ref name="langew atlantic">{{cite news |last1=Langewiesche |first1=William |title=The Lessons of ValuJet 592 |url=https://www.theatlantic.com/magazine/archive/1998/03/the-lessons-of-valujet-592/306534/ |work=The Atlantic |date=1 March 1998 |language=en}}</ref>{{efn|See especially the last three paragraphs of this 30-plus-page ''Atlantic'' article: "... Understanding why might keep us from making the system even more complex, and therefore perhaps more dangerous, too."<ref name="langew atlantic"/>}} For example, employees are more likely to delay reporting any changes, problems, and unexpected conditions, wherever organizational procedures involved in adjusting to changing conditions are complex, difficult, or laborious.
[[James Reason|James T. Reason]] extended this approach with [[human reliability]]<ref>{{cite book|title=Human Error|last= Reason| first= James|publisher=[[Cambridge University Press]]|date=1990-10-26|isbn= 0-521-31419-4}}</ref> and the [[Swiss Cheese Model|Swiss cheese model]], now widely accepted in [[aviation safety]] and healthcare.


A contrasting idea is that of the [[high reliability organization]].<ref>{{cite journal |last1=Christianson |first1=Marlys K |last2=Sutcliffe |first2=Kathleen M |last3=Miller |first3=Melissa A |last4=Iwashyna |first4=Theodore J |title=Becoming a high reliability organization |journal=Critical Care |date=2011 |volume=15 |issue=6 |pages=314 |doi=10.1186/cc10360 |PMID=22188677 |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3388695/ |pmc=3388695}}</ref> In his assessment of the vulnerabilities of complex systems, [[Scott Sagan]], for example, discusses in multiple publications their robust reliability, especially regarding nuclear weapons. ''The Limits of Safety'' (1993) provided an extensive review of close calls during the [[Cold War]] that could have resulted in a nuclear war by accident.<ref>{{cite book|last1=Sagan|first1=Scott D.|title=The Limits of Safety: Organizations, Accidents, and Nuclear Weapons|date=1993|publisher=Princeton University Press|isbn=0-691-02101-5}}</ref>
There is an aspect of an animal devouring its own tail, in that more formality and effort to get it exactly right can actually make the situation worse.<ref name=LangewiescheOnValujet>Langewiesche, William (March 1998). [https://www.theatlantic.com/magazine/archive/1998/03/the-lessons-of-valujet-592/6534/4/ The Lessons of Valujet 592], ''The Atlantic''. See especially the last three paragraphs of this long article: “ . . . Understanding why might keep us from making the system even more complex, and therefore perhaps more dangerous, too.”</ref> For example, the more organizational riga-ma-role involved in adjusting to changing conditions, the more employees will likely delay reporting such changes, "problems," and unexpected conditions.


==System accident examples==
These accidents often resemble [[Rube Goldberg device]]s in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an [[emergence|emergent]] disaster.

William Langewiesche writes about, "an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls."<ref name=LangewiescheOnValujet/>

An opposing idea is that of the [[high reliability organization]].<ref>[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3388695/ Becoming a high reliability organization], ''Critical Care'', M. Christianson, K. Sutcliffe, et al., 8 Dec. 2011. Opposing concept. This is a concept which disagrees with that of system accident.</ref>

== Scott Sagan ==
[[Scott Sagan]] has multiple publications discussing the reliability of complex systems, especially regarding nuclear weapons. ''The Limits of Safety'' (1993) provided an extensive review of close calls during the [[Cold War]] that could have resulted in a nuclear war by accident.<ref>{{cite book|last1=Sagan|first1=Scott D.|title=The Limits of Safety: Organizations, Accidents, and Nuclear Weapons|date=1993|publisher=Princeton U. Pr.|isbn=0-691-02101-5}}</ref>

==Possible system accidents==
{{Multiple issues|section=yes|
{{Multiple issues|section=yes|
{{Over-quotation|date=June 2011}}
{{Over-quotation|date=June 2011}}
{{synthesis|date=June 2011}}}}
{{synthesis|date=June 2011}}}}
<!-- Regarding the issue of whether or not this article contains unpublished synthesis of previously published material . . .
<!-- Regarding the issue of whether or not this article contains unpublished synthesis of previously published material . . .
Charles Perrow directly states that both Apollo 13 and Three Mile Island were 'normal accidents' (this seems to be his preferred phrase, and I just don't remember how often he also uses the phrase 'system accident.' William Langewiesch directly states that the crash of Valujet 592 was a system accident.
Charles Perrow directly states that both Apollo 13 and Three Mile Island were 'normal accidents' (this seems to be his preferred phrase, and I just don't remember how often he also uses the phrase 'system accident.' William Langewiesch directly states that the crash of Valujet 592 was a system accident.
It's true that the excellent quote from the REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report") does not use the phase 'system accident,' but they come about as close as they can: "not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design." And the phrase 'system accident' may not have been in use in 1970 when they issued their report within months of the near disaster.-->
It's true that the excellent quote from the REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report") does not use the phase 'system accident,' but they come about as close as they can: "not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design." And the phrase 'system accident' may not have been in use in 1970 when they issued their report within months of the near disaster.-->


===[[Apollo 13]] space flight, 1970===
===Apollo 13===
{{Further|Apollo 13}}
{{Further|Apollo 13#Investigation and response}}


The Apollo 13 Review Board stated in the introduction to chapter five of their report: [emphasis added]<ref name="Apollo13 rb">[http://nssdc.gsfc.nasa.gov/planetary/lunar/apollo_13_review_board.txt "Chapter 5. Findings, Determinations, and Recommendations".] ''REPORT OF APOLLO 13 REVIEW BOARD'' ("Cortright Report")'' Chair Edgar M. Cortright.</ref>
'''Apollo 13 Review Board:'''
{{blockquote|<br/>" [Introduction] . . . It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather '''resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design''' [Emphasis added]. . . }}
{{poem quote|text=... It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather ''resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design''...
* (g): In reviewing these procedures before the flight, officials of NASA, ER, and Beech did not recognize the possibility of damage due to overheating. Many of these officials were not aware of the extended heater operation. In any event, adequate thermostatic switches might have been expected to protect the tank.}}


===Three Mile Island accident===
{{blockquote|"'''g.''' In reviewing these procedures before the flight, officials of NASA, ER, and Beech did not recognize the possibility of damage due to overheating. Many of these officials were not aware of the extended heater operation. In any event, adequate thermostatic switches might have been expected to protect the tank."<ref name=Apollo13ReviewBoard>[http://nssdc.gsfc.nasa.gov/planetary/lunar/apollo_13_review_board.txt REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report")], Chair Edgar M. Cortright, CHAPTER 5, FINDINGS, DETERMINATIONS, AND RECOMMENDATIONS.</ref>}}

===[[Three Mile Island accident|Three Mile Island]], 1979===
{{Further|Three Mile Island accident}}
{{Further|Three Mile Island accident}}
Perrow considered the Three Mile Island accident ''normal'':<ref>{{cite book |last1=Perrow |first1=Charles |editor1=David L. Sills |editor2=C. P. Wolf |editor3=Vivien B. Shelanski |title=Accident at Three Mile Island : The human dimensions |date=1982 |publisher=Westview Press |location=Boulder, Colorado, U.S |isbn=978-0-86531-165-7 |pages=173–184 |url=https://archive.org/details/accidentatthreem0000unse/page/172/mode/2up |chapter=16. The President's Commission and the Normal Accident |quote-page=173}}</ref>
{{bq|It resembled other accidents in nuclear plants and in other high risk, complex and highly interdependent operator-machine systems; none of the accidents were caused by management or operator ineptness or by poor government regulation, though these characteristics existed and should have been expected. I maintained that the accident was normal, because in complex systems there are bound to be multiple faults that cannot be avoided by planning and that operators cannot immediately comprehend.}}


===ValuJet Flight 592===
'''Charles Perrow:'''

"It resembled other accidents in nuclear plants and in other high risk, complex and highly interdependent operator-machine systems; none of the accidents were caused by management or operator ineptness or by poor government regulation, though these characteristics existed and should have been expected. I maintained that the accident was normal, because in complex systems there are bound to be multiple faults that cannot be avoided by planning and that operators cannot immediately comprehend."<ref>Perrow, C. (1982), [https://inis.iaea.org/search/search.aspx?orig_q=RN:13677929 Perrow's abstract for his chapter entitled "The President's Commission and the Normal Accident," in Sils, D., Wolf, C. and Shelanski, V. (Eds), ''Accident at Three Mile Island: The Human Dimensions'', Boulder, Colorado, U.S: Westview Press, 1982 pp.173–184.]</ref>

===[[Valujet Flight 592|ValuJet (AirTran) 592]], Everglades, 1996===
{{Further|ValuJet Flight 592}}
{{Further|ValuJet Flight 592}}


On May 11, 1996, ValuJet Airlines Flight 592, a regularly scheduled flight from Miami International to Hartsfield–Jackson Atlanta, crashed about 10 minutes after taking off as a result of a fire in the cargo compartment caused by improperly stored and labeled hazardous cargo. All 110 people on board died. The airline had a poor safety record before the crash. The accident brought widespread attention to the airline's management problems, including inadequate training of employees in proper handling of hazardous materials. The maintenance manual for the MD-80 aircraft documented the necessary procedures and was "correct" in a sense. However, it was so huge that it was neither helpful nor informative.<ref name=LangewiescheOnValujet/>
On May 11, 1996, [[Valujet Flight 592]], a regularly scheduled ValuJet Airlines flight from Miami International to Hartsfield–Jackson Atlanta, crashed about 10 minutes after taking off as a result of a fire in the cargo compartment caused by improperly stored and labeled hazardous cargo. All 110 people on board died. The airline had a poor safety record before the crash. The accident brought widespread attention to the airline's management problems, including inadequate training of employees in proper handling of hazardous materials. The maintenance manual for the MD-80 aircraft documented the necessary procedures and was "correct" in a sense. However, it was so huge that it was neither helpful nor informative.<ref name="langew atlantic"/>


===Financial crises and investment losses===
===2008 financial institution near-meltdown===
{{see also|Financial Crisis 2008}}
In a 2014 monograph, economist Alan Blinder stated that complicated financial instruments made it hard for potential investors to judge whether the price was reasonable. In a section entitled "Lesson # 6: Excessive complexity is not just anti-competitive, it's dangerous", he further stated, "But the greater hazard may come from opacity. When investors don't understand the risks that inhere in the securities they buy (examples: the mezzanine tranche of a [[CDO-Squared]]; a [[credit default swap|CDS]] on a [[synthetic CDO]]{{nbsp}}...), big mistakes can be made–especially if rating agencies tell you they are triple-A, to wit, safe enough for grandma. When the crash comes, losses may therefore be much larger than investors dreamed imaginable. Markets may dry up as no one knows what these securities are really worth. Panic may set in. Thus complexity ''per se'' is a source of risk."<ref>{{cite journal |last1=Blinder |first1=Alan S. |title=What Did We Learn from the Financial Crisis, the Great Recession, and the Pathetic Recovery? |journal=Griswold Center for Economic Policy Studies Working Papers |date=November 2014 |id=No. 243 |url=https://www.princeton.edu/~ceps/workingpapers/243blinder.pdf |publisher=Princeton University |quote-page=10}}</ref>


==Continuing challenges==
In a 2014 monograph, economist Alan Blinder stated that complicated financial instruments made it hard for potential investors to judge whether the price was reasonable. In a section entitled '''"Lesson # 6: Excessive complexity is not just anti-competitive, it's dangerous,"''' he further stated, "But the greater hazard may come from opacity. When investors don't understand the risks that inhere in the securities they buy (examples: the mezzanine tranche of a [[CDO-Squared]] ; a [[credit default swap|CDS]] on a [[synthetic CDO]],...), big mistakes can be made--especially if rating agencies tell you they are triple-A, to wit, safe enough for grandma. When the crash comes, losses may therefore be much larger than investors dreamed imaginable. Markets may dry up as no one knows what these securities are really worth. Panic may set in. Thus complexity ''per se'' is a source of risk."<ref>''What Did We Learn from the Financial Crisis, the Great Recession, and the Pathetic Recovery?'' (PDF file), Alan S. Blinder, Princeton University, Griswold Center for Economic Policy Studies, Working Paper No. 243, November 2014.</ref>
===Air transport safety===
Despite a significant increase in airplane safety since 1980s, there is concern that automated flight systems have become so complex that they both add to the risks that arise from overcomplication and are incomprehensible to the crews who must work with them. As an example, professionals in the aviation industry note that such systems sometimes switch or engage on their own; crew in the cockpit are not necessarily privy to the rationale for their auto-engagement, causing perplexity. Langewiesche cites industrial engineer [[Nadine Sarter]] who writes about "automation surprises," often related to system modes the pilot does not fully understand or that the system switches to on its own. In fact, one of the more common questions asked in cockpits today is, "What's it doing now?" In response to this, Langewiesche points to the fivefold increase in aviation safety and writes, "No one can rationally advocate a return to the glamour of the past."<ref name=Langewiesche-2014/>


In an article entitled "The Human Factor", Langewiesche discusses the 2009 crash of [[Air France Flight 447]] over the mid-Atlantic. He points out that, since the 1980s when the transition to automated cockpit systems began, safety has improved fivefold. Langwiesche writes, "In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers." He quotes engineer Earl Wiener who takes the humorous statement attributed to the Duchess of Windsor that one can never be too rich or too thin, and adds "or too careful about what you put into a digital flight-guidance system." Wiener says that the effect of automation is typically to reduce the workload when it is light, but to increase it when it's heavy.
==Possible future applications of concept==


Boeing Engineer Delmar Fadden said that once capacities are added to flight management systems, they become impossibly expensive to remove because of certification requirements. But if unused, may in a sense lurk in the depths unseen.<ref name=Langewiesche-2014>[https://www.vanityfair.com/news/business/2014/10/air-france-flight-447-crash The Human Factor], ''Vanity Fair'', William Langewiesche, September 17, 2014. "... pilots have been relegated to mundane roles as system managers{{nbsp}}... Since the 1980s, when the shift began, the safety record has improved fivefold, to the current one fatal accident for every five million departures. No one can rationally advocate a return to the glamour of the past."</ref>
===Five-fold increase in airplane safety since 1980s, but flight systems sometimes switch to unexpected "modes" on their own===
{{Rename section|date=April 2019|reason=This is an overly long sentence fragment}}
In an article entitled "The Human Factor", William Langewiesche talks the 2009 crash of [[Air France Flight 447]] over the mid-Atlantic. He points out that, since the 1980s when the transition to automated cockpit systems began, safety has improved fivefold. Langwiesche writes, "In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers." He quotes engineer Earl Wiener who takes the humorous statement attributed to the Duchess of Windsor that one can never be too rich or too thin, and adds "or too careful about what you put into a digital flight-guidance system." Wiener says that the effect of automation is typically to reduce the workload when it is light, but to increase it when it's heavy.


===Theory and practice interplay===
Boeing Engineer Delmar Fadden said that once capacities are added to flight management systems, they become impossibly expensive to remove because of certification requirements. But if unused, may in a sense lurk in the depths unseen.<ref name=Langewiesche-2014>[https://www.vanityfair.com/news/business/2014/10/air-france-flight-447-crash The Human Factor], ''Vanity Fair'', William Langewiesche, September 17, 2014. " . . . pilots have been relegated to mundane roles as system managers, . . . Since the 1980s, when the shift began, the safety record has improved fivefold, to the current one fatal accident for every five million departures. No one can rationally advocate a return to the glamour of the past."</ref>
Human factors in the implementation of safety procedures play a role in overall effectiveness of safety systems. Maintenance problems are common with redundant systems. Maintenance crews can fail to restore a redundant system to active status. They may be overworked, or maintenance deferred due to budget cuts, because managers know that they system will continue to operate without fixing the backup system.<ref>Perrow (1999)</ref> Steps in procedures may be changed and adapted in practice, from the formal safety rules, often in ways that seem appropriate and rational, and may be essential in meeting time constraints and work demands. In a 2004 ''Safety Science'' article, reporting on research partially supported by National Science Foundation and NASA, Nancy Leveson writes:<ref name="Leveson 2004">{{cite journal|url=http://sunnyday.mit.edu/accidents/safetyscience-single.pdf |title=A New Accident Model for Engineering Safer Systems |first=Nancy |last=Leveson |journal=Safety Science |volume=42 |issue=4 |date=April 2004 |quote=... In fact, a common way for workers to apply pressure to management without actually going out on strike is to 'work to rule,' which can lead to a breakdown in productivity and even chaos{{nbsp}}...}}

* Citing: {{cite book |last1=Rasmussen |first1=Jens |last2=Pejtersen |first2=Annelise Mark |last3=Goodstein |first3=L. P. |title=Cognitive systems engineering |date=1994 |publisher=Wiley |location=New York |isbn=978-0-471-01198-9}}</ref>
Langewiesche cites industrial engineer [[Nadine Sarter]] who writes about "automation surprises," often related to system modes the pilot does not fully understand or that the system switches to on its own. In fact, one of the more common questions asked in cockpits today is, "What's it doing now?" In response to this, Langewiesche again points to the fivefold increase in safety and writes, "No one can rationally advocate a return to the glamour of the past."<ref name=Langewiesche-2014/>
{{bq|However, instructions and written procedures are almost never followed exactly as operators strive to become more efficient and productive and to deal with time pressures{{nbsp}}... even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found and the violation of rules appears to be quite rational, given the actual workload and timing constraints under which the operators must do their job. In these situations, a basic conflict exists between error as seen as a deviation from the ''normative procedure'' and error as seen as a deviation from the rational and normally used ''effective procedure''.}}

===Healthier interplay between theory and practice in which safety rules are sometimes changed?===
{{Rename section|date=April 2019|reason=Section headings shouldn't be in the form of a question}}
<blockquote>
From the article "A New Accident Model for Engineering Safer Systems," by Nancy Leveson, in ''Safety Science'', April 2004:<br/>"However, instructions and written procedures are almost never followed exactly as operators strive to become more efficient and productive and to deal with time pressures. . . . . even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found and the violation of rules appears to be quite rational, given the actual workload and timing constraints under which the operators must do their job. In these situations, a basic conflict exists between error as seen as a deviation from the ''normative procedure'' and error as seen as a deviation from the rational and normally used ''effective procedure'' (Rasmussen and Pejtersen, 1994)."<ref name=Nancy-Leveson-April-2004>[http://sunnyday.mit.edu/accidents/safetyscience-single.pdf A New Accident Model for Engineering Safer Systems], Nancy Leveson, ''Safety Science'', Vol. 42, No. 4, April 2004. Paper based on research partially supported by National Science Foundation and NASA. " . . In fact, a common way for workers to apply pressure to management without actually going out on strike is to 'work to rule,' which can lead to a breakdown in productivity and even chaos. . "</ref>
</blockquote>


==See also==
==See also==
* [[Unintended consequences]]
* [[Unintended consequences]]


==References==
==Notes==
{{notelist}}
* <!-- Charles Perrow (1984) Normal accidents : living with high-risk technologies-->{{cite Q|Q114963622}}

* <!-- Charles Perrow (1999) Normal accidents : living with high-risk technologies : with a new afterword and a postscript on the Y2K problem-->{{cite Q|Q114963670}}
=== Sources ===
* <!-- Charles Perrow (1984) Normal accidents: Living with high-risk technologies-->{{cite Q|Q114963622}}
* <!-- Charles Perrow (1999) Normal accidents: Living with high-risk technologies with a new afterword and a postscript on the Y2K problem-->{{cite Q|Q114963670}}


== Notes ==
===References===
{{Reflist}}
{{Reflist}}


== Further reading==
== Further reading==
{{ref begin}}
* {{cite book| last =Cooper| first =Alan| title =The Inmates Are Running The Asylum: Why High Tech Products Drive Us Crazy and How To Restore The Sanity| publisher =Sams - [[Pearson Education]] | date =2004-03-05| location =Indianapolis| isbn =0-672-31649-8}}
* {{cite news |last1=Beckham |first1=J. Daniel |title=The Crash of ValuJet 592 |work=Quality |publisher=The Beckham Company |url=http://www.beckhamco.com/41articlescategory.html |date=January 1999 |archive-date=4 March 2016 |archive-url=https://web.archive.org/web/20160304043031/http://www.beckhamco.com/41articlescategory/054_crashofvalujet592.doc |quote=Accidents at both Chernobyl and Three Mile Island were set off by failed safety systems. |id=Article from a health care consulting company}} [http://www.beckhamco.com/41articlescategory/054_crashofvalujet592.doc Direct article download]
* Gross, Michael Joseph (May 29, 2015). [https://www.vanityfair.com/culture/2015/05/life-and-death-at-cirque-du-soleil Life and Death at Cirque du Soleil], This ''Vanity Fair'' article states: " . . . A system accident is one that requires many things to go wrong in a cascade. Change any element of the cascade and the accident may well not occur, but every element shares the blame. . . "
* {{cite book| last=Cooper| first=Alan| title=The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity| publisher=Sams; [[Pearson Education]] | date=2004-03-05| location=Indianapolis|isbn=0-672-31649-8}}
*{{cite journal| last =Helmreich| first =Robert L.| title =Anatomy of a system accident: The crash of Avianca Flight 052| journal =[[International Journal of Aviation Psychology]]| volume =4| issue =3| pages =265–284| year =1994| doi =10.1207/s15327108ijap0403_4| pmid =11539174}}
* Gross, Michael Joseph (May 29, 2015). [https://www.vanityfair.com/culture/2015/05/life-and-death-at-cirque-du-soleil "Life and Death at Cirque du Soleil"], This ''Vanity Fair'' article states: "... A system accident is one that requires many things to go wrong in a cascade. Change any element of the cascade and the accident may well not occur, but every element shares the blame{{nbsp}}..."
* {{cite journal |last=Helmreich |first=Robert L. |title=Anatomy of a system accident: The crash of Avianca Flight 052 |journal=[[International Journal of Aviation Psychology]] |volume=4 |issue=3 |pages=265–284 |date=1994 |doi =10.1207/s15327108ijap0403_4 |pmid=11539174}}
* {{cite journal|last=Hopkins |first=Andrew |title=Was Three Mile Island A Normal Accident? |journal=[[Journal of Contingencies and Crisis Management]] |volume=9 |issue=2 |pages=65–72 |date=June 2001 |url=http://regnet.anu.edu.au/program/review/Publications/HopkinsP3.pdf |access-date=2008-03-06 |doi=10.1111/1468-5973.00155 |url-status=dead |archive-url=https://web.archive.org/web/20070829211912/http://regnet.anu.edu.au/program/review/Publications/HopkinsP3.pdf |archive-date=August 29, 2007 }}
* {{cite journal|last=Hopkins |first=Andrew |title=Was Three Mile Island A Normal Accident? |journal=[[Journal of Contingencies and Crisis Management]] |volume=9 |issue=2 |pages=65–72 |date=June 2001 |url=http://regnet.anu.edu.au/program/review/Publications/HopkinsP3.pdf |access-date=2008-03-06 |doi=10.1111/1468-5973.00155 |url-status=dead |archive-url=https://web.archive.org/web/20070829211912/http://regnet.anu.edu.au/program/review/Publications/HopkinsP3.pdf |archive-date=August 29, 2007 }}
*''Beyond Engineering: A New Way of Thinking About Technology'', Todd La Prote, Karlene Roberts, and Gene Rochlin, Oxford University Press, 1997. This book provides counter-examples of complex systems which have good safety records.
*''Beyond Engineering: A New Way of Thinking About Technology'', Todd La Prote, Karlene Roberts, and Gene Rochlin, Oxford University Press, 1997. This book provides counter-examples of complex systems which have good safety records.
* Pidgeon, Nick (Sept. 22, 2011). "In retrospect: Normal accidents," ''Nature''.
* Pidgeon, Nick (September 22, 2011). "In retrospect: Normal accidents". ''Nature''.
* {{cite document| first = Charles| last = Perrow| authorlink = Charles Perrow| title = Organizationally Induced Catastrophes| publisher = [[University Corporation for Atmospheric Research]]| version = Institute for the Study of Society and Environment| date = May 29, 2000| url =http://www.isse.ucar.edu/extremes/papers/perrow.PDF | accessdate =February 6, 2009}}
* {{citation| first=Charles| last=Perrow| author-link=Charles Perrow| title=Organizationally Induced Catastrophes| publisher = [[University Corporation for Atmospheric Research]]; Institute for the Study of Society and Environment| date =29 May 2000| url=http://www.isse.ucar.edu/extremes/papers/perrow.PDF |archive-date=5 December 2008 |archive-url=https://web.archive.org/web/20181205185152/http://www.isse.ucar.edu:80/extremes/papers/perrow.PDF | access-date=February 6, 2009}}
* Roush, Wade Edmund. CATASTROPHE AND CONTROL: HOW TECHNOLOGICAL DISASTERS ENHANCE DEMOCRACY, Ph.D. Dissertation, Massachusetts Institute of Technology, 1994, page 15. ' . . ''Normal Accidents'' is essential reading today for industrial managers, organizational sociologists, historians of technology, and interested lay people alike, because it shows that a major strategy engineers have used in this century to keep hazardous technologies under control—multiple layers of "fail-safe" backup devices—often adds a dangerous level of unpredictability to the system as a whole. . '
* {{cite thesis|last=Roush |first=Wade Edmund |title=Catastrophe and Control: How Technological Disasters Enhance Democracy |type=Ph.D. dissertation |publisher=Massachusetts Institute of Technology |date=1994 |quote-page=15 |quote=''Normal Accidents'' is essential reading today for industrial managers, organizational sociologists, historians of technology, and interested lay people alike, because it shows that a major strategy engineers have used in this century to keep hazardous technologies under control—multiple layers of 'fail-safe' backup devices—often adds a dangerous level of unpredictability to the system as a whole{{nbsp}}...}}
*{{cite news | title =Test shows oxygen canisters sparking intense fire| work =[[CNN|CNN.com]]| date =1996-11-19| url =http://www.cnn.com/US/9611/19/valujet.final| accessdate =2008-03-06 }}
*{{cite news | title=Test shows oxygen canisters sparking intense fire| work=[[CNN]] |date =1996-11-19 | url =http://www.cnn.com/US/9611/19/valujet.final |access-date =2008-03-06}}
* {{cite book| last =Wallace| first =Brendan| title =Beyond Human Error| publisher =CRC Press| date =2009-03-05| location =Florida| isbn =978-0-8493-2718-6}}
* {{cite book| last=Wallace| first=Brendan| title =Beyond Human Error| publisher=CRC Press| date=2009-03-05| location =Florida| isbn=978-0-8493-2718-6}}
{{ref end}}


[[Category:Safety engineering]]
[[Category:Safety engineering]]

Revision as of 12:22, 23 July 2023

A system accident (or normal accident) is an "unanticipated interaction of multiple failures" in a complex system.[1] This complexity can either be of technology or of human organizations and is frequently both. A system accident can be easy to see in hindsight, but extremely difficult in foresight because there are simply too many action pathways to seriously consider all of them. Charles Perrow first developed these ideas in the mid-1980s.[2]

Pilot and author William Langewiesche used Perrow's concept in his analysis of the factors at play in a 16 aviation disaster. He wrote in The Atlantic in 1998: "the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur."[3][a] Safety systems themselves are sometimes the added complexity which leads to this type of accident.[4]

Characteristics and overview

In 2012 Charles Perrow wrote, "A normal accident [system accident] is where everyone tries very hard to play safe, but unexpected interaction of two or more failures (because of interactive complexity), causes a cascade of failures (because of tight coupling)." Perrow uses the term normal accident to emphasize that, given the current level of technology, such accidents are highly likely over a number of years or decades.[5] James Reason extended this approach with human reliability[6] and the Swiss cheese model, now widely accepted in aviation safety and healthcare.

These accidents often resemble Rube Goldberg devices in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an emergent disaster. Langewiesche writes about, "an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls."[3] The more formality and effort to get it exactly right, at times can actually make failure more likely.[3][b] For example, employees are more likely to delay reporting any changes, problems, and unexpected conditions, wherever organizational procedures involved in adjusting to changing conditions are complex, difficult, or laborious.

A contrasting idea is that of the high reliability organization.[7] In his assessment of the vulnerabilities of complex systems, Scott Sagan, for example, discusses in multiple publications their robust reliability, especially regarding nuclear weapons. The Limits of Safety (1993) provided an extensive review of close calls during the Cold War that could have resulted in a nuclear war by accident.[8]

System accident examples

Apollo 13

The Apollo 13 Review Board stated in the introduction to chapter five of their report: [emphasis added][9]

... It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design...

  • (g): In reviewing these procedures before the flight, officials of NASA, ER, and Beech did not recognize the possibility of damage due to overheating. Many of these officials were not aware of the extended heater operation. In any event, adequate thermostatic switches might have been expected to protect the tank.

Three Mile Island accident

Perrow considered the Three Mile Island accident normal:[10]

It resembled other accidents in nuclear plants and in other high risk, complex and highly interdependent operator-machine systems; none of the accidents were caused by management or operator ineptness or by poor government regulation, though these characteristics existed and should have been expected. I maintained that the accident was normal, because in complex systems there are bound to be multiple faults that cannot be avoided by planning and that operators cannot immediately comprehend.

ValuJet Flight 592

On May 11, 1996, Valujet Flight 592, a regularly scheduled ValuJet Airlines flight from Miami International to Hartsfield–Jackson Atlanta, crashed about 10 minutes after taking off as a result of a fire in the cargo compartment caused by improperly stored and labeled hazardous cargo. All 110 people on board died. The airline had a poor safety record before the crash. The accident brought widespread attention to the airline's management problems, including inadequate training of employees in proper handling of hazardous materials. The maintenance manual for the MD-80 aircraft documented the necessary procedures and was "correct" in a sense. However, it was so huge that it was neither helpful nor informative.[3]

Financial crises and investment losses

In a 2014 monograph, economist Alan Blinder stated that complicated financial instruments made it hard for potential investors to judge whether the price was reasonable. In a section entitled "Lesson # 6: Excessive complexity is not just anti-competitive, it's dangerous", he further stated, "But the greater hazard may come from opacity. When investors don't understand the risks that inhere in the securities they buy (examples: the mezzanine tranche of a CDO-Squared; a CDS on a synthetic CDO ...), big mistakes can be made–especially if rating agencies tell you they are triple-A, to wit, safe enough for grandma. When the crash comes, losses may therefore be much larger than investors dreamed imaginable. Markets may dry up as no one knows what these securities are really worth. Panic may set in. Thus complexity per se is a source of risk."[11]

Continuing challenges

Air transport safety

Despite a significant increase in airplane safety since 1980s, there is concern that automated flight systems have become so complex that they both add to the risks that arise from overcomplication and are incomprehensible to the crews who must work with them. As an example, professionals in the aviation industry note that such systems sometimes switch or engage on their own; crew in the cockpit are not necessarily privy to the rationale for their auto-engagement, causing perplexity. Langewiesche cites industrial engineer Nadine Sarter who writes about "automation surprises," often related to system modes the pilot does not fully understand or that the system switches to on its own. In fact, one of the more common questions asked in cockpits today is, "What's it doing now?" In response to this, Langewiesche points to the fivefold increase in aviation safety and writes, "No one can rationally advocate a return to the glamour of the past."[12]

In an article entitled "The Human Factor", Langewiesche discusses the 2009 crash of Air France Flight 447 over the mid-Atlantic. He points out that, since the 1980s when the transition to automated cockpit systems began, safety has improved fivefold. Langwiesche writes, "In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers." He quotes engineer Earl Wiener who takes the humorous statement attributed to the Duchess of Windsor that one can never be too rich or too thin, and adds "or too careful about what you put into a digital flight-guidance system." Wiener says that the effect of automation is typically to reduce the workload when it is light, but to increase it when it's heavy.

Boeing Engineer Delmar Fadden said that once capacities are added to flight management systems, they become impossibly expensive to remove because of certification requirements. But if unused, may in a sense lurk in the depths unseen.[12]

Theory and practice interplay

Human factors in the implementation of safety procedures play a role in overall effectiveness of safety systems. Maintenance problems are common with redundant systems. Maintenance crews can fail to restore a redundant system to active status. They may be overworked, or maintenance deferred due to budget cuts, because managers know that they system will continue to operate without fixing the backup system.[13] Steps in procedures may be changed and adapted in practice, from the formal safety rules, often in ways that seem appropriate and rational, and may be essential in meeting time constraints and work demands. In a 2004 Safety Science article, reporting on research partially supported by National Science Foundation and NASA, Nancy Leveson writes:[14]

However, instructions and written procedures are almost never followed exactly as operators strive to become more efficient and productive and to deal with time pressures ... even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found and the violation of rules appears to be quite rational, given the actual workload and timing constraints under which the operators must do their job. In these situations, a basic conflict exists between error as seen as a deviation from the normative procedure and error as seen as a deviation from the rational and normally used effective procedure.

See also

Notes

  1. ^ In the same article, Langewiesche continued: [emphasis added][3]

    Charles Perrow's thinking is more difficult for pilots like me to accept. Perrow came unintentionally to his theory about normal accidents after studying the failings of large organizations. His point is not that some technologies are riskier than others, which is obvious, but that the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur. Those failures will occasionally combine in unforeseeable ways, and if they induce further failures in an operating environment of tightly interrelated processes, the failures will spin out of control, defeating all interventions.

    — William Langewiesche (March 1998), "The Lessons of Valujet 592", p. 23 [Section: "A 'Normal Accident'"], The Atlantic
  2. ^ See especially the last three paragraphs of this 30-plus-page Atlantic article: "... Understanding why might keep us from making the system even more complex, and therefore perhaps more dangerous, too."[3]

Sources

  • Charles Perrow (1984). Normal accidents : living with high-risk technologies. ISBN 0-465-05143-X. Wikidata Q114963622.
  • Charles Perrow (1999). Normal accidents : living with high-risk technologies : with a new afterword and a postscript on the Y2K problem. Princeton University Press. ISBN 0-691-00412-9. Wikidata Q114963670.

References

  1. ^ Perrow (1999), p. 70
  2. ^ Perrow (1984)
  3. ^ a b c d e f Langewiesche, William (1 March 1998). "The Lessons of ValuJet 592". The Atlantic.
  4. ^ Perrow (1999)
  5. ^ Perrow, Charles (December 2012). "Getting to Catastrophe: Concentrations, Complexity and Coupling". The Montréal Review.
  6. ^ Reason, James (1990-10-26). Human Error. Cambridge University Press. ISBN 0-521-31419-4.
  7. ^ Christianson, Marlys K; Sutcliffe, Kathleen M; Miller, Melissa A; Iwashyna, Theodore J (2011). "Becoming a high reliability organization". Critical Care. 15 (6): 314. doi:10.1186/cc10360. PMC 3388695. PMID 22188677.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  8. ^ Sagan, Scott D. (1993). The Limits of Safety: Organizations, Accidents, and Nuclear Weapons. Princeton University Press. ISBN 0-691-02101-5.
  9. ^ "Chapter 5. Findings, Determinations, and Recommendations". REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report") Chair Edgar M. Cortright.
  10. ^ Perrow, Charles (1982). "16. The President's Commission and the Normal Accident". In David L. Sills; C. P. Wolf; Vivien B. Shelanski (eds.). Accident at Three Mile Island : The human dimensions. Boulder, Colorado, U.S: Westview Press. pp. 173–184. ISBN 978-0-86531-165-7.
  11. ^ Blinder, Alan S. (November 2014). "What Did We Learn from the Financial Crisis, the Great Recession, and the Pathetic Recovery?" (PDF). Griswold Center for Economic Policy Studies Working Papers. Princeton University. No. 243.
  12. ^ a b The Human Factor, Vanity Fair, William Langewiesche, September 17, 2014. "... pilots have been relegated to mundane roles as system managers ... Since the 1980s, when the shift began, the safety record has improved fivefold, to the current one fatal accident for every five million departures. No one can rationally advocate a return to the glamour of the past."
  13. ^ Perrow (1999)
  14. ^ Leveson, Nancy (April 2004). "A New Accident Model for Engineering Safer Systems" (PDF). Safety Science. 42 (4). ... In fact, a common way for workers to apply pressure to management without actually going out on strike is to 'work to rule,' which can lead to a breakdown in productivity and even chaos ...
    • Citing: Rasmussen, Jens; Pejtersen, Annelise Mark; Goodstein, L. P. (1994). Cognitive systems engineering. New York: Wiley. ISBN 978-0-471-01198-9.

Further reading