Jump to content

Safety-critical system

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Flamminifra (talk | contribs) at 08:51, 5 June 2008 (added internal link to the International Journal of Critical Computer-Based Systems). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A life-critical system or safety-critical system is a system whose failure or malfunction may result in:

  • death or serious injury to people, or
  • loss or severe damage to equipment or
  • environmental harm.

Risks of this sort are usually managed with the methods and tools of safety engineering. A life-critical system is designed to lose less than one life per billion (109) hours of operation.[1] Typical design methods include probabilistic risk assessment, a method that combines failure modes and effects analysis with fault tree analysis. Safety-critical systems are increasingly computer-based.

Reliability regimes

Several reliability regimes for life-critical systems exist:

  • Fail-operational systems continue to operate when they fail. Examples of these include elevators, the gas thermostats in most home furnaces, and passively safe nuclear reactors. Fail-operational mode is sometimes unsafe. Nuclear weapons launch-on-loss-of-communications was rejected as a control system for the U.S. nuclear forces because it is fail-operational: a loss of communications would cause launch, so this mode of operation was considered too risky.
  • Fail-safe systems become safe when they cannot operate. Many medical systems fall into this category. For example, an infusion pump can fail, and as long as it complains to the nurse and ceases pumping, it will not threaten loss of life because its safety interval is long enough to permit a human response. In a similar vein, an industrial or domestic burner controller (of which there are thousands in our homes and workplaces all with explosive and poisoning capabilities), can fail, but must fail in a safe mode (i.e. turn combustion off when they detect faults). Famously, nuclear weapon systems that launch-on-command are fail-safe, because if the communications systems fail, launch cannot be commanded. Railway signalling is designed to be fail-safe.
  • Fail-secure systems maintain maximum security when they can not operate. For example, while fail-safe electronic doors unlock during power failures, fail-secure ones lock, possibly trapping people in a burning building.
  • Fault-tolerant systems continue to operate correctly when subsystems operate incorrectly. Some examples include autopilots on commercial aircraft, and control systems for ordinary nuclear reactors. The normal method to tolerate faults is to have several computers continually test the parts of a system, and switch in hot spares for failing subsystems. As long as faulty subsystems are replaced or repaired at normal maintenance intervals, these systems have excellent safety. Interestingly, the computers, power supplies and control terminals used by human beings must all be duplicated in these systems in some fashion.

Software engineering for life-critical systems

Software engineering for life-critical systems is particularly difficult, but the avionics industry has succeeded in producing standard methods for producing life-critical avionics software. The standard approach is to carefully code, inspect, document, test, verify and analyse the system. Another approach is to certify a production system, a compiler, and then generate the system's code from specifications. Another approach uses formal methods to generate proofs that the code meets requirements. All of these approaches improve the software quality in safety-critical systems by testing or eliminating manual steps in the development process, because people make mistakes, and these mistakes are the most common cause of potential life-threatening errors.

Examples of life-critical systems

Infrastructure

Medicine

The technology requirements can go beyond avoidance of failure, and can even facilitate medical intensive care (which deals with healing patients), and also life support (which is for stabilizing patients).

Nuclear engineering

Recreation

Transport

Automotive

Aviation

Spaceflight

See also

References

  1. ^ AC 25.1309-1A