Jump to content

User:Avi Harel/Resilience engineering

From Wikipedia, the free encyclopedia

Resilience engineering is an emergent discipline, formed by integration of concepts and techniques of three disciplines:

  • Systems Engineering, focusing primarily of providing functionality and performance
  • Software Engineering, focusing on the design, development, implementation and maintenance of the system software
  • Cognitive Engineering , focusing on the operator's performance and on the operator's role in resolving complex situations.

Resilience engineering plays a key role in Safety Engineering.

The integration of the disciplines is described in the interactive guide for resilience assurance.

Resilience-oriented Resilience Engineering (ROSE)

[edit]

The methods for resilience assurance are integrated in the traditional cycle of proactive and reactive system development.

Design

[edit]

Methodology

[edit]

The interactive guide proposes an iterative approach to resilience assurance, combining  two leading approaches (learning cycles ...):

  • The proactive approach, intended to prevent failure by design
  • The reactive approach, intended to assure learning from incidences.

The iterative approach to system design is described here ...

Content

[edit]

The requirement specification includes lists of hazards, defense add-ons,  interaction styles, Resilience modules ... and a description of the operational rules.

The top-level design is based on a resilience-oriented system architecture ...

The unit design targets assuring error prevention by specialized control and supervision stations, detection of component fault operator's slips and mistakes by component-level add-ons, and of unexpected activity, based on the operational rules.

Architecture

[edit]

A key feature in Proactive resilience assurance is a resilience-oriented architecture, which extends the functional unit by special add-ons.

Testing

[edit]

Key features

[edit]
  • Iterative assurance

Subjects

[edit]
  • Alarm generation ...  - towards zero missed alarms, and minimum improper alarms
  • Alarm perception ...  - the awareness of the operators about new operational risks .
  • Recovery - the system activity during the transition from exceptional or unpredictable situations to normal operation

Goals

[edit]
  • Unit testing - verify that the system can identify any component fault and any rule violation
  • Integration - test the system architecture, to ensure that all the resilience features work as intended.
  • Verification - ensure that operators can tackle all the expected situations involved in failure modes, including those related to exceptional situations .
  • Validation - graying out black swans: ensure that the system captures some unpredictable situations and events, such as unexpected  operational errors : identify instances in which the system resilience does not comply with the expectations of the stakeholders.