User:Avi Harel/Resilience engineering
-
Resilience Engineering
Resilience engineering is an emergent discipline, formed by integration of concepts and techniques of three disciplines:
- Systems Engineering, focusing primarily of providing functionality and performance
- Software Engineering, focusing on the design, development, implementation and maintenance of the system software
- Cognitive Engineering , focusing on the operator's performance and on the operator's role in resolving complex situations.
Resilience engineering plays a key role in Safety Engineering.
The integration of the disciplines is described in the interactive guide for resilience assurance.
Resilience-oriented Resilience Engineering (ROSE)
[edit]The methods for resilience assurance are integrated in the traditional cycle of proactive and reactive system development.
Design
[edit]Methodology
[edit]The interactive guide proposes an iterative approach to resilience assurance, combining two leading approaches (learning cycles ...):
- The proactive approach, intended to prevent failure by design
- The reactive approach, intended to assure learning from incidences.
The iterative approach to system design is described here ...
Content
[edit]The requirement specification includes lists of hazards, defense add-ons, interaction styles, Resilience modules ... and a description of the operational rules.
The top-level design is based on a resilience-oriented system architecture ...
The unit design targets assuring error prevention by specialized control and supervision stations, detection of component fault operator's slips and mistakes by component-level add-ons, and of unexpected activity, based on the operational rules.
Architecture
[edit]A key feature in Proactive resilience assurance is a resilience-oriented architecture, which extends the functional unit by special add-ons.
Testing
[edit]Key features
[edit]- Iterative assurance
Subjects
[edit]- Alarm generation ... - towards zero missed alarms, and minimum improper alarms
- Alarm perception ... - the awareness of the operators about new operational risks .
- Recovery - the system activity during the transition from exceptional or unpredictable situations to normal operation
Goals
[edit]- Unit testing - verify that the system can identify any component fault and any rule violation
- Integration - test the system architecture, to ensure that all the resilience features work as intended.
- Verification - ensure that operators can tackle all the expected situations involved in failure modes, including those related to exceptional situations .
- Validation - graying out black swans: ensure that the system captures some unpredictable situations and events, such as unexpected operational errors : identify instances in which the system resilience does not comply with the expectations of the stakeholders.