Resilient control systems
In our modern society, computerized or digital control systems have been used to reliably automate many of the industrial operations that we take for granted, from the power plant to the automobiles we drive. However, the complexity of these systems and how the designers integrate them, the roles and responsibilities of the humans that interact with the systems, and the cyber security of these highly networked systems has led to a new paradigm in research philosophy for next generation control systems. Resilient Control Systems consider all of these elements and those disciplines that contribute to a more effective design, such as cognitive psychology, computer science, and control engineering to develop interdisciplinary solutions. These solutions consider such things such as how to tailor the control system operating displays to best enable the user to make an accurate and reproducible response, how to design in cyber security protections such that the system defends itself from attack by changing its behaviors, and how to better integrate widely distributed computer control systems to prevent cascading failures that result in disruptions to critical industrial operations. In the context of cyber-physical systems, resilient control systems are an aspect that focuses on the unique interdependencies of a control system, as compared to information technology computer systems and networks, due to its importance in operating our critical industrial operations.
- 1 Introduction
- 2 Defining resilience
- 3 Areas Of resilience
- 4 Base Metrics for Resilient Control Systems
- 5 Resilience Manifold for Design and Operation
- 6 Examples of Resilient Control System Developments
- 7 Resilient Control System Solutions and the Need for Interdisciplinary Education
- 8 Standardizing Resilience and Resilient Control System Principles
- 9 Notes
- 10 References
Originally intended to provide a more efficient mechanism for controlling industrial operations, the development of digital control systems allowed for flexibility in integrating distributed sensors and operating logic while maintaining a centralized interface for human monitoring and interaction. This ease of readily adding sensors and logic through software, which was once done with relays and isolated analog instruments, has led to wide acceptance and integration of these systems in all industries. However, these digital control systems have often been integrated in phases to cover different aspects of an industrial operation, connected over a network, and leading to a complex interconnected and interdependent system. While the control theory applied is often nothing more than a digital version of their analog counterparts, the dependence of digital control systems upon the communications networks, has precipitated the need for cybersecurity due to potential effects on confidentiality, integrity and availability of the information. To achieve resilience in the next generation of control systems, therefore, addressing the complex control system interdependencies, including the human systems interaction and cyber security, will be a recognized challenge.
Research in resilience engineering over the last decade has focused in two areas, organizational and information technology. Organizational resilience considers the ability of an organization to adapt and survive in the face of threats, including the prevention or mitigation of unsafe, hazardous or compromising conditions that threaten its very existence. Information technology resilience has been considered from a number of standpoints . Networking resilience has been considered as quality of service. Computing has considered such issues as dependability and performance in the face of unanticipated changes . However, based upon the application of control dynamics to industrial processes, functionality and determinism are primary considerations that are not captured by the traditional objectives of information technology. .
Considering the paradigm of control systems, one definition has been suggested that "Resilient control systems are those that tolerate fluctuations via their structure, design parameters, control structure and control parameters". However, this definition is taken from the perspective of control theory application to a control system. The consideration of the malicious actor and cyber security are not directly considered, which might suggest the definition, "an effective reconstitution of control under attack from intelligent adversaries," which was proposed. However, this definition focuses only on resilience in response to a malicious actor. To consider the cyber-physical aspects of control system, a definition for resilience considers both benign and malicious human interaction, in addition to the complex interdependencies of the control system application .
The use of the term “recovery” has been used in the context of resilience, paralleling the response of a rubber ball to stay intact when a force is exerted on it and recover its original dimensions after the force is removed. Considering the rubber ball in terms of a system, resilience could then be defined as its ability to maintain a desired level of performance or normalcy without irrecoverable consequences. While resilience in this context is based upon the yield strength of the ball, control systems require an interaction with the environment, namely the sensors, valves, pumps that make up the industrial operation. To be reactive to this environment, control systems require an awareness of its state to make corrective changes to the industrial process to maintain normalcy. With this in mind, in consideration of the discussed cyber-physical aspects of human systems integration and cyber security, as well as other definitions for resilience at a broader critical infrastructure level, the following can be deduced as a definition of a resilient control system:
- "A resilient control system is one that maintains state awareness and an accepted level of operational normalcy in response to disturbances, including threats of an unexpected and malicious nature"
Considering the flow of a digital control system as a basis, a resilient control system framework can be designed. Referring to the left side of Fig. 1, a resilient control system holistically considers the measures of performance or normalcy for the state space. At the center, an understanding of performance and priority provide the basis for an appropriate response by a combination of human and automation, embedded within a multi-agent, semi-autonomous framework. Finally, to the right, information must be tailored to the consumer to address the need and position a desirable response. Several examples or scenarios of how resilience differs and provides benefit to control system design are available in the literature.
Areas Of resilience
Some primary tenets of resilience, as contrasted to traditional reliability, have presented themselves in considering an integrated approach to resilient control systems. These cyber-physical tenants complement the fundamental concept of dependable or reliable computing by characterizing resilience in regard to control system concerns, including design considerations that provide a level of understanding and assurance in the safe and secure operation of an industrial facility. These tenants are discussed individually below to summarize some of the challenges to address in order to achieve resilience.
The benign human has an ability to quickly understand novel solutions, and provide the ability to adapt to unexpected conditions. This behavior can provide additional resilience to a control system, but reproducibly predicting human behavior is a continuing challenge. The ability to capture historic human preferences can be applied to bayesian inference and bayesian belief networks, but ideally a solution would consider direct understanding of human state using sensors such as an EEG. Considering control system design and interaction, the goal would be to tailor the amount of automation necessary to achieve some level of optimal resilience for this mixed initiative response. Presented to the human would be that actionable information that provides the basis for a targeted, reproducible response.
In contrast to the challenges of prediction and integration of the benign human with control systems, the abilities of the malicious actor (or hacker) to undermine desired control system behavior also create a significant challenge to control system resilience. Application of dynamic probabilistic risk analysis used in human reliability can provide some basis for the benign actor. However, the decidedly malicious intentions of an adversarial individual, organization or nation make the modeling of the human variable in both objectives and motives. However, in defining a control system response to such intentions, the malicious actor looks forward to some level of recognized behavior to gain an advantage and provide a pathway to undermining the system. Whether performed separately in preparation for a cyber attack, or on the system itself, these behaviors can provide opportunity for a successful attack without detection. Therefore, in considering resilient control system architecture, atypical designs that imbed active and passively implemented randomization of attributes, would be suggested to reduce this advantage.
Complex networks and networked control systems
While much of the current critical infrastructure is controlled by a web of interconnected control systems, either architecture termed as distributed control systems (DCS) or supervisory control and data acquisition (SCADA), the application of control is moving toward a more decentralized state. In moving to a smart grid, the complex interconnected nature of individual homes, commercial facilities and diverse power generation and storage creates an opportunity and a challenge to ensuring that the resulting system is more resilient to threats. The ability to operate these systems to achieve a global optimum for multiple considerations, such as overall efficiency, stability and security, will require mechanisms to holistically design complex networked control systems. Multi-agent methods suggest a mechanism to tie a global objective to distributed assets, allowing for management and coordination of assets for optimal benefit and semi-autonomous, but constrained controllers that can react rapidly to maintain resilience for rapidly changing conditions.
Base Metrics for Resilient Control Systems
Establishing a metric that can capture the resilience attributes can be complex, at least if considered based upon differences between the interactions or interdependencies. Evaluating the control, cyber and cognitive disturbances, especially if considered from a disciplinary standpoint, leads to measures that already had been established. However, if the metric were instead based upon a normalizing dynamic attribute, such a performance characteristic that can be impacted by degradation, an alternative is suggested. Specifically, applications of base metrics to resilience characteristics are given as follows for type of disturbance:
- Physical Disturbances:
- Time Latency Affecting Stability
- Data Integrity Affecting Stability
- Cyber Disturbances:
- Time Latency
- Data Confidentiality, Integrity and Availability
- Cognitive Disturbances:
- Time Latency in Response
- Data Digression from Desired Response
Such performance characteristics exist with both time and data integrity. Time, both in terms of delay of mission and communications latency, and data, in terms of corruption or modification, are normalizing factors. In general, the idea is to base the metric on “what is expected” and not necessarily the actual initiator to the degradation. Considering time as a metrics basis, resilient and un-resilient systems can be observed in Fig. 2.
Dependent upon the abscissa metrics chosen, Fig. 2 reflects a generalization of the resiliency of a system. Several common terms are represented on this graphic, including robustness, agility, adaptive capacity, adaptive insufficiency, resiliency and brittleness. To overview the definitions of these terms, the following explanations of each is provided below:
- Agility: The derivative of the disturbance curve. This average defines the ability of the system to resist degradation on the downward slope, but also to recover on the upward. Primarily considered a time based term that indicates impact to mission. Considers both short term system and longer term human responder actions.
- Adaptive Capacity: The ability of the system to adapt or transform from impact and maintain minimum normalcy. Considered a value between 0 and 1, where 1 is fully operational and 0 is the resilience threshold.
- Adaptive Insufficiency: The inability of the system to adapt or transform from impact, indicating an unacceptable performance loss due to the disturbance. Considered a value between 0 and -1, where 0 is the resilience threshold and -1 is total loss of operation.
- Brittleness: The area under the disturbance curve as intersected by the resilience threshold. This indicates the impact from the loss of operational normalcy.
- Phases of Resilient Control System Preparation and Disturbance Response:
- Recon: Maintaining proactive state awareness of system conditions and degradation
- Resist: System response to recognized conditions, both to mitigate and counter
- Respond: System degradation has been stopped and returning system performance
- Restore: Longer term performance restoration, which includes equipment replacement
- Resiliency: The converse of brittleness, which for a resilience system is “zero” loss of minimum normalcy.
- Robustness: A positive or negative number associated with the area between the disturbance curve and the resilience threshold, indicating either the capacity or insufficiency, respectively.
On the abscissa of Fig. 2, it can be recognized that cyber and cognitive influences can affect both the data and the time, which underscores the relative importance of recognizing these forms of degradation in resilient control designs. For cybersecurity, a single cyberattack can degrade a control system in multiple ways. Additionally, control impacts can be characterized as indicated. While these terms are fundamental and seem of little value for those correlating impact in terms like cost, the development of use cases provide a means by which this relevance can be codified. For example, given the impact to system dynamics or data, the performance of the control loop can be directly ascertained and show approach to instability and operational impact.
Resilience Manifold for Design and Operation
The very nature of control systems implies a starting point for the development of resilience metrics. That is, the control of a physical process if based upon quantifiable performance and measures, including first principles and stochastic. The ability to provide this measurement, which is the basis for correlating operational performance and adaptation, then also becomes the starting point for correlation of the data and time variations that can come from the cognitive, cyber-physical sources. Effective understanding is based upon developing a manifold of adaptive capacity that correlates the design (and operational) buffer. For a power system, this manifold is based upon the real and reactive power assets, the controllable having the latitude to maneuver, and the impact of disturbances over time. For a modern distribution system (MDS), these assets can be aggregated from the individual contributions as shown in Fig. 3 . For this figure, these assets include: a) a battery, b) an alternate tie line source, c) an asymmetric P/Q-conjectured source, d) a distribution static synchronous compensator (DSTATCOM), and e) low latency, four quadrant source with no energy limit.
Examples of Resilient Control System Developments
1) When considering the current digital control system designs, the cyber security of these systems is dependent upon what is considered border protections, i.e., firewalls, passwords, etc. If a malicious actor compromised the digital control system for an industrial operation by a man-in-the-middle attack, data can be corrupted with the control system. The industrial facility operator would have no way of knowing the data has been compromised, until someone such as a security engineer recognized the attack was occurring. As operators are trained to provide a prompt, appropriate response to stabilize the industrial facility, there is a likelihood that the corrupt data would lead to the operator reacting to the situation and lead to a plant upset. In a resilient control system, as per Fig. 1, cyber and physical data is fused to recognize anomalous situations and warn the operator.
2) As our society becomes more automated for a variety of drivers, including energy efficiency, the need to implement ever more effective control algorithms naturally follow. However, advanced control algorithms are dependent upon data from multiple sensors to predict the behaviors of the industrial operation and make corrective responses. This type of system can become very brittle, insofar as any unrecognized degradation in the sensor itself can lead to incorrect responses by the control algorithm and potentially a worsened condition relative to the desired operation for the industrial facility. Therefore, implementation of advanced control algorithms in a resilient control system also requires the implementation of diagnostic and prognostic architectures to recognize sensor degradation, as well as failures with industrial process equipment associated with the control algorithms.
Resilient Control System Solutions and the Need for Interdisciplinary Education
In our world of advancing automation, our dependence upon these advancing technologies and the skill sets needed to keep the United States at the forefront of innovation. The challenges may appear rooted in design of better means to better control our infrastructures for greater safety and efficiency in generation and use of energy. However, the evolution of the technologies developed to achieve the current design of automation has achieved a complex environment where a cyber-attack, human error in design or operation, or a damaging storm can wreak havoc on the infrastructure we depend as a Nation. The next generation of systems will need to consider the broader picture to ensure as a path forward, failures do not lead to ever greater catastrophic events. As a critical resource are the students of tomorrow who will be expected to advance these designs, and require both a perspective on the challenges and the contributions of others to fulfill the need. Addressing this need, courses have been developed to provide the perspectives and relevant examples to overview the issues and provide opportunity to create resilient solutions at such universities as George Mason University and Northeastern. The tie to critical infrastructure operations is an important aspect of these courses.
Through the development of technologies designed to set the stage for next generation automation, it has become evident that effective teams are comprised several disciplines. However, developing a level of effectiveness can be time consuming, and when done in a professional environment can expend a lot of energy and time that provides little obvious benefit to the desired outcome. It is clear that the earlier these STEM disciplines can be successfully integrated, the more effective they are at recognizing each other’s contributions and working together to achieve a common set of goals in the professional world. Team competition at venues such as Resilience Week will be a natural outcome of developing such an environment, allowing interdisciplinary participation and providing an exciting challenge to motivate students to pursue a STEM education.
Standardizing Resilience and Resilient Control System Principles
Standards and policy that define resilience nomenclature and metrics are needed to establish a value proposition for investment, which includes government, academia and industry. The IEEE Industrial Electronics Society has taken the lead in forming a technical committee toward this end. The purpose of this committee will be to establish metrics and standards associated with codifying promising technologies that promote resilience in automation. This effort is distinct from more supply chain community focus on resilience and security, such as the efforts of ISO and NIST
- M. Gopal, "Digital Control And State Variable Method," Tata McGraw-Hill, pp. 3-12, 2009.
- Rinaldi, Peerenboom & Kelly 2001, pp. 11–25.
- DHS staff 2005.
- Rieger, Gertman & McQueen 2009.
- Hollnagel, Woods & Leveson 2006,[page needed].
- Trivedi, Dong & Ghosh 2009, pp. 74-77.
- Cholda et al. 2009, pp. 11-19.
- Meyer 2009.
- Wang & Liu 2008,[page needed].
- Mitchell & Mannan 2006, pp. 39-45.
- Proceedings of the 1st International Symposium on Resilient Control Systems, Idaho Falls, ID, 2008
- Rieger 2010, pp. 64-71.
- S. Jackson, Architecting Resilient Systems: Accident Avoidance and Survival and Recovery from Disruptions, John Wiley, Hoboken, November, 2009
- W. L. Luyben, Process Modelling, Simulation and Control for Chemical Engineers, McGraw-Hill, August, 1989
- Critical Infrastructure Resilience: Final Report and Recommendations, National Infrastructure Advisory Council, Department of Homeland Security, 2009
- Analysis of Prototypical Jurisdiction & Infrastructure Critical Facility Resiliencies, Advanced Research Institute Virginia Polytechnic Institute and State University Arlington, Virginia, 2009
- HTGR Resilient Control System Strategy, September 2010
- Proceedings of the International Symposium on Resilient Control Systems, 2008-2011
- Lin, Sedigh & Hurson 2011, pp. 93-103.
- High-Confidence Medical Devices: Cyber-Physical Systems for 21st Century Health Care, Networking and Information Technology Research and Development (NITRD), February 2009
- E. Hollnagel, J. Pariès, D. Woods and J. Wreathall, Resilience Engineering in Practice, Ashgate, London, 2010
- M. Schrauf, M. Simon, E. Schmidt and W. Kincses, Assessing Drivers' Fatigue State under Real Traffic Conditions using EEG Alpha Spindles, Sixth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, June, 2011
- S. E. Kerick, K. Mcdowell, '’Understanding Brain, Cognition, and Behavior in Complex Dynamic Environments,5th International Conference on Foundations of Augmented Cognition, 2009
- N. Adam, DHS Workshop on Future Directions in Cyber-Physical Systems Security: Final Report, January, 2010
- E. Blasch, M. Pribilski, B. Daughtery, B. Roscoe, and J. Gunsett, “Fusion Metrics for Dynamic Situation Analysis,” Proc. SPIE 5429, April, 2004
- P. Verissimo, Challenges of Architecting Resilient Critical Information Infrastructures, presentation at ENISA-FORTH Summer School on Network and Information Security, September, 2008
- R.L. Boring, Reconciling resilience with reliability: The complementary nature of resilience engineering and human reliability analysis, 53rd Annual Meeting of the Human Factors and Ergonomics Society, pp. 1589-1593, October, 2009
- S. Redwine, Introduction to Modeling Tools for Software Security, DHS US-CERT Build Security In Website, February, 2007
- H. G. Goldman, Building Secure, Resilient Architectures for Cyber Mission Assurance, MITRE, 2010
- M. A. McQueen, W. F. Boyer, Deception used for Cyber Defense of Control Systems, 2nd IEEE Conference on Human System Interaction, Catania, Italy, May, 2009
- V. Vyatkin, G. Zhabelova and M. Ulieru, Toward Digital Ecologies:Intelligent Agent Networks Controlling Interdependent Infrastructures, 1st IEEE Conference on Smart Grid Communications, October, 2010
- Alderson and Doyle, ‘’Contrasting views of Complexity and their Implications for Network-Centric Infrastructures’’,Transactions of IEEE Systems, Man and Cybernetics, Special Issue on Cyber-Physical Ecosystems, July 2010.
- S.P. Meyn, Control Techniques for Complex Networks, Cambridge University Press, New York, NY, 2008
- A. A. Cardenas, S. Amin, and S. S. Sastry, ‘’ Secure control: Towards survivable cyber-physical systems’’, 28th International Conference on Distributed Computing Systems Workshops, pp. 495-500, 2008
- S. D. J. McArthur et al., “Multi-Agent Systems for Power Engineering Applications—Part I: Concepts, Approaches, and Technical Challenges,” IEEE Transactions on Power Systems, pp. 1743-1752, November, 2007
- S. D. J. McArthur et al., “Multi-Agent Systems for Power Engineering Applications—Part II: Technologies, Standards, and Tools for Building Multi-agent Systems,” IEEE Transactions on Power Systems, pp. 1753- 1759, November, 2007
- C. G. Rieger, "Resilient control systems Practical metrics basis for defining mission impact," 7th International Symposium on Resilient Control Systems, August 2014
- T. R. McJunkin and C. G. Rieger, "Electricity distribution system resilient control system metrics," 2017 Resilience Week (RWS), Wilmington, DE, 2017, pp. 103-112.
- D. Wijayasekara, O. Linda, M. Manic, C. Rieger, “FN-DFE: Fuzzy-Neural Data Fusion Engine for Enhanced Resilient State-Awareness of Hybrid Energy Systems,” Special Issue on Resilient Architectures and Systems, IEEE Transactions on Cybernetics, November 2014
- Kun Ji, Yan Lu, Linxia Liao, Zhen Song, and Dong Wei, "Prognostics Enabled Resilient Control for Model-based Building Automation Systems," Proceedings of Building Simulation 2011, 12th Conference of International Building Performance Simulation Association, Sydney, November, 2011.
- H. E. Garcia, W. Lin and S. M. Meerkov, "A Resilience Assessment Monitoring System," in Proc. IEEE Symposium on Resilience Control Systems (ISRCS 2012), Salt Lake City, Utah, August 14–16, 2012
- M. Pajic, N. Bezzo, J. Weimer, R. Alur, R. Mangharam, N. Michael, G. J. Pappas, O. Sokolsky, P. Tabuada, S. Weirich, and I. Lee, "Towards synthesis of platform-aware attack-resilient control systems: extended abstract," 2nd ACM international conference on High confidence networked systems, Philadelphia, PA, April 2013.
- T.R. McJunkin, C.G. Rieger, B.K. Johnson, D.S. Naidu, J.F. Gardner, L.H. Beaty, I. Ray, K. L. Le Blanc, M. Guryan, "Interdisciplinary Education through “Edu-tainment”: Electric Grid Resilient Control Systems Course," 122nd ASEE Annual Conference and Exposition, June 2015.
- Cholda, P.; Tapolcai, J.; Cinkler, T.; Wajda, K.; Jajszczyk, A. (2009), "Quality of resilience as a network reliability characterization Tool", IEEE Network, 23 (2): 11–19, doi:10.1109/mnet.2009.4804331
- DHS staff (May 2005), Critical Infrastructure Protection, Department of Homeland Security Faces Challenges in Fulfilling Cybersecurity Responsibilities,GAO-05-434, US Government
- Hollnagel, E.; Woods, D. D.; Leveson, N (2006), Resilience Engineering: Concepts and Precepts, Aldershot Hampshire, UK: Ashgate Publishing
- Kuo, B. C. (June 1995), Digital Control Systems, Oxford University Press
- Lin, J.; Sedigh, S.; Hurson, A.R. (May 2011), An Agent-Based Approach to Reconciling Data Heterogeneity in Cyber-Physical Systems, 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 93–103
- Meyer, J. F. (September 2009), Defining and Evaluating Resilience: A Performability Perspective, presentation at International Workshop on Performability Modeling of Computer and Communication Systems
- Mitchell, S. M.; Mannan, M. S (April 2006), "Designing Resilient Engineered Systems", Chemical Engineering Progress, 102 (4): 39–45
- Rieger, C. G. (August 2010), Notional examples and benchmark aspects of a resilient control system, 3rd International Symposium on Resilient Control Systems, pp. 64–71
- Rinaldi, S. M.; Peerenboom, J. P.; Kelly, T. K. (December 2001), "Identifying, Understanding and Analyzing Critical Infrastructure Interdependencies", IEEE Control Systems Magazine: 11–25
- Trivedi, K. S.; Dong, S. K.; Ghosh, R. (December 2009), Resilience in Computer Systems and Networks, IEEE/ACM International Conference on Computer-Aided Design-Digest of Technical Papers, pp. 74–77
- Wang, F.Y.; Liu, D. (2008), Networked Control Systems: Theory and Applications, London, UK: Springer-Verlag
- Wei, D.; Ji, K. (August 2010), Resilient industrial control system (RICS): Concepts, formulation, metrics, and insights, 3rd International Symposium Resilient Control Systems (ISRCS), pp. 15–22
- Wing, J. (April 2008), Cyber-Physical Systems Research Charge, St Louis, Missouri: Cyber-Physical Systems Summit
- This article incorporates public domain material from websites or documents of the United States Government. Rieger, C.G.; Gertman, D.I.; McQueen, M.A. (May 2009), Resilient Control Systems: Next Generation Design Research, Catania, Italy: 2nd IEEE Conference on Human System Interaction
- Rieger, Craig G.; Gertman, David I.; McQueen, Miles A. (May 2009), Resilient Control Systems: Next Generation Design Research (HSI 2009) (PDF), Idaho National Laboratory (INL)