Failing badly and failing well are concepts in systems security and network security describing how a system reacts to failure. The terms have been popularized by Bruce Schneier, a cryptographer and security consultant.
- Databases (such as credit card databases) protected only by a password. Once this security is breached, all data can be accessed
- Buildings depending on a single column or truss, whose removal would cause a chain reaction collapse under normal loads
- Security checks which concentrate on establishing identity, not intent (thus allowing, for example, suicide attackers to pass)
- Internet access provided by a single service provider. If the provider's network fails, all Internet connectivity is lost
- Ring networks in which the failure of a single node or connection between nodes brings down the entire network
- Systems, including social ones, that rely on a single person, who, if absent or becomes permanently unavailable, halts the entire system
- Brittle materials, such as "over-reinforced concrete", when overloaded, fail suddenly and catastrophically with no warning.
- Keeping the only copy of data in one central place. That data is lost forever when that place is damaged, such as the 1836 U.S. Patent Office fire.
A system that fails well is one that compartmentalizes or contains failure. Examples include:
- Compartmentalized hulls in watercraft, ensuring that a hull breach in one compartment will not flood the entire vessel
- Databases that do not allow downloads of all data in one attempt, limiting the amount of compromised data
- Structurally redundant buildings conceived to resist loads beyond those expected under normal circumstances, or resist loads when the structure is damaged
- Concrete structures, which show fractures long before breaking under load, thus giving early warning
- Armoured cockpit doors on airplanes, which confine a potential hijacker within the cabin even if they are able to bypass airport security checks
- Internet connectivity provided by more than one vendor or discrete path, known as multihoming
- Star or mesh networks, which can continue to operate when a node or connection has failed (though for a star network, failure of the central hub will still cause the network to fail)
- Ductile materials, such as "under-reinforced concrete", when overloaded, fail gradually -- they yield and stretch, giving some warning before ultimate failure.
- Making a backup copy of all important data and storing it in a separate place. That data can be recovered from the other location when either place is damaged, such as the 1877 U.S. Patent Office fire.
Designing a system to 'fail well' has also been alleged to be a better use of limited security funds than the typical quest to eliminate all potential sources of errors and failure.