In systems design, a fail-fast system is one which immediately reports at its interface any condition that is likely to indicate a failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly flawed process. Such designs often check the system's state at several points in an operation, so any failures can be detected early. The responsibility of a fail-fast module is detecting errors, then letting the next-highest level of the system handle them.
Hardware and software
Fail-fast systems or modules are desirable in several circumstances:
- When building a fault-tolerant system by means of redundant components, the individual components should be fail-fast to give the system enough information to successfully tolerate a failure.
- Fail-fast components are often used in situations where failure in one component might not be visible until it leads to failure in another component.
- Finding the cause of a failure is easier in a fail-fast system, because the system reports the failure with as much information as possible as close to the time of failure as possible. In a fault-tolerant system, the failure might go undetected, whereas in a system that is neither fault-tolerant nor fail-fast the failure might be temporarily hidden until it causes some seemingly unrelated problem later.
- A fail-fast system that is designed to halt as well as report the error on failure is less likely to erroneously perform an irreversible or costly operation.
Developers also refer to code as fail-fast if it tries to fail as soon as possible at variable or object initialization. In object-oriented programming, a fail-fast-designed object initializes the internal state of the object in the constructor, launching an exception if something is wrong (rather than allowing non-initialized or partially initialized objects that will fail later due to a wrong "setter"). The object can then be made immutable if no more changes to the internal state are expected. In functions, fail-fast code will check input parameters in the precondition. In client-server architectures, fail-fast will check the client request just upon arrival, before processing or redirecting it to other internal components, returning an error if the request fails (incorrect parameters, ...). Fail-fast-designed code decreases the internal software entropy, and reduces debugging effort.
The term has been widely employed as a metaphor in business, dating back to at least 2001, meaning that businesses should undertake bold experiments to determine the long-term viability of a product or strategy, rather than proceeding cautiously and investing years in a doomed approach. It became adopted as a kind of "mantra" within startup culture.
- Crash-only software
- Design by contract
- Failing badly vs. failing well
- Fail-silent system
- Khanna, Rajat; Guler, Isin; Nerkar, Atul (2016-04-01). "Fail Often, Fail Big, and Fail Fast? Learning from Small Failures and R&D Performance in the Pharmaceutical Industry". Academy of Management Journal. 59 (2): 436–459. doi:10.5465/amj.2013.1109. ISSN 0001-4273.
- "Epic Fails of the Startup World". The New Yorker. Retrieved 2017-08-14.
- Gray, Jim. "Why Do Computers Stop And What Can Be Done About It?". CiteSeerX 10.1.1.110.9127, Cite journal requires
|journal=(help) introducing 'Fail Fast'
- "Fail Fast" Article by Jim Shore explaining using 'Fail Fast' concept in software development (from 'columns for IEEE software' edited by Martin Fowler)