Crash-only software

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Crash-only software refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery.[1] Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed, except when there are leftover artifacts, such as data corruption from a severe failure, that don't occur during normal startup.

An example of a crash-only implementation is unplugging a computer and plugging it back in. Any data being written at the time may be corrupted, and unsaved data and settings in RAM will be lost. However, if one waits for the computer to be idle (no data being written), saves all the data they need, and hasn't changed any operating system settings they want to keep, then unplugging the computer is faster than shutting down.

Crash-only software also has benefits for end-users. All too often, applications do not save their data and settings while running, only at the end of their use. For example, word processors usually save settings when they are closed. A crash-only application is designed to save all changed user settings soon after they are changed, so that the persistent state matches that of the running machine. No matter how an application terminates (be it a clean close or the sudden failure of a laptop battery), the state will persist.

Erlang[edit]

Erlang is a computer language originally built by Ericsson for fault-tolerant telephone switches. Programs are structured as modules that can be replaced (hot swapped) without having to restart the entire program. If a module crashes or needs to be updated it can be restarted or replaced without affecting any other part of the program. Within the Open Telecom Platform, which often is used together with Erlang, there exist frameworks to simplify and automate this task.

See also[edit]

References[edit]

  1. ^ Candea, George; Fox, Armando (May 2003). "9th Workshop on Hot Topics in Operating Systems". Lihue, Hawaii, USA.  |chapter= ignored (help)

External links[edit]