This article needs additional citations for verification. (April 2014)
In computer science, a heartbeat is a periodic signal generated by hardware or software to indicate normal operation or to synchronize other parts of a computer system. Usually a heartbeat is sent between machines at a regular interval in the order of seconds; a heartbeat message. If the endpoint does not receive a heartbeat for a time—usually a few heartbeat intervals—the machine that should have sent the heartbeat is assumed to have failed. Heartbeat messages are typically sent non-stop on a periodic or recurring basis from the originator's start-up until the originator's shutdown. When the destination identifies a lack of heartbeat messages during an anticipated arrival period, the destination may determine that the originator has failed, shutdown, or is generally no longer available. Heartbeat messages may be used for high-availability and fault tolerance purposes.
A heartbeat protocol is generally used to negotiate and monitor the availability of a resource, such as a floating IP address. Typically when a heartbeat starts on a machine, it will perform an election process with other machines on the heartbeat network to determine which machine, if any, owns the resource. On heartbeat networks of more than two machines, it is important to take into account partitioning, where two halves of the network could be functioning but not able to communicate with each other. In a situation such as this, it is important that the resource is only owned by one machine, not one machine in each partition.
As a heartbeat is intended to be used to indicate the health of a machine, it is important that the heartbeat protocol and the transport that it runs on is as reliable as possible. Causing a failover because of a false alarm may, depending on the resource, be highly undesirable. It is also important to react quickly to an actual failure, so again it is important that the heartbeat is reliable. For this reason it is often desirable to have heartbeat running over more than one transport; for instance, an Ethernet segment using UDP/IP, and a serial link.
- Watchdog timer, electronic timer that is used to detect and recover from computer malfunctions
- Keepalive, a common generalization of this feature in various protocols
- Heartbleed vulnerability
- "Definition of Heartbeat". pcmag.com Encyclopedia. Retrieved 7 October 2020.
- Brown; et al. (1987-12-01). "US Patent 4,710,926". Retrieved 2009-12-10.
- Aguilera, Marcos Kawazoe; Chen, Wei; Toueg, Sam. "Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication" (PDF). Distributed Algorithms. Springer Berlin Heidelberg. Retrieved 4 March 2015.