Backup rotation scheme
|This article needs additional citations for verification. (January 2010)|
A backup rotation scheme refers to a system of backing up data to computer media (such as tapes) that minimizes, via re-use, the number of media used. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it. Different techniques have evolved over time to balance data retention and restoration needs with the cost of extra data storage media. Such a scheme can be quite complicated if it takes incremental backups, multiple retention periods, and off-site storage into consideration.
First In, First Out
A First In, First Out (FIFO) backup scheme saves new or modified files onto the "oldest" media in the set, ie the media which contain the oldest and thus least useful previously backed up data. Performing a daily backup onto a set of 14 media, the backup depth would be 14 days. Each day, the oldest media would be inserted when performing the backup. This is the simplest rotation scheme, and is usually the first to come to mind.
This scheme has the advantage that it retains the longest possible tail of daily backups. It can be used when archived data is unimportant (or is retained separately from the short-term backup data) and data before the rotation period is irrelevant.
However this scheme suffers from the possibility of data loss: suppose an error is introduced into the data but the problem is not identified until several generations of backups and revisions have taken place. Thus when the error is detected, all the backup files contain the error. It would then be useful to have at least one older version of the data, as it would not have the error.
Grandfather-father-son backup refers to a common rotation scheme for backup media. In this scheme there are three or more backup cycles, such as daily, weekly and monthly. The daily backups are rotated on a daily basis using a FIFO system as above. The weekly backups are similarly rotated on a weekly basis, and the monthly backup on a monthly basis. In addition, quarterly, half-yearly, and/or annual backups could also be separately retained. Often some of these backups are removed from the site for safekeeping and disaster recovery purposes.
Tower of Hanoi
The Tower of Hanoi rotation method is more complex. It is based on the mathematics of the Tower of Hanoi puzzle, using a recursive method to optimize the back-up cycle. Every tape corresponds to a disk in the puzzle, and every disk movement to a different peg corresponds with a backup to that tape. So the first tape is used every other day (1, 3, 5, 7, 9,...), the second tape is used every fourth day (2, 6, 10, ...), the third tape is used every eighth day (4, 12, 20, ...).
A set of n tapes (or other media) will allow backups for 2 n-1 days before the last set is recycled. So, three tapes will give four days' worth of backups and on the fifth day Set C will be overwritten; four tapes will give eight days, and Set D is overwritten on the ninth day; five tapes will give 16 days, etc. Files can be restored from 1, 2, 4, 8, 16, ..., 2 n - 1 days ago.
The following tables show which tapes are used on which days of various cycles. A disadvantage of the method is that half the backups are overwritten after only two days.
Three-tape Hanoi schedule
|Day of the Cycle|
Four-tape Hanoi schedule
|Day of the Cycle|
Five-tape Hanoi schedule
|Day of the Cycle|
Weighted random approach
An alternative approach to keeping generations distributed across all points in time is to delete (or overwrite), past generations (except the oldest and the most-recent-n generations) when necessary in a weighted-random fashion. For each deletion, the weight assigned to each of the deletable generations is the probability of it being deleted. One acceptable weight is a constant exponent (possibly the square) of the multiplicative inverse of the duration (possibly expressed in the number of days) between the date of the generation and the generation available before it.
Using a larger exponent leads to a more uniform distribution of generations, whereas a smaller exponent lead to a distribution with more recent and fewer older generations. This technique probabilistically ensures that past generations are always distributed across all points in time as desired.
However this approach has no advantage over a more systematic approach.
Incremented media method
This method has many variations and names. A set of numbered media is used until the end of the cycle. Then the cycle is repeated using media numbered the same as the previous cycle, but incremented by one. The lowest numbered tape from the previous cycle is retired and kept permanently. Thus, one has access to every backup for one cycle, and one backup per cycle before that. This method has the advantage of ensuring even media wear, but requires a schedule to be precalculated.