|This is the talk page for discussing improvements to the Backup article.|
|Backup was a good articles nominee, but did not meet the good article criteria at the time. There are suggestions below for improving the article. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake.|
|WikiProject Computing / Software / Hardware||(Rated C-class, Mid-importance)|
- 1 Old comments
- 2 Paris bank fire
- 3 Verb versus Noun
- 4 Differential backup and Types of backup changes
- 5 Formats
- 6 Incrementals
- 7 Storage Media
- 8 Link spam for Veritas NetBackup
- 9 Grammar
- 10 Failed GA
- 11 Too much like a list?
- 12 A few points based on experience..
- 13 backup window
- 14 truncating the introduction as suggested
- 15 Binary differential method
- 16 manipulation
- 17 References and citations
- 18 Measuring the process subsection
- 19 where is discussion of archival issues like read-only retention time?
- 20 HDD stability unknown ?
- 21 Backup or Archive or both!
- 22 What's with the "Law" section?
- 23 A different formatting on title? so ease the reading on Windows Phone or smart phone
There is a new article at Data backup, which may need to merged in this one. I left a note on the author's talk page, and recommended either merging it here or changing its focus enough to make a standalone article. 126.96.36.199 09:42, 18 Dec 2004 (UTC)
"A backup should not use compression. Compression reduces data redundancy. Redundancy might be useful when restoring data from damaged media." Does this apply to CD-R–based backups? —Wins oddf
- I've just rephrased that to be a bit more straightforward, but yes, regardless of media, it still applies. If you scratch the CD and the data on it is uncompressed, you'll lose part of a file or so. If you scratch it and the data on it is compressed you may lose the entire compressed archive. — mendel ☎ 20:43, 14 September 2005 (UTC)
Paris bank fire
Which is the bank mentioned in the Backup article : "A few years earlier (to 2001), during a fire at the headquarters of a major bank in Paris, system administrators ran into the burning building to rescue backup tapes because they didn't have offsite copies." ? Jay 14:01, 22 September 2005 (UTC)
- In the last decade there have been two Paris bank fires. The Credit Lyonnais headquarters in 1996. And a fire at Banque de France in 1999, which I don't think anyone cared about. I can't verify the above anecdote. lots of issues | leave me a message 05:05, 23 September 2005 (UTC)
- Confirmed that the headquarters of the Lyonnais burnt on May 5, 2001, apparently arson. It is rumoured  that the disappearances of archives was intentional (the Lyonnais was, at the time, caught up in major scandals). The 1996 fire seems to be cited as an example of what should not be done on sites promoting data backups. David.Monniaux 16:18, 26 September 2005 (UTC)
- Yes, Credit Lyonnais seems to be the one since it gave many Google hits on "fire" and "Credit Lyonnais". Got a case study for data backup using this fire incident as an example. Jay 07:55, 23 September 2005 (UTC)
The source that is referenced to support the claim about IT admins running into the burning building to rescue the tapes is unreliable. It looks like something copied from some kind of usenet group, and the author of that source even admits that everything in his post is unconfirmed. The citation I'm referring to is number 16 in the endnotes. Schmungles (talk) 01:18, 9 September 2009 (UTC)
Verb versus Noun
I have often seen the verb form written as two words "to back up the system", whereas the noun is always one. Not sure if this is worth mentioning in the article.
188.8.131.52 16:10, 25 December 2005 (UTC)
Differential backup and Types of backup changes
A few changes I just made to the article: first the references to the "archive bit" in the "Types of backup" section have been removed - the archive bit is a system-specific feature of WinNT systems and is not relevant to backups in general (besides, it tends not to be used by most modern backup software, they keep their own info on what has been backed up). Secondly, i have limited the description of restoring using differential backups to the case (Full backup + differential backup) - there had been mentions of incrementals as well, which for the purposes of exposition had to be dropped - someone not familiar with the terms will be confused as to whether incrementals taken before the differential are required, whereas the author (I assume) meant any incrementals newer than the differential. For clarity, keep it simple, then the reader can grasp the concepts and the more complex cases will suggest themselves without clouding the descriptions. -- unsigned comment made by Special:Contributions/184.108.40.206 at 23:19, 26 August 2006
"A backup should rely on standard, well-established formats."
- Such as?... — Omegatron 00:50, 29 January 2006 (UTC)
On unix and most NAS - tar, gtar, cpio, dump, On netware - sidf On Windows - MTF (Microsoft Tape Format), Vendor Specific - OTF ("Open" Tape Format - Depends on your definition of open, cf EMC Networker man pages), Netbackup multiplexed gtar, various Vendor specific implementations of MTF, and whatever TSM uses.
Thats the great thing about standards, there are so many to choose from.
--Sharkspear 00:37, 4 January 2007 (UTC)
This article is wrong. The discussion of differential versus incremental needs to be corrected. It is fuzzy and misleading as it is now. I suspect the author doesn't really understand the concepts properly. Some examples of backup/restore strategies using "1) full backup + Incremental" and "2) Full backup + differential" bringing out the Pros and Cons of each is also necessary. 220.127.116.11 09:47, 25 December 2006 (UTC)
- Which part of the article is wrong? There is no explicit discussion of differential versus incremental. The only mention of differential backups is in the Glossary section and each of those entries looks good to me. I suggest putting the details of incremental vs. differential in the Incremental backup article. My opinion is that differentials are rarely used and that writing a lot about them in the general backup article would only serve to obfuscate the larger issues. -- Austin Murphy 15:46, 27 December 2006 (UTC)
With regard to the Optical Media section, is "(This is equivalent to 12,000 images or 200,000 pages of text.)" really true? It depends greatly on formats, quality, compression etc.
Also I was considering changing the format of this section to an "Advantages/Disadvantages" format. Ozstrike 01:34, 9 January 2007 (UTC)
- I agree that the 12,000/200,000 comment is basically baloney and suggest removing it. As to changing it the whole section, I would mostly be interested in including the characteristic features of each medium rather than turning it into a face-off of sorts. Feel free to contribute! -- Austin Murphy 01:57, 11 January 2007 (UTC)
Link spam for Veritas NetBackup
I see a lot of proprietary language on this article based on Veritas NetBackup. Is this a really, really hot product or does this reek as much as I think? Marc W. Abel 16:44, 19 April 2007 (UTC)
- Hi Marc, I just delinked NetBackup from two of the glossary entries. The point in mentioning NetBackup by name 5 times in the Glossary is to make sense of some of the unique terms used for different functions. NetBackup is one of the Big Three commercial unix backup packages. The other two are Tivoli and Legato. BackupExec is pretty big on Windows. I don't know the terminology for them so I didn't include it. If you have other terms that could be added, please do. -- Austin Murphy 18:33, 19 April 2007 (UTC)
The grammar in the 'backup' article appears to be erroraneous in a few places. Take for example the opening line: "backup refers to the copying of data so that these additional copies may be restored after a data loss event." I have issues, particularly with the text "restored after a data loss event". From what I understand, "data loss" is the result of failure (software/hardware, electrical infrastructure anomaly, fire, etc), while the "event" refers to the instance in which the said data loss occurred. I suggest rephrasing the text like so: "backup refers to the copying of data onto supplementary media and facilitate data restoration in the event of a failure leading to data loss." (18.104.22.168 01:37, 12 May 2007 (UTC))
- I agree the language was a bit clumsy, so I've updated it. However, I don't think "supplimentary media" really makes any more sense than the previous wording. Check out the data loss page. There is more to data loss than failures. Time playes a crucial role in how data is handled. I think the phrase "data loss event" accurately conveys this. -- Austin Murphy 15:21, 14 May 2007 (UTC)
This article has failed the GA noms due to being written like a list, as well as the few amounts of jargon in various places. If you disagree with this decision feel free to take it to WP:GA/R. Tarrettalk 20:53, 10 September 2007 (UTC)
Too much like a list?
The GA criticism seems to indicate that a way to improve the article would be to make it less like a list. Wikipedia:Embedded list and Wikipedia:Lists have a bit of official info on the subject. I think they generally support the way the article is laid out. Still, it may work better if there was more prose. Comments? -- Austin Murphy 15:02, 22 October 2007 (UTC)
A few points based on experience..
I often ask myself why many (if not most) users fail to maintain any sort of backups, and the main reason is most likely the massive level of complexity of the commercially-available backup systems. Complexity which for the most part is totally unneeded. A key example is the 'media pooling' regime of Windows servers, which through its complexity and troublesomeness is a very frequent cause of backup failure.
It would be beneficial to explain in straightforward terms WHY rotational backups are needed; most users do not grasp the fact that repeatedly using the same media only protects against losses which are immediately noticed. A few pictorial examples of rotation might be helpful in explaining the principle.
Another point worth touching on is that many proffered backup 'solutions' are OK for backing-up documents, but woefully inadequate when it comes to a hard-disk failure, in that they are incapable of fully restoring the OS or system-partition from a backup.
The point about standard formats is a good one, and to this I would add that a backup is of little use for disaster-recovery unless the format in which it was made, and the disk-partitions it represents, are documented. As is a copy of the backup software itself, especially if this is proprietary.
Perhaps the point about verification could be made more strongly, in that many backup systems are notorious for failing to notify the operator that they have started to 'write blanks' and will continue to do so indefinitely unless a periodic manual check is made that the backup actually contains data.
Finally, it might be relevant to mention that since backup processes typically run under a specific useraccount (e.g. root or Administrator) a frequent pitfall is that of forcing a change of this account's password as part of a security-policy, and thereby knocking-out the backup. Since this typically also knocks-out any error-notification process, the fact may go unnoticed until a data-loss occurs. --Anteaus (talk) 11:00, 6 December 2007 (UTC)
- Hi Anteaus, I'm not exactly sure what you're getting at here, but you are welcome to make the edits yourself. Wikipedia's guideline on this is called "Be BOLD!" If you would like to "test-drive" some edits, you can leave them here for some feedback. Also, consider that this is an encyclopedia, not a how-to manual, and it is directed toward a general audience. Deep levels of detail are welcome, but they must fit into the context. -- Austin Murphy (talk) 16:44, 6 December 2007 (UTC)
The backup window is not necessarily the same as doing a cold backup of a database or application. Fuzzy backups are a risk when doing hot backups or open file backups improperly. Cold backups require a strict backup window, but the term backup window is more broad than just that. -- Austin Murphy (talk) 19:08, 12 April 2008 (UTC)
- Well, I have different understanding of the term, but I won't argue. But could you at least mention fuzzy backup somewhere in the article? Do you feel this is non-issue? --Kubanczyk (talk) 07:43, 14 April 2008 (UTC)
- Sorry, I got distracted after that edit and forgot to move fuzzy backup to where I thought it fit. Open file backup is an important topic. I'm thinking of starting a new page to better describe the process for different types of data and the problem of getting fuzzy backups. -- Austin Murphy (talk) 18:14, 15 April 2008 (UTC)
truncating the introduction as suggested
In information technology, backups are typically to avoid loss by creating copies that can be used to restore original data after disasters (called disaster recovery), accidents or corrupted disc. ] Backup storage devices have evolved to concentrate on geographic redundancy, data security, and portability. Techniques have developed to allow optimal techniques regarding for example ... open files; live data sources; compression, encryption and de-duplication. Procedures are still evolving. Backups and backup systems differ from archives and archival systems in the sense that archives are the primary copy of data, typically kept as a historical reference and for future use, and backups are a secondary copy to guarantee replacement in case of loss. Backup systems differ from fault-tolerant systems in the sense that backup systems assume that a fault will cause a data loss event and fault-tolerant systems assume a fault will not.
- Hi 22.214.171.124, WP:LEAD suggests that a long article like this one should have 3 or 4 paragraphs in the lead section. It also suggests that "The lead should be able to stand alone as a concise overview of the article." I think that the existing lead section meets these goals. I'm not sure that your suggested text does that. -- Austin Murphy (talk) 14:30, 26 August 2008 (UTC)
Binary differential method
Hi Mike A Quinn, Here's why I reverted some of your recent changes.
De-dupe is performed long after the backup software decides what data should be backed up. Like compression and encryption, it is just an alternate way to represent the same data on the storage media. I think that fits better into the manipulation section.
Staging is a little more complicated because in one sense, it is a temporary storage spot for data and in another sense it is the combination of on-line and near-line media management methods. I've added some text to this effect to the managing the data repository section. I think holding a temporary copy of data qualifies as manipulation. The individual backup datasets don't get modified in staging, but the way they are clustered on the final destination media can be changed dramatically.
Thanks for opening the talk topic, I agree with your comments to a certain extent, however the wiki article is entitled 'backup' and not 'backup software'. A general concern I have with the backup article is the manipulation section. During backup ALL data is manipulated to a certain extent as it flows from one medium to another however I think it is acceptable to keep encryption, compression, duplication and refactoring as these tasks actually transform (manipulate) the data being backed up into a different or separate form.
Does not rely on 'backup software' to perform a backup although the process of deduplicating backup data can be performed in association with a backup software. I do agree that Deduplication manipulates data to a certain extent as it leaves a pointer within the data set that points to the location of the unique file. However I would suggest that leaving a pointer manipultes data to about the same extent as the OS manipulates data when it changes the file status after a backup has been performed. Although deduplication is an alternate way to represent the data it selects data to store and that selected data-although most deduplication technologies are propriatory- is still in essence the same data as resides on the primary storage device and could in theory be directly accessible by the OS. To this extent backup software manipulates data to a far greater extent -as generally the data is not directly accessible by the OS once backed up- and so backup software might need to reside under the manipulation heading too.
Data is staged almost constantly throughout the backup process as it flows through one component to another and so therefore all backup data could be observed as being manipulated. There is however a section in this article that describes data repositories and when data is staged it is staged in a data repository awaiting its final destination, just like water is staged in a reservoir. You are correct that data is not manipulated on the staged media but it is arranged clustered more efficiently on the storage media. Arranging and clustering data more efficiently on a backup media is what backup does, all selection options of a backup assist in the clustering and arranging of data to store it more efficiently. If staging is a manipulation of data then so should differential and incremental backup selections be considered as manipulation.
I do not intend to revert the article page again at this point in time but would value further input from yourself or any other interested party and maybe we could come up with sections in the article that includes and covers our POV's.
- Hi Mike, I appreciate your comments and interest in the subject. Let me try to explain what I was thinking when I organized this article. In section 1 I tried to have a high-level view of all the different kinds of backup architectures. In section 2 I tried to describe all the different techniques that are necessary or useful for backups. Section 3 was supposed to cover all the planning and policy issues and section 4 is stuff that seems related but I can't really fit in somewhere else. I had a hard time thinking of proper names for the sections and subsections and I'm pretty sure they can be improved. I'm sure there are many other ways to improve the article too and I welcome your input in such.
- I don't follow your argument about dedupe. Dedupe can dramatically change the storage requirements for backups. In this context, it is just a fancy way to compress data that is especially useful for backups. I think replacing a large dataset with a bunch of pointers is a pretty significant type of manipulation. Most commercial dedupe products are either virtual tape style or a NAS box or some type. These kinds of products are generally implemented as black boxes and have absolutely no say over what data gets copied from the computers getting backed up. They just accept the data that is sent to it and replace most of it with pointers. Some software like Rsync or BackupPC can do tricks with Hard Links that might cross over into the selection field, but I think that it still makes sense to separate the two concepts of selection and manipulation.
- Staging is not quite as clear to me. D2D2T is/was a big buzzword and my thinking was that it ought to be covered somewhere. As you point out though, staging is not strictly a data transformation technique. Neither is it a unique architecture. It is more of an optimization technique that is employed to make the overall scheme more effective. The same holds true for duplication, refactoring, and multiplexing. I've updated the section name to be more general, but I'm open to other ideas.
- --Austin Murphy (talk) 16:07, 3 December 2008 (UTC)
Hi Austin, Hey I just skimmed through the history of the page and noticed all your input to this article. Overall I think it is a good article. I have just added one word to the section name as we both seem to agree on what happens to the data. Mike (talk) 17:04, 3 December 2008 (UTC)
References and citations
References (11&12) from the two topics - Cold & Hot Data Backup are leading nowhere. Please suggest an alternative. Lakshmi VB Narsimhan 08:50, 19 August 2009 (UTC) —Preceding unsigned comment added by Lakshmin (talk • contribs)
The sentence "66% of internet users have suffered from serious data loss." should be removed or needs a good citation. The current citation (http://www.kabooza.com/globalsurvey.html) is based on "4257 respondents from 129 countries". This is by no means representative for the hundreds of millions of Internet users. Not to mention that the survey is presented by a company that needs to sell its products and might easily have faked the numbers. Martin Zuther (talk) 17:30, 21 February 2010 (UTC)
Measuring the process subsection
where is discussion of archival issues like read-only retention time?
I was trying to figure out how to archive ( copy once and hide somewhere) some data I want to keep forever but rarely use. I wanted to safest media possible but couldn;t find a discussion right away. Now obviously these things change with tehcnology but an article on archival media with refs to current and historical datasheets and real world tests would be quite helpful and it would describe a notable feature of this topic. Finally under flash discussion I would some discussions on thermal decay and energy barriers but no tables comparing current or past flash devices to hard disks or optical media claims/measured. Thanks. Nerdseeksblonde (talk) 17:56, 21 July 2010 (UTC)
HDD stability unknown ?
"The main disadvantages of hard disk backups are (...) that their stability over periods of years is a relative unknown."
This is blatantly false. HDDs can last for very long times, if treated right - I have several working HDDs which are 10 - 16 years old, respectively. On the other hand I do also have newer models, between 2 and 8 years old, which crashed, mostly without or just very little prior warning. So the most important thing to keep in mind when working with HDDs for back-ups is that you never know if or when an HDD is going to die. It will happen all of a sudden, and you can't do anythign about it. So always keep at least two backups on identical HDDs. -- Alexey Topol (talk) 01:03, 17 November 2010 (UTC)
Backup or Archive or both!
I believe the wording you have for your 'backup' article is misleading! The term 'backup' is a generic term and you've gone into specifics of certain types of backups. If I have a backup running each day which overwrites the previous days backup, then introduces a corrupt file within this backup, you have no way of recovering the original file! (or a good copy) However, I still have what is known as a 'backup'! (even if it is corrupt). The description you give for a backup would lead me to believe I could still recover my original 'good' copy of this. For this to happen, I would need a backup and archiving solution. Would you agree? If my backup solution were to somehow create a 'new' copy of the data (in whatever form) and retain the original document and/or any changes, then surely this is an 'archive'! Depending on whatever solution one chooses to use for their so-called 'backup', they might not have the full capabilities in which you suggest they can. If you agree with this, then would you kindly amend your description please? DiveO2 (talk) 11:05, 6 February 2012 (UTC)DiveO2 - 6th Feb 2012.
What's with the "Law" section?
The so-called "Law" section in this article seems to me like it has nothing to do with law at all. It talks about confusion of terminology, then "Advice" for backing stuff up (which actually seems way out of place even if it wasn't under a "Law" section), then events related to backups. 126.96.36.199 (talk) 14:32, 23 March 2012 (UTC)
A different formatting on title? so ease the reading on Windows Phone or smart phone
I found some titles mess up with the text when reading on Windows Phone or smart phone, e.g. 1.3 -> Backup site or disaster recovery center (DR center); 2.3 Cold database backup; Hot database backup; etc. — Preceding unsigned comment added by Kmchanw (talk • contribs) 22:22, 3 November 2012 (UTC)