File comparison

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by M4gnum0n (talk | contribs) at 10:14, 9 December 2010 (only one reference). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

File comparison in computing compares the contents of computer files, finding their common contents and their differences. The result of the comparison may be presented in a graphic user interface or as part of larger tasks in networks, file systems, or revision control.

Some widely-used file comparison programs are diff, cmp, FileMerge, Araxis Merge, WinMerge, Beyond Compare, and Microsoft File Compare.

Many text editors and word processors perform file comparison to highlight the changes to a document.

Method types

Most file comparison tools find the longest common subsequence between two files and present it as a sequence of insertions and deletions. Very few file comparison programs find block moves.

Some specialized file comparison tools find the longest increasing subsequence between two files (U.S. patent 7,031,972). The rsync protocol uses a rolling hash function to compare two files on two distant computers with low communication overhead.

File comparison in word processors is typically at the word level, while comparison in most programming tools is at the line level. Byte or character-level comparison is useful in some specialized applications.

Reasoning

Comparison tools are used for various reasons. When one wishes to compare binary files, byte-level is probably best. But if one wishes to compare text files, a side-by-side visual comparison is usually best. (Note that visual comparison is also necessary for program files that are based upon languages that are human-readable or that are script-based.) This gives the user the chance to decide which file is the preferred one to retain, if the files should be merged together to create one containing all of the differences, or perhaps to keep them both as-is for later reference, through some form of "versioning" control. Versioning is also important for backup purposes.

File comparison is an important, and most likely integral, part of file synchronization and/or backup. Even in backup methodologies, the issue of corruption is an important one. Corruption occurs without warning and without our knowledge; at least usually until too late to recover the missing parts. Usually, the only way to know for sure if a file has become corrupted is when it is next used or opened. Barring that, one must use a comparison tool to at least recognize that a difference has occurred. Therefore, all file sync or backup programs must include file comparison if these programs are to be actually useful and trusted.

When used in automated processes, file comparison can be set to automatically perform the correct method of saving. Usually the default should be to create another version of the same file automatically so that the user does not have to monitor the process at that point in time. Review, for the sake of elimination of unneeded versions of files, can then occur later at a more convenient time.

Historical uses

Prior to file comparison, machines existed to compare magnetic tapes or punch cards. The IBM 519 Card Reproducer could determine whether a deck of punched cards were equivalent. In 1957, John Van Gardner developed a system to compare the check sums of loaded sections of Fortran programs to debug compilation problems on the IBM 704.[1]

See also

External links