Remote Differential Compression
|This article needs additional citations for verification. (January 2010)|
Remote Differential Compression (RDC) is a client–server synchronization algorithm that allows the contents of two files to be synchronized by communicating only the differences between them. It was introduced with Microsoft Windows Server 2003 R2 and is included with later Windows client and server operating systems.
Unlike Binary Delta Compression (BDC), which is designed to operate only on known versions of a single file, RDC does not make assumptions about file similarity or versioning. The differences between files are computed on the fly, therefore RDC is suitable for efficient synchronization of files that have been updated independently, where network bandwidth is small, or where the files are large but the differences between them are small.
The algorithm used is based on fingerprinting blocks on each file locally at both ends of the replication partners. Since many types of file changes can cause the file contents to move (for example, a small insertion or deletion at the beginning of a file can cause the rest of the file to become misaligned to the original content) the blocks used for comparison are not based on static arbitrary cut points but on cut points defined by the contents of each file segment. This means that if a part of a file changes in length or blocks of the contents get moved to other parts of the file, the block boundaries for the parts that have not changed remain fixed related to the contents, and thus the series of fingerprints for those blocks don't change either, they just change position. By comparing all hashes in a file to the hashes for the same file at the other end of the replication pair, RDC is able to identify which blocks of the file have changed and which haven't, even if the contents of the file has been significantly reshuffled. Since comparing large files could imply making large numbers of signature comparisons, the algorithm is recursively applied to the hash sets to detect which blocks of hashes have changed or moved around, significantly reducing the amount of data that needs to be transmitted for comparing files.
Later versions of Windows support cross-file RDC, which finds files similar to the one being replicated, and uses blocks of the similar files that are identical to the replicating file to minimize data transferred over the WAN. Cross-file RDC can use blocks of up to five similar files.
Where files are similar, RDC can significantly reduce amount of data transferred. A test was made with two similar but not identical 2.4MB files, bitmap files of the same photograph, one of which had a watermark superimposed. Being uncompressed, the content of the files was mostly similar. When transferred with RDC, only 217kB was needed, a 92% reduction. For smaller files the RDC processing overhead may override the bandwidth reduction.
RDC is implemented in Windows operating systems essentially as an API, but is invoked by very little software, particularly on non-server systems. A myth has arisen that RDC significantly slows local file transfers and should be switched off; a Microsoft TechNet Web page by a Microsoft Directory Services Team member comprehensibly debunks this with detailed timings, additional to the fact that a service which is not invoked by software cannot have any effect, detrimental or otherwise.
- Microsoft TechNet: DFS Replication: Frequently Asked Questions, section "What is cross-file RDC?", pub. 16 October 2006, updated 30 January 2013
- Remote Differential Compression (aka rsync algorithm for Windows), David Jade, Programming, 15 February 2013
- Microsoft TechNet: Ask the Directory Services Team - Debunking the Vista Remote Differential Compression Myth, Ned Pyle [MSFT], 26 Jun 2009
- Introduction to DFS replication
- About Remote Differential Compression
- Optimizing File Replication over Limited-Bandwidth Networks using Remote Differential Compression