Distributed revision control
||This article needs attention from an expert in Computer science. (August 2014)|
||This article contains embedded lists that may be poorly defined, unverified or indiscriminate. (January 2008)|
In computer programming, distributed revision control, also known as distributed version control or decentralized version control, allows many software developers to work on a given project without requiring them to share a common network. The software revisions are stored in a distributed revision control system (DRCS), also known as a distributed version control system (DVCS).
Distributed vs. centralized
||This article possibly contains original research. (January 2008)|
Distributed revision control takes a peer-to-peer approach to version control, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a complete repository. Distributed revision control synchronizes repositories by exchanging patches (sets of changes) from peer to peer. This results in some important differences from a centralized system:
- No canonical, reference copy of the codebase exists by default; only working copies.
- Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.
- Communication is only necessary when sharing changes among other peers.
- Each working copy effectively functions as a remote backup of the codebase and of its change-history, protecting against data loss.
Other differences include:
- Multiple "central" repositories.
- Code from disparate repositories are merged based on a web of trust, i.e., historical merit or quality of changes.
- Numerous different development models are possible, such as development / release branches or a Commander / Lieutenant model, allowing for efficient delegation of topical developments in very large projects. Lieutenants are project members who have the power to dynamically decide which branches to merge.
- Network is not involved for common operations.
- A separate set of "sync" operations are available for committing or receiving changes with remote repositories.
DVCS proponents point to several advantages of distributed version control systems over the traditional centralised model:
- Allows users to work productively when not connected to a network.
- Makes most operations much faster.
- Allows participation in projects without requiring permissions from project authorities, and thus arguably better fosters culture of meritocracy instead of requiring "committer" status.
- Allows private work, so users can use their changes even for early drafts they do not want to publish.
- Avoids relying on one physical machine as a single point of failure.
- Permits centralized control of the "release version" of the project
- On FLOSS software projects it is much easier to create a project fork from a project that is stalled because of leadership conflicts or design disagreements.
Software development author Joel Spolsky, the owner of a commercial DVCS, described distributed version control as "possibly the biggest advance in software development technology in the [past] ten years."
A disadvantage is that initial cloning of a repository is slower as compared to centralized checkout, because all branches and revision history are copied. This may be significant if access speed is slow and the repository size is large enough. For instance, the size of the cloned git repository (all history, branches, tags, etc.) for the Linux kernel is approximately the size of the checked-out uncompressed HEAD, whereas the equivalent checkout of a single branch in a centralized checkout would be the compressed size of the contents of HEAD (except without any history, branches, tags, etc.). Another problem with DVCS is the lack of locking mechanisms that is part of most centralized VCS and still plays an important role when it comes to non-mergable binary files such as graphic assets.
An "open system" of distributed revision control is characterized by its support for independent branches, and its reliance on merge operations. Its general characteristics include:
- Every working copy is effectively a fork.
- The system implements each branch as a working copy, with merges conducted by ordinary patch exchange, from branch to branch.
- Code forking therefore occurs more readily, where desired, because every working copy is a fork. (By the same token, undesirable forks are easier to mend because, if the dispute can be resolved, re-merging the code is easy.)
- It may be possible to "cherry-pick" single changes, selectively pulling them from peer to peer.
- New peers can freely join, without applying for access to a server.
For a list of distributed revision control systems, see the comparison of revision control software.
A replicated system of distributed revision control depends on a replicated database. A check-in is equivalent to a distributed commit. Successful commits create a single baseline, which reduces the need for merges. An example of a replicated distributed system is Code Co-op.
The distributed model is generally better suited for large projects with partly independent developers, such as the Linux kernel project, because developers can work independently and submit their changes for merge (or rejection). The distributed model flexibly allows adopting custom source code contribution workflows. The integrator workflow is the most widely used.
In the centralized model, developers must serialize their work, to avoid problems with different versions.
|This section requires expansion. (June 2008)|
Closed source DVCS systems such as Sun WorkShop TeamWare were widely used in enterprise settings in the 1990s and inspired BitKeeper (1998), one of the first open systems. BitKeeper went on to serve in the early development of the Linux kernel.
Some originally centralized systems are starting to offer distributed features. For example, Subversion is able to do many operations with no network. It may become more difficult to separate natively distributed vs centralized systems.
- Revision control
- List of revision control software
- Comparison of revision control software
- Category:Software using distributed revision control
- Repository clone
- Git, an open source DVCS developed for Linux Kernel development
- Mercurial, a cross-platform system similar to Git
- Fossil, a distributed version control system, bug tracking system and wiki software
- GNU Bazaar
- Concurrent Versions System, a predecessor of distributed version control systems
- TortoiseHg, a graphical interface for Mercurial
- Code Co-op, a peer-to-peer version control system
||This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (July 2008)|
- Wheeler, David. "Comments on Open Source Software / Free Software (OSS/FS) Software Configuration Management (SCM) Systems". Retrieved May 8, 2007.
- O'Sullivan, Bryan. "Distributed revision control with Mercurial". Retrieved July 13, 2007.
- "Workflows - Mercurial". Mercurial.selenic.com. 2012-08-19. Retrieved 2013-07-22.
- Spolsky, Joel (2010-03-17). "Distributed Version Control is here to stay, baby". Joel on Software. Retrieved 2010-06-18.
- "Bitmover ends free Bitkeeper, replacement sought for managing Linux kernel code". Wikinews. April 7, 2005.
- OSDir.com. "Subversion for CVS Users :: OSDir.com :: Open Source, Linux News & Software". OSDir.com. Retrieved 2013-07-22.
- Essay on various revision control systems, especially the section "Centralized vs. Decentralized SCM"
- Introduction to distributed version control systems - IBM Developer Works article