Distributed computing
From Wikipedia, the free encyclopedia
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.[1]
Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one computer.[2]
Contents |
[edit] Background
The word distributed in terms such as "distributed computing", "distributed system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area.[3] The terms are nowadays used in a much wider sense, even when referring to autonomous processes that run on the same physical computer and interact with each other by message passing.[4]
While there is no single definition of a distributed system,[5] the following defining properties are commonly used:
- There are several autonomous processes, each of which has its own local memory.[6]
- The processes communicate with each other by message passing.[7]
The system may have a common goal, such as solving a large computational problem.[8] Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users.[9]
Other typical properties of distributed systems include the following:
- The system has to tolerate failures in individual computers.[10]
- The structure of the system (network topology, network latency, number of computers) is not known in advance, the system may consist of different kinds of computers and network links, and the system may change during the execution of a distributed program.[11]
- Each computer has only a limited, incomplete view of the system. Each computer may know only one part of the input.[12]
[edit] Concurrent, parallel, and distributed computing
The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them.[13] The same system may be characterised both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.[14] Parallel computing may be seen as a particular tightly-coupled form of distributed computing,[15] and distributed computing may be seen as a loosely-coupled form of parallel computing.[16] Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:
- In parallel computing, all processors have access to a shared memory. Shared memory can be used to exchange information between processors.
- In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.
The figure on the right illustrates the difference between distributed and parallel systems. Figure (a) is a schematic view of a typical distributed system; as usual, the system is represented as a graph in which each node is a computer and each edge is a communication link. Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links. Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.
[edit] Theoretical foundations
Even though parallel and distributed systems have a lot in common, traditionally these two fields have studied algorithms and computational complexity by using different models:
- In distributed computing, a standard model is a graph with one finite-state machine per node. A central complexity measure is the number of communication rounds required to complete the task.[17]
- In parallel computing, models such as Boolean circuits and parallel random access machines are used. These models give rise to complexity classes such as NC.
A Boolean circuit can be seen as a particular distributed system. Therefore these two models take seemingly opposite directions:
- In distributed computing, a distributed algorithm must solve the computational problem in any network topology.
- In parallel computing, the designer of a parallel algorithm can choose the network topology.
Nevertheless, many algorithms and results can be applied in both fields. An efficient distributed algorithm is usually also an efficient parallel algorithm. Conversely, many central results in distributed computing were originally presented as parallel algorithms; examples include the Cole–Vishkin algorithm for graph colouring.
There are also challenges that are specific to one of these fields. For example, the concept of speedup is central in parallel computing. Unique challenges in distributed computing include the following:
- Challenges that are related to fault-tolerance, for example, consensus problems, Byzantine fault tolerance, and self-stabilisation.[18]
- The asynchronous nature of a distributed system may necessitate clock synchronisation.
- Information can only be transferred hop-by-hop from one node to another in the communication network, and reaching a distant node may require several transmissions. Therefore the design of an efficient distributed algorithm must take the network topology and the shortest-path distances between the nodes into account.[19]
[edit] History
The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in 1960s.[20] The first wide-spread distributed systems were local-area networks such as Ethernet that was invented in 1970s.[21]
The study of distributed computing became its own branch of computer science in late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985.
[edit] Applications
Examples of distributed systems and applications of distributed computing include the following:[22]
- Telecommunications, e.g., telephone networks, cellular networks, and computer networks. Among others, distributed algorithms are used to route and broadcast messages in communication networks.
- Internet, world wide web and peer-to-peer networks.
- Distributed database systems and network file systems.
- Distributed information processing, e.g., banking systems and airline reservation systems.
- Scientific computing, including cluster computing and grid computing and various distributed computing projects.
- Real-time process control, e.g., aircraft control systems and industrial control systems.
- Wireless sensor networks.
[edit] Distributed computing projects
A variety of distributed computing projects have grown up in recent years. Many are run on a volunteer basis, and involve users donating their unused computational power to work on interesting computational problems. Examples of such projects include the Stanford University Chemistry Department Folding@home project, which is focused on simulations of protein folding to find disease cures and to understand biophysical systems; World Community Grid, an effort to create the world's largest public computing grid to tackle scientific research projects that benefit humanity, run and funded by IBM; SETI@home, which is focused on analyzing radio-telescope data to find evidence of intelligent signals from space, hosted by the Space Sciences Laboratory at the University of California, Berkeley (the Berkeley Open Infrastructure for Network Computing (BOINC), was originally developed to support this project); OurGrid, which is a free-to-join peer-to-peer grid provided by the idle resources of all participants; LHC@home, which is used to help design and tune the Large Hadron Collider, hosted by CERN in Geneva; and distributed.net, which is focused on finding optimal Golomb rulers and breaking various cryptographic ciphers.[23]
Distributed computing projects also often involve competition with other distributed systems. This competition may be for prestige, or it may be a matter of enticing users to donate processing power to a specific project. For example, stat races are a measure of the work a distributed computing project has been able to compute over the past day or week. This has been found to be so important in practice that virtually all distributed computing projects offer online statistical analyses of their performances, updated at least daily if not in real-time.
[edit] Architectures
Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system.
Distributed programming typically falls into one of several basic architectures or categories: Client-server, 3-tier architecture, N-tier architecture, Distributed objects, loose coupling, or tight coupling.
- Client-server — Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.
- 3-tier architecture — Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier.
- N-tier architecture — N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.
- Tightly coupled (clustered) — refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result.
- Peer-to-peer — an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers.
- Space based — refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved.
Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database.[24]
[edit] See also
- Software
- Category:Concurrent programming languages
- Distributed design patterns
- Protocol (computing)
- Fallacies of Distributed Computing
- Research
- List of important publications in computer science#Distributed computing
- Edsger W. Dijkstra Prize in Distributed Computing
- Category:Researchers in distributed computing
- List of distributed computing conferences
[edit] References
- Andrews, Gregory R. (2000), Foundations of Multithreaded, Parallel, and Distributed Programming, Addison-Wesley, ISBN 0-201-35752-6.
- Dolev, Shlomi (2000), Self-Stabilization, MIT Press, ISBN 0-262-04178-2.
- Ghosh, Sukumar (2007), Distributed Systems – An Algorithmic Approach, Chapman & Hall/CRC, ISBN 978-1-58488-564-1.
- Lynch, Nancy A. (1996), Distributed Algorithms, Morgan Kaufmann, ISBN 1-55860-348-4.
- Peleg, David (2000), Distributed Computing: A Locality-Sensitive Approach, SIAM, ISBN 0-89871-464-8, http://www.ec-securehost.com/SIAM/DT05.html.
- Anderson, David P. (May 2006). A million years of computing. http://boinc.berkeley.edu/talks/singapore_public.pdf. Retrieved on 2009-07-16.
- Godfrey, Bill (2002). "A primer on distributed computing". http://www.bacchae.co.uk/docs/dist.html.
[edit] Notes
- ^ Andrews (2000). Dolev (2000). Ghosh (2007), p. 10.
- ^ Godfrey (2002).
- ^ Lynch (1996), p. 1.
- ^ Andrews (2000), p. 291–292. Dolev (2000), p. 5.
- ^ Ghosh (2007), p. 10.
- ^ Andrews (2000), p. 8–9, 291. Dolev (2000), p. 5. Ghosh (2007), p. 3. Lynch (1996), p. xix, 1. Peleg (2000), p. xv.
- ^ Andrews (2000), p. 291. Ghosh (2007), p. 3. Peleg (2000), p. 4.
- ^ Ghosh (2007), p. 3–4. Peleg (2000), p. 1.
- ^ Ghosh (2007), p. 4. Peleg (2000), p. 2.
- ^ Ghosh (2007), p. 4, 8. Lynch (1996), p. 2–3. Peleg (2000), p. 4.
- ^ Lynch (1996), p. 2. Peleg (2000), p. 1.
- ^ Ghosh (2007), p. 7. Lynch (1996), p. xix, 2. Peleg (2000), p. 4.
- ^ Ghosh (2007), p. 10.
- ^ Lynch (1996), p. xix, 1–2. Peleg (2000), p. 1.
- ^ Peleg (2000), p. 1.
- ^ Ghosh (2007), p. 10.
- ^ Lynch (1996), p. 17–23.
- ^ Dolev (2000).
- ^ Peleg (2000).
- ^ Andrews (2000), p. 348.
- ^ Andrews (2000), p. 32.
- ^ Ghosh (2007), p. 4–6. Lynch (1996), p. xix, 1. Peleg (2000), p. xv.
- ^ Anderson (2005).
- ^ A database-centric virtual chemistry system, J Chem Inf Model. 2006 May-Jun;46(3):1034-9
[edit] Further reading
- Tel, Gerard (1994). Introduction to Distributed Algorithms. Cambridge University Press.
- Attiya, Hagit and Welch, Jennifer (2004). Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience. ISBN 0471453242.
[edit] External links
- Distributed computing at the Open Directory Project
- Distributed computing journals at the Open Directory Project
- MIT's Open Course – Distributed Algorithms
- Melbourne's Masters Course in Distributed Computing
|
||||||||||||||||||||||||||||||||

