Reliable multicast

A reliable multicast is any computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

Overview[edit]

Multicast is a network addressing method for the delivery of information to a group of destinations simultaneously using the most efficient strategy to deliver the messages over each link of the network only once, creating copies only when the links to the multiple destinations split (typically network switches and routers). However, like the User Datagram Protocol, multicast does not guarantee the delivery of a message stream. Messages may be dropped, delivered multiple times, or delivered out of order. A reliable multicast protocol adds the ability for receivers to detect lost and/or out-of-order messages and take corrective action (similar in principle to TCP), resulting in a gap-free, in-order message stream.

Reliability[edit]

The exact meaning of reliability depends on the specific protocol instance. A minimal definition of reliable multicast is eventual delivery of all the data to all the group members, without enforcing any particular delivery order.^[1] However, not all reliable multicast protocols ensure this level of reliability; many of them trade efficiency for reliability, in different ways. For example, while TCP makes the sender responsible for transmission reliability, multicast NAK-based protocols shift the responsibility to receivers: the sender never knows for sure that all the receivers have in fact received all the data.^[2] RFC- 2887 explores the design space for bulk data transfer, with a brief discussion on the various issues and some hints at the possible different meanings of reliable.

Reliable Group Data Delivery[edit]

Reliable Group Data Delivery (RGDD) is a form of multicasting where an object is to be moved from a single source to a fixed set of receivers known before transmission begins.^[3]^[4] A variety of applications may need such delivery: Hadoop Distributed File System (HDFS) replicates any chunk of data two additional times to specific servers, VM replication to multiple servers may be required for scale out of applications and data replication to multiple servers may be necessary for load balancing by allowing multiple servers to serve the same data from their local cached copies. Such delivery is frequent within datacenters due to plethora of servers communicating while running highly distributed applications.

RGDD may also occur across datacenters and is sometimes referred to as inter-datacenter Point to Multipoint (P2MP) Transfers.^[5] Such transfers deliver huge volumes of data from one datacenter to multiple datacenters for various applications: search engines distribute search index updates periodically (e.g. every 24 hours), social media applications push new content to many cache locations across the world (e.g. YouTube and Facebook), and backup services make several geographically dispersed copies for increased fault tolerance. To maximize bandwidth utilization and reduce completion times of bulk transfers, a variety of techniques have been proposed for selection of multicast forwarding trees.^[5]^[6]

Virtual synchrony[edit]

Modern systems like the Spread Toolkit, Quicksilver, and Corosync can achieve data rates of 10,000 multicasts per second or more, and can scale to large networks with huge numbers of groups or processes.

Most distributed computing platforms support one or more of these models. For example, the widely supported object-oriented CORBA platforms all support transactions and some CORBA products support transactional replication in the one-copy-serializability model. The "CORBA Fault Tolerant Objects standard" is based on the virtual synchrony model. Virtual synchrony was also used in developing the New York Stock Exchange fault-tolerance architecture, the French Air Traffic Control System, the US Navy AEGIS system, IBM's Business Process replication architecture for WebSphere and Microsoft's Windows Clustering architecture for Windows Longhorn enterprise servers.^[7]

Systems that support virtual synchrony[edit]

Virtual synchrony was first supported by the Cornell University and was called the "Isis Toolkit".^[8] Cornell's most current version, Vsync was released in 2013 under the name Isis2 (the name was changed from Isis2 to Vsync in 2015 in the wake of a terrorist attack in Paris by an extremist organization called ISIS), with periodic updates and revisions since that time. The most current stable release is V2.2.2020; it was released on November 14, 2015; the V2.2.2048 release is currently available in Beta form.^[9] Vsync aims at the massive data centers that support cloud computing.

Other such systems include the Horus system^[10] the Transis system, the Totem system, an IBM system called Phoenix, a distributed security key management system called Rampart, the "Ensemble system",^[11] the Quicksilver system, "The OpenAIS project",^[12] its derivative the Corosync Cluster Engine and a number of products (including the IBM and Microsoft ones mentioned earlier).

Other existing or proposed protocols[edit]

Pragmatic General Multicast (PGM)
Tibco Software's TRDP (part of RV). Note: when Tibco acquired Talarian, they inherited a PGM implementation with SmartSockets (SmartPGM). TRDP pre-dates the development of SmartPGM
OpenDDS as an open source implementation since their 0.12 release
Scalable Reliable Multicast (SRM)
Reliable Multicast Transport To Large Groups, Jorg Nonnenmacher, EPFL Thesis 1832^[13]
QuickSilver Scalable Multicast (QSM)
SMART Multicast (Secure Multicast for Advanced Repeating of Television)
Reliable Stream Protocol^[14] (RSP), a high-performance open source protocol for compute clusters
TIPC Communication Groups

Library support[edit]

JGroups (Java API): popular project/implementation
Spread: C/C++ API, Java API
RMF (C# API)
hmbdc open source (headers only) C++ middleware, ultra-low latency/high throughput, scalable and reliable inter-thread, IPC and network messaging

References[edit]

^ Floyd, S.; Jacobson, V.; Liu, C. -G.; McCanne, S.; Zhang, L. (December 1997). "A reliable multicast framework for light-weight sessions and application level framing". IEEE/ACM Transactions on Networking. 5 (6): 784–803. doi:10.1109/90.650139. S2CID 221634489.
^ Diot, C.; Dabbous, W.; Crowcroft, J. (April 1997). "Multipoint communication: A survey of protocols, functions, and mechanisms" (PDF). IEEE Journal on Selected Areas in Communications. 15 (3): 277–290. doi:10.1109/49.564128.
^ C. Guo; et al. (November 1, 2012). "Datacast: A Scalable and Efficient Reliable Group Data Delivery Service For Data Centers". ACM. Retrieved July 26, 2017.
^ T. Zhu; et al. (Oct 18, 2016). "MCTCP: Congestion-aware and robust multicast TCP in Software-Defined networks". 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS). IEEE. pp. 1–10. doi:10.1109/IWQoS.2016.7590433. ISBN 978-1-5090-2634-0. S2CID 28159768.
^ ^a ^b M. Noormohammadpour; et al. (July 10, 2017). "DCCast: Efficient Point to Multipoint Transfers Across Datacenters". USENIX. Retrieved July 26, 2017.
^ M. Noormohammadpour; et al. (2018). "QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts". Retrieved January 23, 2018.
^ K. P. Birman (July 1999). "A Review of Experiences with Reliable Multicast". Software: Practice and Experience. 29 (9): 741–774. doi:10.1002/(SICI)1097-024X(19990725)29:9<741::AID-SPE259>3.0.CO;2-I. hdl:1813/7380.
^ "Isis Toolkit"
^ "Vsync Cloud Computing Library".
^ "Horus system"
^ "Ensemble system"
^ "The OpenAIS project"
^ https://infoscience.epfl.ch/record/32309
^ RSP; info needed.