|This article needs additional citations for verification. (December 2011)|
Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices. InfiniBand host bus adapters and network switches are manufactured by Mellanox and Intel (which acquired QLogic's InfiniBand business in January 2012).
InfiniBand forms a superset of the Virtual Interface Architecture (VIA).
Like Fibre Channel, PCI Express, Serial ATA, and many other modern interconnects, InfiniBand offers point-to-point bidirectional serial links intended for the connection of processors with high-speed peripherals such as disks. On top of the point to point capabilities, InfiniBand also offers multicast operations. It supports several signaling rates and, as with PCI Express, links can be bonded together for additional throughput. The technology is promoted by the InfiniBand Trade Association.
An InfiniBand link is a serial link operating at one of five data rates: single data rate (SDR), double data rate (DDR), quad data rate (QDR), fourteen data rate (FDR), and enhanced data rate (EDR).
The SDR connection's signaling rate is 2.5 gigabit per second (Gbit/s) in each direction per connection. DDR is 5 Gbit/s and QDR is 10 Gbit/s. FDR is 14.0625 Gbit/s and EDR is 25.78125 Gbit/s per lane.
For SDR, DDR and QDR, links use 8b/10b encoding — every 10 bits sent carry 8 bits of data — making the effective data transmission rate four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s useful data, respectively. For FDR-10, FDR and EDR, links use 64b/66b encoding — every 66 bits sent carry 64 bits of data. (Neither of these calculations takes into account the additional physical layer overhead requirements for common characters or protocol requirements such as StartOfFrame and EndOfFrame). It should also be noted that FDR-10 is a non-standard IBTA data rate that is proprietary to Mellanox only.
Implementers can aggregate links in units of 4 or 12, called 4X or 12X. A 12X QDR link therefore carries 120 Gbit/s raw, or 96 Gbit/s of useful data. As of 2009[update] most systems use a 4X aggregate, implying a 10 Gbit/s (SDR), 20 Gbit/s (DDR) or 40 Gbit/s (QDR) connection. Larger systems with 12X links are typically used for cluster and supercomputer interconnects and for inter-switch connections
The InfiniBand future roadmap also has "HDR" (High Data Rate) with a signaling Rate of 50GBit/s per lane — expected for 2017 — and "NDR" (Next Data Rate), due "some time later". As of March 2014, data rates were not yet tied to specific speeds for NDR.
The single data rate switch chips have a latency of 200 nanoseconds, DDR switch chips have a latency of 140 nanoseconds and QDR switch chips have a latency of 100 nanoseconds. The end-to-end latency range spans from 1.07 microseconds MPI latency (Mellanox ConnectX QDR HCAs) to 1.29 microseconds MPI latency (Qlogic InfiniPath HCAs) to 2.6 microseconds (Mellanox InfiniHost DDR III HCAs). As of 2009[update] various InfiniBand host channel adapters (HCA) exist in the market, each with different latency and bandwidth characteristics. InfiniBand also provides RDMA capabilities for low CPU overhead. The latency for RDMA operations is less than 1 microsecond (Mellanox ConnectX HCAs).
InfiniBand uses a switched fabric topology, as opposed to early shared media Ethernet. All transmissions begin or end at a "channel adapter." Each processor contains a host channel adapter (HCA) and each peripheral has a target channel adapter (TCA). These adapters can also exchange information for security or quality of service (QoS).
InfiniBand transmits data in packets of up to 4 KB that are taken together to form a message. A message can be:
- a direct memory access read from or, write to, a remote node (RDMA)
- a channel send or receive
- a transaction-based operation (that can be reversed)
- a multicast transmission.
- an atomic operation
InfiniBand has been adopted in enterprise datacenters, for example Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud and Oracle SPARC SuperCluster, Teradata, financial sectors, IaaS cloud providers (such as OrionVM and ProfitBricks), cloud computing (an InfiniBand based system won the best of VMWorld for Cloud Computing), in scalable database systems like IBM DB2 pureScale and more. InfiniBand has been mostly used for high performance clustering computer cluster applications. A number of the TOP500 supercomputers have used InfiniBand including the former reigning fastest supercomputer, the IBM Roadrunner.
SGI, LSI, DDN, Netapp, Oracle, Nimbus Data, Rorke Data among others, have also released storage utilizing InfiniBand "target adapters". These products compete with architectures such as Fibre Channel, SCSI, and other more traditional connectivity-methods. Such target adapter-based discs can become a part of the fabric of a given network, in a fashion similar to DEC VMS clustering. The advantage to this configuration is lower latency and higher availability to nodes on the network (because of the fabric nature of the network). In 2009, the Oak Ridge National Laboratory Spider storage system used this type of InfiniBand attached storage to deliver over 240 gigabytes per second of bandwidth.
Military applications such as UAV, UUV, electronic warfare are taking this technology into the rugged application space to enhance capabilities. InfiniBand is used in high performance embedded computing systems such as RADAR, Sonar and SIGINT applications. Companies such as GE Intelligent Platforms Mercury Computer Systems produce military grade Single Board Computers that are InfiniBand capable.
Early InfiniBand used copper CX4 cable for SDR and DDR rates with 4x ports — also commonly used to connect SAS (Serial Attached SCSI) HBAs to external (SAS) disk arrays. With SAS, this is known as an SFF-8470 connector, and is referred to as an "InfiniBand-style" Connector. For 12x ports, SFF-8470 12x is used.
InfiniBand has no standard programming API within the specification. The standard only lists a set of "verbs" — functions that must exist. The syntax of these functions is left to the vendors. The de-facto standard has been the syntax developed by the OpenFabrics Alliance, which was adopted by most of the InfiniBand vendors, for GNU/Linux, FreeBSD, and MS Windows. The InfiniBand software stack developed by OpenFabrics Alliance is released as "OpenFabrics Enterprise Distribution (OFED)", under a choice of two licenses GPL2 or BSD license for GNU/Linux and FreeBSD, and as "WinOF" under a choice of BSD license for Windows.
InfiniBand originated from the 1999 merger of two competing designs:
- Future I/O, developed by Compaq, IBM, and Hewlett-Packard
- Next Generation I/O (ngio), developed by Intel, Microsoft, and Sun
InfiniBand was originally envisioned by the authors of its specification as a comprehensive "system area network" that would connect CPUs and provide all high speed I/O for "back-office" applications. In this role it would potentially replace just about every datacenter I/O standard including PCI, Fibre Channel, and various networks like Ethernet. Instead, all of the CPUs and peripherals would be connected into a single pan-datacenter switched InfiniBand fabric. This vision offered a number of advantages in addition to greater speed, not the least of which is that I/O workload would be largely lifted from computer and storage. In theory, this should make the construction of clusters much easier, and potentially less expensive, because more devices could be shared and they could be easily moved around as workloads shifted. Proponents of a less comprehensive vision saw InfiniBand as a pervasive, low latency, high bandwidth, low overhead interconnect for commercial datacenters, albeit one that might perhaps only connect servers and storage to each other, while leaving more local connections to other protocols and standards such as PCI.
As of 2009[update] InfiniBand has become a popular interconnect for high-performance computing, and its adoption as seen in the TOP500 supercomputers list is faster than Ethernet. In recent years[when?] InfiniBand has been increasingly adopted in enterprise datacenters.
In 2008 Oracle Corporation released its HP Oracle Database Machine build as a RAC Database (Real Application Clustered Database) with storage provided on its Exadata Storage server which utilises InfiniBand as the backend interconnect for all IO and Interconnect traffic. Updated versions of the Exadata Storage system, now using Sun computing hardware, continue to utilize InfiniBand infrastructure.
In 2009, IBM announced a December 2009 release date for their DB2 pureScale offering, a shared-disk clustering scheme (inspired by parallel sysplex for DB2 z/OS) that uses a cluster of IBM System p servers (POWER6/7) communicating with each other over an InfiniBand interconnect.
In 2010, scale-out network storage manufacturers increasingly adopt InfiniBand as primary cluster interconnect for modern NAS designs, like Isilon IQ or IBM SONAS. Since scale-out systems run distributed metadata operations without "master node", internal low latency communication is a critical success factor for highest scalability and performance.
In 2010, Oracle released Exadata, Exalogic and SPARC SuperCluster machines, those implement the InfiniBand QDR with 40 Gbit/s (32 Gbit/s effective) using Sun Switches (Sun Network QDR InfiniBand Gateway Switch). The InifiniBand fabric is used to connect compute nodes and those with the storage, and also to connect several Exadata and Exalogic machines.
Chinese researchers have reportedly developed an interconnect that "can handle data at about twice the speed of InfiniBand." This technology was used in the Tianhe-I supercomputer, which took first place in the TOP500 listing in October 2010.
- InfiniBand Trade Association (IBTA)
- RDMA over Converged Ethernet (RoCE)
- SCSI RDMA Protocol (SRP)
- iSCSI Extensions for RDMA (iSER)
- List of device bandwidths
- Optical interconnect
- Interconnect bottleneck
- Optical fiber cable
- Optical communication
- Parallel optical interface
- "Intel Snaps Up InfiniBand Technology, Product Line from QLogic". HPCwire. January 23, 2012. Retrieved 2012-01-27.
- "InfiniBand Roadmap: IBTA - InfiniBand Trade Association". InfiniBand Trade Association. March 1, 2014. Retrieved 2014-03-01.
- Lawson, Stephen (16 November 2009). "Two rival supercomputers duke it out for top spot". IDG.
- "Technology Integration: Spider". Oak Ridge Leadership Computing Facility. Retrieved March 15, 2014.
- "AXIS-Enabled Multiprocessing Hardware". GE Intelligent Platforms.
- Michael Stern (10 October 2012). "VITA 65, serial fabrics, and HPEC: Decisions, decisions". Military Embedded Systems.
- "InfiniBand Brochure" (PDF). TW: Carelink.
- Delivering Application Performance with Oracle’s InfiniBand Technology page 27 (29)
- Artis, Pat (5 December 2012). "Architectural Directions for Server I/O Subsystems" (PDF). Performance Associates. Retrieved August 2, 2013.
- "InfiniBand Use on the World's Fastest Computers Increases 28 Percent from One Year Ago". Infiniband TA. 18 November 2010. Retrieved August 2, 2013.
- "Interconnect Family Share Over Time". Top500. Retrieved 2009-06-03.
- "IBM Scale Out Network Attached Storage (SONAS)". IBM.
- "Mellanox Demos Souped-Up Version of Infiniband". CIO. 20 June 2011. Retrieved 1 August 2011.
- Vance, Ashlee (2010-10-28). "China Wrests Supercomputer Title From U.S.". New York Times.
- "It's a bumpy road to high-speed networking". EE Times. UBM Electronics. 2010-10-13.
- An Introduction to the InfiniBand Architecture, O’Reilly, 2002-02-04.
- The InfiniBand Trade Association.
- An InfiniBand Technology Overview, The InfiniBand Trade Association.
- Dissecting a Small InfiniBand Application Using the Verbs API (tutorial), Arxiv
- "Is InfiniBand poised for a comeback?", Infostor 10 (2).
- "InfiniBand edging into storage market", Infostor 10 (11).
- OpenFabrics Alliance.