iWARP

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Internet Wide Area RDMA Protocol (iWARP) is a computer networking protocol for transferring data efficiently. It is sometimes referred to simply as "RDMA", though RDMA is not a feature exclusive to iWARP.

History[edit]

In 2007, the Internet Engineering Task Force (IETF) published the RDMA Consortium's RDMA over Transmission Control Protocol (TCP) standard.[1] iWARP is a superset of the Virtual Interface Architecture (VIA) that permits zero-copy transmission over TCP and SCTP. It is often compared to InfiniBand and RoCE, both also based on VIA, and distinguishes itself by being an implementation on top of IP networks (typically Ethernet).

Protocol[edit]

The main component in the iWARP protocol is the Direct Data Placement Protocol (DDP),[2] which permits the actual zero-copy transmission. DDP itself does not perform the transmission; the underlying protocol (TCP or SCTP) does.

However, TCP does not respect message boundaries; it sends data as a sequence of bytes without regard to protocol data units (PDU). In this regard, DDP itself may be better suited for SCTP, and indeed the IETF proposed a standard RDMA over SCTP.[3] To run DDP over TCP requires a tweak known as marker PDU aligned (MPA) framing[4] to guarantee boundaries of messages.

Furthermore, DDP is not intended to be accessed directly. Instead, a separate RDMA protocol (RDMAP) provides the services to read and write data. Therefore, the entire RDMA over TCP specification is really RDMAP over DDP over MPA over TCP. All of these protocols can be implemented in hardware.

Unlike IB, iWARP only has reliable connected communication as this is the only service that TCP and SCTP provide. The iWARP specification also omits many of the special features of IB, such as atomic remote operations.

Implementation[edit]

Because a kernel implementation of the TCP stack can be seen as a bottleneck, the protocol is typically implemented in hardware RDMA network interface controllers (rNICs). As simple data losses are rare in tightly coupled network environments, the error-correction mechanisms of TCP may be performed by software while the more frequently performed communications are handled strictly by logic embedded on the rNIC. Similarly, connections are often established entirely by software and then handed off to the hardware. Furthermore, the handling of iWARP specific protocol details is often isolated from the TCP implementation, allowing rNICs to be used for both as RDMA offload and TCP offload (in support of traditional sockets based TCP/IP applications). The portion of the hardware implementation used for implementing the TCP protocol is known as the TCP Offload Engine (TOE).

TOE itself does not prevent copying on the receive side, and must be combined with RDMA hardware for zero-copy results. The RDMA / TCP specification is a set of different wire protocols intended to be implemented in hardware (though it seems feasible to emulate it in software for compatibility but without the performance benefits).

Interfaces[edit]

iWARP is a protocol, not an implementation, but defines protocol behavior in terms of the operations that are legal for the protocol, known as Verbs. As such, iWARP does not have any single standard programming interface. However, programming interfaces tend to very closely correspond to the Verbs.

Several programmatic interfaces have been proposed, and implemented in different forms, including uDAPL, kDAPL, IT-API, RNICPI, and Open Fabrics Verbs. Similarly, several implementations of these interfaces are available for different platforms, including Windows and Linux.

Services available[edit]

Networking services implemented over iWARP include those offered in the OpenFabrics Enterprise Distribution (OFED) by the OpenFabrics Alliance for Linux operating systems, and the Winsock Direct protocol for Microsoft Windows.

See also[edit]

References[edit]

  1. ^ R. Recio et al. (October 2007). "A Remote Direct Memory Access Protocol Specification". RFC 5040. 
  2. ^ H. Shah et al. (October 2007). "Direct Data Placement over Reliable Transports". RFC 5041. 
  3. ^ C. Bestler et al. (October 2007). "Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation". RFC 5043. 
  4. ^ P. Culley et al. (October 2007). "Marker PDU Aligned Framing for TCP Specification". RFC 5044. 

External links[edit]