RTP-MIDI (also known as AppleMIDI) is a protocol to transport MIDI messages within RTP (Real-time Protocol) packets over Ethernet and WiFi networks. It is completely open and free (no license is needed), and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets (with automatic regeneration of lost data). RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.
- 1 History of RTP-MIDI
- 2 Implementations
- 3 Protocol
- 4 Apple's session protocol
- 5 Latency
- 6 Configuration
- 7 Companies/Projects using RTP-MIDI
- 8 References
History of RTP-MIDI
In 2004, John Lazzaro and John Wawrzynek, from UC Berkeley, made a presentation in front of AES named "An RTP payload for MIDI". In 2006, the document was submitted to IETF and received the number RFC 4695. In parallel, another document was released by Lazzaro and Wawrzynek to give details about practical implementation of the RTP-MIDI protocol, especially the journalling mechanism.
The MMA (MIDI Manufacturers Association) has created a page on its website in order to provide basic informations related to RTP-MIDI protocol. A central information website (aiming end users and implementors) has also been created in order to answer questions related to RTP-MIDI.
Apple Computer introduced RTP-MIDI as a part of their operating system, Mac OS X v10.4, in 2005. The RTP-MIDI driver is reached using the Network icon in the MIDI/Audio Configuration tool (in the Utilities folder). Apple's implementation strictly follows the RFC 4695 for RTP payload and journalling system, but uses a dedicated session management protocol (they do not follow the RFC 4695 session management proposal). This protocol is displayed in Wireshark as "AppleMIDI" (see below).
Apple also created a dedicated class in their mDNS implementations (known as "Bonjour"). Devices which comply with this class appear automatically in Apple's RTP-MIDI configuration panel as the Participants directory, making the Apple MIDI system fully 'Plug & Play'. However, it's possible to manually enter IP addresses and ports in this directory to connect to devices which do not support Bonjour.
Apple also introduced RTP-MIDI support in iOS4, but such devices cannot be session initiators.
The RTP-MIDI driver from Apple creates virtual MIDI ports (named "Sessions") which are available as MIDI ports in any software (like sequencers or software instruments) using CoreMIDI, where they appear as a pair of MIDI IN / MIDI OUT ports (like any other MIDI 1.0 port or USB MIDI ports).
In 2006, the Dutch company Kiss-Box presented a first embedded implementation of RTP-MIDI, in different products like MIDI or LTC interfaces. These devices comply with AppleMIDI implementation (using the same session management protocol), in order to be compatible with the other devices and operating system using this protocol.
A proprietary driver was initially developed by this company for Windows XP, but it was restricted to the communication with their devices (it was not possible to connect a PC with a Mac computer using this driver). The support of this driver was dropped in 2012 in favor of the standard approach when rtpMIDI driver for Windows became available.
Kiss-Box announced released in 2012 a new generation of CPU boards (named "V3") which support the session initiator functionnalities (these models are able to establish sessions with other RTP-MIDI devices without requiring a computer as a control point).
During NAMM2013, the Canadian company iConnectivity presented a new interface named iConnectivity4+ which supports RTP-MIDI and allows direct bridging between USB and RTP-MIDI devices.
Tobias Erichsen released in 2010 a Windows implementation of Apple's RTP-MIDI driver. This driver works under XP, Vista, Windows 7, and Windows 8, 32 and 64 bit versions. The driver uses a configuration panel very similar to the Apple's one, and is fully compliant with Apple's implementation. It can then be used to connect a Windows machine with a Macintosh computer, but also embedded systems. As with Apple's driver, the Windows driver creates virtual MIDI ports, which become visible from any MIDI application running on the PC (access is done through mmsystem layer, like all other MIDI ports).
RTP-MIDI support for Linux has been reactivated after an idle period in February 2013. Availability of drivers have been announced on some forums, based on the original work of Nicolas Falquet and Dominique Fober,. A specific implementation for Raspberry PI computer is also being developed, based on the MIDIKit open source project. A full implementation of RTP-MIDI (including the journalling system) is available within the Ubuntu distribution, in the Scenic software package.
Apple added full CoreMIDI support in their iOS devices in 2010, allowing the development of MIDI applications for iPhone, iPad and iPods. MIDI became then available from the docking port in form of a USB controller (to allow connection of USB MIDI devices using the "Apple Camera Kit"), but it was also available in form of a RTP-MIDI session listener over WiFi. iOS devices do not support session initiation functionnalities, which requires the use of an external session initiator on the network to open a RTP-MIDI session with the iPad. This session initiator can be a Mac computer or a Windows computer with the RTP-MIDI driver activated, or an embedded RTP-MIDI device. The RTP-MIDI session appears under the name "Network MIDI" to all CoreMIDI applications on iOS, and no specific development is required to add RTP-MIDI support in the iOS application (the MIDI port is virtualized by CoreMIDI, so the programmer just need to open a MIDI connection. He does not need to care if the port is connected to USB or RTP-MIDI)
It shall be noticed that some complaints arose about the use of the MIDI over USB with iOS devices, since the iPad/iPhone must provide power supply to the external device. Some USB MIDI adapters are drawing too much current for the iPad, which limits the current and blocks the startup of the device (which does not appear as available to the application). This problem is avoided by the use of RTP-MIDI.
Cross-platform Java implementations of RTP-MIDI are possible, particularly 'nmj' library.
The WinRTP-MIDI project  is an open-source implementation of RTP-MIDI protocol stack under Windows RT. The code was initially designed to be portable between the various versions of Windows, but the last version has been optimized for WinRT, in order to simplify the design of applications for Windows Store.
RTP-MIDI has been included in Arduino open platform in November 2013 (under the name "AppleMIDI library"). The software module can run either on Arduino modules with integrated Ethernet adapter (like the Intel Galileo) or run on the "Ethernet shield".
Another solution for Arduino has been presented in an Application Note published by KissBox for their RTP-MIDI OEM module. This solution is similar to the one used by the MIDIBox community (external communication processor board, connected over a fast SPI link). This solution allows to use Arduino boards with very limited amount of RAM, like the Arduino Uno, and frees completely the microprocessor for user's tasks, since the external module implements the whole communication stack and act as buffer between the network and the Arduino.
In December 2013, two members of the MIDIBox DIY (Do It Yourself) group started to work on a first software version of MIOS (MIDIBox Operating System) including RTP-MIDI support over a fast SPI link. In order to simplify integration, it was decided to use an external network processor board handling the whole protocol stack. A first beta version was released in the second week of January 2014. The first official software has been released during first week of March 2014.
The protocol used on the SPI link between the MIOS processor and the network processor is based on the same format as USB (using 32 bits words containing a complete MIDI message) and has been proposed as an open standard for communication between network processor modules and MIDI application boards.
The Axoloti is an open-source hardware synthesizer based on a STM32F427 ARM processor. This synthesizer is fully programmable using a virtual patch concept, similar to Max/MSP, and includes a full MIDI support. A node.js extension has been developed to allow RTP-MIDI connection of an Axoloti with any RTP-MIDI devices. The Axoloti hardware can also be equipped with a RTP-MIDI external coprocessor, connected via the SPI bus available on the expansion port of the Axoloti core. The approach is the same as the one described for Arduino and MIDIBox.
Since RTP-MIDI is based on UDP/IP, any application can implement the protocol directly, without needing any driver. The drivers are needed only when users want to make the networked MIDI ports appear as a standard MIDI port. For example, some Max/MSP objects and VST plugins have been developed following this methodology.
RTP-MIDI over AVB
AVB is a set of technical standards which define specifications for extremely low latency streaming services over Ethernet networks. AVB networks are able to provide latencies down to one audio sample across a complete network.
RTP-MIDI is natively compatible with AVB networks (like any other IP protocol), since AVB switches (also known as "IEEE802.1 switches") manage automatically the priority between real-time audio/video streams and IP traffic (IP traffic is assigned a lower priority to avoid any disturbance on the streams)
RTP-MIDI protocol can also use the real-time capabilities of AVB if the device implements the RTCP payload described in IEEE-1733 document. The RTP-MIDI applications can then correlate the "presentation" timestamp (provided by IEEE-802.1 Master Clock) with the RTP timestamp, ensuring a sample-accurate time distribution of the MIDI events.
RFC 4695/RFC 6295 split the RTP-MIDI implementation in different parts. The only mandatory one (which defines compliance to RTP-MIDI specification) is the payload format. The journalling part is optional (but RTP-MIDI packets shall indicate that they have an empty journal, so the journal is always present in the RTP-MIDI packet, even if it is empty) The session initiation/management part is purely informational (and was not used by Apple, who created its own session management protocol)
|RTP||0||V||P||X||CC||M||Payload type (PT)||Sequence number|
|64||Synchronization source (SSRC) identifier|
|96||Contributing source (CSRC) identifiers (optional)|
|MIDI commands||…||B||J||Z||P||LEN…||MIDI messages list…|
|Journal (optional depending on J flag)||…||S||Y||A||H||TOTCHAN||Checkpoint Packet Seqnum||System journal (optional)…|
RTP-MIDI sessions are in charge of creating a virtual path between two RTP-MIDI devices, and they appear as a MIDI IN / MIDI OUT pair from the application point of view. RFC 6295 proposes to use SIP (Session Initiation Protocol) and SDP (Session Description Protocol), but Apple decided to create its own session management protocol. Apple's protocol allows to link the sessions with names used on Bonjour, and also offers clock synchronization service.
A given session is always created between two, and only two participants (each session being used to detect potential message loss between the two participants). However, a given session controller can open multiple sessions in parallel, which allows in return the capabilities described hereafter (splitting / merging / distributed patchbay). On the diagram given here, device 1 has therefore two sessions being opened at the same time (one with device 2 and another one with device 3), but the two sessions in device 1 appear as the same virtual MIDI interface to the final user.
Sessions vs. endpoints
A common mistake is the mismatch between RTP-MIDI endpoints and RTP-MIDI sessions, since they both represent a pair of MIDI IN / MIDI OUT ports.
An endpoint is used to exchange MIDI data between the element (software and/or hardware) in charge of decoding the RTP-MIDI transport protocol and the element using the MIDI messages. In other terms, only MIDI data are visible at endpoint level. For devices with MIDI 1.0 DIN connectors, there is one endpoint per connector pair (e.g.: 2 endpoints for KissBox MIDI2TR, 4 endpoints for iConnectivity4+, etc...). Devices using other communication links (like SPI or USB) offer more endpoints (for example, a device using the 32 bits encoding of USB MIDI Class can represent up to 16 endpoints using the Cable Identifier field). An endpoint is represented on RTP-MIDI side by a pair UDP port when AppleMIDI session protocol is used.
A session defines connection between two endpoints (MIDI IN of one endpoint being then connected to the MIDI OUT of the remote endpoint, and vice versa). A single endpoint can accept multiple sessions, depending on the software configuration. Each session for a given endpoint appears as a singleton for the remote session handler (a remote session handler does not know if the endpoint it is connected to is being used by other sessions at the same time). If multiple sessions are active for a given endpoint, the different MIDI streams reaching the endpoint are merged before the MIDI data are sent to the application. In the other direction, MIDI data produced by an application are sent to all sessions handler which are connected to this endpoint.
AppleMIDI session participants
AppleMIDI implementation defines two kind of session controllers: session initiators and session listeners. Session initiators are in charge of inviting the session listeners, and are responsible of the clock synchronization sequence. Session initiators can generally be session listeners, but some devices (e.g.: iOS devices) can be session listeners only.
RTP-MIDI devices are able to merge different MIDI streams without needing any specific component (MIDI 1.0 devices require "MIDI mergers"). As it can be seen on the diagram, when a session controller is connected to two or more remote sessions, it merges automatically the MIDI streams coming from the distant devices, without requiring any specific configuration.
MIDI splitting ("MIDI THRU")
RTP-MIDI devices are able to duplicate MIDI streams from one session to any number of remote sessions without requiring any "MIDI THRU" support device. When an RTP-MIDI session is connected to two or more remote sessions, all the remote sessions receive a copy of the MIDI data sent from the source.
Distributed patchbay concept
RTP-MIDI sessions are also able to provide intrinsically the "patchbay" feature, which was requiring a separate hardware device with MIDI 1.0 connections. A MIDI 1.0 patchbay is a hardware device which allows dynamic connections between a set of MIDI inputs and a set of MIDI outputs, most of the time in the form of a matrix. The concept of "dynamic" connection is made in opposition to the classical use of MIDI 1.0 lines where cables were connected "statically" between two devices. Rather than establishing the data path between devices in form of a cable, the patchbay becomes a central point where all MIDI devices are connected. The software in the MIDI patchbay is configured to define then which MIDI input goes to which MIDI output, and the user can change this configuration at any moment, without needing to disconnect the MIDI DIN cables.
The "patchbay" hardware modules are not needed anymore with RTP-MIDI, thanks to the session concept. The sessions are, by definition, virtual paths established over the network between two MIDI ports. No specific software is then needed to perform the patchbay functions since the configuration process is precisely defining the destinations for each MIDI stream produced by a given MIDI device. It is then possible to change at any time these virtual paths just by changing the destination IP addresses used by each session initiator. The "patch" configuration formed in this way can stored in non-volatile memory, to allow the patch to reform automatically when the setup is powered, but they can also be changed directly (like with the RTP-MIDI Manager software tool or with the RTP-MIDI drivers control panels) at RAM level.
The "distributed patchbay" term comes from the fact that the different RTP-MIDI devices can distributed geographically all over the complete MIDI setup, while MIDI 1.0 patchbay were forcing the different MIDI devices to be physically located directly around the patchbay device itself.
Apple's session protocol
RFC6295 document proposes to use SDP (Session Description Protocol) and SIP (Session Initiation Protocol) protocols in order to establish and manage sessions between RTP-MIDI partner. These two protocols are however quite heavy to implement especially on small systems, especially since they do not constrain any of the parameters enumerated in the session descriptor (like sampling frequency, which defines in turn all fields related to timing data both in RTP headers and RTP-MIDI payload). Moreover, RFC6295 document only suggests to use these protocols, allowing any other protocol to be used, leading to potential incompatibilities between suppliers.
Apple has then decided to create their own protocol, imposing all parameters related to synchronization like the sampling frequency. This session protocol is called "AppleMIDI" in Wireshark software. Session management with AppleMIDI protocol requires two UDP ports, the first one is called "Control Port", the second one is called "Data Port". When used within a multithread implementation, only the Data port requires a "real-time" thread, the other port can be controlled by a normal priority thread. These two ports must be located at two consecutive location (n / n+1), the first one can be any of the 65536 possible ports.
There is no constraint about the number of sessions which can be opened simultaneously on the set of UDP ports with AppleMIDI protocol. It is then possible to either create one port group per session manager, or use only one group for multiple sessions (which limits the memory footprint in the system). In this last case, the IP stack provides resources to identify partners from their IP address and ports numbers (this functionality is called "socket reuse" and is available in most modern IP implementations)
All AppleMIDI protocol messages use a common structure of 4 words of 32 bits, with a header containing two bytes with value 255, followed by two bytes describing the meaning of the message:
|Description||Wireshark header definition||Field value (hex)||Field value (chars)|
These messages control a state machine related to each session (for example, this state machine forbids any MIDI data exchange until a session reaches the "opened" state)
Opening a session starts with an invitation sequence. The first session partner (the "Session Initiator") sends an IN message to the control port of the second partner. This one answers by sending an OK message if it accepts to open the session, or by a NO message if it does not accept the invitation. If invitation is accepted on control port, the same sequence is repeated on data port. Once invitations have been accepted on both ports, the state machine goes into synchronization phase.
The synchronization sequence allows both session participants to share informations related to their local clocks. This phase allows to compensate the latency induced by the network, and also permits to support the "future timestamping" (see "Latency" chapter below)
The session initiator sends a first message (named CK0) to the remote partner, giving its local time on 64 bits (Note that this is not an absolute time, but a time related to a local reference, generally given in microseconds since the startup of operating system kernel). This time is expressed on 10 kHz sampling clock basis (100 microseconds per increment) The remote partner must answer to this message with a CK1 message, containing its own local time on 64 bits. Both partners then know the difference between their respective clocks and can determine the offset to apply to Timestamp and Deltatime fields in RTP-MIDI protocol. The session initiator finishes this sequence by sending a last message called CK2, containing the local time when it received the CK1 message. This technique allows to compute the average latency of the network, and also to compensate a potential delay introduced by a slow starting thread (this situation can occur with non-realtime operating systems like Linux, Windows or OS X)
Apple recommends to repeat this sequence a few times just after opening the session, in order to get better synchronization accuracy (in case of one of the sequence has been delayed accidentally because of a temporary network overload or a latency peak in a thread activation)
This sequence must repeat cyclically (between 2 and 6 times per minute typically), and always by the session initiator, in order to maintain long term synchronization accuracy by compensation of local clock drift, and also to detect a loss of communication partner. A partner not answering to multiple CK0 messages shall consider that the remote partner is disconnected. In most cases, session initiators switch their state machine into "Invitation" state in order to re-establish communication automatically as soon as the distant partner reconnects to the network. Some implementations (especially on personal computers) display also an alert message and offer to the user to choose between a new connection attempt or closing the session.
The journalling mechanism permits to detect MIDI messages loss and allows the receiver to generate missing data without needing any retransmission. The journal keeps in memory "MIDI images" for the different session partners at different moments. However, it is useless to keep in memory the journalling data corresponding to events received correctly by a session partner. Each partner then sends cyclically to the other partner the RS message, indicating the last sequence number received correctly (in other terms, without any gap between two sequence numbers). The sender can then free the memory containing old journalling data if necessary.
Disconnection of session's partner
A session partner can ask at any moment to leave a session (which will close the session in return). This is done using the BY message. When a session partner receives this message, it closes immediately the session with the remote partner which sent the message, and it frees all resources allocated to this session. It must be noted that this message can be sent by the session initiator or by the session listener ("invited" partner).
The most common concern about RTP-MIDI is related to latency issues (which is basically the most common discussion topic related to all modern Digital Audio Workstations), mainly because it uses the IP stack. It can however easily be shown that a correctly programmed RTP-MIDI application or driver does not exhibit more latency than other communication methods.
Moreover, RTP-MIDI as described in RFC 6295 contains a latency compensation mechanism (a similar mechanism is found in most plugins, which can inform the host of the latency they add on the processing path. The host can then send samples to the plugin in advance, so the samples are ready and sent synchronously with other audio streams). The compensation mechanism described in RF6295 uses a relative timestamp system, based on the MIDI deltatime (as described in ) Each MIDI event transported in the RTP payload has a leading deltatime value, related to the current payload time origin (defined by the Timestamp field in RTP header).
Each MIDI event in the RTP-MIDI payload can then be strictly synchronized with the global clock. The synchronization accuracy directly depends on the clock source defined when opening the RTP-MIDI session. RFC 6295 gives some examples based on an audio sampling clock, in order to get a sample accurate timestamping of MIDI events. Apple's RTP-MIDI implementation (as all other related implementations like rtpMIDI driver for Windows or KissBox embedded systems) use a fixed clock rate of 10 kHz rather than a sampling audio rate. The timing accuracy of all MIDI events is then 100 microseconds for these implementations. Sender and receiver clocks are synchronized when the session is initiated, and they are kept synchronized during the whole session period by the regular synchronization cycles, controlled by the session initiators. This mechanism has the capability to compensate any latency, from a few hundreds of microseconds (as seen on LAN applications) to seconds (being able then to compensate the latency introduced by the Web for example, allowing real-time execution of music piece over the Internet)
This mechanism is however mainly designed for pre-recorded MIDI streams, like the one coming from a sequencer track. When RTP-MIDI is used for real-time applications (e.g. controlling devices from a RTP-MIDI compatible keyboard ), deltatime is mostly set to the specific value of 0, which means that the related MIDI event shall be interpreted as soon as it is received). With such usecase, the latency compensation mechanism described previously can not be used.
The latency which can be obtained is then directly related to the different networking components involved in the communication path between the RTP-MIDI devices:
- MIDI application processing time
- IP communication stack processing time
- Network switches/routers packet forwarding time
Application processing time
Application processing time is generally tightly controlled, since MIDI tasks are most often real-time tasks. In most cases, the latency comes directly from the thread latency which can be obtained on a given operating system (typically 1-2 ms max on Windows and Mac OS systems. Systems with real-time kernel can achieve much better results, down to 100 microseconds). This time can be considered as constant whatever the communication support (MIDI 1.0, USB, RTP-MIDI, etc...), since the processing threads are operating on a different level than the communication related threads/tasks.
IP stack processing time
IP stack processing time is the most critical one, since the communication process goes under operating system control. This applies to any communication protocol (IP related or not), since most operating systems (including Windows, Mac OS or Linux) do not allow direct access to the Ethernet adapter. In particular, a common mistake is to conflate "raw sockets" with "direct access to network" (sockets being the entry point to send and receive data over network in most operating systems). A "raw socket" is a socket which allows an application to send any packet using any protocol (the application is then responsible to build the telegram following given protocol rules), while "direct access" would require system-level access which is restricted to the operating system kernel. A packet sent using a raw socket can then be delayed by the operating system if the network adapter is currently being used by another application (thus, an IP packet can perfectly be sent to the network before a packet related to a raw socket). Technically speaking, access to a given network card is controlled by "semaphores".
IP stacks need to correlate Ethernet addresses (MAC address) and IP addresses, using a specific protocol named ARP. When a RTP-MIDI application wants to send a packet to a remote device, it must locate it first on the network (since Ethernet does not understand IP related concepts), in order to create the transmission path between the routers/switches. This is done automatically by the IP stack by sending first an ARP request (Address Recognition Protocol). When the destination device recognizes its own IP address in the ARP packet, it sends back an ARP reply with its MAC address. The IP stack can then send the RTP-MIDI packet. The next RTP-MIDI packets do not need the ARP sequence anymore (apart if the link becomes inactive for a few minutes, which clears the ARP entry in the sender's routing table)
This ARP sequence can take a few seconds, which can in turn introduce noticeable latency, at least for the first RTP-MIDI packet. However, Apple's implementation solved this issue in an elegant manner, using the session control protocol. The session protocol uses the same ports than the RTP-MIDI protocol itself. The ARP sequence then takes places during the session initiation sequence. When the RTP-MIDI application wants to send the first RTP-MIDI packet, the computer's routing tables are already initialized with the correct destination MAC addresses, which avoids any latency for the first packet.
Beside the ARP sequence, the IP stack itself requires computations to prepare the packets headers (IP header, UDP header and RTP header). With modern processors, this preparation is extremely fast and takes only a few microseconds, which is negligible compared to the application latency itself. As described before, once prepared, a RTP-MIDI packet can only be delayed when it tries to reach the network adapter if the adapter is already being transmitting another packet (whatever the socket is an IP one or a "raw" one). However, the latency introduced at this level is generally extremely low since the driver threads in charge of the network adapters have very high priority. Moreover, most network adapters have FIFO buffers at hardware level, so the packets can be stored for immediate transmission in the network adapter itself without needing the driver thread to be executed first. The best solution to keep the latency related to "adapter access competition" as low as possible is to reserve the network adapter to MIDI communication only and use a different network adapter for other network usages (like file sharing or Internet browsing)
Network components routing time
The different components used to transmit Ethernet packets between the computers (whatever the protocols being used) are introducing latency too. All modern network switches use the "store and forward" technology, in which packets are stored in the switch before they are sent to the next switch. However, the switching times are most often negligible (a 64 bytes packet on 100Mbit/s network takes around 5.1 microseconds to be forwarded by each network switch. A complex network with 10 switches on a given path introduces then a latency of 51 microseconds)
The latency is however directly related to the network load itself (since the switches will delay a packet until the previous one is transmitted). The computation/measure of the real latency introduced by the network components can be a hard task, and shall involve representative usecase (for example, measuring the latency between two networked devices connected to the same network switch will always give excellent results). As said in the previous section, the best solution to limit the latency introduced by the network components is to use separate networks. However, this is far less critical for network components than for network adapters in computers.
Expected latency for real-time applications
As it can be seen, the exact latency obtained for RTP-MIDI link depends on many parameters, most of them being related to the operating systems themselves (this also apply for any kind of network communication). Measurements made by the different RTP-MIDI actors give latency times from a few hundreds of microseconds for embedded systems using real-time operating systems, up to 3 milliseconds when computers running general purpose operating systems (Windows, Mac OS, Linux) are involved.
Latency enhancement (sub millisecond latency)
The AES started a working group named SC-02-12H in 2010 in order to demonstrate the capability of using RTP payloads in IP networks for very low latency applications. The draft proposal issued by the group in May 2013 demonstrate that it is possible to achieve RTP streaming for live applications, with a latency value as low as 125 microseconds. This proves that arguments used against RTP-MIDI because it uses IP stack are not valid, since all required standard solutions exist in order to reach very low latency for RTP payloads on LAN.
The other most common concern related to RTP-MIDI is the configuration process, since the physical connection of a device to a network is not enough to ensure communication with another device. Since RTP-MIDI is based on IP protocol stack, the different layers involved in the communication process must be configured (IP address and UDP ports configuration). In order to simplify this configuration, different solutions have been proposed, the most common being "Zero Configuration" set of technologies, also known as Zeroconf.
RFC 3927  describes a common method to automatically assign IP addresses, which is used by most RTP-MIDI compatible products. Once connected to the IP network, such a device becomes able to assign itself an IP address (with IP address conflict automatic resolution). If the device follows port assignation recommendation from RTP specification, the device becomes "Plug&Play" from the network point of view. It is then possible to create completely a RTP-MIDI network without needing to define any IP address and/or UDP port numbers. It must be noticed however that these methods are generally reserved to small setups. Complete automation of the network configuration is generally avoided on big setups, since the localization of faulty devices can become complex, because there will be no direct relationship between the IP address which has been selected by the Zeroconf system and the physical location of the device. A minimum configuration would be then to assign a name to the device before connecting it to the network, which voids the "true Plug&Play" concept in that case.
One must note that the "Zero Configuration" concept is restricted to network communication layers. It is technically impossible to perform the complete installation of any networked device (related to MIDI or not) just by abstracting the addressing layer. A practical usecase which illustrates this limitation is a RTP-MIDI sound generator that has to be controlled from a MIDI master keyboard connected to a RTP-MIDI interface. Even if the sound generator and the MIDI interface integrate the "Zero Configuration" services, they are unable to know by themselves that they need to establish a session together, because the IP configuration services are acting at a different levels. Any networked MIDI system, whatever the protocol used to exchange MIDI data (based on IP or not) then requires the mandatory use of a configuration tool to define the exchanges that have to take place between the devices after they have been connected to the network. This configuration tool can be an external management tool running on a computer, or be embedded in the application software of a device in form of a configuration menu if the device integrates a Human-Machine Interface.
Companies/Projects using RTP-MIDI
- Apple Computer (RTP-MIDI driver integrated in Mac OS X and iOS for the whole range of products) - RTP-MIDI over Ethernet and WiFi
- Yamaha (Motif synthesizers, UD-WL101 adapter) - RTP-MIDI over Ethernet and WiFi
- Behringer (X-Touch Control Surface)
- KissBox (RTP-MIDI interfaces with MIDI 1.0, LTC, I/O and ArtNet, VST plugins for hardware synthesizer remote control)
- Tobias Erichsen Consulting (Free RTP-MIDI driver for Windows / Utilities)
- GRAME (Linux driver)
- HRS (MIDI Timecode distribution on Ethernet / Synchronization software)
- iConnectivity (MIDI interface with USB and RTP-MIDI support)
- Merging Technologies (Horus, Pyramix, Ovation) - RTP-MIDI for LTC/MTC and MicPre control
- Zivix PUC (Wireless RTP-MIDI interface for iOS devices)
- Cinara (MIDI interface with USB and RTP-MIDI support)
- BEB (DSP modules for modular synthesizers based on RTP-MIDI backbone)
- Axoloti (Hardware open-source synthesizer with RTP-MIDI connectivity)
- An RTP Payload format for MIDI. The 117th Convention of the Audio Engineering Society, October 28-31, 2004, San Francisco, CA.
- RTP Payload format for MIDI - RFC 4695
- Implementation Guide for RTP MIDI. RFC 4696
- RTP Payload format for MIDI - RFC 6295
- 'About RTP-MIDI' page on MMA website
-  http://www.rtp-midi.com
- Kiss-Box website (hardware devices using RTP-MIDI protocol)
-  RTP-MIDI driver for Windows
- http://www.grame.fr/ressources/publications/falquet05.pdf Implementing a MIDI stream over RTP
- http://www.grame.fr/ressources/publications/TR-050622.pdf Recovery journal and evaluation of alternative proposal
- https://code.google.com/p/midikit MIDIKit open source RTP-MIDI library
- http://manpages.ubuntu.com/manpages/oneiric/man1/midistream.1.html#contenttoc0 User's manual of RTP-MIDI object called "midistream" under Linux Ubuntu
- support.apple.com/kb/HT4106 Apple page about USB MIDI connectivity problems
-  Website of open-source WinRTP-MIDI project
- RTP-MIDI/AppleMIDI library for Arduino
- MIDIBox forum announcement of RTP-MIDI support in MIOS
-  IEEE Standard for Layer 3 Transport Protocol for Time-Sensitive Applications in Local Area Networks
- MIDI 1.0 Specification - Section 4 - Standard MIDI Files
- http://www.cme-pro.com/en/partner.php RTP-MIDI expansion kit for CME keyboards
- http://en.wikibooks.org/wiki/Operating_System_Design/Processes/Semaphores Operating systems semaphores
- http://www.x192.org/about AES standard group for audio interoperability over IP networks
- Automatic configuration of IPv4 Link-Local addresses - RFC3927
- "lathoub/Arduino-AppleMidi-Library". GitHub. Retrieved 2016-05-28.
- MIDIBox homepage
- Cinara homepage
- HorusDSP Homepage
- Axoloti main page