Real Time Messaging Protocol

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Real Time Messaging Protocol (RTMP) was initially a proprietary protocol developed by Macromedia for streaming audio, video and data over the Internet, between a Flash player and a server. Macromedia is now owned by Adobe, which has released an incomplete version of the specification of the protocol for public use.

The RTMP protocol has multiple variations:

  1. The "plain" protocol which works on top of and uses TCP port number 1935 by default.
  2. RTMPS which is RTMP over an TLS/SSL connection.
  3. RTMPE which is RTMP encrypted using Adobe's own security mechanism. While the details of the implementation are proprietary, the mechanism uses industry standard cryptography primitives.[1]
  4. RTMPT which is encapsulated within HTTP requests to traverse firewalls. RTMPT is frequently found utilizing cleartext requests on TCP ports 80 and 443 to bypass most corporate traffic filtering. The encapsulated session may carry plain RTMP, RTMPS, or RTMPE packets within.

While the primary motivation for RTMP was to be a protocol for playing Flash video, it is also used in some other applications, such as the Adobe LiveCycle Data Services ES.

Basic operation[edit]

RTMP is a TCP-based protocol which maintains persistent connections and allows low-latency communication. To deliver streams smoothly and transmit as much information as possible, it splits streams into fragments and their size is negotiated dynamically between the client and server while sometimes it is kept unchanged: the default fragment sizes are 64-bytes for audio data, and 128 bytes for video data and most other data types. Fragments from different streams may then be interleaved, and multiplexed over a single connection. With longer data chunks the protocol thus carries only a one-byte header per fragment, so incurring very little overhead. However, in practice individual fragments are not typically interleaved. Instead, the interleaving and multiplexing is done at the packet level, with RTMP packets across several different active channels being interleaved in such a way as to ensure that each channel meets its bandwidth, latency, and other quality-of-service requirements. Packets interleaved in this fashion are treated as indivisible, and are not interleaved on the fragment level.

The RTMP defines several virtual channels on which packets may be sent and received, and which operate independently of each other. For example, there is a channel for handling RPC requests and responses, a channel for video stream data, a channel for audio stream data, a channel for out-of-band control messages (fragment size negotiation, etc.), and so on. During a typical RTMP session, several channels may be active simultaneously at any given time. When RTMP data is encoded, a packet header is generated. The packet header specifies, amongst other matters, the id of the channel on which it is to be sent, a timestamp of when it was generated (if necessary), and the size of the packet's payload. This header is then followed by the actual payload content of the packet, which is fragmented according to the currently agreed-upon fragment size before it is sent over the connection. The packet header itself is never fragmented, and its size does not count towards the data in the packet's first fragment. In other words, only the actual packet payload (the media data) is subject to fragmentation.

At a higher level, the RTMP encapsulates MP3 or AAC audio and FLV1 video multimedia streams, and can make remote procedure calls (RPCs) using the Action Message Format. Any RPC services required are made asynchronously, using a single client/server request/response model, such that real-time communication is not required.[clarification needed][2]

Encryption[edit]

RTMP sessions may be encrypted using either of two methods:

  • Using industry standard TLS/SSL mechanisms. The underlying RTMP session is simply wrapped inside a normal TLS/SSL session.
  • Using RTMPE, which wraps the RTMP session in a lighter-weight encryption layer.

It is generally understood that the TLS/SSL handshake at the beginning of a session is very computationally intensive. Adobe developed RTMPE as a lighter weight alternative,[3] to make it more practical for high-traffic sites to serve encrypted content. Adobe advertises RTMPE as a method for secure content delivery, protecting against client impersonation[4] but this claim is false. RTMPE only uses[1] Anonymous Diffie-Hellman which provides no verification of either party's identity, and as such is vulnerable to trivial man-in-the-middle attacks at session initialization.

HTTP tunneling[edit]

In RTMP Tunneled (RTMPT), RTMP data is encapsulated and exchanged via HTTP, and messages from the client (the media player, in this case) are addressed to port 80 (the default for HTTP) on the server.

While the messages in RTMPT are larger than the equivalent non-tunneled RTMP messages due to HTTP headers, RTMPT may facilitate the use of RTMP in scenarios where the use of non-tunneled RTMP would otherwise not be possible, such as when the client is behind a firewall that blocks non-HTTP and non-HTTPS outbound traffic.

The protocol works by sending commands through the POST url and AMF messages through the POST body. An example is

POST /open/1 HTTP/1.1

for a connection to be opened.

Specification document[edit]

Adobe released what it claimed was the RTMP specification on 15 June 2009. That specification, however, omits crucial details of the protocol's implementation. It would be impossible to write a program incorporating the RTMP protocol based on the released specification alone; many essential details are omitted, and only limited additional facts can be determined by studying other implementations that use the protocol (such as librtmp), and by carrying out test TCP/IP packet captures.

The Adobe license to use this protocol requires that implementations of RTMP servers meet this specification.

Details missing from Adobe's published specification include:

  • No word about the real RTMP handshake. If done incorrectly, a server implementation is unable to deliver H.264/AAC content. Flash player silently fails the H.264 content if the handshake is wrong. However, all client implementations will work because usually rtmp servers are more permissive in this regard (including FMS)
  • The fact that chunks are sent up to a maximum chunk size only; and that where a chunk exceeds that size it is still sent, with a header giving the total chunk size, but that after the maximum chunk size has been exceeded, a type 4 chunk header is then sent, starting the next part of the fragmented chunk.
  • Explanations for some control messages for streams are missing (31 and 32). FMS sends them from time to time.

Packet structure[edit]

RTMP Packet Diagram

Packets are sent over a TCP connection which are established first between client and server. They contain a header and a body which, in the case of connection and control commands, is encoded using the Action Message Format (AMF). The header is split into the Basic Header (shown as detached from the rest, in the diagramme) and Chunk Message Header. The Basic Header is the only constant part of the packet and is usually composed of a single composite byte, where the 2 most significant bits are the Chunk Type (fmt in the specification) and the rest form the Stream ID. Depending on the value of the former, some fields of the Message Header can be omitted and their value derived from previous packets while depending on the value of the latter, the Basic Header can be extended with 2 or 3 extra bytes (as in the case of the diagramme that has 3 bytes in total (c)). If the value of the remaining 6 bits of the Basic Header (BH) (least significant) is 0 then the BH is of 2 bytes and represents from Stream ID 64 to 319 (64+255); if the value is 1, then the BH is of 3 bytes (last 2 bytes encoded as 16bit Little Endian) and represents from Stream ID 64 to 65599 (64+65535); if the value is 2, then BH is of 1 byte and is reserved for low-level protocol control messages and commands. The Chunk Message Header contains meta-data information such as the message size (measured in bytes), the Timestamp Delta and Message Type. This last value is a single byte and defines whether the packet is an audio, video, command or "low level" RTMP packet such as an RTMP Ping.

An example is shown below as captured when a flash client executes the following code:

var stream:NetStream = new NetStream(connectionObject);

this will generate the following Chunk:

Hex Code ASCII
03 00 0b 68 00 00 19 14 00 00 00 00 02 00 0C 63 72 65 61 74 65 53 74 72 65 61 6D 00 40 00 00 00 00 00 00 00 05 . . @ I . . . . . . . . . . . . c r e a t e S t r e a m . @ . . . . . . . .

The packet starts with a Basic Header of a single byte (0x03) where the 2 most significant bits (b00000011) define a chunk header type of 0 while the rest (b00000011) define a Chunk Stream ID of 3. The 4 possible values of the header type and their significance are:

  • b00 = 12 byte header (full header).
  • b01 = 8 bytes - like type b00. not including message ID (4 last bytes).
  • b10 = 4 bytes - Basic Header and timestamp (3 bytes) are included.
  • b11 = 1 byte - only the Basic Header is included.

The last type (b11) is always used in the case of aggregate messages where, in the example above, the second message will start with an id of 0xC3 (b11000011) and would mean that all Message Header fields should be derived from the message with a stream Id of 3 (which would be the message right above it). The 6 least significant bits that form the Stream ID can take values between 3 and 65599. Some values have special meaning like 1 that stands for an extended ID format, in which case there will be 2 bytes following that. A value of 2 is for low level messages such as Ping and Set Client Bandwidth.

The next bytes of the RTMP Header (including the values in the example packet above) are decoded as follows:

  • byte #1 (0x03) = Chunk Header Type.
  • byte #2-4 (0x000b68) = Timestamp delta.
  • byte #5-7 (0x000019) = Packet Length - in this case it is 0x000019 = 25 bytes.
  • byte #8 (0x14) = Message Type ID - 0x14 (20) defines an AMF0 encoded command message.
  • byte #9-12 (0x00000000) = Message Stream ID. This (strangely) is in little-endian order

The Message Type ID byte defines whether the packet contains audio/video data, a remote object or a command. Some possible values are for are:

  • 0x01 = Set Packet Size Message.
  • 0x04 = Ping Message.
  • 0x05 = Server Bandwidth
  • 0x06 = Client Bandwidth.
  • 0x08 = Audio Packet.
  • 0x09 = Video Packet.
  • 0x11 = An AMF3 type command.
  • 0x12 = Invoke (onMetaData info is sent as such).
  • 0x14 = An AMF0 type command.

Following the header, 0x02 denotes a string of size 0x000C and values 0x63 0x72 ... 0x6D ("createStream" command). Following that we have a 0x00 (number) which is the transaction id of value 2.0. The last byte is 0x05 (null) which means there are no arguments.

Invoke Message Structure (0x14, 0x11)[edit]

Some of the message types shown above, such as Ping and Set Client/Server Bandwidth, are considered low level RTMP protocol messages which do not use the AMF encoding format. Command messages on the other hand, whether AMF0 (Message Type of 0x14) or AMF3 (0x11), use the format and have the general form shown below:

(String) <Command Name>
(Number) <Transaction Id>
(Mixed)  <Argument> ex. Null, String, Object: {key1:value1, key2:value2 ... }

The transaction id is used for commands that can have a reply. The value can be either a string like in the example above or one or more objects, each composed of a set of key/value pairs where the keys are always encoded as strings while the values can be any AMF data type, including complex types like arrays.

Ping Message Structure (0x04)[edit]

Ping messages are not AMF encoded. They start with a stream Id of 0x02 which implies a full (type 0) header and have a message type of 0x04. The header is followed by 6 bytes which are interpreted as such:

  • #0-1 - Ping Type.
  • #2-3 - Second Parameter (this has meaning in specific Ping Types)
  • #4-5 - Third Parameter (same)

The first two bytes of the message body define the Ping Type which can apparently[5] take 6 possible values.

  • Type 0 - Clear Stream: Sent when the connection is established and carries no further data
  • Type 1 - Clear the Buffer.
  • Type 3 - The client's buffer time. The third parameter holds the value in millisecond.
  • Type 4 - Reset a stream.
  • Type 6 - Ping the client from server. The second parameter is the current time.
  • Type 7 - Pong reply from client. The second parameter is the time when the client receives the Ping.

Pong is the name for a reply to a Ping with the values used as seen above.

ServerBw/ClientBw Message Structure (0x05, 0x06)[edit]

This relates to messages that have to do with the client up-stream and server down-stream bit-rate. The body is composed of 4 bytes showing the bandwidth value with a possible extension of one byte which sets the Limit Type. This can have one of 3 possible values which can be: hard, soft or dynamic (either soft or hard).

Set Chunk Size (0x01)[edit]

The value received in the 4 bytes of the body. A default value of 128 bytes exists and the message is sent only when a change is wanted

The protocol[edit]

RTMP Handshake Diagram

Handshake[edit]

After establishing a TCP connection, an RTMP connection is established first performing a handshake through the exchange of 3 packets from each side (also referred as Chunks in the official documentation). These are referred in the official spec as C0-2 for the client sent packets and S0-2 for the server side respectively and are not to be confused with RTMP packets that can be exchanged only after the handshake is complete. These packets have a structure of their own and C1 contains a field setting the "epoch" timestamp but since this can be set to zero, as is done in third party implementations, the packet can be simplified. The client initialises the connection by sending the C0 packet with a constant value of 0x03 representing the current protocol version. It follows straight with C1 without waiting for S0 to be received first which contains 1536 bytes, with the first 4 representing the epoch timestamp, the second 4 all being 0, and the rest being random (and which can be set to 0 in third party implementations). C2 and S2 are an echo of S1 and C1 respectively, except with the second 4 bytes being the time the respective message was received (instead of 0). After C2 and S2 are received the handshake is considered complete.

Connect[edit]

At this point, the client and server can negotiate a connection by exchanging AMF encoded messages. These include key value pairs which relate to variables that are needed for a connection to be established. An example message from the client is:

(Invoke) “connect”
(Transaction ID) 1.0
(Object1) { app: “sample”, flashVer: “MAC 10,2,153,2”, swfUrl: null,
              tcUrl: “rtmpt://127.0.0.1/sample “, fpad: false,
              capabilities: 9947.75 , audioCodecs: 3191, videoCodecs: 252,
              videoFunction: 1 , pageUrl: null, objectEncoding: 3.0 }

The Flash Media Server and other implementations uses the concept of an "app" to conceptually define a container for audio/video and other content, implemented as a folder on the server root which contains the media files to be streamed. The first variable contains the name of this app as "sample" which is the name provided by the Wowza Server for their testing. The flashVer string is the same as returned by the Action-script getversion() function. The audioCodec and videoCodec are encoded as doubles and their meaning can be found in the original spec. The same is true for the videoFunction variable which in this case is the self-explanatory SUPPORT_VID_CLIENT_SEEK constant. Of special interest is the objectEncoding which will define whether the rest of the communication will make use of the extended AMF3 format or not. As version 3 is the current default, the flash client has to be told explicitly in Action-script code to use AMF0 if that is requested. The server then replies with a ServerBW, a ClientBW and a SetPacketSize message sequence, finally followed by an Invoke, with an example message.

(Invoke) “_result”
(transaction ID) 1.0
(Object1) { fmsVer: "FMS/3,5,5,2004", capabilities: 31.0, mode: 1.0 }
(Object2) { level: “status”, code: “NetConnection.Connect.Success",
                   description: “Connection succeeded”,
                   data: (array) { version: “3,5,5,2004” },
                   clientId: 1728724019, objectEncoding: 3.0 }

Some of the values above are serialised into properties of a generic Action-script Object which is then passed to the NetConnection event listener. The clientId will establish a number for the session to be started by the connection. Object encoding must match the value previously set.

Play video[edit]

To start a video stream, the client sends a "createStream" invocation followed by a ping message, followed by a "play" invocation with the file name as argument. The server will then reply with a series of "onStatus" commands followed by the video data as encapsulated within RTMP messages.

After a connection is established, media is sent by encapsulating the content of FLV tags into RTMP messages of type 8 and 9 for audio and video respectively.

HTTP tunneling (RTMPT)[edit]

This refers to the HTTP tunneled version of the protocol. It communicates over port 80 and passes the AMF data inside HTTP POST request and responses. The sequence for connection is as follows:

POST /fcs/ident2 HTTP/1.1
Content-Type: application/x-fcs\r\n

HTTP/1.0 404 Not Found

POST /open/1 HTTP/1.1
Content-Type: application/x-fcs\r\n

HTTP/1.1 200 OK
Content-Type: application/x-fcs\r\n
    1728724019

The first request has an /fcs/ident2 path and the correct reply is a 404 Not Found error. The client then sends an /open/1 request where the server must reply with a 200 ok appending a random number that will be used as the session identifier for the said communication. In this example 1728724019 is returned in the response body.

POST /idle/1728724019/0 HTTP/1.1
HTTP/1.1 200 OK
   0x01

From now on the /idle/<session id>/<sequence #> is a polling request where the session id has been generated and returned from the server and the sequence is just a number that increments by one for every request. The appropriate response is a 200 OK with an integer returned in the body signifying the interval time. AMF data is sent through /send/<session id>/<sequence #>

Software implementations[edit]

Client software[edit]

The most widely adopted RTMP client is Adobe Flash Player, which supports playback of audio and video streamed from RTMP servers (when installed as a web browser plug-in).

  • XBMC media player has partial support for playing RTMP streams (but not RTMPE).
  • Stream Transport is Windows software that can download videos that stream over RTMP.
  • Gnash, an open source replacement for the Flash Player on the Linux platform, intends to support RTMP streaming for Linux.[6]

rtmpdump[edit]

The open-source command-line tool rtmpdump is designed to play back or save to disk the full RTMP stream including the RTMPE protocol Adobe uses for encryption. RTMPdump runs on Linux, Android, Solaris, MacOSX, and most other Unix-derived operating systems, as well as Microsoft Windows. Originally supporting all versions of 32-bit Windows including Windows 98, from version 2.2 the software will run only on Windows XP and above (although earlier versions remain fully functional).

Packages of the rtmpdump suite of software are available in the major open-source repositories (GNU/Linux distros). These include the front-end apps "rtmpdump", "rtmpsrv" and "rtmpsuck."

Development of RTMPdump was restarted in October 2009, outside the United States, at the MPlayer site.[7] The current version features greatly improved functionality, and has been rewritten to take advantage of the benefits of the C programming language. In particular, the main functionality was built into a library (librtmp) which can easily be used by other applications. The RTMPdump developers have also written support for librtmp for MPlayer, FFmpeg, XBMC, cURL, VLC and a number of other open source software projects. Use of librtmp provides these projects with full support of RTMP in all its variants without any additional development effort.

FLVstreamer[edit]

FLVstreamer is a fork of RTMPdump, without the code which Adobe claims violates the DMCA in the USA. This was developed as a response to Adobe's attempt in 2008 to suppress RTMPdump. FLVstreamer will save to disk ("download") a stream of audio or video content from any RTMP server, if encryption (RTMPE) is not enabled on the stream.

Server software[edit]

Some full implementation RTMP servers are:

Research and development[edit]

  • Researchers at crtmpserver are reverse engineering the RTMFP protocol. This is currently a work-in-progress.
  • Blue5 - A project to create open source versions of RTMPE and RTMFP.
  • kbmMW Enterprise Edition N-tier development tool for Delphi/C++Builder supports RTMP.

See also[edit]

References[edit]

  1. ^ a b "RTMPE". Adobe Flash Lite 4 Help. Adobe. Retrieved 29 December 2013. 
  2. ^ Using RPC services in Flex Data Services 2. Archived from the original on 3 April 2007. Retrieved 16 April 2007. 
  3. ^ Adobe Media Server Help. Livedocs.adobe.com (2013-02-28). Retrieved on 2014-01-22.
  4. ^ [1][dead link]
  5. ^ The Red5 Project (2009) Ping. Available from: http://trac.red5.org/wiki/Documentation/Tutorials/Ping. Accessed at: 25 December 2011
  6. ^ "Linux Funding". Retrieved 1 January 2010. 
  7. ^ "Updates:2009-11-01". Retrieved 1 November 2009. 

External links[edit]