Session Initiation Protocol

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Session Initiation Protocol (SIP) is a communications protocol for signaling and controlling multimedia communication sessions. The most common applications of SIP are in Internet telephony for voice and video calls, as well as instant messaging, over Internet Protocol (IP) networks.

The protocol defines the messages that are sent between endpoints, which govern establishment, termination and other essential elements of a call. SIP can be used for creating, modifying and terminating sessions consisting of one or several media streams. SIP is an application layer protocol designed to be independent of the underlying transport layer. It is a text-based protocol, incorporating many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP).[1]

SIP works in conjunction with several other application layer protocols that identify and carry the session media. Media identification and negotiation is achieved with the Session Description Protocol (SDP). For the transmission of media streams (voice, video) SIP typically employs the Real-time Transport Protocol (RTP) or Secure Real-time Transport Protocol (SRTP). For secure transmissions of SIP messages, the protocol may be encrypted with Transport Layer Security (TLS).


SIP was originally designed by Mark Handley, Henning Schulzrinne, Eve Schooler and Jonathan Rosenberg in 1996. The protocol was standardized as RFC 2543 in 1999 (SIP 1.0). In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular systems. As of 2014, the latest version (SIP 2.0) of the specification is RFC 3261, published in June 2002,[2] with extensions and clarifications since then.[3]

The U.S. National Institute of Standards and Technology (NIST), Advanced Networking Technologies Division provides a public-domain Java implementation[4] that serves as a reference implementation for the standard. The implementation can work in proxy server or user agent scenarios and has been used in numerous commercial and research projects. It supports RFC 3261 in full and a number of extension RFCs including RFC 6665 (event notification) and RFC 3262 (reliable provisional responses).

While originally developed based on voice applications, the protocol was envisioned and supports a diverse array of applications, including video conferencing, streaming multimedia distribution, instant messaging, presence information, file transfer, fax over IP and online games.[5][6][7]

Protocol operation[edit]

SIP is independent from the underlying transport protocol. It runs on the Transmission Control Protocol (TCP), the User Datagram Protocol (UDP) or the Stream Control Transmission Protocol (SCTP).[8] SIP can be used for two-party (unicast) or multiparty (multicast) sessions.

SIP employs design elements similar to the HTTP request/response transaction model.[9] Each transaction consists of a client request that invokes a particular method or function on the server and at least one response. SIP reuses most of the header fields, encoding rules and status codes of HTTP, providing a readable text-based format.

Each resource of a SIP network, such as a user agent or a voicemail box, is identified by a uniform resource identifier (URI), based on the general standard syntax also used in Web services and e-mail.[10] The URI scheme used for SIP is sip and a typical SIP URI has the form sip:username@domainname or sip:username@hostport, where domainname requires DNS SRV records to locate the servers for SIP domain while hostport can be an IP address or a fully qualified domain name of the host and port.[11][12]

If secure transmission is required, the scheme sips is used and mandates that each hop over which the request is forwarded up to the target domain must be secured with Transport Layer Security (TLS). The last hop from the proxy of the target domain to the user agent has to be secured according to local policies. TLS protects against attackers who try to listen on the signaling link but it does not provide real end-to-end security to prevent espionage and law enforcement interception, as the encryption is only hop-by-hop and every single intermediate proxy has to be trusted.

SIP works in concert with several other protocols and is only involved in the signaling portion of a communication session. SIP clients typically use TCP or UDP on port numbers 5060 or 5061 to connect to SIP servers and other SIP endpoints. Port 5060 is commonly used for non-encrypted signaling traffic whereas port 5061 is typically used for traffic encrypted with Transport Layer Security (TLS). SIP is primarily used in setting up and tearing down voice or video calls. It also allows modification of existing calls. The modification can involve changing addresses or ports, inviting more participants, and adding or deleting media streams. SIP has also found applications in messaging applications, such as instant messaging, and event subscription and notification. A suite of SIP-related Internet Engineering Task Force (IETF) rules define behavior for such applications. The voice and video stream communications in SIP applications are carried over another application protocol, the Real-time Transport Protocol (RTP). Parameters (port numbers, protocols, codecs) for these media streams are defined and negotiated using the Session Description Protocol (SDP), which is transported in the SIP packet body.

A motivating goal for SIP was to provide a signaling and call setup protocol for IP-based communications that can support a superset of the call processing functions and features present in the public switched telephone network (PSTN). SIP by itself does not define these features; rather, its focus is call-setup and signaling. The features that permit familiar telephone-like operations (i.e. dialing a number, causing a phone to ring, hearing ringback tones or a busy signal) are performed by proxy servers and user agents. Implementation and terminology are different in the SIP world but to the end-user, the behavior is similar.

SIP-enabled telephony networks often implement many of the call processing features of Signaling System 7 (SS7), although the two protocols themselves are very different. SS7 is a centralized protocol, characterized by a complex central network architecture and dumb endpoints (traditional telephone handsets). SIP is a client-server protocol, however most SIP-enabled devices may perform both the client and the server role. In general, session initiator is a client, and the call recipient performs the server function. SIP features are implemented in the communicating endpoints, contrary to traditional SS7 architecture, in which features are implemented in the network core.

SIP is distinguished by its proponents for having roots in the IP community rather than in the telecommunications industry. SIP has been standardized and governed primarily by the IETF, while other protocols, such as H.323, have traditionally been associated with the International Telecommunication Union (ITU).

Network elements[edit]

SIP defines user agents as well as several types of server network elements. Two SIP endpoints can communicate without any intervening SIP infrastructure. However, this approach is often impractical for public services, which need directory services to locate available nodes in the network.

User agent[edit]

A SIP user agent (UA) is a logical network end-point used to create or receive SIP messages and thereby manage a SIP session. A SIP UA can perform the role of a user agent client (UAC), which sends SIP requests, and the user agent server (UAS), which receives the requests and returns a SIP response. These roles of UAC and UAS only last for the duration of a SIP transaction.[6]

A SIP phone is an IP phone that implements client and server functions of a SIP user agent and provides the traditional call functions of a telephone, such as dial, answer, reject, call hold, and call transfer.[13][14] SIP phones may be implemented as a hardware device or as a softphone. As vendors increasingly implement SIP as a standard telephony platform, the distinction between hardware-based and software-based SIP phones is blurred and SIP elements are implemented in the basic firmware functions of many IP-capable devices. Examples are devices from Nokia and BlackBerry.[15]

In SIP, as in HTTP, the user agent may identify itself using a message header field User-Agent, containing a text description of the software, hardware, or the product name. The user agent field is sent in request messages, which means that the receiving SIP server can see this information. SIP network elements sometimes store this information,[16] and it can be useful in diagnosing SIP compatibility problems.

Proxy server[edit]

The proxy server is an intermediary entity that acts as both a server and a client for the purpose of making requests on behalf of other clients. A proxy server primarily plays the role of routing, meaning that its job is to ensure that a request is sent to another entity closer to the targeted user. Proxies are also useful for enforcing policy, such as for determining whether a user is allowed to make a call. A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it.


SIP user agent registration to SIP registrar with authentication.
Call flow through redirect server and proxy.
Establishment of a session through a back-to-back user agent.

A registrar is a SIP endpoint that accepts REGISTER requests, recording the address and other parameters from the user agent, and that provides a location service for subsequent requests. The location service links one or more IP addresses to the SIP URI of the registering agent. Multiple user agents may register for the same URI, with the result that all registered user agents receive the calls to the URI.

SIP registrars are logical elements, and are commonly co-located with SIP proxies. To improve network scalability, location services may instead be located with a redirect server.

Redirect server[edit]

A redirect server is a user agent server that generates 3xx (redirection) responses to requests it receives, directing the client to contact an alternate set of URIs. A redirect server allows proxy servers to direct SIP session invitations to external domains.

Session border controller[edit]

Session border controllers serve as middle boxes between UA and SIP servers for various types of functions, including network topology hiding and assistance in NAT traversal.


Gateways can be used to interconnect a SIP network to other networks, such as the public switched telephone network, which use different protocols or technologies.

SIP messages[edit]

SIP is a text-based protocol with syntax similar to that of HTTP. There are two different types of SIP messages: requests and responses. The first line of a request has a method, defining the nature of the request, and a Request-URI, indicating where the request should be sent.[17] The first line of a response has a response code.

SIP request[edit]

For SIP requests, RFC 3261 defines the following methods:[18]

  • REGISTER: Used by a UA to register to the registrar.
  • INVITE: Used to establish a media session between user agents.
  • ACK: Confirms reliable message exchanges.
  • CANCEL: Terminates a pending request.
  • BYE: Terminates an existing session.
  • OPTIONS: Requests information about the capabilities of a caller without the need to set up a session. Often used as keepalive messages.
  • REFER: indicates that the recipient (identified by the Request-URI) should contact a third party using the contact information provided in the request. (call transfer)

A new method has been introduced in SIP in RFC 3262:

  • PRACK (Provisional Response Acknowledgement): PRACK improves network reliability by adding an acknowledgement system to the provisional responses (1xx). PRACK is sent in response to provisional response (1xx).

SIP response[edit]

The SIP response types defined in RFC 3261 fall in one of the following categories:[19]

  • Provisional (1xx): Request received and being processed.
  • Success (2xx): The action was successfully received, understood, and accepted.
  • Redirection (3xx): Further action needs to be taken (typically by sender) to complete the request.
  • Client Error (4xx): The request contains bad syntax or cannot be fulfilled at the server.
  • Server Error (5xx): The server failed to fulfill an apparently valid request.
  • Global Failure (6xx): The request cannot be fulfilled at any server.


Example: User1’s UAC uses an Invite Client Transaction to send the initial INVITE (1) message. If no response is received after a timer controlled wait period the UAC may chose to terminate the transaction or retransmit the INVITE. Once a response is received, User1 is confident the INVITE was delivered reliably. User1’s UAC must then acknowledge the response. On delivery of the ACK (2) both sides of the transaction are complete. In this case, a dialog may have been established.[20]

SIP defines a transaction mechanism to control the exchanges between participants and deliver messages reliably. A transaction is a state of a session, which is controlled by various timers. Client transactions send requests and server transactions respond to those requests with one or more responses. The responses may include provisional responses, which a response code in the form 1xx, and one or multiple final responses (2xx – 6xx).

Transactions are further categorized as either type Invite or type Non-Invite. Invite transactions differ in that they can establish a long-running conversation, referred to as a dialog in SIP, and so include an acknowledgment (ACK) of any non-failing final response, e.g., 200 OK.

Because of these transactional mechanisms, unreliable transport protocols, such as the User Datagram Protocol (UDP), are sufficient for SIP operation.

Instant messaging and presence[edit]

The Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE) is the SIP-based suite of standards for instant messaging and presence information. MSRP (Message Session Relay Protocol) allows instant message sessions and file transfer.

Conformance testing[edit]

TTCN-3 test specification language is used for the purposes of specifying conformance tests for SIP implementations. SIP test suite is developed by a Specialist Task Force at ETSI (STF 196).[21] The SIP developer community meets regularly at the SIP Forum SIPit events to test interoperability and test implementations of new RFCs.


A SIP connection is a marketing term for voice over Internet Protocol (VoIP) services offered by many Internet telephony service providers (ITSPs). The service provides routing of telephone calls from a client's private branch exchange (PBX) telephone system to the public switched telephone network (PSTN). Such services may simplify corporate information system infrastructure by sharing Internet access for voice and data, and removing the cost for Basic Rate Interface (BRI) or Primary Rate Interface (PRI) telephone circuits.

Many VoIP phone companies allow customers to use their own SIP devices, such as SIP-capable telephone sets, or softphones.

SIP-enabled video surveillance cameras can make calls to alert the owner or operator that an event has occurred; for example, to notify that motion has been detected out-of-hours in a protected area.

SIP is used in audio over IP for broadcasting applications where it provides an interoperable means for audio interfaces from different manufacturers to make connections with one another.[22]

SIP-ISUP interworking[edit]

SIP-I, or the Session Initiation Protocol with encapsulated ISUP, is a protocol used to create, modify, and terminate communication sessions based on ISUP using SIP and IP networks. Services using SIP-I include voice, video telephony, fax and data. SIP-I and SIP-T[23] are two protocols with similar features, notably to allow ISUP messages to be transported over SIP networks. This preserves all of the detail available in the ISUP header, which is important as there are many country-specific variants of ISUP that have been implemented over the last 30 years, and it is not always possible to express all of the same detail using a native SIP message. SIP-I was defined by the ITU-T, whereas SIP-T was defined via the IETF RFC route.[24]

Deployment issues[edit]

If the call traffic runs on the same connection with other traffic, such as email or Web browsing, voice and even signaling packets may be dropped and the voice stream may be interrupted.

To mitigate this, many companies split voice and data between two separate internet connections. Alternately, some networks use the Differentiated services (DiffServ) field (previously defined as Type of Service (ToS) field) in the header of IPV4 packets to mark the relative time-sensitivity of SIP and RTP as compared to web, email, video and other types of IP traffic. This precedence marking method requires that all routers in the SIP and RTP paths support separate queues for different traffic types. Other options to control delay and loss include incorporating multiple VLANs (virtual local area networks), traffic shaping to avoid this resource conflict, but the efficacy of this solution is dependent on the number of packets dropped between the Internet and the PBX.

Registration is required if the end user has a dynamic IP address, if the provider does not support static hostnames, or if NAT is used. In order to share several DID numbers on the same registration, the IETF has defined additional headers (for example "P-Preferred-Identity", see RFC 3325). This avoids multiple registrations from one PBX to the same provider. Using this method the PBX can indicate what identity should be presented to the Called party and what identity should be used for authenticating the call. This feature is also useful when the PBX redirects an incoming call to a PSTN number, for example a cell phone, to preserve the original Caller ID.

Users should also be aware that a SIP connection can be used as a channel for attacking the company's internal networks, similar to Web and Email attacks. Users should consider installing appropriate security mechanisms to prevent malicious attacks.


The increasing concerns about the security of calls that run over the public Internet has made SIP encryption more popular and, in fact more desired. Because VPN is not an option for most service providers, most service providers that offer secure SIP (SIPS) connections use TLS for securing signaling. The relationship between SIP (port 5060) and SIPS (port 5061), is similar to that as for HTTP and HTTPS, and uses URIs in the form "". The media streams, which occur on different connections to the signaling stream, can be encrypted with SRTP. The key exchange for SRTP is performed with SDES (RFC 4568), or the newer and often more user friendly ZRTP (RFC 6189), which can automatically upgrade RTP to SRTP using dynamic key exchange (and a verification phrase). One can also add a MIKEY (RFC 3830) exchange to SIP and in that way determine session keys for use with SRTP.

See also[edit]


  1. ^ Johnston, Alan B. (2004). SIP: Understanding the Session Initiation Protocol, Second Edition. Artech House. ISBN 1-58053-168-7. 
  2. ^ "SIP core working group charter". 2010-12-07. Retrieved 2011-01-11. 
  3. ^ "Search Internet-Drafts and RFCs". Internet Engineering Task Force. 
  4. ^ "JAIN SIP project". Retrieved 2011-07-26. 
  5. ^ "What is SIP?". Network World. May 11, 2004. 
  6. ^ a b "RFC 3261 – SIP: Session Initiation Protocol". IETF. 2002. 
  7. ^ Margaret Rouse. "Session Initiation Protocol (SIP)". TechTarget. 
  8. ^ RFC 4168, The Stream Control Transmission Protocol (SCTP) as a Transport for the Session Initiation Protocol (SIP), IETF, The Internet Society (2005)
  9. ^ William Stallings, p.209
  10. ^ RFC 3986, Uniform Resource Identifiers (URI): Generic Syntax, IETF, The Internet Society (2005)
  11. ^ Miikka Poikselkä et al. 2004.
  12. ^ Brian Reid & Steve Goodman 2015.
  13. ^ Azzedine (2006). Handbook of algorithms for wireless networking and mobile computing. CRC Press. p. 774. ISBN 978-1-58488-465-1. 
  14. ^ Porter, Thomas; Andy Zmolek; Jan Kanclirz; Antonio Rosela (2006). Practical VoIP Security. Syngress. pp. 76–77. ISBN 978-1-59749-060-3. 
  15. ^ "BlackBerry MVS Software". Retrieved 2011-01-11. 
  16. ^ "User-Agents We Have Known "VoIP
  17. ^ Stallings, p.214
  18. ^ Stallings, pp.214-215
  19. ^ Stallings, pp.216-217
  20. ^ James Wright. "SIP - An Introduction" (PDF). Konnetic. Retrieved 2011-01-11. 
  21. ^ Experiences of Using TTCN-3 for Testing SIP and also OSP Archived March 30, 2014, at the Wayback Machine.
  22. ^ Jonsson, Lars; Mathias Coinchon (2008). "Streaming audio contributions over IP" (PDF). EBU Technical Review. Retrieved 2010-12-27. 
  23. ^ "RFC3372: SIP-T Context and Architectures". September 2002. Retrieved 2011-01-11. 
  24. ^ White Paper: "Why SIP-I? A Switching Core Protocol Recommendation"


  • Brian Reid; Steve Goodman (22 January 2015), Exam Ref 70-342 Advanced Solutions of Microsoft Exchange Server 2013 (MCSE), Microsoft Press, p. 24, ISBN 978-0-73-569790-4 
  • Miikka Poikselkä; Georg Mayer; Hisham Khartabil; Aki Niemi (19 November 2004), The IMS: IP Multimedia Concepts and Services in the Mobile Domain, John Wiley & Sons, p. 268, ISBN 978-0-47-087114-0 

External links[edit]