Host media processing
|This article needs additional citations for verification. (January 2008) (Learn how and when to remove this template message)|
A telephony system based on host media processing (HMP) is one that uses a general-purpose computer to process a telephony call’s media stream rather than using digital signal processors (DSPs) to perform the task. When telephony call streams started to be digitized for time-division-multiplexed (TDM) transport, processing of the media stream, to enhance it in some way, became common. For example, digital echo cancellers were added to long-haul circuits, and transport channels were shaped to improve modem performance. Then, in the mid-‘80s, computer-based systems that implemented messaging, for example, used DSPs to compress the audio for storage, and fax servers used DSPs to implement fax modems.
However, since the late ‘90s, the millions of instructions per second (MIPS) of processing power available on low-cost PCs have been adequate to process several media streams, while still leaving enough processing power to handle the application. And, following Moore’s Law, PC capacity continues to double every 18 months, while the MIPS required to process a call’s media stream have remained relatively constant. Now, in the latter half of the century’s first decade, a single PC can handle well over 100 simultaneous calls.
Prior to IP telephony, when you wanted to connect a telecommunications system to a telecom network it was necessary to have a telecom-specific physical interface. This could mean an analog interface (POTS/DS-0), for low-density non-network systems, or a digital interface, such as a T-1 or E-1 line (DS-1, delivering 24 or 32 DS-0s). A DS-4 connection delivers 274.176 Mbit/s or 4032 DS-Os. In each case, telecom-specific electronic interfaces, which were proprietary and, therefore, relatively expensive, were necessary. The situation changes dramatically with an all-IP telecom infrastructure. The network interfaces move from being a significant proprietary component to off-the-shelf high-performance IP interfaces, an inherent feature in every modern computing system. Today, 10-Gigabit Ethernet' telephony systems are being deployed.
The term Host Media Processing was first used in a product name by Intel in the early 2000s. It was quickly adopted as a generic term for software-based telephony products, used by many companies including Aculab, Pika, Eicon Networks, Uniqall, Commetrex, and NMS. Intel's Host Media Processing product line (still called HMP) exists today under the Dialogic banner.
The concept of using an industry standard PC to do telephony processing is now widely understood and accepted, with open-source platforms like Asterisk, YATE and FreeSWITCH using the same principle. The rise of interest in VoIP and Fax-over-IP (FoIP) have driven demand for open, host-based solutions that can be molded into a variety of different communications solutions. HMP components are used today to implement many different kinds of solutions including PBX, conference servers, unified communications servers and IVR. The emergence of virtualization in recent years also increases the appeal of HMP, since it is then possible to think of telephony resources as being virtual channels (rather than dedicated hardware boards), which offer the same benefit as virtual processors and servers, i.e. resilience; less hardware; space saving; lower maintenance.
Network connectivity through low-cost industry-standard interfaces influences the consideration of whether to use DSPs or server blades for media processing, especially in media servers, where packet-delays are not as troublesome and TDM interfaces are not required. Without telephony interface blades and their attendant chassis and power systems available to host the DSPs, the addition of DSPs on proprietary blades must be independently justified. They will continue to be justified for the highest-density applications. However, with the semiconductor industry continuing to follow Moore’s law, host media processing will support 1500 channels on one blade in 2010. DSPs will always offer even higher densities, but if 1500 channels meets the system requirement, higher densities will have little incremental value.
Not every use of the term “HMP” means the same thing. There are, for example, HMP systems that do no actual media processing, so it is important to understand how the term is being used today.
Modern digital-media telephony systems require signal processing to transform a call stream or extract information from it. Transformation includes the processing required to send or receive a fax and to transcode the stream from one speech codec to another for capability matching or bandwidth reduction. DTMF detection, caller ID, and in-band call-progress analysis are good examples of information extraction.
There are many limited-function media servers on the market that don’t actually do any media (signal) processing. There is an Internet Engineering Task Force (IETF) “RFC” (2833)  that defines how a gateway can perform the in-band-tone analysis to extract some of the embedded information, such as DTMF and caller ID. In this case, all the media server need do is parse the RTP buffers from a gateway to derive the tone information.
But what about transcoding, where one voice-compression scheme (vocoder) it transcoded to another? Some media servers, for example, simply process buffers, and, therefore, cannot perform any transcoding, limiting them to low-function voice messaging. RTP packets are simply stored and played back as they are received. This means no AGC, volume control, time-scale modification (playback speedup and slowdown), or capabilities matching with endpoint terminals, making this type of so-called HMP media server a viable option only in the most functionally constrained applications.
For years, the terms “signal processing” and “media processing” have been used interchangeably, so, most appropriately, the term HMP is reserved for those systems where host MIPS are actually used to perform digital signal-processing tasks.
- "Intel Announces Two Standards-based Software Building Blocks" (Press release). Speechtekmag. September 1, 2003. Retrieved 2010-08-10.