Regardless of what’s in between, a phone conversation between two people requires that each has both a microphone and a speaker. In the traditional telephone, the microphone is located in the mouthpiece and the speaker is located in the earpiece. In an analog telephone (like the one you have at home), the voice signal produced by the mouthpiece is sent directly along the wire to a telephone exchange or a local PBX.
If you’re going to use IP telephony, you’ll still need a microphone and speaker. Those could be the microphone and speaker supplied with your PC or built into a PC-attached headset. But they could equally well be provided by a traditional analog telephone attached to an IP telephony enabled PBX (a so-called iPBX) or by a telephone plugged into a data port that supports IP telephony directly (an IP telephone). Regardless of whether it’s a PC or a traditional telephone attached to an iPBX or an IP telephone, the basic mechanics of how an IP telephone call works are the same.
SIP, the session initiation protocol, is the IETF (Internet Engineering Task Force) protocol for VOIP and other text and multimedia sessions, like instant messaging, video, online games and other services. This document describes Session Initiation Protocol (SIP), an application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants. These sessions include Internet telephone calls, multimedia distribution, and multimedia conferences.
SIP invitations used to create sessions carry session descriptions that allow participants to agree on a set of compatible media types. SIP makes use of elements called proxy servers to help route requests to the user’s current location, authenticate and authorize users for services, implement provider call-routing policies, and provide features to users. SIP also provides a registration function that allows users to upload their current locations for use by proxy servers. SIP runs on top of several different transport protocols.
- SIP is a text-based protocol that uses UTF-8 encoding
- SIP uses port 5060 both for UDP and TCP. SIP may use other transports
So what happens when you want to make a call? First of all, after you’ve dialed a telephone number or clicked on a name, signaling is required to determine the status of the called party — available or busy — and to establish the call (signaling is used for many other things too, but more on that later). Next, when the conversation starts, the analog signal produced by the microphone needs to be encoded in a digital format suitable for transmission across an IP network. The IP network itself must then ensure that your real-time conversation is transported across the available media in a manner that produces acceptable voice quality. Finally, it may be necessary for the IP telephony stream to be converted by a gateway to another format — either for interoperation with a different IP based multimedia scheme or because you are placing a call to the traditional public telephone network. The overall technology requirements of an IP telephony solution can therefore be split into four categories — signaling, encoding, transport and gateway control.
For each of these areas there exist standards. Unfortunately, in the key area of signaling there are two competing sets of standards — H.323 (an ITU standard) and SIP (Session Initiation Protocol, an IETF standard). This is why discussions of IP telephony often seem to boil down to an H.323 vs. SIP argument. It is important to remember, however, that neither H.323 nor SIP alone make up a complete set of IP telephony protocols — they are just the competing standards for signaling. Until recently there was a similar situation with gateway control protocols, with competition between IPDC (IP Device Control) and SGCP (Simple Gateway Control Protocol). Now, however, the industry appears to have agreed on a third standard, called MGCP (Media Gateway Control Protocol), which combines the advantages of its two predecessors. For details on which standards relate to signaling, encoding, transport and gateway control, see the protocol tables below.
|H.323 Protocol Suite|
|H.323 V2||ITU||Packet-based multimedia communications systems|
|H.225.0||ITU||Call signaling protocols and media stream packetization for packet-based multimedia (includes Q.931 and RAS)|
|H.225.0 Annex G||ITU||Gatekeeper to gatekeeper (inter-domain) communications|
|H.245||ITU||Control protocol for multimedia communications|
|H.235||ITU||Security and encryption for H-series multimedia terminals|
|H.450.x||ITU||Supplementary services for multimedia (call transfer, diversion, hold, park and pickup, call waiting, message waiting)|
|H.323 Annex D||ITU||Real-time fax using T.38|
|H.323 Annex E||ITU||Call connection over UDP|
|H.323 Annex F||ITU||Single-use device|
|T.38||ITU||Procedures for real-time group 3 facsimile communications over IP networks|
|T.120 series||ITU||Data protocols for multimedia conferencing|
|SIP Protocol Suite|
|SIP (RFC 2543)||IETF||Session initiation protocol|
|SDP (RFC 2327)||IETF||Session description protocol|
|SAP (Internet Draft)||IETF||Session announcement protocol|
|Pulse Code Modulation (PCM) Variants:|
|G.711||ITU||Pulse Code Modulation (PCM) 48 to 64kbps|
|G.726||ITU||Adaptive Differential PCM (ADPCM) 16 to 40kpbs|
|Codebook Excited Linear Prediction (CELP) Variants:|
|RTP (RFC 1889)||IETF||RTP: Real-time transport protocol|
|RTCP (RFC 1889)||IETF||RTCP: Real-time transport control protocol|
|RTSP (RFC 2324)||IETF||RTSP: Real-time streaming protocol|
|MGCP||IETF||Media gateway control protocol (Internet Draft)|
|MEGACO||IETF||MEGACO protocol (Internet Draft)|
|SGCP||IETF||Simple gateway control protocol (Internet Draft)|
|IPDC||IETF||IP device control (Internet Draft)|
|H.GCP||ITU||Proposed recommendations for gateway control protocol|
H.323 vs. SIP
Given that two standards currently compete for the dominance of IP telephony signaling, how do you decide which is more appropriate? The good news is that the two protocol suites appear to be converging — picking up good ideas from one another. In particular, the third and latest incarnation of H.323 (H.323 v3) has addressed some key performance issues (call setup delay and stateless processing to support UDP), which were initially key SIP advantages.
Most importantly, each suite supports (pretty much equally well) the majority of required end-user functions (including call setup and tear-down, call holding, call transfer, call forwarding, call waiting, conferencing and click-for-dial). The only functional differences are message waiting indication (which only H.323 supports), third-party control (e.g., a secretary placing a call on behalf of a manager, which only SIP supports, and certain conferencing functions. While the range of functions supported is similar, the H.323 v3 suite (by means of H.245) provides a somewhat more robust mechanism for “capabilities exchange” than does SIP, which relies on the less descriptive Session Description Protocol (SDP). “Capabilities Exchange” is the process by which it is determined whether a particular feature is supported by both participating entities.
However, functionality is by no means the only consideration in the H.323 vs. SIP debate. Equally important are Quality of Service (QOS), Scalability/Flexibility and Interoperability. Indeed, whereas the suites are relatively similar in terms of functionality, they differ quite substantially in these areas, as seen in the following table. Because SIP is a significantly less complex protocol, it is argued that it should scale better. This is an important consideration given that the Internet may well come to support 500 million IP telephony devices. However, it is my opinion that this potential advantage doesn’t sufficiently compensate for the protocol’s current weaknesses in terms of QOS and interoperability. Most importantly, SIP doesn’t provide for redundancy (making it unsuitable for carrier applications), doesn’t support the emerging Differentiated Services/Policy Management approach to QOS and has a limited interoperability testing track record (largely because the protocol is new, having only been ratified in February 1999).
|Features||Similar||H.323 v3 Better||SIP Better|
|Quality of Service and Management||Call setup delay, packet loss recovery, lack of resource reservation capability||Fault tolerance (H.323 supports redundant gatekeepers and endpoints), Admission Control (SIP relies on other protocols for bandwidth mgmt, call mgmt and bandwidth control), Policy Control (H.323 has limited DiffServ support vs. none for SIP)||Loop detection (SIP’s algorithm using “via header” somewhat superior to H.323’s PathValue approach)|
|Scalability and Flexibility||Stateless processing, UDP Support, Inter-server communications for endpoint location||Location of endpoints in other administrative domains (SIP does not define a method, but suggests use of DNS)||Complexity (SIP is less complex), Extensibility (SIP’s use of hierarchical feature names and error codes which can be IANA registered is more flexible than H.323’s vendor-specific single extension field “NonStandardParam”), Ease of customization (SIP less complex, and offers text-based protocol encoding)|
|Interoperability||PSTN Signaling Interoperability (SIP Internet Draft only, H.323 uses Q.931-like messages, which are SS7 compatible, though only a subset of ISUP messages), Inter-vendor interoperability (H.323 more mature, greater interoperability track record, IMTC iNOW! profile to assist implementation)|
Does the H.323 vs. SIP Debate Even Matter?
From a practical perspective, if you go out and research the IP telephony products that are available today you’ll find that many are still vendor specific, several support H.323 version 1, some support H.323 v2 and very few support either SIP or H.323 v3. This is particularly true of products targeted at enterprise solutions as opposed to carrier solutions (where H.323 support is more widespread).
In practice, what is likely to happen is that major vendors will support both approaches until it becomes clear either that one standard is going to die, or that the two are going to merge. That said, it is certainly worthwhile clarifying vendors’ intended strategies with respect to signaling. If you are particularly concerned with high availability and interoperability, then a SIP-oriented solution might be too bleeding edge right now. Otherwise, both approaches pretty much deliver the same in terms of functionality. The only area where there is a noticeable difference is in the implementation of conferencing capabilities. Because SIP can be used to invite multiple parties to join a call, simple conference calls can be initiated without the requirement for a conferencing server (whereas H.323 does require one). In practice however, whether or not this is a constraint depends on the full vendor solution and approach rather than just the protocols that happen to be used.
Strategies for Building VOIP Networks
How do you actually implement Voice over IP (VOIP)? There are a few different strategies available, including the following:
Simple toll bypass: Perhaps you just want to use IP to transport calls between offices within the corporate network. Such an approach requires little or no change to existing PBX, cabling and handset infrastructures, is relatively easy to implement and has no PSTN integration issues to consider.
Total IP telephony: Throw out your existing voice systems, replace the phone handsets with IP telephones that plug straight into 10BASE-T ports and implement LAN servers to provide (most of) the features your PBX once provided. Not for the faint of heart, but absolutely feasible with today’s technology.
IP-enabled PBXs: This is the intermediate route — don’t change the existing cabling or handset infrastructure, but upgrade the PBXs so that the organization’s core telephony systems speak IP telephony protocols. That means that PBX users can speak with other IP telephony users (e.g., PC-based NetMeeting users) as they become more prevalent — but it also means that your PBXs will rely on IP telephony gateways to communicate with the outside world (unless the PBXs themselves provide such functionality). Two ways to do this — either upgrade your existing PBXs or replace them with purpose built IP-PBXs.
The simplest of these strategies from an implementation perspective is probably the first, so we’ll begin with that approach and then explore the additional requirements of the other two strategies.
Simple Toll Bypass
IP telephony toll bypass solutions are relatively straightforward to implement — and pretty much similar to other forms of toll bypass (good old Time Division Multiplexing, Voice over Frame, and Voice over ATM). Before we get into the alternative approaches, let’s examine what it is that we’re likely to be replacing. The following diagram shows two interconnected PBXs.
The basic function of a PBX (or Private Branch Exchange), as the name suggests, is to connect phone calls coming in on trunk lines from the Public Switched Telephone Network (PSTN) to the particular extension associated with the called party (and similarly, in reverse for outbound calls). However, PBXs are not limited to simply switching calls between the PSTN and the extensions — they are equally capable of switching calls to extensions on other connected PBXs. In the good old days, these interconnections took the form of leased analog voice circuits — if there were likely to be 10 conversations occurring between two offices at any one time, that meant you needed 10 separate leased lines. While that approach may still be taken for interconnecting smaller offices, most PBX interconnections today are digital. These digital connections might be T1 circuits dedicated purely to the interconnection of PBXs or, more likely, they are channels allocated on a Time Division Multiplexing (TDM) backbone, which divides up bandwidth between voice, various data streams (probably including IP) and perhaps video conferencing. The problem with both dedicated voice tie lines and multiservice TDM backbones is that bandwidth must be permanently allocated (and paid for) for each voice circuit despite the fact that the voice circuits are not used all the time. A better solution is to split the traffic into packets (or “cells” or “frames”) so that all the traffic types can be interwoven in the most efficient manner. Each of the so called “voice over” technologies — voice over Frame Relay, voice over ATM and voice over IP are able to achieve this improved efficiency, but it is Voice over IP that best fits with most organizations’ convergence strategies.
How do you implement Voice over IP for toll bypass? The easiest approach to illustrate is simply to unplug the existing PBX tie line(s) and to plug them into a separate unit that converts the voice signaling and transport to an IP format. I call such units VOIP relays (they’re sometimes also referred to as VOIP gateways, but the gateway term is more commonly used for PSTN interconnection, as we’ll discuss later). The VOIP relay connects directly to a router for transport over the IP network, as shown in the following diagram.
Examples of stand-alone products that can be used in this way include Nortel’s V/IP and Nokia’s IP Relay. [Incidentally, as we move through this chapter I’ll mention products that illustrate concepts so as to help readers tie back theory to practice (and to provide a flavor for some of the vendors involved in different areas). However, it is emphasized that there are numerous vendors involved in IP telephony and no attempt is made here to provide a comprehensive list.]
There is no reason why the VOIP relay functionality need be provided in a separate unit — you might instead want to implement the functionality in the PBX or in the router itself. For example, both Lucent’s Definity PBXs and Nortel’s Magellan offer IP-trunking, while various Cisco routers can provide direct PBX interfaces (including the 1750, 2600 and 3600).
Regardless of which approach you take, there are three practical design issues that you’ll need to address. Firstly, you’ll need to make sure that from a functional perspective the VOIP relay will relay sufficient signaling information to support the features in use on the PBXs. Secondly, you’ll need to consider whether standards are important in your situation — while some of the products claim H.323 compliance, many still use proprietary schemes. This may not matter if your network is relatively small and you feel comfortable that the vendor is committed to standards, but give it consideration. Remember also that H.323 comes in three versions — I’ve not come across any products that currently support H.323 v3 (or SIP), but remember to check whether the H.323 support is v1 or v2. Thirdly, and perhaps most importantly, you’ll need to figure out how you’re going to offer the voice quality that’s required. This last aspect is a combination of the encoding scheme you choose and the QOS capabilities of the IP network and your VOIP relay’s ability to work with those QOS mechanisms. Since encoding and QOS requirements are substantial practical considerations in all IP telephony implementations.