VoIP Technology and Glossary
VoIP stands for Voice over IP (Internet Protocol), a variety of methods for establishing two-way multi-media communications over the Internet or other IP-based packet switched networks. Although VoIP systems are capable of some unique functions (for example: video conferencing, instant messaging, and multicasting), this appendix concentrates on the ways in which VoIP can be used to replicate the voice conversation functionality of the public switched telephone network (PSTN).
There are several competing approaches to implementing VoIP. Each makes use of a variety of protocols to handle signaling, data transfer, and other tasks. To help describe the similarities and differences between these approaches, consider the following simplified description of a telephone call under VoIP:
- Caller picks up the phone (his terminal), hears a dial tone and dials a destination number.
- Destination number is mapped to a destination IP address.
- Call setup routines are invoked, handled by signaling protocols. Depending on the VoIP standard in use, this may involve a device (or function) known as a Gateway, and may also involve a Gatekeeper.
- Destination phone generates a ring, the called party picks up the phone, and a two-way conversation is established.
- Data is moved between the two endpoints using a media protocol, the Real-time Transport Protocol (RTP). A codec (coder/decoder) is used to convert the sound of each caller’s voice to digital data, then back to analog audio signals at the other end.
- Conversation ends and the call is torn down. Again, this involves the signaling protocols appropriate to the particular implementation of VoIP, along with any Gateway or Gatekeeper functions.
Note that the instructions governing the call-the call setup and call teardown-are handled separately from the transmission of the actual data content of the call, or the encoding and packetization of voice media.
VoIP Network Hardware
VoIP systems make use of specialized hardware such as terminals (VoIP phones or other endpoints), and may include Gateways, Gatekeepers, or Multipoint Control Units (MCUs).
Terminal An endpoint device that provides communications services (User Interface).
Gateway A translation device that provides real-time bi-directional communication between terminals.
Gatekeeper An H.323 device that performs call control duties for terminals.
MCU Multipoint Control Unit, used to coordinate between three or more terminals.
Figure J.1 Gateway and Gatekeeper (GK) in H.323 VoIP call
A Gateway acts as the interface between the packet switched network (IP) and the circuit switched network (PSTN), translating formats between the two. It is responsible for call setup and teardown, compression/decompression and packetization of the voice or other media, and conversion between signaling and media types. A Gateway is sometimes a dedicated device but, more commonly, routers with “voice modules” act as gateways. Software in the router handles call setup/teardown, voice encoding, and so forth, with LAN connectivity provided through the regular router ports.
There are several different types of gateways. The Media Gateway (MG) terminates voice calls from the PSTN, packetizes and compresses voice data into data packets, and delivers the data packets to the IP network. The Media Gateway Controller controls registration and manages resources for Gateways. It communicates with the Central Office Switch via Signaling Gateways. A Signaling Gateway provides transparent connections between IP networks and switched networks (including SS7 termination), and may provide additional translation.
A Gatekeeper provides management for groups of H.323 devices known as zones. There is typically only one Gatekeeper per zone, but an installation may have one or more alternates for backup and load balancing. A Gatekeeper provides address translation, admission control, and bandwidth control for its zone. It may also provide call authorization and management services, as well as bandwidth management and directory services.
Gatekeepers are optional. (Microsoft NetMeeting for example, does not use Gatekeepers by default). It is most often a software application, but can also be integrated in a Gateway or terminal. If Gatekeepers are not used, then Gateways must be configured to talk directly to one another.
A Multipoint Control Unit (MCU) is an endpoint that typically supports conferences between three or more stations. It can be a stand-alone device, or integrated into a Gateway, Gatekeeper or terminal. The MCU consists of two functional entities: the Multipoint Controller (MC) and the Multipoint Processor (MP). The MC handles control and signaling for conference support. The MP receives and processes streams from endpoints, and returns them to the endpoints in the conference.
VoIP Protocols
Like every other aspect of Internet communications, VoIP has evolved rapidly since its introduction in 1995, and continues to evolve today. The standards show the influence of their creators: the traditional telecommunications players, the Internet community, and the communications equipment manufacturers such as Cisco and 3Com.
In rough chronological order of introduction, the most widely used VoIP systems are:
H.323 Developed by the International Telecommunications Union (ITU) and the Internet Engineering Task Force (IETF)
MGCP (Megaco) Developed by Cisco as an alternative to H.323
SIP Developed by 3Com as an alternative to H.323
SKINNY A Cisco proprietary system allowing skinny clients to communicate with H.323 systems, by off-loading some functions to a Call Manager.
Each of these approaches involves the use of multiple protocols. In the sections below, we split these software tools into three groups: Signaling protocols, Media protocols, and Codecs. The media protocols (RTP and RTCP) are common to all types of VoIP, and the codecs are also widely used. The principle distinction between one VoIP setup and another is their use of signaling protocols and related devices or functions, such as Gateways and Gatekeepers.
Signaling protocols
In VoIP communication, the signaling that controls the conversation is distinct from the actual stream of data carrying the voice content of the conversation. The principle families of VoIP signaling protocols are described briefly below.
Note: The data streams of VoIP are carried in connectionless UDP packets. Many setups use UDP for signaling also, but some require the connection-oriented TCP instead, and few permit either TCP or UDP for signaling.
H.323 protocols suite
H.323 is an ITU-T standard that provides multimedia video conferencing, voice, and data capability for use over packet-switched networks. It is the most widely deployed VoIP protocol in enterprise and carrier markets.
-
-
- H.225.0 defines the call signaling between endpoints and the Gatekeeper
-
-
-
- H.225.0 Annex G and H.501 define the procedures and protocol for communication within and between Peer Elements
-
-
-
- H.245 is the protocol used to control establishment and closure of media channels within the context of a call and to perform conference control
-
-
-
- H.460.x is a series of version-independent extensions to the base H.323 protocol
-
-
-
- T.120 specifies how to do data conferencing
-
-
-
- T.38 defines how to relay fax signals
-
-
-
- V.150.1 defines how to relay modem signals
-
-
-
- H.235 defines security within H.323 systems
-
-
-
- X.680 defines the ASN.1 syntax used by the Recommendations
-
-
- X.691 defines the Packed Encoding Rules (PER) used to encode messages for transmission on the network
MGCP
Media Gateway Control Protocol is used for controlling telephony gateways from external call control elements called media gateway controllers or call agents. A telephony gateway is a network element that provides conversion between the audio signals carried on telephone circuits and data packets carried over the Internet or over other packet networks.
MEGACO (H.248)
Media Gateway Control protocol (H.248) is used between elements of a physically decomposed multimedia gateway. This protocol creates a general framework suitable for gateways, multipoint control units (MCUs) and interactive voice response units (IVRs).
SGCP
Simple Gateway Control Protocol (SGCP) is used to control telephony gateways from external call control elements.
SIP
Session Initiation Protocol (SIP) is used to initiate VoIP connections. SIP provides the necessary protocol mechanisms so that the end user systems and proxy servers can provide different services such as call forwarding, called and calling number identification, and caller and called authentication. See IETF RFC 2543.
SKINNY (SCCP)
As a generic computing term, “skinny” refers to a device with fewer features or functions than the common or “fat” version of the same device. In VoIP, SKINNY is a proprietary Cisco system intended to allow skinny clients to communicate with H.323 VoIP systems, by placing most of the required H.323 processing capabilities in an intervening device called a Call Manager. The skinny client and the Call Manager use a simple messaging set called Skinny Client Control Protocol (SCCP) to communicate with each other over TCP/IP. SKINNY systems use a proxy for the H.225 and H.245 signalling, and use RTP/UDP/IP for audio.
Media protocols
RTP and RTCP (RFC 3550) are used to transmit media such as audio and video over IP networks. RTP and RTCP are carried in UDP packets.
RTP
The Real-time Transport Protocol (RTP) provides end-to-end network transport functions suitable for applications transmitting real-time data such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers.
RTCP
The RTP Control Protocol (RTCP) is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. The underlying protocol must provide multiplexing of the data and control packets, for example using separate port numbers with UDP.
Codecs
A codec (coder/decoder) handles the conversion of analog signals to digital form, and back again. VoIP systems may use any of a wide variety of codecs for voice, video, or both. In VoIP, the codec used is often referred to as the encoding method or the payload type for the RTP packet. Codec designers seek to optimize among three primary factors: the speed of the encoding/decoding operations (packetization delay), the quality and fidelity of sound and/or video signal, and the size of the resulting encoded data stream. In Table J.1, note that the Data Rate column refers to the compressed (encoded) data, while the Bandwidth column describes the uncompressed audio data equivalent delivered by the codec.
OmniPeek can correctly identify and perform analysis based on a wide range of VoIP codecs. It can also play back and perform passive MOS (Mean Opinion Score) analysis on the most commonly used voice codecs, as shown in Table J.3.