H.323 versus SIP: A Comparison
To counter this misinformation, we decided to put together this thorough, up-to-date comparison. As with ours, please consider the financial interests of the source of any information on this subject, be it an author, speaker, institution, forum, company, web site, or conference. Are the people providing information on this issue involved in both of these—and other—protocols and have nothing besides perhaps an honest academic interest in one or the other protocol, or have they otherwise "hitched their wagon" to one?
Like everything else on the web, this is a living document which we will be updating as the standards evolve. In fact, there is much work in progress for both H.323 and SIP, but, in order to compare apples to apples and make this comparison meaningful, we have chosen to focus on what is currently defined rather than on what might be defined in the future. Also, note that commentary that is not vital to the main comparison text appears in a smaller font immediately below it.
H.323 | SIP | |
---|---|---|
Philosophy |
H.323 was designed with a good understanding of the requirements
for multimedia communication over IP networks, including audio,
video, and data conferencing. It defines an entire, unified system
for performing these functions, leveraging the strengths of the
IETF and
ITU-T protocols.
As a result, it might be reasonable for users to expect about the same level of robustness and interoperability as is found on the PSTN today, although this admittedly varies across the globe. H.323 was designed to scale to add new functionality. The most widely deployed use of H.323 is "Voice over IP" followed by "Videoconferencing", both of which are described in the H.323 specifications. |
SIP was designed to setup a "session" between two points
and to be a modular, flexible component of the Internet
architecture. It has a loose concept of a call (that being a
"session" with media streams), has no support for
multimedia conferencing, and the integration of sometimes
disparate standards is largely left up to each vendor.
As a result, SIP is now a 14-year old protocol with a vast number of interoperability problems. While SIP has been successfully deployed in some environments, those are generally "closed" environments where the means of interoperability has been PSTN gateways. |
Complexity |
H.323 is limited to multimedia conferencing, so the complexity of
the system is constrained accordingly. No communication system is
simple, but H.323 attempts to clearly define the basic set of
functionality that all devices must support.
|
SIP was initially focused on voice communication and then expanded
to include video, application sharing, instant messaging, presence, etc.
With each capability, complexity increases and,
unfortunately, there are no strict guidelines as to what functionality
any given device must support. This leads to more complex systems
with more interoperability problems. Since SIP was "marketed" as a
simple protocol, in spite of the fact it only looks simple on
the surface, we suggest you refer to the
SIP Myths page.
|
Reliability |
H.323 has defined a number of features to handle failure of
intermediate network entities, including "alternate
gatekeepers", "alternate endpoints", and a means of
recovering from connection failures.
|
SIP has not defined procedures for handling device failure. If a
proxy fails, the user agent detects this through timer expiration.
It is the responsibility of the user-agent to send a re-INVITE to
another proxy, leading to long delays in call establishment.
|
Message Definition |
ASN.1,
a standardized, extremely precise, easy-to-understand
structural notation that is used by many other systems.
|
ABNF, or Augmented Backus-Naur Form, a syntactical
notation. SIP uses the ABNF as defined in RFC 2234.
|
Message Encoding |
H.323 encodes messages in a compact binary format that is suitable
for narrowband and broadband connections. Messages are
efficiently encoded and decoded by machines, with decoders widely
available (e.g., Ethereal).
|
SIP messages are encoded in ASCII text format, suitable for
humans to read. As a consequence, the messages are large and less
suitable for networks where bandwidth, delay, and/or processing
are a concern. SIP messages get so large that they sometimes exceed the MTU size when going over WAN links, resulting in delays, packet loss, etc. As a result, effort has been made to binary encode SIP (e.g., RFC 3485 and RFC 3486). |
Media Transport |
RTP/RTCP, SRTP
|
RTP/RTCP, SRTP
|
Extensibility - Vendor Specific |
H.323 is extended with non-standard features in such a way as to
avoid conflicts between vendors. Globally unique identifiers
prevent feature and data element collision.
|
SIP is extended by adding new header lines or message bodies that
may be used by different vendors to serve different purposes, thus
risking interoperability problems.
The risk is admittedly small, but this problem has already been seen in the real world with similar extension schemes. |
Extensibility - Standard |
H.323 is extended by the standards community to add new features
to H.323 in such a way as to not impact existing features.
However, new revisions of H.323 are published periodically, which
introduce new functionality that is mandatory, yet done in such a
way as to preserve backward compatibility.
|
SIP is extended by the standards community to add new features to
SIP in such a way as to not impact existing features. However, new
revisions of SIP are potentially not backward compatible (e.g.,
RFC 3261 was not entirely compatible with RFC 2543). In
addition, several extensions are "mandatory" in some
implementations, which cause interoperability problems.
|
Scalability - Load Balancing |
H.323 has the ability to load balance endpoints across a number of
alternate gatekeepers in order to scale a local point of presence.
In addition, endpoints report their available and total capacity
so that calls going to a set of gateways, for example, may be best
distributed across those gateways.
|
SIP has no notion of load balancing, except "trial and
error" across pre-provisioned devices or devices learned from
DNS SRV records. There is no means of detecting the load on a
particular gateway or to know whether a device has failed, meaning
that proxies simply have to try a PSTN gateway, wait for the call
to timeout, and then try another.
|
Scalability - Call Signaling |
When an H.323 gatekeeper is used, it may simply provide address
resolution through one RAS message exchange, or it may route all
call signaling traffic. In large networks, the direct call model
may be used so that endpoints connect directly to one another.
|
When using a SIP proxy to perform address resolution for the SIP
device, the proxy is required to handle at least 3 full message
exchanges for every call. In large networks, such as
IMS networks, the number of messages on the wire may be
excessive. A basic call between two users may require as many as
30 messages on the wire!
|
Scalability - Statelessness |
An H.323 gatekeeper can be stateless using the direct call model.
|
A SIP proxy can be stateless if it does not fork, use TCP, or use
multicast.
|
Scalability - Address Resolution |
H.323 defines an interface between the endpoint and gatekeeper for
address resolution using ARQ or LRQ. The H.323 gatekeeper may use
any number of protocols to discover the destination address of the
callee, including LRQs to other gatekeepers, Annex
G/H.225.0, TRIP, ENUM,
and/or DNS. The endpoint does not have to be
concerned with the mechanics of this process, and the processing
requirements for address resolution placed on the gatekeeper by
H.323 are for just a single message exchange.
Although out of scope of H.323, an H.323 endpoint may perform its own address resolution using ENUM and/or DNS and then place a direct call to the resolved address or provide the resolved address to the gatekeeper as an "alias". |
While SIP has no address-resolution protocol, per se, a SIP user
agent may route its INVITE message through a proxy or redirect
server in order to resolve addresses. The SIP proxy may use
various protocols to discover the destination address of the
callee, including TRIP,
ENUM, and/or |REFREF|1035||DNS|. The endpoint does
not have to be concerned with the mechanics of this process.
Unfortunately, the processing requirements placed on the SIP proxy
are higher than with H.323 because at least 3 message exchanges
must take place between the SIP device, SIP proxy, and the next
hop.
Although out of scope of SIP, a SIP user agent may perform its own address resolution using ENUM and/or DNS and then place a direct call to the resolved address or through a proxy. |
Addressing |
Flexible addressing mechanisms, including URIs, e-mail addresses,
and E.164 numbers.
H.323 supports these aliases:
|
SIP only understands URI-style addresses. This works fine for
SIP-SIP devices, but causes some confusion when trying to
translated various dialed digits. The unofficial
convention is that a "+" sign is inserted in the SIP URI
(e.g., "sip:+18005551212@example.com") in order to
indicate that the number is in E.164 format, versus a user ID that
might be numeric. SIP has support for overlapped signaling defined in RFC 3578, though additional digit received requires transmission of three messages on the wire (a new INVITE, a 484 response to indicate that the address is incomplete, and an ACK). |
Billing |
Even with H.323's direct call model, the ability to successfully
bill for the call is not lost because the endpoint reports to the
gatekeeper the beginning and end time of the call via the RAS
protocol. Various pieces of billing information may be
present in the ARQ and DRQ messages at the start and end of the
call.
|
If the SIP proxy wants to collect billing information, it has no
choice but to stay in the call signaling path for the entire
duration of the call so that it can detect when the call
completes. Even then, the statistics are skewed because the call
signaling may have been delayed. Otherwise, there is no mechanism
in SIP to perform any accounting/billing function.
|
Call Setup |
A call can be established in as few as 1.5 round trips using UDP:
Setup ->
Of course, more elaborate call establishment procedures may be
required to negotiate complex capabilities, negotiate complex
video modes, etc.
<- Connect Ack -> |
A call can be established in as few as 1.5 round trips using UDP:
INVITE ->
Most real-world flows are more complex, as they often pass through
one or more proxy devices, have intermediary response messages,
and "negotiate" capabilities through a "trial and
error" process that is far from scientific. Here is a more
real-life SIP call flow.
<- 200 OK Ack -> |
Capability Negotiation |
H.323 entities may exchange capabilities and negotiate which
channels to open, including audio, video, and data channels.
Individual channels may be opened and closed during the call
without disrupting the other channels.
|
SIP entities have limited means of exchanging capabilities.
RFC 3407 is the state of the art, which is more or less a
"declaration" mechanism, not a negotiation procedure.
The end result is still a "trial and error" approach in
case the called party does not support the proposed media.
|
Call Forking |
H.323 gatekeeper can control the call signaling and may fork the
call to any number of devices simultaneously.
|
SIP proxies can control the call signaling and may fork the call
to any number of devices simultaneously.
|
PSTN Interworking |
H.323 borrows from traditional PSTN protocols, e.g., Q.931, and is
therefore well suited for PSTN integration. However, H.323 does
not employ the PSTN's circuit-switched technology--like
SIP, H.323 is completely packet-switched. How Media Gateway
Controllers fit into the overall H.323 architecture is
well-defined within the standard.
|
SIP has no commonality with the PSTN and such signaling must be
"shoe-horned" into SIP. SIP has no architecture that describes
the decomposition of the gateway into the Media Gateway Controller
and the Media Gateways. This has been a recent study of 3GPP and
others in the form of IMS. Presently, there are about 4
"IMS" variants: 3GPP, ITU NGN, 3GPP2, and PacketCable.
Pick the architecture you like best, I suppose.
|
Services |
Services may be provided to the endpoint through a web-browser
interface using HTTP or a feature server using Megaco/H.248. In
addition, services may be provided to an endpoint as it places a
call, as a call arrives, or during the middle of a call by a
gatekeeper or other entity that routes the call signaling. As a
result, H.323 is well-suited to providing new services.
|
SIP devices can receive service from a SIP proxy as the endpoint
places a call, as a call arrives, or during the middle of a call.
There is no defined way within SIP of providing services via a web
browser or a feature server, as everything is done within the
context of a "session".
One may provide ad-hoc services through other means, such as XML, SOAP, or CPL. However, there are no standards for this. |
Video and Data Conferencing |
H.323 fully supports video and data conferencing. Procedures are
in place to provide control for the conference as well as lip
synchronization of audio and video streams.
|
SIP has limited support for video and no support for data
conferencing protocols like T.120. SIP has no protocol to
control the conference and there is no mechanism within SIP
for lip synchronization. There is no standard means of recovering
from packet loss in a video stream (to parallel H.323's
"video fast update" command).
|
Administrative Requirements |
H.323 does not require a gatekeeper. A call can be made directly
between two endpoints.
However, most devices do utilize a gatekeeper for the purpose of registration and address resolution. |
SIP does not require a proxy. A call can be made directly between
two user agents. However, most devices do utilize a SIP proxy for the purpose of registration, address resolution, and call routing. |
Codecs |
H.323 supports any codec, standardized or proprietary. No
registration authority is required to use any codec in H.323.
|
SIP supports any IANA-registered codec (as a legacy feature) or
other codec whose name is mutually agreed upon.
|
Firewall/NAT support | Provided by H.323 "proxy" or by the endpoint, both in conjunction with a gatekeeper residing in the public network. H.323 also supports direct point-to-point media flows between devices that are located behind a NAT/FW. Refer to H.460.17, H.460.18, H.460.19, H.460.23, and H.460.24. |
SIP does not define a NAT/FW traversal mechanism, as
this is left to other standard. Some standards that have been
defined or are being defined are STUN,
TURN, ANAT, and ICE.
ANAT is popular as a means of addressing IPv4/IPv6 interworking
and appears to be widely implemented. As of January 2011,
ICE is still not so widely adopted.
|
Transport protocol |
Reliable or unreliable, e.g., TCP or UDP. Most H.323 entities use
a reliable transport for signaling.
|
Reliable or unreliable, e.g., TCP or UDP. Most SIP entities use an
unreliable transport for signaling.
|
Loop Detection |
Routing gatekeepers can detect loops by looking at the
CallIdentifier and destinationAddress fields in call-processing
messages. If the combination of these matches an existing call, it
is a loop. Infinite loops may be prevented by utilizing the
hopCount field in the SETUP message.
|
The Via header facilitates this. However, there has been talk
about deprecating Via as a means of loop detection due to its
complexity. Instead, the Max-Forwards header seems to be the
preferred method of limiting hops and therefore loops. In November
2005, a
presentation
was given on issues with max-forwards. So, what is the right solution?
|
Multicast Signaling |
Yes, location requests (LRQ) and auto gatekeeper discovery (GRQ).
|
Yes, e.g., through group INVITEs.
|
Third-party Call Control |
Yes, through third-party pause and re-routing which is defined
within H.323. More sophisticated control is defined by the
related H.450.x series of standards.
|
Yes, through SIP as described in RFC 3725.
|
Minimum Ports for VoIP Call | 3 (Call signaling, RTP, and RTCP.) | 3 (SIP, RTP, and RTCP.) |
Conferencing Entity |
Yes, an MC is required for this, but it could be co-located in a
participating endpoint, or all endpoints could contain an MC. A
stand-alone conference bride may provide this functionality and
H.323 has well-defined procedures for such entities.
What distinguishes H.323 is not that it requires yet another onerous physical entity for conferencing (it does not) but that it just has a name for this functionality, an "MC," and that it provides a flexible means of implementing that functionality. |
No; however, SIP user agents may perform conferencing
themselves. A stand-alone conference bridge may also provide
this functionality.
|
Original Title |
"VISUAL TELEPHONE SYSTEMS AND EQUIPMENT FOR LOCAL AREA NETWORKS WHICH PROVIDE A NON-GUARANTEED QUALITY OF SERVICE"
It is now, "Packet-based multimedia communications systems." Despite the word, "VISUAL," in the original title, H.323 has never described just a videoconferencing solution--support for video and data has always been optional. And the reference to LANs may be misleading because H.323 was intended from the start to support simple and "complex topologies" and not just single-segment networks, which "LOCAL AREA NETWORKS" may imply. |
"Application-level protocol for inviting users to
multimedia conferences [emphasis ours]" It is now, "SIP: Session Initiation Protocol." Note that the "multimedia conferences" referred to in the original title are loosely coupled multicast conferences, à la MBone. This is because SIP was intended to be just a point-to-point version of SAP and not the "carrier-class solution addressing a wide area" that many would have you believe. |
Lineage |
H.323 is based on H.324, not H.320.
However, H.324 was designed to be a better H.320.
|
SIP is frequently allied with the Internet and the World Wide Web
by way of HTTP.
|
Open-source projects |
Yes, e.g., H.323 Plus.
|
Yes, e.g., Opal.
|
Media Topology | Unicast, multicast, star, and centralized. | Unicast, multicast, star, and centralized. |
Authentication |
Yes, via H.235.
|
Yes, via HTTP (Digest and Basic), SSL, PGP, S/MIME, or various
other means.
|
Encryption |
Yes, via H.235 (including use of SRTP, TLS, IPSec, etc.).
|
Yes, via SSL, PGP, S/MIME, or various other means.
|
DTMF Carriage |
H.245 User Input Indication, RFC 4733, or via the audio stream.
The alphanumeric choice of the H.245 UserInputIndication message is the
baseline carriage common to all H.323 endpoints, so
interoperability is assured.
|
There is no baseline carriage, which presents issues of
interoperability. Transport of DTMF via the INFO method,
RFC 4733, KPML, or the audio stream are all options.
|
Standards Documents |
Refer to the H.323 Information Site.
|
Refer to the SIP Information Site. |