Provided by Colasoft Co., Ltd.

RTP Audio ( Real-time Transport Protocol Audio )

Home > Protocols > RTP Audio Update: 2007-02-26 11:45:11    I have words to say about this protocol
On this page
SUMMARY
Protocol : Real-time Transport Protocol Audio
Protocol suite : TCP/IP
Layer : Application Layer
SNMP MIBs : iso.org.dod.internet.mgmt.mib-2.rtpMIB (1.3.6.1.2.1.10.87).
iso.org.dod.internet.mgmt.mib-2.rohcRtpMIB (1.3.6.1.2.1.114)
Ports : 5004 (UDP)
Related protocols : RTP,
UDP,
TCP,
RTSP,
RTCP
Working groups : AVT, Audio/Video Transport
DESCRIPTION
Encoding-Independent Rules
Since the ability to suppress silence is one of the primary motivations for using packets to transmit voice, the RTP header carries both a sequence number and a timestamp to allow a receiver to distinguish between lost packets and periods of time when no data was transmitted. Discontiguous transmission (silence suppression) may be used with any audio payload format. Receivers must assume that senders may suppress silence unless this is restricted by signaling specified elsewhere. (Even if the transmitter does not suppress silence, the receiver should be prepared to handle periods when no data is present since packets may be lost.)

Some payload formats define a "silence insertion descriptor" or "comfort noise" frame to specify parameters for artificial noise that may be generated during a period of silence to approximate the background noise at the source. For other payload formats, a generic Comfort Noise (CN) payload format is specified in RFC 3389 [9]. When the CN payload format is used with another payload format, different values in the RTP payload type field distinguish comfort-noise packets from those of the selected payload format.

For applications which send either no packets or occasional comfort- noise packets during silence, the first packet of a talkspurt, that is, the first packet after a silence period during which packets have not been transmitted contiguously, should be distinguished by setting the marker bit in the RTP data header to one. The marker bit in all other packets is zero. The beginning of a talkspurt MAY be used to adjust the playout delay to reflect changing network delays. Applications without silence suppression MUST set the marker bit to zero.

The RTP clock rate used for generating the RTP timestamp is independent of the number of channels and the encoding; it usually equals the number of sampling periods per second. For N-channel encodings, each sampling period (say, 1/8,000 of a second) generates N samples. (This terminology is standard, but somewhat confusing, as the total number of samples generated per second is then the sampling rate times the channel count.)

If multiple audio channels are used, channels are numbered left-to- right, starting at one. In RTP audio packets, information from lower-numbered channels precedes that from higher-numbered channels.

For more than two channels, the convention followed by the AIFF-C audio interchange format should be followed, using the following notation, unless some other convention is specified for a particular encoding or payload format:

1left
Rright
Ccenter
Ssurround
Ffront
Rrear


channels description

1 2 3 4 5 6
_________________________________________________
2 stereo l r
3 l r c
4 l c r S
5 Fl Fr Fc Sl Sr
6 l lc c r rc S


Note: RFC 1890 defined two conventions for the ordering of four audio channels. Since the ordering is indicated implicitly by the number of channels, this was ambiguous. In this revision, the order described as "quadrophonic" has been eliminated to remove the ambiguity. This choice was based on the observation that quadrophonic consumer audio format did not become popular whereas surround-sound subsequently has.


Operating Recommendations
The following recommendations are default operating parameters. Applications SHOULD be prepared to handle other values. The ranges given are meant to give guidance to application writers, allowing a set of applications conforming to these guidelines to interoperate without additional negotiation. These guidelines are not intended to restrict operating parameters for applications that can negotiate a set of interoperable parameters, e.g., through a conference control protocol.

For packetized audio, the default packetization interval should have a duration of 20 ms or one frame, whichever is longer, unless otherwise noted in Table 1 (column "ms/packet"). The packetization interval determines the minimum end-to-end delay; longer packets introduce less header overhead but higher delay and make packet loss more noticeable. For non-interactive applications such as lectures or for links with severe bandwidth constraints, a higher packetization delay MAY be used. A receiver should accept packets representing between 0 and 200 ms of audio data. (For framed audio encodings, a receiver should accept packets with a number of frames equal to 200 ms divided by the frame duration, rounded up.) This restriction allows reasonable buffer sizing for the receiver.

Guidelines for Sample-Based Audio Encodings
In sample-based encodings, each audio sample is represented by a fixed number of bits. Within the compressed audio data, codes for individual samples may span octet boundaries. An RTP audio packet may contain any number of audio samples, subject to the constraint that the number of bits per sample times the number of samples per packet yields an integral octet count. Fractional encodings produce less than one octet per sample.

The duration of an audio packet is determined by the number of samples in the packet.

For sample-based encodings producing one or more octets per sample, samples from different channels sampled at the same sampling instant should be packed in consecutive octets. For example, for a two-channel encoding, the octet sequence is (left channel, first sample), (right channel, first sample), (left channel, second sample), (right channel, second sample), .... For multi-octet encodings, octets should be transmitted in network byte order (i.e., most significant octet first).

The packing of sample-based encodings producing less than one octet per sample is encoding-specific.

The RTP timestamp reflects the instant at which the first sample in the packet was sampled, that is, the oldest information in the packet.

Guidelines for Frame-Based Audio Encodings
Frame-based encodings encode a fixed-length block of audio into another block of compressed data, typically also of fixed length. For frame-based encodings, the sender may choose to combine several such frames into a single RTP packet. The receiver can tell the number of frames contained in an RTP packet, if all the frames have the same length, by dividing the RTP payload length by the audio frame size which is defined as part of the encoding. This does not work when carrying frames of different sizes unless the frame sizes are relatively prime. If not, the frames must indicate their size.

For frame-based codecs, the channel order is defined for the whole block. That is, for two-channel audio, right and left samples should be coded independently, with the encoded frame for the left channel preceding that for the right channel.

All frame-oriented audio codecs should be able to encode and decode several consecutive frames within a single packet. Since the frame size for the frame-oriented codecs is given, there is no need to use a separate designation for the same encoding, but with different number of frames per packet.

RTP packets shall contain a whole number of frames, with frames inserted according to age within a packet, so that the oldest frame (to be played first) occurs immediately after the RTP packet header. The RTP timestamp reflects the instant at which the first sample in the first frame was sampled, that is, the oldest information in the packet.

Audio Encodings
Properties of Audio Encodings (N/A: not applicable; var.: variable)
Namesample/framebits/samplesampling ratems/framedefault ms/packet
DVI4sample4var.20
G722sample816,00020
G723frameN/A8,0003030
G726-40sample58,00020
G726-32sample48,00020
G726-24sample38,00020
G726-16sample28,00020
G728frameN/A8,0002.520
G729frameN/A8,0001020
G729DframeN/A8,0001020
G729EframeN/A8,0001020
GSMframeN/A8,0002020
GSM-EFRframeN/A8,0002020
L8sample8var.20
L16sample16var.20
LPCframeN/A8,0002020
MPAframeN/Avar.var.
PCMAsample8var.20
PCMUsample8var.20
QCELPframeN/A8,0002020
VDVIsamplevar.var.20


The characteristics of the audio encodings described in this document are shown in Table 1; they are listed in order of their payload type in Table 4. While most audio codecs are only specified for a fixed sampling rate, some sample-based algorithms (indicated by an entry of "var." in the sampling rate column of Table 1) may be used with different sampling rates, resulting in different coded bit rates. When used with a sampling rate other than that for which a static payload type is defined, non-RTP means beyond the scope of this memo must be used to define a dynamic payload type and must indicate the selected RTP timestamp clock rate, which is usually the same as the sampling rate for audio.


Top of Page

EXAMPLES

Top of Page


PROTOCOL RELATIONS
Parent layer
Child layer
Top of Page

GLOSSARY
CSRC
CSRC (Contributing source) is a source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the SSRC identifiers of the sources that contributed to the generation of a particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier.

H.323
H.323 is an umbrella recommendation from the ITU-T, that defines the protocols to provide audio-visual communication sessions on any packet network. It is currently implemented by various Internet real-time applications such as NetMeeting and GnomeMeeting. It is a part of the H.32x series of protocols which also address communications over ISDN, PSTN or SS7. H.323 is commonly used in Voice over IP (VoIP) and IP-based videoconferencing.

Mixer
Mixer is an intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet. Since the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments among the streams and generate its own timing for the combined stream. Thus, all data packets originating from a mixer will be identified as having the mixer as their synchronization source.

Monitor
Monitor is an application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics. The monitor function is likely to be built into the application(s) participating in the session, but may also be a separate application that does not otherwise participate and does not send or receive the RTP data packets. These are called third party monitors.


Non-RTP means
Non-RTP means are protocols and mechanisms that may be needed in addition to RTP to provide a usable service. In particular, for multimedia conferences, a control protocol may distribute multicast addresses and keys for encryption, negotiate the encryption algorithm to be used, and define dynamic mappings between RTP payload type values and the payload formats they represent for formats that do not have a predefined payload type value.

RTCP
The RTP control protocol (RTCP) is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. The underlying protocol must provide multiplexing of the data and control packets.

RTCP packet
RTCP packet is a control packet consisting of a fixed header part similar to that of RTP data packets, followed by structured elements that vary depending upon the RTCP packet type. Typically, multiple RTCP packets are sent together as a compound RTCP packet in a single packet of the underlying protocol; this is enabled by the length field in the fixed header of each RTCP packet.

RTP
RTP (Real-Time Transport Protocol) is an Internet protocol for transmitting real-time data such as audio and video. RTP itself does not guarantee real-time delivery of data, but it does provide mechanisms for the sending and receiving applications to support streaming data. Typically, RTP runs on top of the UDP protocol, although the specification is general enough to support other transport protocols.

RTP packet
RTP packet is a data packet consisting of the fixed RTP header, a possibly empty list of contributing sources (see below), and the payload data. Some underlying protocols may require an encapsulation of the RTP packet to be defined.

SIP
Session Initiated Protocol (SIP) is an application-layer control protocol; a signaling protocol for Internet Telephony. SIP can establish sessions for features such as audio/videoconferencing, interactive gaming, and call forwarding to be deployed over IP networks, thus enabling service providers to integrate basic IP telephony services with Web, e-mail, and chat services.

SSRC
Synchronization source (SSRC) is the source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.

Translator
Translator is an intermediate system that forwards RTP packets with their synchronization source identifier intact. Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application- level filters in firewalls.

Transport address
The Transport Address is traditionally defined by Network Layer address, Transport Layer protocol and Transport Layer port number. In the case of SCTP running over IP, a transport address is defined by the combination of an IP address and an SCTP port number (where SCTP is the Transport protocol).

UDP
UDP (User Datagram Protocol) is a connectionless protocol that, like TCP, runs on top of IP networks. Unlike TCP/IP, UDP/IP provides very few error recovery services, offering instead a direct way to send and receive datagrams over an IP network. It's used primarily for broadcasting messages over a network.

Unicast
Unicast is a communication that takes place over a network between a single sender and a single receiver.

Top of Page

REFERENCES
RFCs:
[RFC 2029] RTP Payload Format of Sun's CellB Video Encoding.
                
[RFC 2032] RTP Payload Format for H.261 Video Streams.
                
[RFC 2190] RTP Payload Format for H.263 Video Streams.
                
[RFC 2198] RTP Payload for Redundant Audio Data.
                
[RFC 2250] RTP Payload Format for MPEG1/MPEG2 Video.
                Obsoletes: RFC 2038.
                
[RFC 2343] RTP Payload Format for Bundled MPEG.
                
[RFC 2429] RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+).
                
[RFC 2431] RTP Payload Format for BT.656 Video Encoding.
                
[RFC 2435] RTP Payload Format for JPEG-compressed Video.
                Obsoletes: RFC 2035.
                
[RFC 2508] Compressing IP/UDP/RTP Headers for Low-Speed Serial Links.
                
[RFC 2658] RTP Payload Format for PureVoice(tm) Audio.
                
[RFC 2733] An RTP Payload Format for Generic Forward Error Correction.
                
[RFC 2736] Guidelines for Writers of RTP Payload Format Specifications.
                
[RFC 2762] Sampling of the Group Membership in RTP.
                
[RFC 2833] RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals.
                
[RFC 2862] RTP Payload Format for Real-Time Pointers.
                
[RFC 2959] Real-Time Transport Protocol Management Information Base.
                Defines SNMP MIB iso.org.dod.internet.mgmt.mib-2.rtpMIB (1.3.6.1.2.1.10.87).
                
[RFC 3009] Registration of parityfec MIME types.
                
[RFC 3016] RTP Payload Format for MPEG-4 Audio/Visual Streams.
                
[RFC 3047] RTP Payload Format for ITU-T Recommendation G.722.1.
                
[RFC 3095] RObust Header Compression (ROHC): Framework and four profiles: RTP, UDP, ESP, and uncompressed.
                
[RFC 3119] A More Loss-Tolerant RTP Payload Format for MP3 Audio.
                
[RFC 3158] RTP Testing Strategies.
                
[RFC 3189] RTP Payload Format for DV (IEC 61834) Video.
                
[RFC 3190] RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio.
                
[RFC 3267] Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs.
                
[RFC 3389] Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN).
                Defines MIME media subtype audio/CN.
                Defines RTP payload type CN.
                
[RFC 3497] RTP Payload Format for Society of Motion Picture and Television Engineers (SMPTE) 292M Video.
                Defines MIME media subtype video/SMPTE292M.
                
[RFC 3558] RTP Payload Format for Enhanced Variable Rate Codecs (EVRC) and Selectable Mode Vocoders (SMV).
                Defines MIME media subtypes audio/EVRC, audio/EVRC0, audio/SMV and audio/SMV0.
                
[RFC 3545] Enhanced Compressed RTP (CRTP) for Links with High Delay, Packet Loss and Reordering.
                
[RFC 3550] RTP: A Transport Protocol for Real-Time Applications.
                Obsoletes: RFC 1889.
                
[RFC 3551] RTP Profile for Audio and Video Conferences with Minimal Control.
                Obsoletes: RFC 1890.
                
[RFC 3555] MIME Type Registration of RTP Payload Formats.
                Updated by: RFC 3625.
                
[RFC 3557] RTP Payload Format for European Telecommunications Standards Institute (ETSI) European Standard ES 201 108 Distributed Speech Recognition Encoding.
                Defines MIME media subtype audio/dsr-es201108.
                
[RFC 3640] RTP Payload Format for Transport of MPEG-4 Elementary Streams.
                
[RFC 3711] The Secure Real-time Transport Protocol (SRTP).
                Defines RTP profile RTP/SAVP.
                
[RFC 3816] Definitions of Managed Objects for RObust Header Compression (ROHC).
                iso.org.dod.internet.mgmt.mib-2.rohcMIB (1.3.6.1.2.1.112)
                iso.org.dod.internet.mgmt.mib-2.rohcUncmprMIB (1.3.6.1.2.1.113)
                iso.org.dod.internet.mgmt.mib-2.rohcRtpMIB (1.3.6.1.2.1.114)
                
[RFC 3952] Real-time Transport Protocol (RTP) Payload Format for internet Low Bit Rate Codec (iLBC) Speech.
                Defines MIME media subtype audio/iLBC.
                
[RFC 3984] RTP Payload Format for H.264 Video.
                Defines MIME media subtype video/H264.
                
[RFC 4040] RTP Payload Format for a 64 kbit/s Transparent Call.
                Defines MIME media subtype audio/clearmode.
                
[RFC 4060] RTP Payload Formats for European Telecommunications Standards Institute (ETSI) European Standard ES 202 050, ES 202 211, and ES 202 212 Distributed Speech Recognition Encoding.
                Defines MIME media subtypes audio/dsr-es202050, audio/dsr-es202211 and audio/dsr-es202212.
                
[RFC 4103] RTP Payload for Text Conversation.
                Defines MIME media subtype text/t140.
                Obsoletes: RFC 2793.
                
[RFC 4170] Tunneling Multiplexed Compressed RTP (TCRTP).
                BCP: 110.
                
[RFC 4175] RTP Payload Format for Uncompressed Video.
                Defines MIME media subtype video/raw.
                
[RFC 4184] RTP Payload Format for AC-3 Audio.
                
[RFC 4298] RTP Payload Format for BroadVoice Speech Codecs.
                Defines MIME media subtypes audio/BV16 and audio/BV32.
                
[RFC 4348] Real-Time Transport Protocol (RTP) Payload Format for the Variable-Rate Multimode Wideband (VMR-WB) Audio Codec.
                Category: Standards Track.
                Defines MIME media subtype audio/VMR-WB.
                
[RFC 4351] Real-Time Transport Protocol (RTP) Payload for Text Conversation Interleaved in an Audio Stream.
                Category: Historic.
                Defines MIME media subtype audio/t140c.
                
[RFC 4352] RTP Payload Format for the Extended Adaptive Multi-Rate Wideband (AMR-WB+) Audio Codec.
                Category: Standards Track.
                Defines MIME media subtype audio/amr-wb+.
                
Obsolete RFCs:
[RFC 1889] RTP: A Transport Protocol for Real-Time Applications.
                Obsoleted by: RFC 3550.
                
[RFC 1890] RTP Profile for Audio and Video Conferences with Minimal Control.
                Obsoleted by: RFC 3551.
                
[RFC 2035] RTP Payload Format for JPEG-compressed Video.
                Obsoleted by: RFC 2435.
                
[RFC 2038] RTP Payload Format for MPEG1/MPEG2 Video.
                Obsoleted by: RFC 2250.
                
[RFC 2793] RTP Payload for Text Conversation.
                Obsoleted by: RFC 4103.
                Defines MIME media subtype text/t140.
                


Top of Page

OTHER PROTOCOLS OF TCP/IP SUITE
AARP   RRP   RTP Video   RTP Audio   RTP   COPS   Gopher   HSRP   ICP   MPLS   IEEE 802.2   CIP   FTP - Data   FTP - Ctrl   IMAPS   IP Fragment   LDAPS   PUP   MSSQL   RSH   SQL   POP3s   RTELNET   RSVP   STP   VLAN   MSN   H.323   MSRDP   HTTPS   WINS   LPD   GTP   ICMPv6   POP   TELNET   H.225   VRRP   PIM   RARP   SAP   OSPF   RLOGIN   SCTP   SIP   RTCP   PPPoE   Mobile IP   IMAP3   WhoIs   SLP   NCP   PPTP   MGCP   LDAP   L2TP   Kerberos   IPv6   GRE   Ethernet SNAP   AFP   CIFS   IEEE 802.3   Finger   NBDGM   NetBEUI   NBSSN   ESP   EIGRP   EGP   DHCP   CGMP   CDP   BOOTP   AH   NBNS   EthernetII   ICQ   PPP   ARP   RIP   IPX   IGRP   IGMP   SSH   RPC   NetBIOS   TFTP   SNMP   SNA   SMB   RADIUS   NTP   NNTP   UDP   TCP   BGP   DNS   SOCKS   IMAP   RTSP   NFS   ICMP   IP   FTP   Telnet   POP3   SMTP   HTTP  
Search RFCs:

Advanced Search
Search Glossary:
Exact search
Fuzzy search


All Protocols
Submit a Request

Recommend an Article

 Layer 7 Application Layer
  AFP
  BOOTP
  CIFS
  CIP
  COPS
  DHCP
  DNS
  Finger
  FTP
  FTP - Ctrl
  FTP - Data
  Gopher
  HSRP
  HTTP
  HTTPS
  ICP
  ICQ
  IMAP
  IMAP3
  IMAPS
  Kerberos
  LPD
  MGCP
  MSN
  MSRDP
  MSSQL
  NCP
  NFS
  NNTP
  NTP
  POP
  POP3
  POP3s
  RADIUS
  RLOGIN
  RRP
  RSH
  RTCP
  RTELNET
  RTP
  RTP Audio
  RTP Video
  RTSP
  SAP
  SIP
  SLP
  SMB
  SMTP
  SNA
  SNMP
  SOCKS
  SSH
  Telnet
  TELNET
  TFTP
  WhoIs
  WINS
 Layer 6 Presentation Layer
  NBNS
  NBSSN
  NCP
  NetBIOS
 Layer 5 Session Layer
  LDAP
  LDAPS
  NCP
  NetBEUI
  RPC
 Layer 4 Transport Layer
  H.225
  H.323
  NBDGM
  NetBEUI
  PUP
  SCTP
  TCP
  UDP
 Layer 3 Network Layer
  AARP
  AH
  BGP
  EGP
  EIGRP
  ESP
  GRE
  GTP
  ICMP
  ICMPv6
  IGMP
  IGRP
  IP
  IP Fragment
  IPv6
  IPX
  Mobile IP
  MPLS
  OSPF
  PIM
  PPPoE
  RIP
  RSVP
  STP
  VRRP
 Layer 2 Data Link Layer
  ARP
  CDP
  CGMP
  Ethernet SNAP
  EthernetII
  IEEE 802.2
  IEEE 802.3
  L2TP
  PPP
  PPTP
  RARP
  SQL
  VLAN
 Layer 1 Physical Layer
© 2006 - 2007 Colasoft Co., Ltd. All rights reserved.