State-of-the-Art Coding Technologies Addressing a Wide Range of Bit Rates and Applications
The MPEG-4 Audio standard (ISO/IEC 14496-3) includes a set of technologies designed to address a broad range of audio coding requirements—from extremely low bit-rate parametric coding or audio synthesis, up to high-quality speech or natural audio coding. The objective of MPEG-4 Audio is to provide a scalable system composed of a set of core coders, each suited to a segment of the audio bit-rate range.
In MPEG-4 terminology, these core coders are called "audio objects;" the standard further identifies several "audio profiles," which are made up of multiple audio objects.
A copy of the MPEG-4 Audio standard can be purchased from the ISO online store (search for "14496-3").
Another recommended source of information about MPEG-4 Audio is The MPEG-4 Book, edited by Fernando Pereira and Touradj Ebrahimi, published by IMSC Press, ISBN 0-13-061621-4.
To download the document, please provide the following information.
MPEG-4 AAC Standard
MPEG-4 enhancements to AAC introduce new features and enhanced performance at stereo bit rates down to 64kb/s and below.
Advanced Audio Coding (AAC) is a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to convey high-quality digital audio. First, signal components that are "perceptually irrelevant" and can be discarded without a perceived loss of audio quality are removed. Next, redundancies in the coded audio signal are eliminated. Efficient audio compression is achieved by a variety of perceptual audio coding and data compression tools, which are combined in the MPEG-4 AAC specification.
The MPEG-4 AAC standard incorporates MPEG-2 AAC, forming the basis of the MPEG-4 audio compression technology for data rates above 32 kbps per channel. Additional tools increase the effectiveness of AAC at lower bit rates, and add scalability or error resilience characteristics. These additional tools extend AAC into its MPEG-4 incarnation (ISO/IEC 14496-3, Subpart 4).
The MPEG-4 AAC patent license grants rights for multiple MPEG-4 AAC Object Types, including AAC LC (Low Complexity), AAC LTP (Long-Term Prediction), AAC Scalable, and ER AAC LD (Low Delay).
MPEG-4 BSAC Standard
Bit-Sliced Arithmetic Coding (BSAC) allows the delivery of audio bitstreams that are scalable in fine-grain increments. In contrast to the MPEG-4 AAC Scalable Object Type, which operates in 8 kbps/channel increments, BSAC enables a bitstream to scale in increments of 1 kbps/channel. This fine-grain scalability enables the creation of streaming audio solutions that use network resources efficiently, balancing audio quality and bandwidth usage with high precision.
BSAC is a general audio coder that is most effective in the bit-rate range of 48 to 64 kbps/channel.
MPEG-4 CELP Standard
MPEG-4 CELP (Code Excited Linear Prediction) is a speech coding technology. Unlike general audio coders, such as AAC, which do not make any assumptions about the types of audio signal that will be represented, CELP coding is specifically designed to represent human speech. Optimizations resulting from a knowledge of the particular type of signal that will be encoded allow MPEG-4 CELP to achieve better compression than can be obtained from a general audio coder when used to encode speech.
MPEG-4 CELP operates in a narrowband (8 kHz sample rate) or a wideband mode (16-kHz sample rate) in order to achieve bit rates in the range of 4 to 12 kbps for a narrowband configuration, or approximately 11 to 24 kbps in a wideband configuration. In addition, MPEG-4 CELP supports scalable coding that allows multiple low bit-rate streams to be embedded in a single stream such that a lower quality rendering can be played back if the full rate bitstream is not completely available.
MPEG-4 HVXC Standard
Harmonic Vector Excitation Coding (HVXC) enables the representation of speech signals at very low bit rates. The standard defines two HVXC bit rates: 2 kbps and 4 kbps. Unlike the code excited linear prediction (CELP) speech coder, HVXC is a parametric coding system, which means that certain aspects of the coded representation can be manipulated independently. For example, the playback speed of a HVXC-encoded bitstream can be altered without affecting the pitch of the voice. Similarly, the pitch of the voice can be modified without altering playback speed.
HVXC is useful for a variety of synthetic speech applications in bandwidth-constrained environments.
MPEG-4 TTSI Standard
Text-to-Speech Interface (TTSI) is designed to be used in conjunction with a speech synthesizer to generate synthetic speech from textual data. In addition to the text to be rendered as speech, TTSI provides the means to incorporate nuances in the rendering, such as the pace, energy, and pitch of the speech, as well as the age/gender of the speaker. Parameters to enable synchronization with facial animation controls and lip shape information for use in video dubbing are also available. By these means, speech data can be effectively transmitted at extremely low bit rates of between 200 bps and 1.2 kbps.
The MPEG-4 standard does not include a normative speech synthesis method and so only the interface between the textual data and a speech synthesizer is specified.
Philips licenses its essential patents for MPEG-4 Audio through VIA Licensing. For further details or questions please contact www.vialicensing.com