IntroductionA powerful audio coding system capable of superior results at stereo bit rates below 128 kb/s.
Advanced Audio Coding, or AAC, is one of several audio coding systems specified in the MPEG-2 standard (ISO/IEC 13818-7).
MPEG-2 AAC is the audio format utilized in the Japanese Digital Broadcast system, known as ISDB (Integrated Services Digital Broadcasting). MPEG-2 AAC is also the basis of the audio coding technology used by XM Radio, one of two satellite radio services currently operating in the United States.
MPEG-2 AAC has been extended with additional features and capabilities in MPEG-4 AAC . However, companies creating products for the applications described above may not require these additional AAC tools, and for such applications, we offer the MPEG-2 AAC Patent License Agreement.
A copy of the MPEG-2 AAC standard can be purchased from the ISO online store (search for "13818-7").
To download the document, please provide the following information.
The International Organization for Standardization ISO did foresee the importance of compression algorithms for video and audio signals already at an early stage and hence founded the Moving Pictures Expert Group MPEG with the intention to develop and standardize these methods. The group´s highly acclaimed achievements are the standards MPEG-1 and MPEG-2. MPEG-2 Advanced Audio Coding (AAC) and was declared international standard by MPEG by the end of April 1997.
The major advantages of data compression are minimum memory requirement and minimum transmission bandwidth required by the compressed signal. This method is of useful service whenever resources are scarce or expensive. Thus digital radio (e.g. Eureka DAB, WorldSpace) and audio transmission in the internet constitute two major domains of application within the field of audio coding.
The driving force to develop AAC was the quest for an efficient coding method for surround signals, like 5-channel signals (left, right, center, left-surround, right-surround; as being used in cinemas today. There have been algorithms for these signals in MPEG-2 for quite a while. Optimum efficiency, however, was not reached due to technical and historical reasons. Therefore, the set aim was a considerable decrease of necessary bitrate.
What is MPEG-2 AAC?
MPEG-2 AAC is the consequent continuation of the truly successful coding method ISO/MPEG Audio Layer-3. The appropriate incorporation of high coding gain and great flexibility opens up a wide field of applications. With sampling frequencies between 8 kHz and 96 kHz and any number of channels between 1 and 48, the method is well prepared for future developments in the audio sector. Compared to coding methods such as MPEG-2 Layer-2, it is possible to cut the required bitrate by a factor of two with no loss of subjective quality.
Like all perceptual coding schemes, MPEG-2 AAC basically makes use of the signal masking properties of the human ear in order to reduce the amount of data. Doing so, the quantization noise is distributed to frequency bands in such a way that it is masked by the total signal, i.e. it remains inaudible. Even though the basic structure of this coding method hardly differs from the ones of its predecessors, a closer look into the details does reveal some new aspects worth paying attention to.
Philips licenses its essential patents for MPEG-4 Audio through VIA Licensing. For further details or questions please contact www.vialicensing.com
MPEGAudio is an ISO (International Standards Organisation) and IEC (International Electrotechnical Commission) world standard - open to everyone on an equal, non-discriminatory basis.
MPEG Business Applications
MPEG Audio is an ISO (International Standards Organisation) and IEC (International Electrotechnical Commission) world standard, open to everyone on an equal, nondiscriminatory basis. Being a world standard means that all organisations which have contributed to this Standard, will make their MPEG Audio patent rights available against fair, reasonable and nondiscriminatory conditions, thus preventing the influence of vested interests. Investments are future proof, the family approach of MPEG means forward and backwards compatibility, thus perfect audio/video integration. The forward and backward compatibility aspect of MPEG means that new developments will not necessarily make existing equipment and programming obsolete. The existing equipment may not handle additional features of a new standard, but will still operate within the specification of its own standard. MPEG equipped hardware is future-proof. MPEG's compatibility between formats allows for great flexibility. Content providers are free to supply their programs in any format (stereo, multichannel) in the knowledge that excellent audio quality is reproduced at the consumer end, whatever equipment is used. In other words: the supplier has a guarantee of, and control over, the quality of their programming. Efficient implementation of MPEG allows for cost effective consumer equipment prices with high quality and performance.
Patents / Agreements
|Patent No.||Priority Date(s)|
License will include all foreign counterparts
This page provides a background of the theory behind, and techniques involved with MPEG AUDIO. It also covers the more technical aspects of the MPEG standard. This is presented in questions and answers format.
What is MPEG2 audio?
MPEG2 audio is a compatible extension to MPEG1 audio encoding, which enables the transmission of mono, stereo, or multichannel audio in a single bitstream. It can operate at a wide range of bitrates (8 kbit/s up to more than 1 Mbit/s) and supports sampling rates of 16, 22.05, 24, 32, 44.1 and 48 kHz. For stereo, a typical application would operate at an average bit rate of 128-256 kbit/s. A multichannel movie soundtrack requires an average bit rate of 320-640 kbit/s, depending on the number of channels and the complexity of the audio to be encoded. MPEG2 defines an extension for five full bandwidth channels plus a low frequency enhancement (LFE) channel, termed 5.1 multichannel. With an additional compatible extension, seven channels are possible (7.1 multichannel).
How does MPEG audio work?
In devising an encoding method, the basis had to be the human ear. Although not a perfect device for acoustic reception, advantage was taken of one of its characteristics: a non- linear and adaptive threshold of hearing. The threshold of hearing is the level below which a sound is not heard. It varies with frequency and, of course, between individuals. Most people's hearing is most sensitive between 2 and 5 kHz. Whether a person hears a sound or not depends on the frequency of the sound and whether the amplitude is above or below that persons hearing threshold at that frequency. The threshold of hearing is adaptive, and is constantly changed by the sounds heard. For example, an ordinary conversation in a room is perfectly audible under normal conditions. However, the same conversation in the vicinity of a loud noise, such as an aircraft passing low overhead, is impossible to hear due to the distortions introduced to the hearing thresholds of the individuals concerned. When the aircraft has gone the hearing thresholds return to normal. Sounds that are inaudible due to dynamic adaptation of the hearing threshold are said to be 'masked'. This effect is universal but is of particular relevance in music, An orchestra instrument playing fortissimo will, to a greater or lesser extent, make the sound of some other instruments inaudible to the human ear. When the music is recorded, however, all the frequencies go on the medium because the response of the recording device is flat, i.e. it is not dynamically adaptive. When the recording is played the masked instruments will not be audible to the listener. A linear recording, as used on CD, is inefficient in this respect. To make the best use of a recording medium the parts of the medium that contain inaudible data can better be used to store audible data. In this way a fixed capacity recording medium can contain a considerably increased amount of audio without any loss of quality. Also, the demands on a transmission link carrying the information are reduced.
Is MPEG2 compatible with MPEG1?
The MPEG2 standard was designed with compatibility being a major consideration. With the ever-growing number of applications of MPEG1 audio, especially in the entertainment, satellite broadcasting (DSS) and multimedia fields, this compatibility will provide the consumer a cross-platform format to enjoy high quality audio reproduction. The core of the MPEG2 bitstream is an MPEG1 bitstream, which enables fully compatible decoding by an MPEG1 audio decoder. In addition, the need to transfer two separate bit streams (one for stereo and another one for the multichannel audio pro- gram) is avoided. In other words, a future upgrade of e.g. DSS with multichannel audio will not make existing set top boxes obsolete. The existing ones will reproduce stereo, the new ones high quality multichannel sound.
How does an MPEG1 decoder handle multichannel input?
An MPEG1 decoder will be supplied in the MPEG1 part of the bitstream with an appropriate (stereo) 'downmix' of all channels in the multichannel frame. The left and right channel of the stereo signal contain components of all the channels, according to the equations in the compatibility matrix. The MPEG1 (stereo) decoder decodes the stereo part of the MPEG2 frame, and ignores the multichannel extensions. MPEG2 defines four matrix sets, one of which is selected in the decoder from information in the MC (multichannel) frame header.
What is the purpose of the compatibility matrix?
The compatibility matrix is a set of operations performed on the channels to be encoded, which ensures that every type of decoder (stereo, 5.1, 7.1) will properly decode the correct channel information. The matrix equations are very simple, so the matrix operations do not introduce calculation errors. The floating point representation of information in the MPEG bit- stream reduces the need for peak limiting or level reduction, as a much larger dynamic range can be accommodated than with fixed point notation. Any risks of 'unmasking' encoding artifacts after decoding can be avoided by taking precautions in the encoder. The additional encoding efforts are more than compensated for by the benefits of the matrix operations, i.e. better audio quality and low cost stereo compatibility, Another important consideration is how many existing television sets are only equipped with mono sound capability. The matrix operations also ensure that a mono signal can be obtained from the decoder through a downmix of the multichannel information, ensuring that mono TV sets will also reproduce the soundtrack as intended.
Why not encode the channels individually?
First of all it would not provide an MPEG1 decoder with a downmix of all the channels. The matrix is in fact a very clever way of reducing the need to encode additional information. An MPEG2 decoder will use the stereo downmix in the MPEG1 part together with the MPEG2 extension information to reconstruct all channels. With most of the information already in the MPEG1 part, much less additional information needs to be encoded. A large number of bits are saved in this way, which can be put to use in encoding the stereo downmix in the best possible way.
So for low cost applications can I use an MPEG1 decoder?
Current MPEG1 decoders can decode the MPEG2 bitstream into a stereo signal This is a clear benefit in view of the number of MPEG1 decoders currently installed in PCs, set-top boxes and digital broadcasting decoders. When the program material is supplied in multichannel sound, these systems will still provide the same high quality (stereo) audio. So both customers and hardware industry can decide for themselves when to invest in multichannel playback capabilities. In the meantime, stereo decoders continue to drop in price, driven by the many MPEG1 applications. There is also the flexibility of being able to make a separate downmix of the multichannel information into, for example, a stereo signal.
What about the analogue surround sound decoder I already own?
With MPEG2, if you don't want to buy a digital surround decoder yet, you can still apply your current analogue surround decoder. In fact, Philips DVD players will provide on their stereo outputs a signal that is compatible with your current decoder, provided the program material was encoded in multichannel. Otherwise a normal stereo signal is available. In addition a special 'digital surround' output provides an encoded signal for a separate multichannel decoder or multichannel receiver, which you can decide to buy later.
What is meant by variable bit rate encoding?
In any given audio section, certain fragments are more complex than others, e.g. a whole orchestra playing compared to a single instrument. As a result the number of bits needed to faithfully encode varies with the program material. In order to encode in the best possible way, it is advantageous to save bits from the simple sections and use them to encode complex ones. This is what variable bit rate encoding does. Think of movie soundtracks, which can contain a very wide variety of sound complexity; dialogue, music, sound effects, background noise, sections of silence. These variations can occur in any combination at any point in time. In a typical movie soundtrack, for most of the time a bitrate of approximately 384 kbit/s is sufficient to encode the 5.1 channels fully 'transparently' (i.e, indistinguishable from the original). However, the peak bit rate needed for transparency can extend to over 600 kbit/s for some particularly complex sections, e.g. at times where there is music, sound effects, back- ground noise and dialogue all happening at the same time.
What are the benefits over constant bit rate encoding?
Operating at a certain (average) bitrate, a fixed or constant bit rate encoder provides variable quality. For fragments that are simple to encode, the constant bit rate encoder is applying a bitrate that is higher than is really needed. For complex sections the bit rate available is lower than that required, and artefacts may become noticeable. A constant bit rate encoder would have to operate at a bit rate high enough to encode the most complex sections transparently, i.e. around 600 kbit/s, but a variable bit rate encoder can operate at the aver- age bit rate, namely around 384 kbit/s, which represents a significant increase in encoding efficiency. A variable bit rate encoder provides fixed quality.
It always applies the number of bits necessary to encode without noticeable artefacts. It has been suggested that a constant bit rate encoder, constantly operating at the same peak bit rate as the variable bit rate encoder, will sound equally good or better. But the variable bit rate encoder needs the high peak bit rate only for a fraction of the total time. Applying a fixed bit rate is clearly excessive. Another variable bit rate bitstream with e.g. a second language version could easily be added instead.
Can you use a variable bit rate for broadcast?
At the present time, practical limitations restrict broadcast systems to using a fixed bit rate. In this case, the audio and video information must be encoded with a fixed bit rate high enough to ensure that no artefacts occur. However, this does not exclude the use of the more efficient variable bit rate in the future, when more refined and widespread technology allows.
Has MPEG2 audio ever been comparatively tested?
Apart from the many listening tests that have been per- formed during development, the most publicised test has been performed by the 'Grand Alliance' for the USA HDTV system. In the test MPEG2 was compared with systems from MIT and Dolby. It was concluded that both the Dolby system and MPEG2 sounded equally good (see the Grand Alliance test report and press release of February 24, 1994). The most recent listening tests have been carried out for the European RACE dTTb (digital Terrestrial Television broad- casting) project at the BBC Research Labs in Kingswood Warren (Great Britain) and German Telekom FTZ in Berlin (Germany) in January 1996. These tests showed a very high audio quality for MPEG2 Audio Layer II, even for the most critical test sequences. The tests were so critical that in one test 23 out of 36 experienced listeners took no part in the final evaluation of the results, because they could no longer identify the encoded signals. Unfortunately these RACE dTTb multichannel listening tests did not include the Dolby AC-3 encoding system. Although invited to participate, Dolby declined to take part.
What provision does MPEG make for multiple languages?
MPEG2 provides the option to include up to seven different languages channels in the multichannel bitstream. This can be done in one of two ways:
for each language a multichannel bit stream is encoded. Via the MPEG2 system layer, one of the bit streams can be selected to be reproduced. The advantage of this method is that it is simple at the encoding side.
using the MPEG2 multilingual feature, up to seven different dialogue channels can be included in the multichannel bit stream. The MPEG2 audio encoder can select one of these dialogue channels and mix it in, for example, the centre channel. The advantage of this method is that it is very efficient, and that separate optimised Pro Logic streams can be put in parallel. The disadvantage is that the dialogue is monaural and there is some additional complexity involved.
The content provider has the choice between these two options.
An MPEG2 decoder can select one of the seven dialogue channels, and mix it in, for example, the centre channel. As most movie soundtracks have all the dialogue in the centre channel, this offers the possibility of very efficient encoding in the multilingual case. To add another channel requires only about 64 kbit/s, while without the multilingual feature a complete multichannel soundtrack has to be added, which will increase the bit rate by typically 384 kbit/s. Therefore using the multilingual feature, there is a gain of around 320 kbit/s per language.
In this mode, a player with only a stereo decoder would not reproduce the dialogue. Therefore it is necessary to include at least one stereo bit stream, at an average bit rate of typically 128 kbit/s (it may also be necessary to include a stereo bit- stream for each language). On one hand, the gain in bit rate is reduced by the introduction of these stereo streams to 192 kbit/s per language, but on the other hand it offers the possibility to use a different downmix for the stereo stream instead of an automatic downmix from the digital multichannel sound track.
To minimise the additional complexity involved with the multilingual feature, the selected dialogue channel can be mixed in the centre channel before the subband filter, so that no additional subband filter is needed, The subband filter is the most complex part of the decoder.
The actual use of the multilingual feature is set up by the content provider. The content provider can decide either to include N separate multichannel streams, at N x 384 kbit/s, or to include one multichannel stream, with N dialogue channels, plus one to N stereo (optimised Pro Logic) streams.
Dolby Surround, Pro Logic and AC-3 are trademarks of Dolby Laboratories Licensing Inc.