IntroductionMPEG is an encoding and compression system for digital multimedia content defined by the Motion Pictures Expert Group (MPEG). MPEG-2 provides compression support for TV-quality transmission of digital video.
To download the document, please provide the following information.
MPEG is an encoding and compression system for digital multimedia content defined by the Motion Pictures Expert Group (MPEG). MPEG-2 provides compression support for TV quality transmission of digital video.
Phase Alternate Line (PAL) is the analogue TV standard used in many parts of the world. An uncompressed PAL TV picture requires a massive 216 Mbps. The U.S. uses an analogue TV system called NTSC. An uncompressed NTSC signal requires 68 Mbps. A High Definition TV (HDTV) picture requires a raw bandwidth exceeding 1 Gbps. MPEG-2 provides a way to compress video signals to a manageable rate.
Because the MPEG-2 compression algorithms include the following features:
· Video compression which is backwards compatible with MPEG-1
· Full-screen interlaced and/or progressive video (for TV and Computer displays)
· Enhanced audio coding (high quality, mono, stereo, and other audio features)
· Transport multiplexing (combining different MPEG streams in a single transmission stream)
· Other services (GUI, interaction, encryption, data transmission, etc)
Philips licenses its essential patents for MPEG-2 Visual through MPEG LA. For further details or questions please contact MPEG LA: www.MPEGLA.com
This page provides a background of the theory behind, and techniques involved with MPEG VIDEO. It also covers the more technical aspects of the MPEG standard. This is presented in questions and answers format.
What exactly is MPEG?
MPEG is a working group within ISO/IEC that was formed from the existing JPEG (Joint Photographic Experts Group) standard for bit reduction of still pictures. To ensure that one defined standard was followed, the Moving Pictures Expert Group (MPEG) was formed to devise a suitable encoding scheme for transmitting moving pictures and sound over various broadcast links and recording them in standard digital storage media, (Compact Disc, other optical media, magnetic media, solid state etc.). As moving pictures are almost always interlinked with accompanying sound, MPEG also defines a standard for the encoding of audio information. A meeting between the International Standards Organisation (ISO) and the International Electrotechnical Commission (IEC) in 1992 resulted in a standard for audio and video encoding, known as MPEG1 (ISO/IEC 11172). To further extend the standard to keep abreast of increasing media capacity and demand for higher quality, MPEG2 was devised. This is a compatible extension to the MPEG1 standard, giving higher quality pictures and sound. MPEG2 (ISO/IEC 13818) became a bone fide standard on 11th November 1994 after a five day meeting of ISO and ITC in Singapore.
Why do we need MPEG?
You may ask why there is any need to encode at all. Well, encoding uses low sample and bit rates which results in audio and video information requiring much less storage space (hence the word 'compression' is sometimes used instead of encoding). For example, a CD can contain a maximum of 650 MB of unencoded video. This is sufficient for a program length of just 5 or 6 minutes. When the video signal is encoded the CD can contain up to 74 minutes of video. Even with the increased storage capacity of DVD, there is still a need to encode the audio and video information to give sufficient program length. The benefits of audio and video encoding are not limited to optical or magnetic storage. Encoding also allows audio and video to be transmitted at much lower bit rates. Efficient encoding gives much narrower bandwidths so that many more channels can be transmitted via existing fibre optic, terrestrial and satellite links. Where bandwidth is at a premium this is a very important consideration. In simple terms all that is required is an encoder at one end and a decoder at the other end. Encoding also gives PCs real multimedia capability due to the reduced storage space and bit rates required for high quality audio and video.
Is MPEG an international standard?
MPEG is an ISO/IEC working group, with contributors from all over the world, set up to ensure that the compression of digital audio and video signals followed a defined standard worldwide. Among the contributing companies were Sony, Matsushita, JVC, Toshiba, Thomson, Motorola, C-Cube, LSI Logic, Texas Instruments, Digital Equipment, ATRT and many others, including Philips. MPEG is also part of many other standards, such as the new Digital Audio Visual Council (DAVIC) standard, which will be incorporated in numerous digital television decoders, the ITU-R recommendation (BS 1115) for emission, contribution and distribution, the ETSI Standard on Digital Audio Broadcasting (ETS 300 401, February 1995), and the ITU-R draft recommendation for Digital Terrestrial Television Broadcast.
'Moving Pictures Expert Group' - so is MPEG just for video?
No, MPEG defines standards for the encoding of both audio and video information. In very simple terms the principle is the same for both; the information present is analysed to determine redundancy contained in the data which can be discarded without effecting the quality of the audio or video. However, how these processes are realised for audio and video are very different, due to the inherent difference in the structure of the source information.
Where is MPEG currently being used?
MPEG audio and video is used in a rapidly growing number of applications.
-MPEG2 is the video encoding standard for DVD video players worldwide and one of the audio encoding options. MPEG2 will also be used for DVD-ROM.
-An increasing number of broadcasting applications are based on MPEG technology.
-DSS (Digital Satellite System)
-DAB (Digital Audio Broadcast)
-DVB (Digital Video Broadcast)
-ADR (Astra Digital Radio)
-Satellite feeds to cable networks
-Many cable channels (national and international) are distributed within and between countries using MPEG2 video encoding.
-Current computer systems are already supporting MPEG1 audio and video. MPEG encoded files are available from a growing number of platforms, including CD ROM and the Internet.
-MPEG is used more and more on the ISDN to provide very high quality audio and video. MPEG1 (audio and video) has also found many applications in CD-i, Video CD, hard disk recording and solid state audio
Many manufacturers are now offering MPEG products, such as:
-dedicated encoding and decoding chipsets
-constant and variable bit rate encoders (audio and video)
-PC plug-in audio and video cards
-multichannel audio encoders and decoders The variety of different MPEG product manufacturers allows content providers and consumer equipment manufacturers the freedom of choice to obtain the best solution for their specific application, in the safe knowledge that the format is an international standard.
What is MPEG2 video?
MPEG2 video is an ISO/IEC standard that specifies the syntax and semantics of an encoded video bitstream, These include parameters such as bit rates, picture sizes and resolutions that may be applied, and how it is decoded to reconstruct the picture. What MPEG2 does not define is how the decoder and encoder should implemented, only that they should be compliant with the MPEG2 bitstream. This leaves designers free to develop the best encoding and decoding methods whilst retaining compatibility. The range of possibilities of the MPEG2 standard is so wide that not all Features of the standard are used for all applications
MPEG video is a worldwide standard - so all MPEG equipment is the same?
The MPEG video standard allows MPEG compatible equipment to inter-operate, because the bitstreams are standardised. However, the way the actual encoding process is implemented to generate the bitstream is up to the encoder designer. Therefore, all equipment will not necessarily produce the same quality video (at a given bit rate), there will be a range of products available, at different price levels, which the consumer can choose from to suit their own application.
What is the difference between MPEG I video and MPEG2 video?
MPEG1 is standardised at a maximum bit rate of 1.856 Mbit/s (for CD based applications) and does not support fully interlaced video. It is, however, possible to use higher bit rates for increased video quality that is not used in a CD based system (e.g. for broadcast). MPEG2 video is an extension of the MPEGI video standard, which supports fully interlaced video. It has been proven to provide studio quality pictures at bit rates between 4-9 Mbit/s. Both MPEG1 and MPEG2 support additional layers which allow for the addition of other types of data along with the video data.
MPEG1 and MPEG2 are different in their throughput of information. MPEG1 is generally recognised to produce video with SIF resolution (352 x 240/288). MPEG2 handles CCIR-601 resolution (720 x 480/576) as well as the MPEG1 defined resolutions. Although MPEG1 can produce higher resolution, the bandwidth used is greatly reduced and delivers lower quality video (30 field/s). The amount of data processed by MPEG2 can be more than four times that of MPEG1 and is recognised as full 60 field/s video. MPEG2 will be used most in applications where bandwidth is a lesser consideration than quality.
Is MPEG I video compatible with MPEG2 video?
MPEG video is forwards compatible but not backwards compatible. This means that an MPEG2 decoder will correctly decode an MPEG1 bitstream to produce an MPEG1 quality picture. However an MPEG1 decoder cannot decode an MPEG2 bitstream. This means that all back catalogues of MPEG1 encoded material will not become obsolete when equipment fitted with MPEG2 decoders are installed.
What is interlaced video?
Video pictures can be represented in two different ways: 'progressive' and 'interlaced'. Progressive video, as used in movies and computer screens means the picture is constructed, top down, line by line. Interlaced video, as used in a television set, displays the odd lines (the odd field) first. Then it displays the even lines (the even field). Each pair forms a frame and there are 50(60)* of these fields displayed every second (or 25(30)' frames every second). This is referred to as interlaced video. *PAL(NTSC).
Using interlaced video means that a picture can be encoded as a frame or a field. A frame of interlaced video consists of two fields which are samples of the full vertical image separated by the time of the field period, the lines of one image falling exactly between the lines of the other. The standard for displaying any sort of non-film video is 30 frames per second for NTSC systems, and 25 frames per second for PAL/SECAM systems. This simply means that the video is made up of 30 (or 25) pictures or 'frames' for every second of video. Additionally, these frames are split in half (odd lines and even lines), to form what are called 'fields'.
Film format video is progressive and has a frame rate of 24 frames per second. MPEG2 uses a technique called 3:2 pull down to convert film format to interlaced 25 or 30 frames per second.
Why encode as frames and fields?
MPEG2 video uses the benefits of interlaced video to further increase encoding efficiency and picture quality. Slow moving pictures are best encoded by combining the Fields into a single frame and then encoding the frame. Where there is a large amount of fast movement, it is most efficient to encode each field separately. MPEG2 allows switching between the two modes on a block-by-block basis.
What is a profile?
Not all parts of the MPEG2 standard are used for MPEG2 video applications. Profiles provide a means of defining sub- sets of the syntax and semantics of the MPEG2 standard. Profiles are used to create a 'tool set' for a certain specific application. By taking the definitions of the MPEG2 bit- stream, the profile is built up for the video encoding process. Profiles can be scalable or non-scalable. Scalable profiles are used to encode video which is to be used in real time transmission links because the decoder does not need to decode the whole bitstream to reconstruct the picture. The drawback of scalable profiles is the complexity of the encoding process. Non-scalable profiles are much less complicated, can produce higher quality pictures and are more suited to encoding for storage on fixed capacity medium (optical, magnetic). Profiles are coupled with various levels to completely define the MPEG2 video encoding/decoding behaviour.
What is a level?
A level is the definitions for the MPEG standard for physical parameters such as bit rates, picture sizes and resolutions. There are four levels specified by MPEG2, High level, High 1440, Main level, and Low level. MPEG2 Video Main Profile and Main Level has sampling limits at CCIR 601 parameters (PAL and NTSC). Profiles limit syntax (i.e. algorithms), whereas Levels limit encoding parameters (sample rates, frame dimensions, coded bitrates, buffer size etc.). Together, Video Main Profile and Main Level (abbreviated as MP@ML) keep complexity within current technical limits, yet still meet the needs of the majority of applications. MP@ML is the most widely accepted combination for most cable and satellite TV systems, however different combinations are possible to suit other applications.
How is the video information actually encoded?
Encoding of video information is achieved by using two main techniques. These are termed spatial and temporal compression. Spatial compression involves analysis of a picture to determine redundant information within that picture, for example by discarding frequencies that are not visible to the human eye. Temporal compression is achieved by only encoding the difference between successive pictures. Imagine a scene where at first there is no movement, then an object moves across the picture. The first picture in the sequence contains all the information required until there is any movement, so there is no need to encode any of the information after the first picture until the movement occurs. Thereafter, all that needs to be encoded is the part of the picture that contains movement. The rest of the scene is not effected by the moving object because it is still the same as the first picture. The means by which it is determined how much movement is contained between two successive pictures is known as motion estimation prediction. The information obtained from this process is then used by motion compensated prediction to define the parts of the picture that can be discarded. This means that pictures cannot be considered in isolation. A given picture is constructed from the prediction from a previous picture, and may be used to predict the next picture. There is also the need to have pictures which are not used in any reference for random access. Therefore MPEG2 defines three picture types:
- I (Intraframe) pictures. These are encoded without reference to another picture to allow for for random access
- P (Predictive) pictures are encoded using motion compensated prediction on the previous picture therefore contain a reference to the previous picture. They may themselves be used in subsequent predictions
- B (Bi-directional) pictures are encoded using motion compensated prediction on the previous and next pictures, which must be either a B or P picture. B pictures are not used in subsequent predictions.
The I, P and B pictures can be formed into a group of pictures (GOP).
Each picture type (I, P, B) provides increased opportunity of redundancy. An I picture is encoded with little compression (only spatially redundant information). P and B pictures also use motion compensation to remove temporally redundant information. B pictures offer the most compression. Spatial compression is achieved in practice by use of a DCT (Discrete Cosine Transform) which converts the information in the picture to be encoded into the frequency domain. This transform is used to remove redundant information within the picture itself, by removing frequencies with negligible amplitudes and rounding frequency coefficients to standard values. At higher frequencies, contrast is less perceptible by the human eye, therefore these frequencies we cannot detect can be removed. More compression can also be achieved by using a process called run length encoding. This is an operation that searches for regularly occurring patterns in the frequency information obtained from the DCT. If a pattern is detected, it can be replaced by a shorter representative pat- tern, providing even more compression efficiency
Motion compensated prediction is used to exploit redundant temporal information that is not changing from picture to picture. The images in a video stream do not generally change much within small time intervals. The idea of motion compensated prediction is to encode a video frame based on other video frames temporally close to it.
What is a group of pictures?
This is the grouping of I, B and P pictures into a specified sequence known as a group of pictures (GOP). The group must start and end with an I picture to allow for random access to the group, and contains B and P pictures in between in a specified sequence (determined by the designer). A group can be made of different lengths to suit the type of video being encoded, for example it is better to use a shorter group length for a film which contains a lot of fast moving action with complex scenes. A group length is typically between 8 - 24 pictures. Commonly used GOP sizes are 12 for 50 Hz systems, 16 for 60 Hz systems. GOPs are optional in an MPEG2 bitstream, but are mandatory in DVD video, to achieve an SMPTE timebase. A bitstream with no GOP header can be directly accessed at a specific point using the sequence header.
How does motion estimation prediction work?
Motion estimation prediction is a method of determining the amount of movement contained between two pictures. This is achieved by dividing the picture to be encoded into sections known as macroblocks. The size of a macroblock is 16 x 16 pixels. Each macroblock is searched for the closest match in the search area of the picture it is being compared with. Motion estimation prediction is not used on I pictures, however B and P pictures can refer to I pictures. For P pictures, only the previous picture is searched for matching macroblocks. In B pictures both the previous and next pictures are searched. a match is found, the offset (or motion vector) between them is calculated. The matching parts are used to create a prediction picture, by using the motion vectors. The prediction picture is then compared in the same manner to the picture to be encoded. Macroblocks which have a match have already been encoded, and are therefore redundant. Macroblocks which have no match to any part of the search area in the picture to be encoded represent the difference between the pictures, and these macroblocks are encoded.
What is meant by a search area?
A search area is used in the motion compensated prediction process, to determine the area that the encoder searches in the previous picture for each macroblock. When the comparison is made, it can be on a pixel or half pixel basis. A half pixel search is more accurate and produces higher quality pictures, but is more time consuming. The MPEG2 video standard defines that the motion vectors must be transmitted in the half pixel format, even if the search was only pixel accurate. By interpolating adjacent pixels a much more accurate motion prediction picture is obtained than using individual pixels. There are many ways of defining the way in which macro- blocks are compared in the search area. Three widely recog- nised methods are:
A Full block motion estimation search, where macro- blocks are compared in the entire seach area to seek a matching macroblock. This process requires a large com- putational effort.
A Telescopic motion estimation search, which reduces the search time by looking for a match initially in every fourth macroblock. When a near match is obtained, every second macroblock is searched, then every macro- block until the search has 'homed in' on the best match.
A hierarchical motion estimation search, where before the search is made, the two pictures to be compared are filtered to reduce the search area by a factor of four. This is a common technique used in MPEG2 video encoders
How does a decoder reconstruct the picture?
Decoding the MPEG bitstream is essentially the reverse process to encoding. The spatial information is retrieved from the encoded bitstream by an inverse DCT and dequantizing procedure. This restores the original frequency coeffcients (as far as the accuracy of the encoder quantizing process allows). The decoder reconstructs temporal information in the picture by using the transmitted macroblocks which were matched to replace redundant macroblocks discarded during encoding. The position of the replaced macro- blocks is obtained from the motion vectors, which are included in the MPEG bitstream. The decoder needs two memory stores, one to hold the previous picture, one the next picture (to handle bi-directional pictures).
Can you use a variable bit rate for video encoding?
In any given video section, certain parts contam more movement than others or more fine detail. For example a clear blue sky is simpler to encode than a picture of a tree. As a result the number of bits needed to faithfully encode without artifacts varies with the video material. In order to encode in the best possible way, it is advantageous to save bits from the simple sections and use them to encode complex ones. This is, in a simple way, what variable bit rate encoding does, however the process by which the bit rates are calculated is complex. Variable bit rate encoding can be carried out in one or two passes of the video data. For fixed size storage applications such as DVD, the amount of encoded video information must be known in advance, therefore two passes of the video information are required. This ensures that the amount of data is not too small (quality compromised) or too large (not enough storage space). The first pass is used to analyse and store encoding information about the video data, the second pass uses this information to perform the actual encoding. Where the amount of encoded data produced is not so critical, encoding can be carried out in one pass of the input video.
What are the advantages of using a variable bit rate?
The advantage of using a variable bit rate is mainly the gain it gives in encoding efficiency. For fixed storage mediums (e.g. DVD) the variable bit rate is ideaL By reducing the amount of space needed to store the video (whilst retaining very high quality), it leaves more space on the medium for inclusion of other features e.g. multiple language sound- tracks, extra subtitle channels, interactivity, etc. The other important feature of the variable bit rate system is that it gives constant video quality for all complexities of program material. A constant bit rate encoder provides variable quality.
VARIABLE BIT RATE = CONSTANT QUALITY
CONSTANT BIT RATE = VARIABLE QUALITY
It is possible to use a variable bit rate in, for example a satel- Jite broadcast system by using a technique called joint bit rate control. A satellite transmission has a fixed number of bits at its disposal, say 300 Mbit/s which is termed the bit pool, to transmit its allocated number of channels. Each channel is transmitted in MPEG2 variable bit rate encoded form. For example, if channel 2 is at a section of low com- plexity, and therefore bit rate, the system makes available extra bits for channel 15, which happens to be in a fast moving, complex section of video that needs a large number of bits. In this way, the bit pool is dynamically shared throughout the channels, allocating the required bit rate for each individual channel.
So why encode video with a fixed bit rate?
For some applications, it is necessary to transmit the encoded video information with a fixed bit rate. For example, in broadcast mediums (satellite, cable, terrestrial etc.), practical limitations mean that current transmission is restricted to using a fixed bit rate. This is why fixed bit rate MPEG2 encoders are available. It is true that a fixed bit rate encoder is not as efficient as the variable bit rate system, however the MPEG2 system still provides very high quality video for both encoding methods. Very importantly, fixed bit rate encoding can also be carried out in real time, i.e. one pass of the video information. For live broadcasts, and satellite link- ups etc. the real time encoding capability is essential.
What video formats can MPEG2 handle?
MPEG2 can encode video for both PAL/SECAM (and PAL+) and NTSC formats. It also allows for different aspect ratios, i.e. 16:9 and 4:3. MPEG2 video includes a system to convert progressive film format video (24 frames/sec) to the interlaced 25 frames/sec of PAL and the 30 frames/sec of NTSC. This process is called 3:2 pull down. It is achieved by taking the progressive video sequence and repeating selected fields to decrease the frame rate.
For example in NTSC every other encoded frame will signal a repeat field: (24 frames/s " 2 fields/frame) * (5 display fields/4 frames) = 30 display frames/s.
A key feature of MPEG2 is the scalable extensions that allow the division of a continuous video signal into two or more coded bit streams representing the video at different resolutions, picture quality, or picture rates. So for applications such as HDTV, MPEG2 allows the broadcaster to broadcast both the HDTV and the normal resolution signals. The set top box would display the appropriate signal depending upon the receiver's television
What picture sizes can MPEG video handle ?
MPEG2 defines a range of picture sizes to suit a range of different applications.
What about subtitles?
MPEG1 allows only open caption subtitles. The MPEG2 bitstream makes a provision for up to 32 different closed caption subtitle channels in addition to the audio and video information. These subtitles can be used to provide 32 diEer- ent language subtitle channels, one of which is selected for playback at the decoder using the MPEG system layer.
What are the applications of MPEG2 video?
Cable TV networks are using MPEG2 as the standard for compressing and decompressing video for distribu- tion and for broadcasting. They want high quality video and have the bandwidth needed to handle high bit rates.
DBS (Direct Broadcast Satellite) will use MPEG2 video for direct broadcast. Multi-source channel rate control methods are employed to optimally allocate bits between several programs on one data carrier. An average of 150 channels is planned.
HDTV (High-Definition Television also known as ATV), The U.S. Grand Alliance, a consortium of companies, has already agreed to use the MPEG2 Video and Systems syntax (including B-pictures). Interlaced and progressive modes will be supported.
The developers of DVD have defined in its specification that MPEG2 video is to be the video encoding standard. This is made possible by the greatly increased capacity of DVD (Up to a maximum of 17 GB). DVD can also take advantage of the efficiency increases of variable bit rate encoding, not presently possible in broadcast systems. For DVD, it will also be possible to make the playback interactive by including for example, several different camera angles of the same scene which the viewer can switch between on playback. It is also possi- ble to include several different storylines which the view- er can select between.
Video on Demand (VOD) encompasses nearly all video based applications, but the most common application referred to regarding VOD is movies on demand. Initially in hotels and hospitals, and eventually in our homes, all of us will have an interactive television system from which we can order which movie we want, when we want. The technology exists today for this application although VOD to the home is some time away from a large scale implementation. This application is also planning the use of MPEG2 video. However, VOD in hotels is well underway in many areas of the U.S. and around the world.