MPEG-1 Video Syntax


Some of the many parameters used by MPEG to specify and control the compression of a video sequence are described in this section in detail. Readers who are interested only in the general description of MPEG may skip this section. The concepts of video sequence, picture, slice, macroblock, and block have already been discussed. Figure 6.24 shows the format of the compressed MPEG stream and how it is organized in six layers. Optional parts are enclosed in dashed boxes. Notice that only the video sequence of the compressed stream is shown; the system parts are omitted.

The video sequence starts with a sequence header, followed by a group of pictures (GOP) and optionally by more GOPs. There may be other sequence headers followed by more GOPs, and the sequence ends with a sequence-end-code. The extra sequence headers may be included to help in random access playback or video editing, but most of the parameters in the extra sequence headers must remain unchanged from the first header. A group of pictures (GOP) starts with a GOP header, followed by one or more pictures. Each picture in a GOP starts with a picture header, followed by one or more slices. Each slice, in turn, consists of a slice header followed by one or more macroblocks of encoded, quantized DCT coefficients. A macroblock is a set of six 8×8 blocks, four blocks of luminance samples and two blocks of chrominance samples. Some blocks may be completely zero and may not be encoded. Each block is coded in intra or nonintra. An intra block starts with a difference between its DC coefficient and the previous DC coefficient (of the same type), followed by run-level codes for the nonzero AC coefficients and zero runs. The EOB code terminates the block. In a nonintra block, both DC and AC coefficients are run-level coded.

It should be mentioned that in addition to the I, P, and B picture types, there exists in MPEG a fourth type, a D picture (for DC coded). Such pictures contain only DC coefficient information; no run-level codes or EOB is included. However, D pictures are not allowed to be mixed with the other types of pictures, so they are rare and will not be discussed further. The headers of a sequence, GOP, picture, and slice all start with a byte-aligned 32-bit start code. In addition to these video start codes there are other start codes for the system layer, user data, and error tagging. A start code starts with 23 zero bits, followed by a single bit of 1, followed by a unique byte. Table 6.25 lists all the video start codes. The “sequence.error” code is for cases where the encoder discovers unrecoverable errors in a video sequence and cannot encode it as a result. The run-level codes have variable lengths, so some zero bits normally have to be appended to the video stream before a start code, to make sure the code starts on a byte boundary.

Video Sequence Layer: This starts with start code 000001B3, followed by nine fixed-length data elements. The parameters horizontal_size and vertical_size are 12-bit parameters that define the width and height of the picture. Neither is allowed to be zero, and vertical_size must be even. Parameter pel_aspect_ratio is a 4-bit parameter that specifies the aspect ratio of a pel. Its 16 values are listed in Table 6.26. Parameter picture_rate is a 4-bit parameter that specifies one of 16 picture refresh rates

GOP Layer: This layer starts with nine mandatory elements, optionally followed by extensions and user data, and by the (compressed) pictures themselves. The 32-bit group start code 000001B8 is followed by the 25-bit time_code, which consists of the following six data elements: drop_frame_flag (1 bit) is zero unless the picture rate is 29.97 Hz; time_code_hours (5 bits, in the range [0, 23]), data elements time_code_minutes (6 bits, in the range [0, 59]), and time_code_seconds (6 bits, in the same range) indicate the hours, minutes, and seconds in the time interval from the start of the sequence to the display of the first picture in the GOP. The 6-bit time_code_pictures parameter indicates the number of pictures in a second. There is a marker_bit between time_code_minutes and time_code_seconds. Following the time_code there are two 1-bit parameters. The flag closed_gop is set if the GOP is closed (i.e., its pictures can be decoded without reference to pictures from outside the group). The broken_link flag is set to 1 if editing has disrupted the original sequence of groups of pictures.

Picture Layer: Parameters in this layer specify the type of the picture (I, P, B, or D) and the motion vectors for the picture. The layer starts with the 32-bit picture_start_code, whose hexadecimal value is 00000100. It is followed by a 10- bit temporal_reference parameter, which is the picture number (modulo 1024) in the sequence. The next parameter is the 3-bit picture_coding_type (Table 6.29), and this is followed by the 16-bit vbv_delay that tells the decoder how many bits must be in the compressed data buffer before the picture can be decoded. This parameter helps prevent buffer overflow and underflow. If the picture type is P or B, then this is followed by the forward motion vectors scale information, a 3-bit parameter called forward_f_code (see Table 6.34). For B pictures, there follows the backward motion vectors scale information, a 3-bit parameter called backward_f_code.

Slice Layer: There can be many slices in a picture, so the start code of a slice ends with a value in the range [1, 175]. This value defines the macroblock row where the slice starts (a picture can therefore have up to 175 rows of macroblocks). The horizontal position where the slice starts in that macroblock row is determined by other parameters. The quantizer_scale (5 bits) initializes the quantizer scale factor, discussed earlier in connection with the rounding of the quantized DCT coefficients. The extra_bit_slice flag following it is always 0 (the value of 1 is reserved for future ISO standards). Following this, the encoded macroblocks are written.

Macroblock Layer: This layer identifies the position of the macroblock relative to the position of the current macroblock. It codes the motion vectors for the macroblock, and identifies the zero and nonzero blocks in the macroblock. Each macroblock has an address, or index, in the picture. Index values start at 0 in the upper-left corner of the picture and continue in raster order. When the encoder starts encoding a new picture, it sets the macroblock address to -1. The macroblock_ address_increment parameter contains the amount needed to increment the macroblock address in order to reach the macroblock being coded. This parameter is normally 1. If macroblock_address_increment is greater than 33, it is encoded as a sequence of macroblock_escape codes, each incrementing the macroblock address by 33.

Block Layer: This layer is the lowest in the video sequence. It contains the encoded 8×8 blocks of quantized DCT coefficients. The coding depends on whether the block contains luminance or chrominance samples and on whether the macroblock is intra or nonintra. In nonintra coding, blocks that are completely zero are skipped; they don’t have to be encoded. The macroblock_intra flag gets its value from macroblock_type. If it is set, the DC coefficient of the block is coded separately from the AC coefficients.