Jump to content

High Efficiency Video Coding

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 134.134.139.72 (talk) at 13:30, 15 July 2012 (removed lower bound (there is none) and spammy links). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a draft video compression standard, a successor to H.264/MPEG-4 AVC (Advanced Video Coding), currently under joint development by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG). MPEG and VCEG have established a Joint Collaborative Team on Video Coding (JCT-VC) to develop the HEVC standard.[1] HEVC is said to improve video quality and double the data compression ratio compared to H.264, and scales up to 7680 × 4320 pixels resolution.[2]

History

The ITU-T Video Coding Experts Group (VCEG) began significant study of technology advances that could enable creation of a new video compression standard (or substantial compression-oriented enhancements of the H.264/MPEG-4 AVC standard) in about 2004. Various techniques for potential enhancement of the H.264/MPEG-4 AVC standard were surveyed in October 2004. At the next meeting of VCEG, in January 2005, VCEG began designating certain topics as "Key Technical Areas" (KTA) for further investigation. A software codebase called the KTA codebase was established for evaluating such proposals in 2005.[3] The KTA software was based on the Joint Model (JM) reference software that was developed by the MPEG & VCEG Joint Video Team for H.264/MPEG-4 AVC. Additional proposed technologies were integrated into the KTA software and tested in experiment evaluations over the next four years. [4]

Two approaches for standardizing enhanced compression technology were considered: either creating a new standard or creating extensions of H.264/MPEG-4 AVC. The project had tentative names H.265 and H.NGVC (Next-generation Video Coding), and was a major part of the work of VCEG until its evolution into the HEVC joint project with MPEG in 2010. The "H.265" nickname was especially associated with the potential creation of a new standard.

The preliminary requirements for NGVC were bit rate reduction of 50% at the same subjective image quality comparing to H.264/MPEG-4 AVC High profile, with computational complexity ranging from 1/2 to 3 times that of the High profile. NGVC would be able to provide 25% bit rate reduction along with 50% reduction in complexity at the same perceived video quality as the High profile, or to provide greater bit rate reduction with somewhat higher complexity.[5]

"H.265" was used as a nickname for an entirely new standard, as was the "High-performance Video Coding" work by the ISO/IEC Moving Picture Experts Group (MPEG). Although some agreements about the goals of the project had been reached by early 2009, e.g. computational efficiency and high compression performance,[6] the state of technology at the time seemed not yet mature for creation of an entirely new "H.265" standard, as all contributions were essentially modifications closely based on the H.264/MPEG-4 AVC design.

The ISO/IEC Moving Picture Experts Group (MPEG) started a similar project in 2007, tentatively named High-performance Video Coding. Early evaluations were performed with modifications of the KTA reference software encoder developed by VCEG. By July 2009, experimental results showed average bit reduction of around 20% compared with AVC High Profile; these results prompted MPEG to initiate its standardization effort in collaboration with VCEG.

A formal joint Call for Proposals (CfP) on video compression technology was issued in January 2010 by VCEG and MPEG, and proposals were evaluated at the first meeting of the MPEG & VCEG Joint Collaborative Team on Video Coding (JCT-VC), which took place in April 2010. A total of 27 full proposals were submitted. Evaluations showed that some proposals could reach the same visual quality as AVC at only half the bit rate in many of the test cases, at the cost of 2×-10× increase in computational complexity; and some proposals achieved good subjective quality and bit rate results with lower computational complexity than the reference AVC High profile encodings. At that meeting, the name High Efficiency Video Coding (HEVC) was adopted for the joint project. Starting at that meeting, the JCT-VC integrated features of some of the best proposals into a single software codebase and a draft standard text specification, and performed further experiments to evaluate various proposed features.[7]

Schedule

The timescale for completing the HEVC standard is as follows:

  • February 2012: Committee Draft (complete draft of standard)
  • July 2012: Draft International Standard
  • January 2013: Final Draft International Standard (ready to be ratified as a Standard)

Features

HEVC aims to substantially improve coding efficiency compared to AVC High Profile, i.e. to reduce bitrate requirements by half with comparable image quality, at the expense of increased computational complexity. Depending on the application requirements, HEVC should be able to trade off computational complexity, compression rate, robustness to errors, and processing delay time.

HEVC is targeted at next-generation HDTV displays and content capture systems which feature progressive scanned frame rates and display resolutions from QVGA (320×240) up to 1080p (1920×1080) and 4320p (7680×4320), as well as improved picture quality in terms of noise level, color gamut, and dynamic range.[8][5][9][10]

HEVC replaces macroblocks with flexible scheme based on coding units (CUs), variable size structures which sub-partition the picture into rectangular regions. Each CU contains variable-block-sized prediction unit (PUs) of either intra-picture or inter-picture prediction type, and transform units (TUs) which contain coefficients for spatial block transform and quantization.[11]

Coding tools

Prediction block size

Instead of macroblocks, HEVC divides the picture into coding tree blocks, which are either 64x64, 32x32 or 16x16 pixel regions. These coding tree blocks or coding units (CUs) can be further hierarchically subdivided all the way down to 8x8 sized CUs. The arrangement of different sized CUs within a coding tree block is known as a quadtree, since a subdivision results in four smaller regions.

Higher bit depth

Internal bit depth increase (IBDI) allows the encoding of images that have a pixel bit-depth that is higher than 8.

Parallel processing tools

  • Tiles. The picture can be divided up into a grid of rectangular regions that can independently be decoded. The start of each tile is signaled in the bitstream, allowing for multi-threaded decode.
  • Wavefront parallel processing (WPP). Each row of Coding Tree Blocks can be decoded by a separate thread (as long as it does not get ahead of the row above).
  • Slices. Similar to AVC.

Entropy Coding

HEVC specifies a Context-adaptive binary arithmetic coding (CABAC) algorithm that is fundamentally very similar to AVC's CABAC, but great care has been taken in the design to ensure that hardware decoder implementations are simpler and much easier to speed up. In particular, context selection of any given bin never depends on the value of the preceding bin, and bins coded in bypass mode grouped together as much as possible to allow higher decode speeds. Compared to AVC, there are much less context states and more bins end up being coded in bypass mode.

Intra Prediction

HEVC specifies 33 different intra predictions directions as well as a Planar and DC mode, for a total of 35 modes. In addition to these 35 modes, chroma can be predicted by deriving a best-fit linear transform (scale and offset) between the chroma and luma prediction arrays, then applying this scale to the reconstructed luma samples. This tool is knows as Luma from Chroma, or LM.

Motion Compensation

Luma is filtered to quarter-pel accuracy with a high precision 8-tap filter. Chroma is filtered with an eighth-pel 4-tap filter. A motion compensated region can be either single or bidirectionally interpolated (one or two motion vectors and reference frames). Each direction can be individually weighted.

Motion Vector Prediction

Two methods exist:

  • Advanced Motion Vector Prediction (AMVP). A list of two predictors is constructed using neighboring predictors, temporal predictors and possibly zero predictors. An index into this list is signaled, as well as a motion vector delta.
  • Merge mode. A list of up to five predictors is constructed using neighboring and temporal predictors, followed by various bidirectional combinations of these and finally zero vectors. An index into this list is signaled, but no motion vector delta is used.

Inverse Transforms

Four different transform sizes are specified: 4x4, 8x8, 16x16 and 32x32. Additionally the following non-square transforms are available: 32x8, 8x32, 16x4 and 4x16. The transform is an integer approximation of the DCT. Additionally, 4x4 luma blocks that belong to an intra coded region are transformed using the Discrete Sine Transform (DST). The transform of a 4x4 block can optionally be skipped entirely, a useful option for screen content, i.e. GUIs which often have many thin straight lines.
Chroma is transformed the same way. so there is no 2x2 transform for chroma. Unlike AVC, columns are transformed first, followed by rows. A coding unit can be hierarchically split (quadtree) all the way down to 4x4 regions. For example, a 16x16 coding unit can be transformed with one 16x16 transform unit (TU), or four 8x8 TUs. One of those 8x8 TUs could be split into four 4x4 TUs.

Loop Filters

There are three loop filters, applied in the following order:

  • De-blocking filter. While similar to AVC's de-blocking filter, HEVC only filters edges that are on the 8x8 grid, removing many pixel processing dependencies and thus allowing simpler hardware designs and parallel processing. Unlike AVC, all vertical edges of the whole picture are filtered first, then followed by the horizontal edges. As of July 2012, the draft only specifies three boundary strengths: 2, 1 and 0.
  • Sample Adaptive Offset (SAO). This filter is a per-pixel operation that has two basic modes. The first mode is Edge Offset, of which there are 4 variations. It operates by comparing the value of a pixel to two of its eight neighbors (depending on the mode variation). Based on the magnitude differences, one of four possible offsets is added to the pixel. The second mode is referred to as Band Offset, where a pixels within a certain magnitude range get one of four offsets added. The filter mode and four offsets are chosen by the encoder for each Coded Tree Block in an attempt to get the pixels in that block to more closely match the source material.
  • Adaptive Loop Filter (ALF). Also a per-pixel filter, ALF is essentially a simplified wiener filter which can reduce the noise caused by the encoding process. It is a computationally expensive operation, requiring up to 10 multiplications per pixel.

Interlaced support

HEVC only supports the coding of interlaced pictures at a higher level. Unlike AVC, no interlace-specific tools are present. An interlaced video sequence could be coded by coding each field as a separate picture.

Profiles

The July 2012 draft of the standard[12] includes a single profile, Main, which is similar to the Progressive High profile in H.264 AVC. The draft standard contains provisions for future extensions similar to Scalable Video Coding and Multiview Video Coding defined in H.264 AVC. Notable constraints specified by Main Profile include:

  • Bit-depth restricted to 8 bits per pixel.
  • Decoded picture buffer size restricted to 6 pictures.
  • LM mode disallowed
  • ALF disabled
  • Non-square transformed not allowed.
  • Wavefront and tiles are allowed but not required. If present, tiles must be at least 384 pixels wide.

Levels

The February 2012 draft standard defines sixteen Levels, which are a set of constraints for required decoder performance. The levels retain the basic structure of H.264 AVC.

Levels with maximum property values
Level Max luma pixel rate
(samples/s)
Max picture buffer size
(samples)
Max bit rate
(kbit/s)
Example picture resolution @
picture rate
(max stored pictures)
1 552,960 36,864 128 128×96@33.6 (6)
176×144@15.0 (6)
2 3,686,400 122,880 1,000 320×240@45.0 (6)
352×288@30.0 (6)
3 13,762,560 458,752 5,000 352×480@70.0 (6)
352×576@62.2 (6)
720×480@35.0 (6)
720×576@31.1 (6)
854×480@30.0 (6)
3.1 33,177,600 983,040 9,000 720×480@84.3 (6)
720×576@75.0 (6)
854×480@72.3 (6)
960×540@60.0 (6)
1280×720@33.7 (6)
4 62,668,800 2,088,960 15,000 960×540@113.3 (6)
1,280×720@63.7 (6)
1,920×1,080@30.0 (6)
4.1 30,000
4.2 133,693,440 2,228,224 30,000 960×540@241.7 (6)
1,280×720@136.0 (6)
1,920×1,080@64.0 (6)
2,048×1,080@60.0 (6)
4.3 50,000
5 267,386,880 8,912,896 50,000 1,920×1,080@128.0 (6)
2,048×1,080@120.0 (6)
2,560×1,920@54.4 (6)
3,672×1,536@46.8 (6)
4,096×2,160@30.0 (6)
5.1 100,000
5.2 534,773,760 150,000 1,920×1,080@256.0 (6)
2,048×1,080@240.0 (6)
2,560×1,920@108.8 (6)
3,672×1,536@93.7 (6)
4,096×2,160@60.0 (6)
6 1,002,700,800 33,423,360 300,000 1,920×1,080@300.0 (6)
2,560×1,920@204.0 (6)
4,096×2,304@106.2 (6)
7,680×4,320@30.0 (6)
6.1 2,005,401,600 500,000 1,920×1,080@300.0 (6)
2,560×1,920@300.0 (6)
4,096×2,304@212.5 (6)
7,680×4,320@60.0 (6)
6.2 4,010,803,200 800,000 1,920×1,080@300.0 (6)
2,560×1,920@300.0 (6)
4,096×2,304@300.0 (6)
7,680×4,320@120.0 (6)

The maximum number of decoded picture buffers is currently 6 for all Levels.

See also

References

  1. ^ ITU TSB (23 April 2010). "Joint Collaborative Team on Video Coding". ITU-T. Retrieved 21 May 2010.
  2. ^ http://www.linuxfordevices.com/c/a/News/ZiiLabs-ZMS40/ Quad-core SoC supports Android 4.0, 3840 x 1080 video resolution, including up to true 4K resolution 3D video.
  3. ^ T. Wedi and T.K. Tan, AHG report – Coding Efficiency Improvements, VCEG document VCEG-AA06, October 2005.
  4. ^ Draft meeting report for 31st VCEG Meeting (Marrakech, MA, 15–16 January 2007)
  5. ^ a b ITU-T VCEG, Draft requirements for "EPVC" enhanced performance video coding project, 10 July 2009.
  6. ^ "An Interview With Dr. Thomas Wiegand". in-cites. 18 April 2004. Retrieved 21 May 2010.
  7. ^ "Documents of the first meeting of the Joint Collaborative Team on Video Coding (JCT-VC) – Dresden, Germany, 15–23 April 2010". ITU-T. 23 April 2010. Retrieved 21 May 2010.
  8. ^ Press release of the 88th MPEG meeting, Maui, USA, April 2009.
  9. ^ ISO/IEC JTC1/SC29/WG11/N11872: Vision, Applications and Requirements for High Efficiency Video Coding (HEVC). January 2011, Daegu, KR
  10. ^ ISO/IEC JTC1/SC29/WG11/N10361: Vision and Requirements for High-Performance Video Coding (HVC). January 2009
  11. ^ ISO/IEC JTC1/SC29/WG11/N11822: Description of High Efficiency Video Coding (HEVC). January 2011, Daegu, KR
  12. ^ JCTVC-H1003: High Efficiency Video Coding (HEVC) text specification draft 9. B. Bross, W.-J. Han, G. J. Sullivan, J.-R. Ohm, T. Wiegand (Editors)