Scalable Video Coding
||This article's lead section may not adequately summarize key points of its contents. (July 2010)|
Scalable Video Coding (SVC) is the name for the Annex G extension of the H.264/MPEG-4 AVC video compression standard. SVC standardizes the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. A subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream. The subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal. H.264/MPEG-4 AVC was developed jointly by ITU-T and ISO/IEC JTC 1. These two groups created the Joint Video Team (JVT) to develop the H.264/MPEG-4 AVC standard.
The objective of the SVC standardization has been to enable the encoding of a high-quality video bitstream that contains one or more subset bitstreams that can themselves be decoded with a complexity and reconstruction quality similar to that achieved using the existing H.264/MPEG-4 AVC design with the same quantity of data as in the subset bitstream. The subset bitstream is derived by dropping packets from the larger bitstream.
A subset bitstream can represent a lower spatial resolution, or a lower temporal resolution, or a lower quality video signal (each separately or in combination) compared to the bitstream it is derived from. The following modalities are possible:
- Temporal (frame rate) scalability: the motion compensation dependencies are structured so that complete pictures (i.e. their associated packets) can be dropped from the bitstream. (Temporal scalability is already enabled by H.264/MPEG-4 AVC. SVC has only provided supplemental enhancement information to improve its usage.)
- Spatial (picture size) scalability: video is coded at multiple spatial resolutions. The data and decoded samples of lower resolutions can be used to predict data or samples of higher resolutions in order to reduce the bit rate to code the higher resolutions.
- SNR/Quality/Fidelity scalability: video is coded at a single spatial resolution but at different qualities. The data and decoded samples of lower qualities can be used to predict data or samples of higher qualities in order to reduce the bit rate to code the higher qualities.
- Combined scalability: a combination of the 3 scalability modalities described above.
SVC enables forward compatibility for older hardware: the same bitstream can be consumed by basic hardware which can only decode a low-resolution subset (i.e. 720p or 1080i), while more advanced hardware will be able decode high quality video stream (1080p).
Background and applications
Bit-stream scalability for video is a desirable feature for many multimedia applications. The need for scalability arises from graceful degradation transmission requirements, or adaptation needs for spatial formats, bit rates or power. To fulfill these requirements, it is beneficial that video is simultaneously transmitted or stored with a variety of spatial or temporal resolutions or qualities which is the purpose of video bit-stream scalability.
Traditional digital video transmission and storage systems are based on H.222.0/MPEG-2 TS systems for broadcasting services over satellite, cable, and terrestrial transmission channels, and for DVD storage, or on H.320 for conversational video conferencing services. These channels are typically characterized by a fixed spatio-temporal format of the video signal (SDTV or HDTV or CIF for H.320 video telephone). The application behavior in such systems typically falls into one of the two categories: it works or it doesn't work.
Modern video transmission and storage systems using the Internet and mobile networks are typically based on RTP/IP for real-time services (conversational and streaming) and on computer file formats like mp4 or 3gp. Most RTP/IP access networks are typically characterized by a wide range of connection qualities and receiving devices. The varying connection quality results from adaptive resource sharing mechanisms of these networks addressing the time varying data throughput requirements of a varying number of users. The variety of devices with different capabilities ranging from cell phones with small screens and restricted processing power to high-end PCs with high-definition displays results from the continuous evolution of these endpoints.
Scalable video coding (SVC) is one solution to the problems posed by the characteristics of modern video transmission systems. The following video applications can benefit from SVC:
History and timeline
- October 2003: The Moving Picture Experts Group (MPEG) issued a call for proposals on SVC Technology.
- April 2004: Fourteen proposals were submitted; twelve were based on compression by wavelets, and two were extensions of H.264/MPEG-4 AVC.
- October 2004: The proposal made by the image communication group of the Heinrich-Hertz-Institute (HHI) was chosen by MPEG as the starting point of its SVC standardization project.
- January 2005: MPEG and the Video Coding Experts Group (VCEG) did agree to standardize the SVC project as an amendment of the H.264/MPEG-4 AVC standard.
- July 2007: The SVC project received final approval[clarification needed]
Profiles and levels
As a result of the Scalable Video Coding extension, the standard contains five additional scalable profiles: Scalable Baseline, Scalable High, Scalable High Intra, Scalable Constrained Baseline and Scalable Constrained High Profile. These profiles are defined as a combination of the H.264/MPEG-4 AVC profile for the base layer (2nd word in scalable profile name) and tools that achieve the scalable extension:
- Scalable Baseline Profile: Mainly targeted for conversational, mobile, and surveillance applications.
- A bitstream conforming to Scalable Baseline profile contains a base layer bitstream that conforms to a restricted version of Baseline profile of H.264/MPEG-4 AVC.
- Supports B slices, weighted prediction, CABAC entropy coding, and 8×8 luma transform in enhancement layers (CABAC and the 8×8 transform are only supported for certain levels), although the base layer has to conform to the restricted Baseline profile, which does not support these tools. Coding tools for interlaced sources are not included.
- Spatial scalable coding is restricted to resolution ratios of 1.5 and 2 between successive spatial layers in both horizontal and vertical direction and to macroblock-aligned cropping.
- Quality and temporal scalable coding are supported without any restriction.
- Scalable High Profile: Primarily designed for broadcast, streaming, storage and videoconferencing applications.
- A bitstream conforming to Scalable High profile contains a base layer bitstream that conforms to High profile of H.264/MPEG-4 AVC.
- Supports all tools specified in the Scalable Video Coding extension.
- Spatial scalable coding without any restriction, i.e., arbitrary resolution ratios and cropping parameters is supported.
- Quality and temporal scalable coding are supported without any restriction.
- Scalable High Intra Profile: Mainly designed for professional applications.
- Uses Instantaneous Decoder Refresh (IDR) pictures only. IDR pictures can be decoded without reference to previous frames.
- A bitstream conforming to Scalable High Intra profile contains a base layer bitstream that conforms to High profile of H.264/MPEG-4 AVC with only IDR pictures allowed.
- All scalability tools are allowed as in Scalable High profile but only IDR pictures are permitted in any layer.
- Scalable Constrained Baseline Profile
- Scalable Constrained High Profile
- Adam7 algorithm, used in PNG interlacing
- Bitrate peeling
- Hierarchical modulation
- JPEG 2000
Introduction and overview
- Overview paper on SVC by H. Schwarz, D. Marpe, and T. Wiegand
- HHI presentation of the Scalable Extension of H.264/AVC
- MPEG - Technologies - Overview of Scalable Video Coding (chiariglione.org)