Jump to content

ZPEG

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by DrWestwater (talk | contribs) at 20:24, 27 January 2017 (Format Conversion). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.

Overview

ZPEG is a motion video technology that applies a human visual acuity model to a decorrelated transform-domain space, thereby optimally reducing the redundancies in motion video by removing the subjectively imperceptible. This technology is applicable to a wide range of video processing problems such as video optimization, real-time motion video compression, subjective quality monitoring, and format conversion.

Decorrelated Transform Space

Pixel distributions are well-modeled as stochastic processes, and a transformation to their ideal decorrelated representation is accomplished by the Karhunen–Loève transform [1]. The Discrete Cosine Transform (DCT) is often used as a computationally efficient transform that closely approximates the Karhunen–Loève transform for video data due to the strong correlation in pixel space typical of video frames [2][3]. As the correlation in the temporal direction is just as high as that of the spatial directions, a three-dimensional DCT may be used to decorrelate motion video [4].

Human Visual Model

A Human Visual Model may be formulated based on the contrast sensitivity of the visual perception system [5]. A time-varying Contrast Sensitivity model may be specified, and is applicable to the three-dimensional DCT [6]. A three-dimensional Contrast Sensitivity model is used to generate quantizers for each of the three-dimensional basis vectors, resulting in a near-optimal visually lossless removal of imperceptible motion video artifacts[7].

Perceptual Strength in VisiBels

The perceptual strength of the Human Visual Model quantizer generation process is calibrated in visiBels (vB), a logarithmic scale roughly corresponding to perceptibility as measured in screen heights. As the eye moves further from the screen, it becomes less able to perceive details in the image. The ZPEG model also includes a temporal component, and thus is not fully described by viewing distance. In terms of viewing distance, the visiBel strength increases by six as the screen distance halves. The standard viewing distance for Standard Definition television (about 7 screen heights) is defined as 0vB. The normal viewing distance for High Definition video, about 4 screen heights, would be defined as about -6 vB (3.5 screen heights).

Video Optimization

The ZPEG pre-processor optimizes motion video sequences for compression by existing motion estimation-based video compressors, such as AVC (H.264) and HEVC (H.265). The human visual acuity model is converted into quantizers for direct application to a three-dimensional transformed block of the motion video sequence, followed by an inverse quantization step by the same quantizers. The motion video sequence returned from this process is then used as input to the existing compressor.

Compression Boost Strength

The application of Human Visual System-generated quantizers to a block-based Discrete Cosine Transform results in increased compressability of a motion video stream by removing imperceptible content from the stream. The result is a curated stream that has removed detailed spatial and temporal details that the compressor would otherwise be required to reproduce. The stream also produces better matches for motion estimation algorithms. The quantizers are generated to be imperceptible at a specified viewing distance, specified in visiBels. Typical pre-processing viewing conditions in common use are:

  • Standard Definition video is processed at -6 vB
  • High Definition video is processed at -12 vB
  • Ultra-High Definition video (UHD, 4K) is processed at -12 vB
  • Immersive Ultra-High Definition video (Virtual Reality) is processed at -18 vB

Average compression savings for 6Mbs HD video using the x.264 codec when processed at -12vB is 21.88%. Average compression savings for 16Mbs Netflix 4K test suite video using the x.264 codec processed at -12 vB is 29.81%. The same Netflix test suite when compressed for immersive viewing (-18vB) generates a 25.72% savings. These results are reproducible through use of a publicly-accessible test bed [8].

Deblocking

While the effects of ZPEG pre-processing are imperceptible to the average viewer at the specified viewing distance, edge effects introduced by block-based transform processing still affect the performance advantage of the video optimization process[9]. While existing deblocking filters may br applied to improve this performance, optimal results are obtained through use of a multi-plane deblocking algorithm. Each plane is offset by one-half the block size in each of four directions, such that the offset of the plane is one of (0,0), (0,4), (4, 0), and (4,4) in the case of 8x8 blocks[10] and four planes. Pixels values are then chosen according to their distance to the block edge, with interior pixel values being preferred to boundary pixel values. The resulting deblocked video generates substantially better optimization over a wide range of pre-processing strengths.

Real-Time Video Compression

Conventional motion video compression solutions are based on motion estimation technology[11]. While some transform-domain technologies exist[12], ZPEG is based on the three-dimensional Discrete Cosine Transform (DCT)[13], where the three dimensions are pixel within line, line within frame, and temporal sequence of frames. The extraction of redundant visual data is performed by the computationally-efficient process of quantization of the transform-domain representation of the video, rather than the far more computatilnally expensive process of searching for object matches between blocks. Quantizer values are derived by applying a Human Visual Model to the basis set of DCT coefficients at a pre-determined perceptual processing strength. All perceptually redundant information is thereby removed from the transform domain representation of the video. Compression is then performed by an entropy removal process.[14]

Quantization

Once the viewing conditions has been chosen under which the compressed content is to be viewed, a Human Visual Model generates quantizers for application to the three-dimensional Discrete Cosine Transform (DCT)[15]. These quantizers are tuned to remove all imperceptible content from the motion video stream, greatly reducing the entropy of the representation. The viewing conditions expressed in visiBels and the correlationof pixels before transformation are generated for reference by the entropy encoder.

Context-Driven Entropy Coding

While quantized DCT coefficients have traditionally be modeled as LaPlace distributions[16], more recent work has suggested the Cauchy distribution better models the quantized coefficient distributions[17]. The ZPEG entropy encoder encodes quantized three-dimensional DCT values according to a distribution that is completely characterized by the quantization matrix and the pixel correlations. This side-band information carried in the compressed stream enables the decoder to synchronize its internal state to the encoder[18].

Subband Decomposition

Each DCT band is separately entropy coded to all other bands. These coefficients are transmitted in band-wise order, staring with the DC component, followed by the successive bands in order of low resolution to high, similar to wavelet decomposition[19]. Following this convention assures that the receiver will always receive the maximum possible resolution for any bandpass pipe, enabling a no-buffering transmission protocol.

Subjective Quality Metrics

Format Conversion

Statistically ideal format conversion is done by interpolation of video content in Discrete Cosine Transform space[20]. The conversion process, particularly in the case of up-sampling, must consider the ringing artifacts that occur when abrupt continuities take place in a sequence of pixels being re-sampled[21]. The resulting algorithm can down-sample or up-sample video formats by changing the frame dimensions, pixel aspect ratio, and frame rate.

  1. ^ "Karhunen–Loève theorem". Wikipedia. 2016-12-03.
  2. ^ Rao, Kamisetty; Yip, P (1990). Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press. ISBN 0080925340.
  3. ^ "Discrete cosine transform". Wikipedia. 20 January 2017.
  4. ^ Westwater, Raymond; Fuhrt, Borko (1997). Real Time Video Compression - Techniques and Algorithms. Springer. ISBN 978-0-585-32313-8.
  5. ^ Glenn, William (1993). Digital Image Compression Based on Visual Perception. MIT Press. pp. 63–71. ISBN 0-262-23171-9.
  6. ^ Barten, Peter (1999). Contract Sensitivity of the Human Eye and its Effects on Image Quality. SPIE Press. ISBN 0-8194-3496-5.
  7. ^ Watson, A.B. (1993). "A technique for visual optimization of DCT quantization matrices for individual images". Society for Information Display Digest of Technical Papers. XXIV: 946–949.
  8. ^ "ZPEG Demonstration Page". ZPEG. Retrieved 27 January 2017.
  9. ^ "Deblocking Filter". Wikipedia. Retrieved 27 January 2017.
  10. ^ "Why was the 8x8 DCT size chosen?". experts123. Retrieved 27 January 2017.
  11. ^ Furht, Borko; Greenberg, Jeffry; Westwater, Raymond (1997). Motion Estimation Algorithms for Video Compression. Springer. ISBN 978-1-4613-7863-1.
  12. ^ "Video codec". Video Codec. wikipedia. Retrieved 27 January 2017.
  13. ^ Hatim, Anas; Belkouch, Said; Hassani, Moha (May 2014). "Fast 8x8x8 RCF 3D_DCT/IDCT transform for real time video compression and its FPGA Implementation" (PDF). International Journal of Advances in Engineering & Technology. Retrieved 27 January 2017.
  14. ^ Westwater, Raymond. "Transform-Based Video Coding - Motivation for use of the Three-Dimensional Discrete Cosine Transform". researchgate.net. Retrieved 27 January 2017.
  15. ^ Westwater, Raymond. "Transform-Based Video Coding - Computation of Quantizers for the Three-Dimensional Discrete Cosine Transform". researchgate.net. Retrieved 27 January 2017.
  16. ^ Smoot, Stephen; Rowe, Lawrence A (1996). "Study of DCT coefficient distributions". Proceedings of the SPIE Symposium on Electronic Imaging. 2657. Retrieved 27 January 2017.
  17. ^ Kamaci, Nejat; Ghassan, Al-Rejib (February 2012). "Impact of Video Parameters on The DCT Coefficient Distribution for H.264-Like Video Coders" (PDF). Proceedings of SPIE - The International Society for Optical Engineering. 8305:3. Retrieved 27 January 2017.
  18. ^ Westwater, Raymond. "Transform-Based Video Coding - Correlation-Based Compression Using the Three-Dimensional Discrete Cosine Transform". researchgate.net. Retrieved 27 January 2017.
  19. ^ Gu, Junfeng; Jiang, Yimin; Baras, John. "3D wavelet based video codec with human perceptual model". US Patent 7006568. U S Patent Office. Retrieved 27 January 2017.
  20. ^ Westwater, Raymond. "Method for converting the resolution and frame rate of video data using Discrete Cosine Transforms". uspto.gov.
  21. ^ "Ringing artifacts". wikipedia.org. Retrieved 27 January 2017.