The structural similarity (SSIM) index is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. The first version of the model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University.
SSIM is used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference. SSIM is designed to improve on traditional methods such as peak signal-to-noise ratio (PSNR) and mean squared error (MSE).
The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik Index, and was developed by Zhou Wang and Al Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing.. In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings.
SSIM was rapidly adopted by the image processing community, in part because the March 2000 FRTV Phase I report by the Video Quality Experts Group had concluded that nine previously proposed models for perceptual quality were ineffective. The 2004 SSIM paper has been cited more than 15,000 times according to Google Scholar, making it one of the highest cited papers in the image processing and video engineering fields. It was accorded the IEEE Signal Processing Society Best Paper Award for 2009. It also received the IEEE Signal Processing Society Sustained Impact Award for 2017, indicative of a paper having an unusually high impact for at least 10 years following its publication. The inventors of SSIM were each accorded an individual Primetime Engineering Emmy Award by the Television Academy in 2015.
The difference with respect to other techniques mentioned previously such as MSE or PSNR is that these approaches estimate absolute errors; on the other hand, SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image.
The SSIM index is calculated on various windows of an image. The measure between two windows and of common size N×N is:
- the average of ;
- the average of ;
- the variance of ;
- the variance of ;
- the covariance of and ;
- , two variables to stabilize the division with weak denominator;
- the dynamic range of the pixel-values (typically this is );
- and by default.
The SSIM index satisfies the condition of symmetry:
The SSIM formula is based on three comparison measurements between the samples of and : luminance (), contrast () and structure (). The individual comparison functions are:
with, in addition to above definitions:
SSIM is then a weighted combination of those comparative measures:
Setting the weights to 1, the formula can be reduced to the form shown at the top of this section.
Application of the formula
In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a decimal value between -1 and 1, and value 1 is only reachable in the case of two identical sets of data. Typically it is calculated on window sizes of 8×8. The window can be displaced pixel-by-pixel on the image, but the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation.
A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases.
Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions. The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception.
Structural dissimilarity (DSSIM) is a distance metric derived from SSIM (though the triangle inequality is not necessarily satisfied).
Video quality metrics
It is worth noting that the original version SSIM was designed to measure the quality of still images. It doesn't contain any parameters directly related to temporal effects of human perception and human judgment. However, several temporal variants of SSIM have been developed.
A simple application of SSIM to estimate video quality would be to calculate the average SSIM value over all frames in the video sequence.
Owing to its performance and low computation cost, SSIM has become widely used in the broadcast, cable and satellite television industries. It has become a dominant method of measuring video quality in broadcast and post-production houses throughout the television industry. These achievements were the basis for the team's Emmy Award.
SSIM is included in a number of video quality measurement tools used globally, including those marketed by Video Clarity, National Instruments, Rodhe and Schwarz, and SSIMWave. Overall, SSIM and its variants – such as Multiscale SSIM – are amongst the most widely used full-reference perceptual image and video quality models throughout the world, as evidenced by high citation count, wide industry acceptance, and industry recognition and awards.
Due to its popularity, SSIM is often compared to other metrics, including more simple metrics such as MSE and PSNR, and other perceptual image and video quality metrics. SSIM has been repeatedly shown to significantly outperform MSE and its derivates in accuracy, including research by its own authors and others.
A paper by Dosselmann and Yang claims that the performance of SSIM is “much closer to that of the MSE” than usually assumed. While they do not dispute the advantage of SSIM over MSE, they state an analytical and functional dependency between the two metrics. According to their research, SSIM has been found to correlate as well as MSE-based methods on subjective databases other than the databases from SSIM's creators. As an example, they cite Reibman and Poole, who found that MSE outperformed SSIM on a database containing packet-loss–impaired video. In another paper, an analytical link between PSNR and SSIM was identified.
- Wang, Zhou; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. (2004-04-01). "Image quality assessment: from error visibility to structural similarity". IEEE Transactions on Image Processing. 13 (4): 600–612. doi:10.1109/TIP.2003.819861. ISSN 1057-7149.
- "VQEG FRTV Phase I". 2000.
- "IEEE Signal Processing Society, Best Paper Award" (PDF).
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. (2003-11-01). "Multiscale structural similarity for image quality assessment". Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2004. 2: 1398–1402 Vol.2. doi:10.1109/ACSSC.2003.1292216.
- Søgaard, Jacob; Krasula, Lukáš; Shahid, Muhammad; Temel, Dogancan; Brunnström, Kjell; Razaak, Manzoor (2016-02-14). "Applicability of Existing Objective Metrics of Perceptual Quality for Adaptive Video Streaming". Electronic Imaging. 2016 (13): 1–7. doi:10.2352/issn.2470-1173.2016.13.iqsp-206.
- Dosselmann, Richard; Yang, Xue Dong (2009-11-06). "A comprehensive assessment of the structural similarity index". Signal, Image and Video Processing. 5 (1): 81–91. doi:10.1007/s11760-009-0144-1. ISSN 1863-1703.
- Li, Chaofeng; Bovik, Alan Conrad (2010-01-01). "Content-weighted video quality assessment using a three-component image model". Journal of Electronic Imaging. 19 (1): 011003–011003–9. doi:10.1117/1.3267087. ISSN 1017-9909.
- Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. (September 2012). "A comprehensive evaluation of full reference image quality assessment algorithms". 2012 19th IEEE International Conference on Image Processing: 1477–1480. doi:10.1109/icip.2012.6467150.
- Zhou Wang; Wang, Zhou; Li, Qiang. "Information Content Weighting for Perceptual Image Quality Assessment". IEEE Transactions on Image Processing. 20 (5): 1185–1198. doi:10.1109/tip.2010.2092435.
- Channappayya, S. S.; Bovik, A. C.; Caramanis, C.; Heath, R. W. (March 2008). "SSIM-optimal linear image restoration". 2008 IEEE International Conference on Acoustics, Speech and Signal Processing: 765–768. doi:10.1109/icassp.2008.4517722.
- Gore, Akshay; Gupta, Savita (2015-02-01). "Full reference image quality metrics for JPEG compressed images". AEU - International Journal of Electronics and Communications. 69 (2): 604–608. doi:10.1016/j.aeue.2014.09.002.
- Reibman, A. R.; Poole, D. (September 2007). "Characterizing packet-loss impairments in compressed video". 2007 IEEE International Conference on Image Processing. 5: V – 77–V – 80. doi:10.1109/icip.2007.4379769.
- Hore, A.; Ziou, D. (August 2010). "Image Quality Metrics: PSNR vs. SSIM". 2010 20th International Conference on Pattern Recognition: 2366–2369. doi:10.1109/icpr.2010.579.