Jump to content

Scene statistics

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Almon.David.Ing (talk | contribs) at 20:02, 28 May 2010 (I added the article scene (perception) and transferred the relevant info from this article into that one.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Scene statistics is a discipline within the field of perception. It is based on the premise that a perceptual system is designed to interpret scenes.

Biological perceptual systems have evolved in response to physical properties of natural environments[1]. Therefore natural scenes receive a great deal of attention[2].

Natural scene statistics can play an important role in Ideal Observer Analysis. They are useful for defining the behavior of an Ideal Observer in a naturalistic task, typically by incorporating signal detection theory, information theory, or estimation theory.

Within-domain vs. across-domain

Geisler (2008)[3] distinguishes between four kinds of domains: (1) Physical environments, (2) Images/Scenes, (3) Neural responses, and (4) Behavior.

Within the domain of images/scenes, one can study the characteristics of information related to redundancy and efficient coding.

Across-domain statistics determine how an autonomous system should make inferences about its environment, process information, and control its behavior. To study these statistics, it is necessary to sample or register information in multiple domains simultaneously.

The image above[4] was generated from a database of segmented leaves that simultaneously registers natural images (scene information) with the exact locations of leaf boundaries (information about the physical environment). Such a database can be used to study across-domain statistics.

Still monocular images

Achromatic

In 1987, Field [5] conducted a classic investigation of luminance contrast in natural images by showing that the amplitude of image intensity is inversely proportional to frequency in the Fourier domain. This suggested a type of redundancy in the way images tend to be coded. In simple terms, the result means that as the distance between any two pixels increases, the expected difference in pixel intensities also increases.

In 1994, Ruderman and Bialek [6] later expanded on Field’s work by showing that the amplitude spectrum for a log-luminance image captured in a forested environment tends to be proportional to (1/f)0.91 where f is the spatial frequency. For any single image, only small deviations from this rule are expected.

Brady & Field (2000)[7] measured the distribution of contrasts in natural scenes and derived an optimal contrast response function based on a histogram equalization technique. The derived function matched the observed contrast response function of cortical cells studied by Albrecht & Hamilton (1982)[8] demonstrating that neural processing reflects the statistics of natural images.

Frazor & Geisler (2006)[9] also explored the range of local contrast information present in natural images. They showed that the dynamic range of luminance in natural images, including 95% of luminance samples, is 1.0 log unit (base 10), implying that luminance tends to vary by a factor of 10 in a given natural image. Similarly, the dynamic range of local RMS contrast is 1.5 log units, implying that local RMS contrast tends to vary by a factor of 30 in a given natural image. There was no correlation in the luminance of two image patches separated by the distances consistent with the typical lengths of human saccades. There was also no correlation in local contrast. In addition, local luminance and contrast are generally not correlated with each other within a given image (except that patches of sky stand out with high luminance and low contrast), implying that luminance and contrast are two independent sources of information. Mante et al. (2005)[10] demonstrated that neural processing reflected these findings by demonstrating that gain control for the two sources of information are independent in the lateral geniculate nucleus (LGN).

Local patterns of contrast in natural images tend to be of a specific variety, and this has important implications for efficient coding of natural images. Using a technique similar to Blind Source Separation, Bell & Sejnowski (1997)[11] revealed that the most informative components of natural scenes, which they called independent components, are similar to two-dimensional log-Gabor filters. Olshausen & Field (1997)[12] also achieved the same result using a similar technique. These independent components look like edges, so both groups argue that if the goal of the visual system is to encode contrast, then an efficient method for doing so would be to encode these edge-like patterns. The visual system may follow this principle because these independent components resemble the receptive fields in V1, but their spatial frequency is too high on average (but this may be a result of the whitening procedures used, the whitening method is unclear).

Real images, however, are not really composed of disjoint edge elements subtending small spatial scales. Instead they are composed of long and smooth contours. Sigman et al. (2001)[13] demonstrated that edges tend to be mutually arranged as if they come from contours of constant curvature, also called cocircular contours. Similarly, Hoyer & Hyvarinen (2002)[14] discovered that the independent components of edge activation ensembles correspond to extended contours and end-stop formations (since some contours tend to reach an end point). Both studies are important because they demonstrate that contours, like those found in natural environments, are a higher order source of variation in natural images than disjoint edge elements.

Geisler et al. (2001)[15] took this a step further by exploring the inferential power of edge ensembles with respect to predicting the path of contours in natural images. By analyzing the co-occurrence statistics of pairs of edges, not only did Geisler et al. confirm the observations of Sigman et al. about cocircular contours, but they precisely quantified the inferential power of the related across-domain statistics. Elder & Goldberg (2002)[16] later confirmed the robustness of that result by using a slightly different method to arrive at a similar result. Geisler et al. also demonstrated that the statistics were important by showing how they accounted for approximately 80% of the variance of human performance in a contour completion task exploring a broad range of contour shapes. The work demonstrated that the human visual system is capable of behaving in accordance with the statistics of contour grouping in natural images, so it has led to a viable hypothesis describing the nature of algorithms in early areas of visual processing.

Chromatic

Krinov (1947)[17] gathered natural materials and demonstrated that the reflectance spectra of these materials tend to vary smoothly and regularly as a function of wavelength to the extent that only three principal components are required to capture most of this variation. Since irradiance spectra and reflectance spectra are so regular, radiance spectra are also very regular. The human visual system may be limited to three kinds of photoreceptors because an additional type of receptor would not add any more knowledge about radiances typically observed in the environment. This leads to the idea of trichromacy, that the color of something can be described by three numbers.

Ruderman, Cronin, & Chaio (1998)[18] studied trichromatic images captured in the natural forest environment. The basis for the trichromatic signal was to filter the image through the wavelength sensitivity functions of human L (long wavelength), M (medium wavelength), and S (short wavelength) cones (as measured by Stockman, MacLeod, and Johnson, 1993[19]) to mimic the trichromatic signal received by the human visual system. They found that for a random image pixel, the expected log of cone responses was Gaussian and contained three principal components. The first principal component was denoted because of its resemblance to the classic idea of luminance and was roughly proportional to ; the second principal component, , was roughly proportional to and resembled a yellow-blue opponent channel; and the third principal component, , was roughly proportional to and resembled a red-green opponent channel. Therefore, color variation in a forested environment is well described by three numbers: , , and ; because the numbers are distributed as Gaussian and not correlated with each other.

Lee et al. (2002)[20] were interested in explaining why the cone responses are highly correlated. One reason for this must be that the spectral sensitivity functions of the cones overlap, but another explanation might be that natural radiance functions tend to be smooth (as opposed to choppy). To address this question, Lee et al. invented cone sensitivity functions that did not overlap and then simulated the perception of natural radiances with those sensitivity functions. The correlational structure was still apparent in the simulated cone responses. This was important because it demonstrated that the smoothness of natural radiance functions was a major reason why this correlational structure exists.

The three-dimensional distribution of , , and values may be Gaussian in forested environments, but Fine, MacLeod, and Boynton (2003)[21] showed that for a random pair of image pixels, the expected difference in , , and values is leptokurtic. In other words, when moving around an image, the values of , , and either change by a small or large magnitude, with relatively few changes of medium sized magnitude (relative to Gaussian normality). In addition, there were slight correlations in the magnitude of change between , , and ; suggesting that when one value changes, the other values are likely to change. This may be attributable to the fact that when both pixels lie on the same surface, the radiance is relatively constant, causing no change in the values; but when the pixels lie on different surfaces, the radiance changes drastically, causing a change in all the values.

Lewis and Zhaoping (2006)[22] later demonstrated that the shape of the cone sensitivity functions is suboptimal for capturing the variance in natural radiance spectra. One possible explanation is that it was difficult for evolution to produce the required sensitivity functions, but a different explanation is that the sensitivity functions are optimized for a specific task that is important for primate survival. One possibility is that the shape helps in discriminating red surfaces (e.g. fruit) from green surfaces (e.g. leaves) given all the variation in the environment due to shading. Lovell et al. (2005)[23] studied this problem with respect to hyperspectral images containing red fruit and green backgrounds under various lighting conditions and concluded that the L (long wavelength) and M (medium wavelength) cones are nearly optimal for making this discrimination because their sensitivity functions are broadly tuned with adjacent peaks. The starling (a type of bird), on the other hand, has narrower sensitivity functions with a large gap between the peaks, and this proves to be a less optimal design for solving the task.

A third possible explanation for the shape of cone sensitivity functions may be that it helps to isolate the sources of image contrast due to shading. Endler (1993)[24] studied the phenomenon of shading by measuring the irradiant in several different forest environments, ranging from dense shade (no direct sunlight) to no shade (direct sunlight), from cloudy to sunny days, and through all times of day. The result was that the irradiant varied principally along one major axis, ranging from the color of blue sky at one extreme to green foliage at the other. Another axis of variation was introduced by orange, red, or purple sunsets; but that was considered relatively minor and therefore insignificant. Since the primary variation could be accounted for in the range of colors from blue to green, the relative activity of the S and M cones is more strongly affected by this variation while the relative activity between the M and L cones is more immune. Therefore, information about total changes in luminance () relative to changes in red-green levels () should be valuable for isolating the artifacts of shading in images.

Wachtler et al. (2001) studied the possibility that the spatial distribution of color in images might show regularity. They analyzed image patches, square arrangements of 49 pixels (7 x 7 arrangements), encoded with respect to the log of L, M, and S cone responses. A principal component analysis revealed eigenvectors that were segregated along the dimensions of , , and . Within these three groups, a spatial arrangement resembling checkerboard patterns was observed. The size of the checkers (from large to small) predicted the rank order of the proportion of variance accounted for (from large to small) by each eigenvector, exactly as one would expect from a principal component analysis of grayscale images (encoded with only one color channel). In other words, the principal components of color were completely separable from the principal components of spatial variation.

However, a different result emerges when an independent component analysis is run. Wachtler et al. (2001)[25] borrowed the technique of Bell & Sejnowski (1997), but used color image patches instead of grayscale. The independent components did not show segregation along three color channels as the principal components had. Therefore, unlike the principal components, the independent components of natural images reflect a mixed relationship between spatial variation and color variation. Wachtler suggests that the representation of color in LGN, with segregated , , and color channels, is more like the representation uncovered by a principal component analysis; but the representation in V1, with mixed color channels, is more like the representation uncovered by an independent component analysis, especially in light of the high similarity between independent components and V1 (referring to Van Hateren & Ruderman 1998). Regardless of the possible similarity to neural receptive fields, independent components suggest that edges may have a specific color signature, mixing information across several color channels at once.

Work unrelated to natural scene statistics has shown that human color vision is notoriously difficult to characterize. Researchers are unable to uncover a single color space amenable to the psychophysical phenomena related to color perception. Instead, researchers have uncovered several mysterious phenomena that defy simple explanation. Long, Yang, & Purves (2006)[26] tried to explain these mysteries by demonstrating that the 3-dimensional frequency distribution of trichromatic values in natural images, when mapped onto any color space, is highly irregular (a hyper-surface with irregularly distributed hills and valleys). Histogram equalization is a way of warping this surface to a non-Euclidian space so that it becomes a hyper-plane. This warping seems to predict most of the mysterious phenomena (to extents that were not measured). It seems that the visual system somehow represents color in an efficient manner as the warped space would suggest. This demonstrates that the human visual system is built around the statistics of color in natural images.

Still binocular images

[Content needed.]

Moving monocular images (movies)

Van Hateren & Ruderman (1998)[27] ran an Independent Component Analysis of moving image patches and showed that when the goal is to encode movies instead of still images, the independent components are moving spatial gratings with properties that strongly and precisely mimic the properties of V1 receptive fields.

Moving binocular images

[Content needed.]

Properties of 2-dimensional images and depth

Potetz & Lee (2003)[28] demonstrated a relationship between depth and optical image properties by assembling a database of depth images simultaneously registered with grayscale optical images. Some of the scenes were natural while others contained manmade objects. The images were always captured with the horizontal axis parallel to the horizon. There was a -0.32 correlation between optical intensity and relative depth, but only for rural images. They performed ridge regression on 25 x 25 square image patches and predicted 21% of the variance in relative depth (i.e. log depth). This work demonstrated that there exists a relationship between properties of two-dimensional optical images and the depth at locations within those images.

A different approach was taken by a group who collected depth images that were not simultaneously registered with any optical images. These depth data could be used to justify several classic optical illusions. Three dimensional geometry could provide reasons why vertical lines are judged to be longer than horizontal lines of the same length (Howe & Purves 2002[29]), why the perception of angles is biased towards the right angle (Howe & Purves 2005a[30]), several illusions involving arranged circles (Howe & Purves 2004[31]), the Muller-Lyer illusion (Howe & Purves 2005b[32]), and the Pogendorff illusion (Howe et al. 2005[33]). These findings serve to demonstrate that several biases in the human visual system may be adaptive for making judgments about three-dimensional geometry from the information contained in two-dimensional images.

Monaural auditory scenes

[Content needed.]

Binaural auditory scenes

[Content needed.]

Notes

This article was originally created on Feb. 4, 2009 using excerpts from Almon Ing's PhD Dissertation (as a service to the community).

References

  1. ^ Geisler, W. S., & Diehl, R. L. (2003). A Bayesian approach to the evolution of perceptual and cognitive systems. Cognitive Science, 27, 379-402.
  2. ^ Simoncelli, E. P. and B. A. Olshausen (2001). Natural image statistics and neural representation. Annual Review of Neuroscience 24: 1193-1216.
  3. ^ Geisler, W.S. (2008) Visual perception and the statistical properties of natural scenes. Annual Review of Psychology, 59, 167-192.
  4. ^ Geisler, W.S., Perry, J.S. and Ing, A.D. (2008) Natural systems analysis. In: B. Rogowitz and T. Pappas (Eds.), Human Vision and Electronic Imaging. Proceedings SPIE, Vol 6806, 68060M
  5. ^ Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A 4, 2379-2394.
  6. ^ Ruderman, D. L., & Bialek, W. (1994). Statistics of Natural Images - Scaling in the Woods. Physical Review Letters, 73(6), 814-817.
  7. ^ Brady, N., & Field, D. J. (2000). Local contrast in natural images: normalisation and coding efficiency. Perception, 29, 1041-1055.
  8. ^ Albrecht, D. G., & Hamilton, D. H. (1982). Striate cortex of monkey and cat: Contrast response function. Journal of Neurophysiology, 48(1), 217-237.
  9. ^ Frazor, R.A., Geisler, W.S. (2006) Local luminance and contrast in natural images. Vision Research, 46, 1585-1598.
  10. ^ Mante et al. (2005) Independence of luminance and contrast in natural scenes and in the early visual system. Nature Neuroscience, 8 (12) 1690-1697.
  11. ^ Bell, A. J., & Sejnowski, T. J. (1997). The "independent components" of natural scenes are edge filters. Vision Research, 37, 3327-3338.
  12. ^ Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy by V1? Vision Research, 37(23), 3311-3325.
  13. ^ Sigman, M., Cecchi, G. A., Gilbert, C. D., & Magnasco, M. O. (2001). On a common circle: Natural scenes and Gestalt rules. PNAS, 98(4), 1935-1940.
  14. ^ Hoyer, PO and Hyvärinen, A. A multi-layer sparse coding network learns contour coding from natural images, Vis. Res., vol. 42, no. 12, pp. 1593-1605, 2002.
  15. ^ Geisler, W. S., Perry, J. S., Super, B. J., & Gallogly, D. P. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41, 711-724.
  16. ^ Elder JH, Goldberg RM. (2002) Ecological statistics for the Gestalt laws of perceptual organization of contours. J. Vis. 2:324–53.
  17. ^ Krinov, E. (1947). Spectral reflectance properties of natural formations (Technical translation No. TT-439). Ottawa: Nation Research Council of Canada.
  18. ^ Ruderman, D. L., Cronin, T. W., & Chiao, C. (1998). Statistics of cone responses to natural images: implications for visual coding. Journal of the Optical Society of America A, 15, 2036-2045.
  19. ^ Stockman, A., MacLeod, D. I. A., & Johnson, N. E. (1993). Spectral sensitivities of the human cones. J Opt Soc Am A Opt Image Sci Vis, 10, 1396-1402.
  20. ^ Lee TW, Wachtler, T, Sejnowski, TJ. (2002) Color opponency is an efficient representation of spectral properties in natural scenes. Vision Research 42:2095-2103.
  21. ^ Fine, I., MacLeod, D. I. A., & Boynton, G. M. (2003). Surface segmentation based on the luminance and color statistics of natural scenes. Journal of the Optical Society of America a-Optics Image Science and Vision, 20(7), 1283-1291.
  22. ^ Lewis A, Zhaoping L. (2006) Are cone sensitivities determined by natural color statistics? Journal of Vision. 6:285-302.
  23. ^ Lovell PG et al. (2005) Stability of the color-opponent signals under changes of illuminant in natural scenes. J. Opt. Soc. Am. A 22:10.
  24. ^ Endler, J.A. 1993. The color of light in forests and its implications. Ecological Monographs 63:1-27.
  25. ^ Wachtler T, Lee TW, Sejnowski TJ (2001) Chromatic structure of natural scenes. J. Opt. Soc. Am. A 18(1):65-77.
  26. ^ Long F, Yang Z, Purves D. Spectral statistics in natural scenes predict hue, saturation, and brightness. PNAS 103(15):6013-6018.
  27. ^ Van Hateren, J. H., & Ruderman, D. L. (1998). Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proceedings of the Royal Society of London B, 265, 2315-2320.
  28. ^ Potetz, B., & Lee, T. S. (2003). Statistical correlations between two-dimensional images and three-dimensional structures in natural scenes. Journal of the Optical Society of America a-Optics Image Science and Vision, 20(7), 1292-1303.
  29. ^ Howe, C. Q., & Purves, D. (2002). Range image statistics can explain the anomalous perception of length. Proceedings of the National Academy of Sciences of the United States of America, 99(20), 13184-13188.
  30. ^ Howe, C. Q., & Purves, D. (2005a). Natural-scene geometry predicts the perception of angles and line orientation. Proceedings of the National Academy of Sciences of the United States of America, 102(4), 1228-1233.
  31. ^ Howe, C. Q., & Purves, D. (2004). Size contrast and assimilation explained by the statistics of natural scene geometry. Journal of Cognitive Neuroscience, 16(1), 90-102.
  32. ^ Howe, C. Q., & Purves, D. (2005b). The Muller-Lyer illusion explained by the statistics of image-source relationships. Proceedings of the National Academy of Sciences of the United States of America, 102(4), 1234-1239.
  33. ^ Howe, C. Q., Yang, Z. Y., & Purves, D. (2005). The Poggendorff illusion explained by natural scene geometry. Proceedings of the National Academy of Sciences of the United States of America, 102(21), 7707-7712.