Jump to content

Scene statistics: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
mNo edit summary
Removed the detailed stuff (i.e. jargon). Removed jargon warning.
Line 1: Line 1:
{{Cleanup-jargon|date=March 2010}}

'''Scene statistics''' is a discipline within the field of [[perception]]. It is based on the premise that a [[perceptual system]] is designed to interpret [[scene_(perception)|scenes]].
'''Scene statistics''' is a discipline within the field of [[perception]]. It is based on the premise that a [[perceptual system]] is designed to interpret [[scene_(perception)|scenes]].


Line 17: Line 15:


The image above<ref>Geisler, W.S., Perry, J.S. and Ing, A.D. (2008) Natural systems analysis. In: B. Rogowitz and T. Pappas (Eds.), Human Vision and Electronic Imaging. Proceedings SPIE, Vol 6806, 68060M</ref> was generated from a database of segmented leaves that simultaneously registers natural images (scene information) with the exact locations of leaf boundaries (information about the physical environment). Such a database can be used to study across-domain statistics.
The image above<ref>Geisler, W.S., Perry, J.S. and Ing, A.D. (2008) Natural systems analysis. In: B. Rogowitz and T. Pappas (Eds.), Human Vision and Electronic Imaging. Proceedings SPIE, Vol 6806, 68060M</ref> was generated from a database of segmented leaves that simultaneously registers natural images (scene information) with the exact locations of leaf boundaries (information about the physical environment). Such a database can be used to study across-domain statistics.

==Still monocular images==
===Achromatic===

In 1987, Field <ref>Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A 4, 2379-2394.</ref> conducted a classic investigation of luminance contrast in natural images by showing that the amplitude of image intensity is inversely proportional to frequency in the Fourier domain. This suggested a type of redundancy in the way images tend to be coded. In simple terms, the result means that as the distance between any two pixels increases, the expected difference in pixel intensities also increases.

In 1994, Ruderman and Bialek <ref>Ruderman, D. L., & Bialek, W. (1994). Statistics of Natural Images - Scaling in the Woods. Physical Review Letters, 73(6), 814-817.</ref> later expanded on Field’s work by showing that the amplitude spectrum for a log-luminance image captured in a forested environment tends to be proportional to (1/''f'')<sup>0.91</sup> where ''f'' is the spatial frequency. For any single image, only small deviations from this rule are expected.

Brady & Field (2000)<ref>Brady, N., & Field, D. J. (2000). Local contrast in natural images: normalisation and coding efficiency. Perception, 29, 1041-1055.</ref> measured the distribution of contrasts in natural scenes and derived an optimal contrast response function based on a histogram equalization technique. The derived function matched the observed contrast response function of cortical cells studied by Albrecht & Hamilton (1982)<ref>Albrecht, D. G., & Hamilton, D. H. (1982). Striate cortex of monkey and cat: Contrast response function. Journal of Neurophysiology, 48(1), 217-237.</ref> demonstrating that neural processing reflects the statistics of natural images.

Frazor & Geisler (2006)<ref>Frazor, R.A., Geisler, W.S. (2006) Local luminance and contrast in natural images. Vision Research, 46, 1585-1598.</ref> also explored the range of local contrast information present in natural images. They showed that the dynamic range of luminance in natural images, including 95% of luminance samples, is 1.0 log unit (base 10), implying that luminance tends to vary by a factor of 10 in a given natural image. Similarly, the dynamic range of local RMS contrast is 1.5 log units, implying that local RMS contrast tends to vary by a factor of 30 in a given natural image. There was no correlation in the luminance of two image patches separated by the distances consistent with the typical lengths of human saccades. There was also no correlation in local contrast. In addition, local luminance and contrast are generally not correlated with each other within a given image (except that patches of sky stand out with high luminance and low contrast), implying that luminance and contrast are two independent sources of information. Mante et al. (2005)<ref>Mante et al. (2005) Independence of luminance and contrast in natural scenes and in the early visual system. Nature Neuroscience, 8 (12) 1690-1697.</ref> demonstrated that neural processing reflected these findings by demonstrating that gain control for the two sources of information are independent in the lateral geniculate nucleus (LGN).

Local patterns of contrast in natural images tend to be of a specific variety, and this has important implications for efficient coding of natural images. Using a technique similar to Blind Source Separation, Bell & Sejnowski (1997)<ref>Bell, A. J., & Sejnowski, T. J. (1997). The "independent components" of natural scenes are edge filters. Vision Research, 37, 3327-3338.</ref> revealed that the most informative components of natural scenes, which they called independent components, are similar to two-dimensional log-Gabor filters. Olshausen & Field (1997)<ref>Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy by V1? Vision Research, 37(23), 3311-3325.</ref> also achieved the same result using a similar technique. These independent components look like edges, so both groups argue that if the goal of the visual system is to encode contrast, then an efficient method for doing so would be to encode these edge-like patterns. The visual system may follow this principle because these independent components resemble the receptive fields in V1, but their spatial frequency is too high on average (but this may be a result of the whitening procedures used, the whitening method is unclear).

Real images, however, are not really composed of disjoint edge elements subtending small spatial scales. Instead they are composed of long and smooth contours. Sigman et al. (2001)<ref>Sigman, M., Cecchi, G. A., Gilbert, C. D., & Magnasco, M. O. (2001). On a common circle: Natural scenes and Gestalt rules. PNAS, 98(4), 1935-1940.</ref> demonstrated that edges tend to be mutually arranged as if they come from contours of constant curvature, also called cocircular contours. Similarly, Hoyer & Hyvarinen (2002)<ref>Hoyer, PO and Hyvärinen, A. A multi-layer sparse coding network learns contour coding from natural images, Vis. Res., vol. 42, no. 12, pp. 1593-1605, 2002.</ref> discovered that the independent components of edge activation ensembles correspond to extended contours and end-stop formations (since some contours tend to reach an end point). Both studies are important because they demonstrate that contours, like those found in natural environments, are a higher order source of variation in natural images than disjoint edge elements.

Geisler et al. (2001)<ref>Geisler, W. S., Perry, J. S., Super, B. J., & Gallogly, D. P. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41, 711-724.</ref> took this a step further by exploring the inferential power of edge ensembles with respect to predicting the path of contours in natural images. By analyzing the co-occurrence statistics of pairs of edges, not only did Geisler et al. confirm the observations of Sigman et al. about cocircular contours, but they precisely quantified the inferential power of the related across-domain statistics. Elder & Goldberg (2002)<ref>Elder JH, Goldberg RM. (2002) Ecological statistics for the Gestalt laws of perceptual organization of contours. J. Vis. 2:324–53.</ref> later confirmed the robustness of that result by using a slightly different method to arrive at a similar result. Geisler et al. also demonstrated that the statistics were important by showing how they accounted for approximately 80% of the variance of human performance in a contour completion task exploring a broad range of contour shapes. The work demonstrated that the human visual system is capable of behaving in accordance with the statistics of contour grouping in natural images, so it has led to a viable hypothesis describing the nature of algorithms in early areas of visual processing.

===Chromatic===
Krinov (1947)<ref>Krinov, E. (1947). Spectral reflectance properties of natural formations (Technical translation No. TT-439). Ottawa: Nation Research Council of Canada.</ref> gathered natural materials and demonstrated that the reflectance spectra of these materials tend to vary smoothly and regularly as a function of wavelength to the extent that only three principal components are required to capture most of this variation. Since irradiance spectra and reflectance spectra are so regular, radiance spectra are also very regular. The human visual system may be limited to three kinds of photoreceptors because an additional type of receptor would not add any more knowledge about radiances typically observed in the environment. This leads to the idea of trichromacy, that the color of something can be described by three numbers.

Ruderman, Cronin, & Chaio (1998)<ref>Ruderman, D. L., Cronin, T. W., & Chiao, C. (1998). Statistics of cone responses to natural images: implications for visual coding. Journal of the Optical Society of America A, 15, 2036-2045.</ref> studied trichromatic images captured in the natural forest environment. The basis for the trichromatic signal was to filter the image through the wavelength sensitivity functions of human L (long wavelength), M (medium wavelength), and S (short wavelength) cones (as measured by Stockman, MacLeod, and Johnson, 1993<ref>Stockman, A., MacLeod, D. I. A., & Johnson, N. E. (1993). Spectral sensitivities of the human cones. J Opt Soc Am A Opt Image Sci Vis, 10, 1396-1402.</ref>) to mimic the trichromatic signal received by the human visual system. They found that for a random image pixel, the expected log of cone responses was Gaussian and contained three principal components. The first principal component was denoted <math>l</math> because of its resemblance to the classic idea of luminance and was roughly proportional to <math>\log(L)+\log(M)+\log(S)</math>; the second principal component, <math>\alpha</math> , was roughly proportional to <math>\log(L)+\log(M)-2\log(S)</math> and resembled a yellow-blue opponent channel; and the third principal component, <math>\beta</math> , was roughly proportional to <math>\log(L)-\log(M)</math> and resembled a red-green opponent channel. Therefore, color variation in a forested environment is well described by three numbers: <math>l</math> , <math>\alpha</math>, and <math>\beta</math>; because the numbers are distributed as Gaussian and not correlated with each other.

Lee et al. (2002)<ref>Lee TW, Wachtler, T, Sejnowski, TJ. (2002) Color opponency is an efficient representation of spectral properties in natural scenes. Vision Research 42:2095-2103.</ref> were interested in explaining why the cone responses are highly correlated. One reason for this must be that the spectral sensitivity functions of the cones overlap, but another explanation might be that natural radiance functions tend to be smooth (as opposed to choppy). To address this question, Lee et al. invented cone sensitivity functions that did not overlap and then simulated the perception of natural radiances with those sensitivity functions. The correlational structure was still apparent in the simulated cone responses. This was important because it demonstrated that the smoothness of natural radiance functions was a major reason why this correlational structure exists.

The three-dimensional distribution of <math>l</math>, <math>\alpha</math>, and <math>\beta</math> values may be Gaussian in forested environments, but Fine, MacLeod, and Boynton (2003)<ref>Fine, I., MacLeod, D. I. A., & Boynton, G. M. (2003). Surface segmentation based on the luminance and color statistics of natural scenes. Journal of the Optical Society of America a-Optics Image Science and Vision, 20(7), 1283-1291.</ref> showed that for a random pair of image pixels, the expected difference in <math>l</math>, <math>\alpha</math>, and <math>\beta</math> values is leptokurtic. In other words, when moving around an image, the values of <math>l</math>, <math>\alpha</math>, and <math>\beta</math> either change by a small or large magnitude, with relatively few changes of medium sized magnitude (relative to Gaussian normality). In addition, there were slight correlations in the magnitude of change between <math>l</math>, <math>\alpha</math>, and <math>\beta</math>; suggesting that when one value changes, the other values are likely to change. This may be attributable to the fact that when both pixels lie on the same surface, the radiance is relatively constant, causing no change in the values; but when the pixels lie on different surfaces, the radiance changes drastically, causing a change in all the values.

Lewis and Zhaoping (2006)<ref>Lewis A, Zhaoping L. (2006) Are cone sensitivities determined by natural color statistics? Journal of Vision. 6:285-302.</ref> later demonstrated that the shape of the cone sensitivity functions is suboptimal for capturing the variance in natural radiance spectra. One possible explanation is that it was difficult for evolution to produce the required sensitivity functions, but a different explanation is that the sensitivity functions are optimized for a specific task that is important for primate survival. One possibility is that the shape helps in discriminating red surfaces (e.g. fruit) from green surfaces (e.g. leaves) given all the variation in the environment due to shading. Lovell et al. (2005)<ref>Lovell PG et al. (2005) Stability of the color-opponent signals under changes of illuminant in natural scenes. J. Opt. Soc. Am. A 22:10.</ref> studied this problem with respect to hyperspectral images containing red fruit and green backgrounds under various lighting conditions and concluded that the L (long wavelength) and M (medium wavelength) cones are nearly optimal for making this discrimination because their sensitivity functions are broadly tuned with adjacent peaks. The starling (a type of bird), on the other hand, has narrower sensitivity functions with a large gap between the peaks, and this proves to be a less optimal design for solving the task.

A third possible explanation for the shape of cone sensitivity functions may be that it helps to isolate the sources of image contrast due to shading. Endler (1993)<ref>Endler, J.A. 1993. The color of light in forests and its implications. Ecological Monographs 63:1-27.</ref> studied the phenomenon of shading by measuring the irradiant in several different forest environments, ranging from dense shade (no direct sunlight) to no shade (direct sunlight), from cloudy to sunny days, and through all times of day. The result was that the irradiant varied principally along one major axis, ranging from the color of blue sky at one extreme to green foliage at the other. Another axis of variation was introduced by orange, red, or purple sunsets; but that was considered relatively minor and therefore insignificant. Since the primary variation could be accounted for in the range of colors from blue to green, the relative activity of the S and M cones is more strongly affected by this variation while the relative activity between the M and L cones is more immune. Therefore, information about total changes in luminance (<math>l</math>) relative to changes in red-green levels (<math>\beta</math>) should be valuable for isolating the artifacts of shading in images.

Wachtler et al. (2001) studied the possibility that the spatial distribution of color in images might show regularity. They analyzed image patches, square arrangements of 49 pixels (7 x 7 arrangements), encoded with respect to the log of L, M, and S cone responses. A principal component analysis revealed eigenvectors that were segregated along the dimensions of <math>l</math>, <math>\alpha</math>, and <math>\beta</math>. Within these three groups, a spatial arrangement resembling checkerboard patterns was observed. The size of the checkers (from large to small) predicted the rank order of the proportion of variance accounted for (from large to small) by each eigenvector, exactly as one would expect from a principal component analysis of grayscale images (encoded with only one color channel). In other words, the principal components of color were completely separable from the principal components of spatial variation.

However, a different result emerges when an independent component analysis is run. Wachtler et al. (2001)<ref>Wachtler T, Lee TW, Sejnowski TJ (2001) Chromatic structure of natural scenes. J. Opt. Soc. Am. A 18(1):65-77.</ref> borrowed the technique of Bell & Sejnowski (1997), but used color image patches instead of grayscale. The independent components did not show segregation along three color channels as the principal components had. Therefore, unlike the principal components, the independent components of natural images reflect a mixed relationship between spatial variation and color variation. Wachtler suggests that the representation of color in LGN, with segregated <math>l</math>, <math>\alpha</math>, and <math>\beta</math> color channels, is more like the representation uncovered by a principal component analysis; but the representation in V1, with mixed color channels, is more like the representation uncovered by an independent component analysis, especially in light of the high similarity between independent components and V1 (referring to Van Hateren & Ruderman 1998). Regardless of the possible similarity to neural receptive fields, independent components suggest that edges may have a specific color signature, mixing information across several color channels at once.

Work unrelated to natural scene statistics has shown that human color vision is notoriously difficult to characterize. Researchers are unable to uncover a single color space amenable to the psychophysical phenomena related to color perception. Instead, researchers have uncovered several mysterious phenomena that defy simple explanation. Long, Yang, & Purves (2006)<ref>Long F, Yang Z, Purves D. Spectral statistics in natural scenes predict hue, saturation, and brightness. PNAS 103(15):6013-6018.</ref> tried to explain these mysteries by demonstrating that the 3-dimensional frequency distribution of trichromatic values in natural images, when mapped onto any color space, is highly irregular (a hyper-surface with irregularly distributed hills and valleys). Histogram equalization is a way of warping this surface to a non-Euclidian space so that it becomes a hyper-plane. This warping seems to predict most of the mysterious phenomena (to extents that were not measured). It seems that the visual system somehow represents color in an efficient manner as the warped space would suggest. This demonstrates that the human visual system is built around the statistics of color in natural images.

==Still binocular images==
[Content needed.]

==Moving monocular images (movies)==
Van Hateren & Ruderman (1998)<ref>Van Hateren, J. H., & Ruderman, D. L. (1998). Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proceedings of the Royal Society of London B, 265, 2315-2320.</ref> ran an Independent Component Analysis of moving image patches and showed that when the goal is to encode movies instead of still images, the independent components are moving spatial gratings with properties that strongly and precisely mimic the properties of V1 receptive fields.

==Moving binocular images==
[Content needed.]

==Properties of 2-dimensional images and depth==
Potetz & Lee (2003)<ref>Potetz, B., & Lee, T. S. (2003). Statistical correlations between two-dimensional images and three-dimensional structures in natural scenes. Journal of the Optical Society of America a-Optics Image Science and Vision, 20(7), 1292-1303.</ref> demonstrated a relationship between depth and optical image properties by assembling a database of depth images simultaneously registered with grayscale optical images. Some of the scenes were natural while others contained manmade objects. The images were always captured with the horizontal axis parallel to the horizon. There was a -0.32 correlation between optical intensity and relative depth, but only for rural images. They performed [[ridge regression]] on 25 x 25 square image patches and predicted 21% of the variance in relative depth (i.e. log depth). This work demonstrated that there exists a relationship between properties of two-dimensional optical images and the depth at locations within those images.

A different approach was taken by a group who collected depth images that were not simultaneously registered with any optical images. These depth data could be used to justify several classic optical illusions. Three dimensional geometry could provide reasons why vertical lines are judged to be longer than horizontal lines of the same length (Howe & Purves 2002<ref>Howe, C. Q., & Purves, D. (2002). Range image statistics can explain the anomalous perception of length. Proceedings of the National Academy of Sciences of the United States of America, 99(20), 13184-13188.</ref>), why the perception of angles is biased towards the right angle (Howe & Purves 2005a<ref>Howe, C. Q., & Purves, D. (2005a). Natural-scene geometry predicts the perception of angles and line orientation. Proceedings of the National Academy of Sciences of the United States of America, 102(4), 1228-1233.</ref>), several illusions involving arranged circles (Howe & Purves 2004<ref>Howe, C. Q., & Purves, D. (2004). Size contrast and assimilation explained by the statistics of natural scene geometry. Journal of Cognitive Neuroscience, 16(1), 90-102.</ref>), the Muller-Lyer illusion (Howe & Purves 2005b<ref>Howe, C. Q., & Purves, D. (2005b). The Muller-Lyer illusion explained by the statistics of image-source relationships. Proceedings of the National Academy of Sciences of the United States of America, 102(4), 1234-1239.</ref>), and the Pogendorff illusion (Howe et al. 2005<ref>Howe, C. Q., Yang, Z. Y., & Purves, D. (2005). The Poggendorff illusion explained by natural scene geometry. Proceedings of the National Academy of Sciences of the United States of America, 102(21), 7707-7712.</ref>). These findings serve to demonstrate that several biases in the human visual system may be adaptive for making judgments about three-dimensional geometry from the information contained in two-dimensional images.

==Monaural auditory scenes==
[Content needed.]

==Binaural auditory scenes==
[Content needed.]

==Notes==
This article was originally created on Feb. 4, 2009 using excerpts from Almon Ing's PhD Dissertation (as a service to the community).


== References ==
== References ==

Revision as of 17:25, 5 June 2010

Scene statistics is a discipline within the field of perception. It is based on the premise that a perceptual system is designed to interpret scenes.

Biological perceptual systems have evolved in response to physical properties of natural environments[1]. Therefore natural scenes receive a great deal of attention[2].

Natural scene statistics are useful for defining the behavior of an ideal observer in a naturalistic task, typically by incorporating signal detection theory, information theory, or estimation theory.

Within-domain vs. across-domain

Geisler (2008)[3] distinguishes between four kinds of domains: (1) Physical environments, (2) Images/Scenes, (3) Neural responses, and (4) Behavior.

Within the domain of images/scenes, one can study the characteristics of information related to redundancy and efficient coding.

Across-domain statistics determine how an autonomous system should make inferences about its environment, process information, and control its behavior. To study these statistics, it is necessary to sample or register information in multiple domains simultaneously.

The image above[4] was generated from a database of segmented leaves that simultaneously registers natural images (scene information) with the exact locations of leaf boundaries (information about the physical environment). Such a database can be used to study across-domain statistics.

References

  1. ^ Geisler, W. S., & Diehl, R. L. (2003). A Bayesian approach to the evolution of perceptual and cognitive systems. Cognitive Science, 27, 379-402.
  2. ^ Simoncelli, E. P. and B. A. Olshausen (2001). Natural image statistics and neural representation. Annual Review of Neuroscience 24: 1193-1216.
  3. ^ Geisler, W.S. (2008) Visual perception and the statistical properties of natural scenes. Annual Review of Psychology, 59, 167-192.
  4. ^ Geisler, W.S., Perry, J.S. and Ing, A.D. (2008) Natural systems analysis. In: B. Rogowitz and T. Pappas (Eds.), Human Vision and Electronic Imaging. Proceedings SPIE, Vol 6806, 68060M