Jump to content

Structure tensor

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 143.106.24.25 (talk) at 21:34, 20 August 2010 (→‎The multi-scale structure tensor/second-moment matrix: Moving the multiscale discussion to a separate section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In mathematics, the structure tensor, also referred to as the second-moment matrix, is a matrix derived from the gradient of a function. The structure tensor is often used in image processing and computer vision.[1][2][3].

The 2D structure tensor

Continuous version

For a function of two variables p=(x,y), the structure tensor is the 2×2 matrix

where and are the partial derivatives of with respect to x and y; the integrals range over the plane ; and w is some fixed "window function", a distribution on two variables. Note that the matrix Sw is itself a function of p=(x,y).

The formula above can be written also as , where is the matrix-valued function defined by

If the gradient gradient of is viewed as a 1×2 (single-row) matrix, the matrix can be written as the matrix product , where denotes the 2×1 (single-column) transpose of the gradient. (Note however that the structure tensor cannot be factored in this way.)

Discrete version

In image processing and other similar applications, the function is usually given as a discrete array of samples , where p is a pair of integer indices. The 2D structure tensor at a given pixel is usually taken to be the discrete sum

Here the summation index r ranges over a finite set of index pairs (the "window", typically for some m), and w[r] is a fixed "window weight" that depends on r, such that the sum of all weights is 1. The values are the partial derivatives sampled at pixel p; which, for instance, may be estimated from by by finite difference formulas.

The formula of the structure tensor can be written also as , where is the matrix-valued array such that

Interpretation

The importance of the 2D structure tensor stems from the fact that its eigenvalues (which can be ordered so that ) and the corresponding eigenvectors summarize the distribution of the gradient of within the window centered at .[1][2][3]

Namely, if , then (or ) is the direction that is maximally aligned with the gradient within the window. In particular, if then the gradient is always a multiple of (positive, negative or zero); this is the case if and only if within the window varies along the direction but is constant along .

If , on the other hand, the gradient in has no predominant direction; which happens, for instance, when the image has rotational symmetry within that window. In particular, if and only if the function is constant () within .

More generally, the value of , for k=1 or k=2, is the weighted average of the square of the directional derivative of along within . The relative discrepancy between the two eigenvalues of is an indicator of the degree of anisotropy of the gradient in the window, namely how strongly is its biased towards a particular direction (and its opposite).[4][5] This attribute can be quantified by the coherence, defined as

if . This quantity is 1 when the gradient is totally aligned, and 0 when it has no preferred direction. The formula is undefined, even in the limit, when the image is constant in the window (); but some authors define it as 0 in that case.

Note that the average of the gradient inside the window is not a good indicator of anisotropy. Aligned but oppositely oriented gradients would cancel out in this average, whereas in the structure tensor they are properly added together.[6]

By increasing the window size and spreading out the weights, one can make the structure tensor more robust in the face of noise, at the cost of diminished spatial resolution.[5], which reflects the abovementioned scale-space property of directional data as obtained from the multi-scale structure tensor [7].

The 3D structure tensor

Definition

The structure tensor can be defined also for a function of three variables p=(x,y,z) in an entirely analogous way. Namely, in the continuous version we have , where

where are the three partial derivatives of , and the integral ranges over . In the discrete version,, where

and the sum ranges over a three-dimensional window , usually for some m.

Interpretation

As in the two-dimensional case, the eigenvalues of , and the corresponding eigenvectors , summarize the distribution of gradient directions within the -neighborhood of p. This information can be visualized as an ellipsoid whose semi-axes are equal to the eigenvalues and directed along their corresponding eigenvectors.[8]

Ellipsoidal representation of the 3D structure tensor.

In particular, if the ellipsoid is stretched along one axis only, like a cigar (that is, if is much larger than both and ), it means that the gradient in the window is predominantly aligned with the direction , so that the isosurfaces of tend to be flat and perpendicular to that vector. This situation occurs, for instance, when p lies on a thin plate-like feature, or on the smooth boundary between two regions with contrasting values.

The structure tensor ellipsoid of a surface-like neighborhood ("surfel"), where .
A 3D window straddling a smooth boundary surface between two uniform regions of a 3D image.
The corresponding structure tensor ellipsoid.

If the ellipsoid is flattened in one direction only, like a pancake (that is, if is much smaller than both and ), it means that the gradient directions are spread out but perendicular to ; so that the isosurfaces tend to be like tubes parallel to that vector. This situation occurs, for instance, when p lies on a thin plate-like feature, or on a sharp corner of the boundary between two regions with contrasting values.

The structure tensor of a line-like neighborhood ("curvel"), where .
A 3D window straddling a line-like feature of a 3D image.
The corresponding structure tensor ellipsoid.

Finally, if the ellipsoid is roughly spherical (that is, if ), it means that the gradient directions in the window are more or less evenly distributed, with no marked preference; so that the function is mostly isotropic in that neighborhood. This happens, for instance, when the function has spherical symmetry in the neighborhood of p. In particular, if the ellipsoid degenerates to a point (that is, if the three eigenvalues are zero), it means that is constant (has zero gradient) within the window.

The structure tensor in an isotropic neighborhood, where .
A 3D window containing a spherical feature of a 3D image.
The corresponding structure tensor ellipsoid.

Applications

The eigenvalues of the structure tensor play a significant role in many image processing algorithms, for problems like corner detection, interest point detection, and feature tracking.[9][10][11][12][13][8] The structure tensor also plays a central role in the Lucas-Kanade optical flow algorithm, and in its extensions to estimate affine shape adaptation[14]; where the magnitude of is an indicator of the reliability of the computed result. The tensor has also been used for scale-space analysis,[7], estimation of local surface orientation from monocular or binocular cues[15], non-linear fingerprint enhancement,[16] diffusion-based image processing,[17][18][19][20] and several other image processing problems.

Processing spatio-temporal video data with the structure tensor

The three-dimensional structure tensor has been used to analyze three-dimensional video data (viewed as a function of x, y, and time)[21]. If one in this context aims at image descriptors that are invariant under Galilean transformations to make it possible to compare image measurements that have been obtained under variations of a priori unknown image velocities

,

it is, however, from a computational viewpoint more preferable to parameterize the components in the structure tensor/second-moment matrix using the notion of Galilean diagonalization[22]

where denotes a Galilean transformation of space-time and a two-dimensional rotation over the spatial domain, compared to the abovementioned use of eigenvalues that correspond to an eigenvalue decomposition and a (non-physical) three-dimensional rotation of space-time

.

To obtain true Galilean invariance, however, also the shape of the spatio-temporal window function needs to be adapted[22][23], corresponding to the transfer of affine shape adaptation[14] from spatial to spatio-temporal image data. In combination with local spatio-temporal histogram descriptors,[24] these concepts together allow for Galilean invariant recognition of spatio-temporal events[25].

References

  1. ^ a b J. Bigun and G. Granlund (1986), Optimal Orientation Detection of Linear Symmetry. Tech. Report LiTH-ISY-I-0828, Computer Vision Laboratory, Linkoping University, Sweden 1986; Thesis Report, Linkoping studies in science and technology No. 85, 1986.
  2. ^ a b J. Bigun and G. Granlund (1987.). "Optimal Orientation Detection of Linear Symmetry". First int. Conf. on Computer Vision, ICCV, (London). Piscataway: IEEE Computer Society Press, Piscataway. pp. 433–438. {{cite conference}}: Check date values in: |year= (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: year (link)
  3. ^ a b H. Knutsson (1989). "Representing local structure using tensors". Proceedings 6th Scandinavian Conf. on Image Analysis. Oulu: Oulu University. pp. 244–251. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); line feed character in |booktitle= at position 35 (help)
  4. ^ B. Jahne (1993). Spatio-Temporal Image Processing: Theory and Scientific Applications. Vol. 751. Berlin: Springer-Verlag.
  5. ^ a b G. Medioni, M. Lee and C. Tang (2000). A Computational Framework for Feature Extraction and Segmentation. Elsevier Science. {{cite book}}: Unknown parameter |month= ignored (help)
  6. ^ T. Brox, J. Weickert, B. Burgeth and P. Mrazek (2004). "Nonlinear Structure Tensors" (113): 1–32. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |booktitle= ignored (help)CS1 maint: multiple names: authors list (link)
  7. ^ a b Cite error: The named reference lin94book was invoked but never defined (see the help page).
  8. ^ a b M. Nicolescu and G. Medioni (2003). "Motion Segmentation with Accurate Boundaries — A Tensor Voting Approach". Proc. IEEE Computer Vision and Pattern Recognition. Vol. 1. pp. 382–389. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
  9. ^ W. Förstner (1986). "A Feature Based Correspondence Algorithm for Image Processing". 26: 150–166. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |booktitle= ignored (help)
  10. ^ C. Harris and M. Stephens (1988). "A Combined Corner and Edge Detector". Proc. of the 4th ALVEY Vision Conference. pp. 147–151. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
  11. ^ K. Rohr (1997). "On 3D Differential Operators for Detecting Point Landmarks". 15 (3): 219–233. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |booktitle= ignored (help)
  12. ^ B. Triggs (2004). "Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes". Proc. European Conference on Computer Vision. Vol. 4. pp. 100–113. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
  13. ^ C. Kenney, M. Zuliani and B. Manjunath, (2005). "An Axiomatic Approach to Corner Detection". Proc. IEEE Computer Vision and Pattern Recognition. pp. 191–197. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)
  14. ^ a b T. Lindeberg and J. Garding (1997). "Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure". Image and Vision Computing. 15: pp 415--434. doi:10.1016/S0262-8856(97)01144-X. {{cite journal}}: |pages= has extra text (help)
  15. ^ J. Garding and T. Lindeberg (1996). "Direct computation of shape cues using scale-adapted spatial derivative operators, International Journal of Computer Vision, volume 17, issue 2, pages 163--191.
  16. ^ A. Almansa and T. Lindeberg (2000), Enhancement of fingerprint images using shape-adaptated scale-space operators. IEEE Transactions on Image Processing, volume 9, number 12, pages 2027-2042.
  17. ^ J. Weickert (1998), Anisotropic diffusion in image processing, Teuber Verlag, Stuttgart.
  18. ^ D. Tschumperle and Deriche (2002). "Diffusion PDE's on Vector-Valued Images": 16–25. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |booktitle= ignored (help); Unknown parameter |month= ignored (help)
  19. ^ S. Arseneau and J. Cooperstock (2006). "An Asymmetrical Diffusion Framework for Junction Analysis". British Machine Vision Conference. Vol. 2. pp. 689–698. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)
  20. ^ S. Arseneau, and J. Cooperstock (2006). "An Improved Representation of Junctions through Asymmetric Tensor Diffusion". International Symposium on Visual Computing. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)
  21. ^ B. Jahne (1993). Spatio-Temporal Image Processing: Theory and Scientific Applications. Vol. 751. Berlin: Springer-Verlag.
  22. ^ a b T. Lindeberg, A. Akbarzadeh, and I. Laptev (2004). "Galilean-corrected spatio-temporal interest operators" (PDF). International Conference on Pattern Recogntion ICPR'04. Vol. I. pp. 57–62. doi:10.1109/ICPR.2004.1334004. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  23. ^ I. Laptev, and T. Lindeberg (2004). "Velocity adaptation of space-time interest points" (PDF). International Conference on Pattern Recogntion ICPR'04. Vol. I. pp. 52–56. doi:10.1109/ICPR.2004.971. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)
  24. ^ I. Laptev, and T. Lindeberg (2004). "Local descriptors for spatio-temporal recognition" (PDF). ECCV'04 Workshop on Spatial Coherence for Visual Motion Analysis (Prague, Czech Republic) Springer Lecture Notes in Computer Science. Vol. 3667. pp. 91-103. doi:10.1007/11676959. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)
  25. ^ I. Laptev, B. Caputo, C. Schuldt, and T. Lindeberg (2007). "Local velocity-adapted motion events for spatio-temporal recognition". Computer Vision and Image Understanding. Vol. 108. pp. 207–229. doi:10.1016/j.cviu.2006.11.023. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

See also

Resources