From Wikipedia, the free encyclopedia
Jump to: navigation, search

A spatial-temporal pattern, is a type of recurring events or objects in terms of topological,geographic or geometric properties of entities that appear in a sequence of data ordered by time. The term spatial pattern in this context may refer to points, regions, line segments or curves and any other interesting space related patterns that can be observed from the data. Spatial temporal patterns are often used in computer vision for automatic categorization and localization of human actions in video sequences.


Spatial-temporal patterns are normally used to solve space-time problems. The space-time problem can be formulized as following: suppose we observed spatial data at each of m time points, i.e., { }. Here are the data location at time , and are the times of observation. The generic space-time problem is to use the data to predict where and ( is the spatial domain and the time domain), typically [1]. In other words, we are interested in how to predict or recognize the appearance of certain important points in the future by studying the spatial-temporal patterns in the given data.

Spatial-temporal patterns are often used in computer vision as features to represent different categories of actions or events in order to recognize certain actions or events that we are interested from the video sequences. Analysis on spatial-temporal patterns also has diverse applications in other fields: ecologists use it to interpret and predict landscape changes [2] and natural disasters; physicists use it to study the placement of galaxies in the cosmos.

Spatial-temporal interest points[edit]

Spatial-temporal patterns are normally represented by a collection of spatial-temporal interest points [3], which are sets of points that distinguish one class from others in the space and time domain. Different techniques are proposed to represent spatial-temporal interest points such as spatial-temporal corners, periodic spatial-temporal features, volumetric features and spatial-temporal regions of high entropy.

  • Spatial-temporal corners

Spatial-temporal corners were first proposed by Ivan Laptev and Tony Lindeberg . They detect interests point that are simultaneous maxima of the spatial-temporal corner function that they proposed as well as extrema of the normalized spatial-temporalLaplace operator. Thus, they can detect interest points for a set of sparsely distributed scale values and then track these points in both space and time domain [4]. ( For more details on the corner function and the Laplace operator, please refer to the paper.)

Their work provides an automatic scale selection for the spatial-temporal interest points, but the spatial-temporal corners can be quite rare and thus are too sparse for many types of motions.

  • Periodic spatial-temporal features

This representation of spatial-temporal interest points is also proposed by Ivan Laptev [5]. Compared with spatial-temporal corners, periodic spatial-temporal features, on the contrary, have a rich set of features to represent each action class, but do not provide an automatic scale selection[6].

  • Volumetric features

Volumetric features are studied for action event classifications by Ke et al. Their initial experiments using pixel intensity performed poorly, mainly because the changes in appearance of the actor, the background, and lighting conditions influence the intensity of the pixels. Then they decided to compute the volumetric features on the video’s optical flow. They separate the optical flow into its horizontal and vertical components and compute volumetric features on each component[7].

Volumetric features provide dense features at many locations and scales, but need to process a video pyramid in order to achieve spatial scale invariance.

  • Spatial-temporal region of high entropy

Spatial-temporal region of high entropy is a collection of spatiotemporal events that are localized at points that are salient in both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. Oikonomopoulos et al. introduced a distance metric between two collections of spatiotemporal salient points, which is based on the chamfer distance and an iterative linear time-warping technique, in order to find the salient region[8].

This approach to select spatial-temporal interest points provides automatic scale selection, but examples suggest that high entropy regions are rare[9].

Spatial-temporal analysis[edit]

Many techniques, especially those in machine learning, have been developed for spatial-temporal analysis. These algorithms can be categorized as supervised learning algorithms, which include Support Vector Machine [10] , Time Delay Neural Network[11],and Hidden Markov Model and unsupervised learning algorithms, which include k-means and Gaussian Mixture Model.

  • Supervised learning algorithms

Supervised learning algorithms for spatial-temporal analysis often involve the use of a set of labeled training data in order to train the model to recognize human actions. In this approach, we feed the spatial-temporal features that are extracted from the video as input and the labels of the videos as output to the supervised learning algorithms such as Support Vector Machine or Time Delay Neural Network in order to train the parameters of the model. After training the model, we can feed features detected from the new video in order to recognize human actions.

  • Unsupervised learning algorithms

Unlike supervised learning algorithms, unsupervised learning algorithms do not require labeled data for training. Unsupervised learning algorithms learn clusters from unlabeled data and automatically classify different actions for us using techniques such as expectation–maximization algorithm. After training and building a probabilistic model, actions from the new video can be recognized by the unsupervised learning algorithms [12].

Application in Computer Vision[edit]

Spatial temporal patterns are often used in computer vision for automatic categorization and localization[13] of human actions in video sequences. More specifically, the task is to recognize and locate actions from new video sequences by learning models from a collection of videos and the task can be easily extended to build a variety of applications.

  • Detect relevant events in surveillance video

For example, in public train station with busy people walking around, we can build vision systems that can automatically detect dangerous actions such as theft.

  • Retrieve video from large databases

We can build Content-based image retrieval systems that allow us to search for videos based on the content of the video rather than tags or keywords.


  1. ^ Montserrat Fuentes's lecture notes
  2. ^ Monica G. Turner, “Spatial and temporal analysis of landscape patterns”, Landscape Ecology, 4 (1990), pp. 21–30
  3. ^ Ivan Laptev and Tony Lindeberg, "Local Descriptors for Spatio-Temporal Recognition", European Conference on Computer Vision, 2004
  4. ^ Ivan Laptev, Tony Lindeberg "Space Time Interest Points", International Conference on Computer Vision, 2003
  5. ^ Ivan Laptev, "On Space--Time Interest Points", International Journal of Computer Vision, vol 64, number 2/3,2005
  6. ^ Sinisa Todorovic's lecture notes
  7. ^ Y. Ke, R. Sukthankar, and M. Hebert "Efficient Visual Event Detection using Volumetric Feaatures",International Conference on Computer Vision, 2005.
  8. ^ Antonios Oikonomopoulos, Ioannis Patras, Maja Pantic "Human action recognition with spatiotemporal salient point", IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. June, 2006
  9. ^ Sinisa Todorovic's lecture notes
  10. ^ Christian Schuldt, Ivan Laptev and Barbara Caputo, "Recognizing Human Actions: A Local SVM Approach" In Proc. ICPR2004, Cambridge, UK.
  11. ^ M.-H. Yang and N. Ahuja, "Recognizing Hand Gestures Using Motion Trajectories", In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 99), Fort Collins, June, 1999.
  12. ^ J.C. Niebles, H. Wang and L. Fei-Fei. “Unsupervised learning of human action categories using spatial-temporal words”. International Journal of Computer Vision. 79(3): 299-318. 2008
  13. ^ Piotr Dollár, Vincent Rabaud, Garrison Cottrell and Serge Belongie "Behavior Recognition via Sparse Spatio-Temporal Features",ICCV VS-PETS 2005, Beijing, China.


  • Gelfand, Alan; Peter Diggle , Peter Guttorp , Montserrat Fuentes (2010). Handbook of Spatial Statistics.  Cite uses deprecated parameter |coauthors= (help); Unknown parameter |book= ignored (help)

Category:Image processing Category:Artificial intelligence