Mean shift

From Wikipedia, the free encyclopedia
  (Redirected from Mean-shift)
Jump to: navigation, search

Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm.[1] Application domains include cluster analysis in computer vision and image processing.[2]


The mean shift procedure was originally presented in 1975 by Fukunaga and Hostetler.[3]


Mean shift is a procedure for locating the maxima of a density function given discrete data sampled from that function.[1] It is useful for detecting the modes of this density.[1] This is an iterative method, and we start with an initial estimate  x . Let a kernel function  K(x_i - x) be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a Gaussian kernel on the distance to the current estimate is used,  K(x_i - x) = e^{-c||x_i - x||^2} . The weighted mean of the density in the window determined by  K

 m(x) = \frac{ \sum_{x_i \in N(x)} K(x_i - x) x_i } {\sum_{x_i \in N(x)} K(x_i - x)}

where  N(x) is the neighborhood of  x , a set of points for which  K(x) \neq 0 .

The difference m(x) - x is called mean shift in Fukunaga and Hostetler.[3] The mean-shift algorithm now sets  x \leftarrow m(x) , and repeats the estimation until  m(x) converges.


Let data be a finite set S embedded in the n-dimensional Euclidean space, X. Let K be a flat kernel that is the characteristic function of the \lambda-ball in X,

K(x) =
1 & \text{if}\ \|x\| \le \lambda\\
0 & \text{if}\ \|x\| > \lambda\\

In each iteration of the algorithm, s \leftarrow m(s) is performed for all s \in S simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h,

f(x) = \sum_{i}K(x - x_i) = \sum_{i}k \left(\frac{\|x - x_i\|^2}{h^2}\right)

where x_i are the input samples and k(r) is the kernel function (or Parzen window). h is the only parameter in the algorithm and is called the bandwidth. This approach is known as kernel density estimation or the Parzen window technique. Once we have computed f(x) from equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this "brute force" approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f(x) over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as multiple restart gradient descent. Starting at some guess for a local maximum, y_k, which can be a random input data point x_1, mean shift computes the gradient of the density estimate f(x) at y_k and takes an uphill step in that direction.

Types of kernels[edit]

Kernel definition: Let X be the n-dimensional Euclidean space,  R^n . Denote the ith component of x by  x_i . The norm of x is a non-negative number.  \|x\|^2=x^Tx A function K:  X\leftarrow R is said to be a kernel if there exists a profile,  k: [0, \infty]\rightarrow R , such that

K(x) = k(\|x\|^2)

  • k is non-negative.
  • k is nonincreasing:  k(a)\ge k(b) if  a < b .
  • k is piecewise continuous and  \int_0^\infty k(r)\,dr < \infty\

The two frequently used kernel profiles for mean shift are:

Flat kernel

k(x) =
1 & \text{if}\ x \le \lambda\\
0 & \text{if}\ x > \lambda\\

Gaussian kernel

k(x) = e^{-\frac{\|x\|^2}{2 \sigma^2}},

where the standard deviation parameter \sigma works as the bandwidth parameter,  h .



Consider a set of points in two-dimensional space. Assume a circular window centered at C and having radius r as the kernel. Mean shift is a hill climbing algorithm which involves shifting this kernel iteratively to a higher density region until convergence. Every shift is defined by a mean shift vector. The mean shift vector always points toward the direction of the maximum increase in the density. At every iteration the kernel is shifted to the centroid or the mean of the points within it. The method of calculating this mean depends on the choice of the kernel. In this case if a Gaussian kernel is chosen instead of a flat kernel, then every point will first be assigned a weight which will decay exponentially as the distance from the kernel's center increases. At convergence, there will be no direction at which a shift can accommodate more points inside the kernel.


The mean shift algorithm can be used for visual tracking. The simplest such algorithm would create a confidence map in the new image based on the color histogram of the object in the previous image, and use mean shift to find the peak of a confidence map near the object's old position. The confidence map is a probability density function on the new image, assigning each pixel of the new image a probability, which is the probability of the pixel color occurring in the object in the previous image. A few algorithms, such as ensemble tracking,[4] CAMshift,[5] expand on this idea.


Let x_i and z_i, i = 1,...,n, be the d-dimensional input and filtered image pixels in the joint spatial-range domain. For each pixel,

  • Initialize j = 1 and y_{i,1} = x_i
  • Compute y_{i,j+1} according to m(\cdot) until convergence, y = y_{i,c}.
  • Assign z_i =(x_i^s,y_{i,c}^r). The superscripts s and r denote the spatial and range components of a vector, respectively. The assignment specifies that the filtered data at the spatial location axis will have the range component of the point of convergence y_{i,c}^r.


  1. Mean shift is an application-independent tool suitable for real data analysis.
  2. Does not assume any predefined shape on data clusters.
  3. It is capable of handling arbitrary feature spaces.
  4. The procedure relies on choice of a single parameter: bandwidth.
  5. The bandwidth/window size 'h' has a physical meaning, unlike k-means.


  1. The selection of a window size is not trivial.
  2. Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes.
  3. Often requires using adaptive window size.

Mean shift and k-means clustering[edit]

The mean shift clustering algorithm has two main drawbacks. First, the algorithm is calculation intensive; it requires in general O(kN^2) operations, where N is the number of data points and k is the number of average iteration steps for each data point. Second, the mean shift algorithm relies on sufficient high data density with clear gradient to locate the cluster centers. In particular, the mean shift algorithm often fails to find appropriate clusters for so called data outliers, or those data points located between natural clusters.

The k-means algorithm does not have the above two problems. The k-means algorithm normally requires only O(kN) operations, so that the k-means algorithm can be applied to relatively large dataset. However, k-means has two significant limitations. First, the k-means algorithm requires that the number of clusters to be pre-determined. In practise, it is often difficult to specify a priori an appropriate cluster number, resulting in some natural clusters being represented by multiple clusters found by the k-means algorithm. Second, the k-means algorithm is, in general, incapable of identifying non-convex clusters. The second limitation makes the k-means algorithm inadequate for complex non-linear data. These problems can be overcome, by simply combining the two algorithms mean shift and k-means together.[6]

See also[edit]


  1. ^ a b c Cheng, Yizong (August 1995). "Mean Shift, Mode Seeking, and Clustering". IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE) 17 (8): 790–799. doi:10.1109/34.400568. 
  2. ^ Comaniciu, Dorin; Peter Meer (May 2002). "Mean Shift: A Robust Approach Toward Feature Space Analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE) 24 (5): 603–619. doi:10.1109/34.1000236. 
  3. ^ a b Fukunaga, Keinosuke; Larry D. Hostetler (January 1975). "The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition". IEEE Transactions on Information Theory (IEEE) 21 (1): 32–40. doi:10.1109/TIT.1975.1055330. Retrieved 2008-02-29. 
  4. ^ Avidan, Shai (2005). "Ensemble Tracking". 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (San Diego, California: IEEE) 2. ISBN 0-7695-2372-2. 
  5. ^ Emami, Ebrahim (2013). "Online failure detection and correction for CAMShift tracking algorithm". 2013 Iranian Conference on Machine Vision and Image Processing (MVIP) (IEEE) 2: 180–183. 
  6. ^ Li, James (2012). "Visualizing High Dimensional Data". 

External links[edit]

Code implementations[edit]

  • Scikit-learn library Numpy/Python implementation uses ball tree for efficient neighboring points lookup
  • EDISON library. C++ implementation of mean-shift-based image segmentation. There is also a Matlab interface for EDISON.
  • OpenCV contains mean-shift implementation via cvMeanShift Method
  • Aiphial. Java-based mean-shift implementation for numeric data clustering and image segmentation
  • Apache Mahout. An map-reduce based implementation of MeanShift clustering written on Apache Hadoop.
  • CAMSHIFT project. A MATLAB implementation of CAMSHIFT algorithm.
  • OTB MeanShift. A C++ implementation using the Orfeo Toolbox.
  • ImageJ Plug-in. Image filtering using the mean shift filter.
  • Mean-shift google code. A simple implementation of mean-shift as image filtering tool.

Short lessons[edit]

  • A lesson from Prof. M. Shah on this topic