From Wikipedia, the free encyclopedia
Jump to: navigation, search
For the urban renewal body, see Scottish Urban Regeneration Forum. For the university computing organisation in the Netherlands, see SURFnet.

SURF (Speeded Up Robust Features) is a robust local feature detector, first presented by Herbert Bay et al. ECCV 9th in International Conference on Computer Vision held in Austria in May 2006, that can be used in computer vision tasks like object recognition or 3D reconstruction. It is partly inspired by the SIFT descriptor. The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT. SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of integral images.

It uses an integer approximation to the determinant of Hessian blob detector, which can be computed extremely quickly with an integral image (3 integer operations). For features, it uses the sum of the Haar wavelet response around the point of interest. Again, these can be computed with the aid of the integral image. This information is treated to perform operations such as locate and recognition of certain objects, people or faces, make 3D scenes, object tracking and extraction of points of interest. This algorithm is part of that artificial intelligence, able to train a system to interpret images and determine the content.

An application of the algorithm is patented in the US.[1]


SURF is a detector and a high-performance descriptor points of interest in an image where the image is transformed into coordinates, using a technique called multi-resolution. Is to make a copy of the original image with Pyramidal Gaussian or Laplacian Pyramid shape and obtain image with the same size but with reduced bandwidth. Thus a special blurring effect on the original image, called Scale-Space is achieved. This technique ensures that the points of interest are scale invariant. The SURF algorithm is based on the SIFT predecessor.

Algorithm and Features[edit]


The SURF algorithm is based on the same principles and steps off SIFT, but it uses a different scheme and should provide better results: it works much faster. In order to detect characteristic points on a scale invariably SIFT approach it uses cascaded filters, where the difference Gaussian, DOG, is calculated on rescaled images progressively.

Integral Image[edit]

Similar to SDoG.

Instead of using Gaussian averaging the image, squares are used (approximation). Making the convolution of the image with a square is much faster if the integral image is used. The integral image is defined as:

S(x, y)=\sum_{i=0}^{i<=x}\sum_{j=0}^{j<=y} I(i,j)

The sum of the original image within a rectangle D image can be evaluated quickly using this integrated image. I(x, y) added over the selected area requires four evaluations S(x, y) (A, B, C, D)

Points of interest in the Hessian matrix[edit]

SURF uses a BLOB detector based on the Hessian to find points of interest. The determinant of the Hessian matrix expresses the extent of the response and is an expression of a local change around the area.

The detector is based on the Hessian matrix, due to its high accuracy. More precisely, BLOB structures are detected in places where the determining factor is the maximum. In contrast to the detector Hess - Laplace Mikolajczyk and Schmid, also is based on the determinant of the Hessian for selecting scale, as it is done by Lindeberg. Given a point x = (x, y) in an image I, the Hessian matrix H (x, σ) in x at scale σ , is defined as follows:

H(x,\sigma)=\begin{pmatrix} Lxx(x,\sigma) & Lxy(x,\sigma) \\ Lxy(x,\sigma) & Lyy(x,\sigma) \end{pmatrix}

where Lxx(x,\sigma) is the convolution of second order derivative \partial x/\partial x^2 g(\sigma)	with the image in the point x, y similarly with Lxy(x,\sigma) and Lyy(x,\sigma).

The Gaussian filters are optimal for scale space analysis, but in practice should be quantized and clipped. This leads to a loss of repeatability image rotations around the odd multiple of π / 4. This weakness is true for Hessian-based detectors in general. Repeatability of peaks around multiples of π / 2. This is due to the square shape of the filter. However, the detectors still work well, the discretization has a slight effect on performance. As real filters are not ideal, in any case, given the success of Lowe with logarithmic approximations, they push the approximation of the Hessian further matrix with square filters. These second order Gaussian filters' approximate can be evaluated with a cost very low with the use of integrated computer images. Therefore, the calculation time is independent of the filter size. Here are some approaches: Gyy and Gxy (1)

The box filters 9x9 are approximations of a Gaussian with σ = 1.2 and represents the lowest level ( higher spatial resolution ) for computerized maps BLOB response. Is denoted Dxx, Dyy, Dxy . The weights applied to the rectangular regions are maintained by the efficiency of the CPU.

Images are calculated :

-Dxx (x, y ) from I ( x, y) and Gxx ( x, y)

-Dxy (x, y ) from I ( x, y) and Gxy (x, y )

-Dyy (x, y ) from I ( x, y) and Gyy (x, y)

Then, the following image is generated:

Det(H aprox) = DxxDyy-(wDxy)^2

The relative weighting (w) of the filter response is used to balance the expression for the Hessian determinant . It is necessary for the conservation of energy between Gaussian kernels and Gaussian kernels approximate.


0.9 factor appears such a correction factor using squares instead of gaussians. it can generate several images det (H) for several filter sizes . This is called multi- resolution analysis.

||x||_F is the Frobenius norm: Frobenius

Changes of weighting depends on scale σ . In practice, this factor is kept constant . How it keep constant? Normalizing the filter response relative to its size. This ensures the Frobenius norm for any other filter.

The approximation of the determinant of the Hessian matrix representing the response BLOB image on location x . These responses are stored in the BLOB response map on different scales.

Then, the local maxima are searched.

Scale-space representation & location of points of interest[edit]

The attractions can be found in different scales, partly because the search for correspondences often requires comparison images where they are seen at different scales. The scale spaces are generally applied as a pyramid image . Images are repeatedly smoothed with a Gaussian filter, then, is sub sampled to achieve a higher level of the pyramid. Therefore, several floors or stairs " det H" with various measures of the masks are calculated:

\sigma approx = Current filter size * \left( \frac{\left(Base Filter scale\right) }{Base Filter Size} \right)

The scale -space is divided into a number of octaves, Where an octave refers to a series of response maps of covering a doubling of scale . In SURF The Lowest level of the Scale- space is Obtained from the output of the 9 × 9 filters.

Scale spaces are implemented by applying box filters of different size. Therefore, the scale space is analyzed by up-scaling the filter size rather than iteratively reducing the image size. The output of the above 9*9 filter is considered as the initial scale layer, to which we will refer as scale s=1.2 (corresponding to Gaussian derivatives withσ=1.2). The following layers are obtained by filtering the image with gradually bigger masks, taking into account the discrete nature of integral images and the specific structure of or filters. Specifically, this results in filters of size 9*9, 15*15, 21*21, 27*27, etc. In order to localize interest points in the image and over scales, non-maximum suppression in a 3*3*3 neighborhood is applied. The maxima of the determinant of the Hessian matrix are then interpolated in scale and image space with the method proposed by Brown et al. Scale space interpolation is especially important in our case, as the difference in scale between the first layers of every octave is relatively large.

After 3D maxima are looking at ( x, y, n) using the cube 3x3x3 neighborhood . From there it is proceed to do the interpolation of the maximum. Lowe rest of the layers of the pyramid to get the DOG (Difference of Gaussian ) find images contours and stains.

Specifically, it is entered by variant a quick and Van Gool Neubecker used . The maximum of the determinant of the Hessian matrix in scale and space interpolated image with Brown and Lowe proposed method. The approach of the determinant of the Hessian matrix represents the response of BLOB in the image to the location x . These responses are stored in the BLOB map of responses on different scales. Them have the principal feature of repetibility, that means if some point is considerated realiable, the detector will find the same point under different perspective (scale, orientation, rotation, etc.).

It has one position (x,y) for each interest point.


The goal of a descriptor is to provide a unique and robust description of an image feature, e.g. by describing the intensity distribution of the pixels within the neighborhood of the point of interest. Most descriptors are computed thus in a local manner; hence, a description is obtained for every point of interest identified previously.

The dimensionality of the descriptor has direct impact on both its computational complexity and point-matching robustness/accuracy. A short descriptor may be more robust against appearance variations, but may not offer sufficient discrimination and thus give too many false positives.

The next three introducing steps are explained in the following: The first step consists of fixing a reproducible orientation based on information from a circular region around the interest point. Then, we construct a square region aligned to the selected orientation and extract the SURF descriptor from it. Finally, features are matched between two images.

The SURF descriptor is based on the similar properties of SIFT, with a complexity stripped down even further. The first step consists of fixing a reproducible orientation based on information from a circular region around the interest point. The second step is constructing as square region aligned to the selected orientation, and extracting the SURF descriptor from it. These two steps are now explained in turn. Furthermore, we also propose an upright version of our descriptor (U-SURF) that is not invariant to image rotation and therefore faster to compute and better suited for application where the camera remains more or less horizontal.

Orientation Assignment[edit]

In order to achieve rotational invariance, the orientation of the point of interest needs to be found. The Haar wavelet responses in both x- and y-directions within a circular neighbourhood of radius 6s around the point of interest are computed, where s is the scale at which the point of interest was detected. The obtained responses are weighed by a Gaussian function centered at the point of interest, then plotted as points in a two-dimensional space, with the horizontal response in the abscissa and the vertical response in the ordenate. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window of size π/3. The horizontal and vertical responses within the window are summed. The two summed responses then yield a local orientation vector. The longest such vector overall defines the orientation of the point of interest. The size of the sliding window is a parameter which has to be chosen carefully to achieve a desired balance between robustness and angular resolution.

Descriptor based on Sum of Haar Wavelet Responses[edit]

For the extraction of the descriptor, the first step consist of constructing a square region centred around the interest 6point and oriented along the orientation selected in the previous section. The size of this window is 20s.

The interest region is split up into smaller 4x4 square sub-regions, and for each one, it is computed Haar wavelet responses at 5x5 regularly spaced sample points., and weighted with a Gaussian (to offer more robustness for deformations, noise and translations, This responses are dx and dy.

The select interest point orientation is who defines de horizontality and verticality. For each sub-region the resultts of dx, dy and their absolutes, are summed in a vector, the descriptor:

v=(\sum_{} dx, \sum_{} dy, \sum_{} |dx|, \sum_{} |dy| )

Which is distinctive and, at the same time, robust to noise, geometric and photometric distortions and errors. Surf descriptor obtained by the union of the sub- vectors .


This section details the step back in search of characteristic points that provides the detector. This way it is possible to compare between descriptors and look for matching pairs of images between them. There are two ways to do it:

  • Get the characteristic points of the first image and its descriptor and do the same with the second image . So you will be able to compare the two images descriptor correspondences between points and establish some kind of measure.
  • Get the characteristic points of the first image with the descriptor. Then compare this descriptor with the points of the second image which is believed to be the partner concerned.

Development of an algorithm using the SURF method[edit]

1 - HOME : Initialization of all global variables and necessary to run the program functions.

2 - SEARCH CAMERA : If a camera is connected, go to step : Capture sequence and store on your computer . If not, the searching step begins, if one receives a sequence of video stored on your computer.

3 - VIDEO SEARCH : Only enter if the algorithm did not find any video device connected to your computer . If a route stored video sequence is received, it is passed to the step of capturing sequence. If path is not received, it is closed.

4 - IMAGE CAPTURE : You can capture the image from the camera or stored video. Extracted for each iteration a picture . Program execution if no box ( pointer image is NULL) ends .

5 - THERE IS A PICTURE : Previous stage is reviewed, if the pointer does not mark any memory address, the execution is finished. If no image passes to the step of applying filter to soften the image. Gaussian filters, average, etc. can be used ( they are the clearest ) ...

6 - SHOW VIDEO : Displayed on screen for each loop iteration, the image obtained in the image capture stage .

7 - PAUSE VIDEO: The video can be stopped, and the last image obtained by the capture step is displayed on screen in a separate window that would go on to the next stage.

8 - SELECTION ROI ( Region Of Interest) : With the image in the window duplicated once made the pause, the image area or zone is selected. The selected area will appear in another window to be grasped as a base image that can analyze the rest of the video. This new image will be stored temporarily and parallel algorithm .

9 - SELECTION OF POINTS OF THE MASK : The mask is an image containing two colors: white and black . This image is taken as a basis for finding points of interest in the SURF algorithm in later steps.

10 - IMAGE CONVERSION OF GRAY SCALE : The image is converted to grayscale, because the algorithm needs of the first selected by the user and on the boundary area blank point selection mask.

11 - EXTRACTION POINT SURF : This step extracts the points of interest ( image descriptors ) of the first selected by the user and on the boundary area blank point selection mask.

12 - STORAGE OF IMAGE OF BASE: Temporarily saves images to be picked for future comparison with the rest of the video sequence . These are:

-Image Obtained from the ROI selection gray ladder .

-Black and white mask of stage point selection mask points.

13 - COMPARISON USING SURF TEMPLATES : The images above are not changed and compared with other video sequence . At this stage you can identify image descriptors and descriptors of the video stream, drawing a line between them. Switching between these descriptors, we determine wether the selected object will be determined is partially clogged or has been removed.

14 - OBSTRUCTION COUNTER : This stage is used in case the object has been partially or totally obstructed by a short period of time . It will have two temporary counters:

- Obstructor Counter : Timer or cycles that indicates whether the object has been overlapped or obstructed. While the object is overlapped the counter increases when it reaches the maximum, the end of program counter is activated.

- Counter End Program : Turns the program or show warning if it reaches a certain time or cycles where the object is not seen yet. Having peaked in the counter obstruction. If you have been a temporary obstruction, counters are reseted.

15 - WARNING OBJECT BLOCKED : The video stops if confirmed whether the object is missing or has been blocked for a considerable time . It also indicates the second when the algorithm detects . Spent stop showing the video image and finally ends the program.

16 - VALUES OF DESCRIPTORS : And taking the values taken in the extraction step SURF points, a condition is created to determine whether the object is present or not . Yes 4 descriptors, X <4, where X is the number of descriptors analyzed in each table are taken . Thus, it checks whether or not the object is .

17 - LOCATION PURPOSE . POINT LOCATION FLANN . Relationships between descriptors points of both the image and the video stream are and a line is drawn .

Location object using homography .

18 - Pass the video Tu

Surf vs Sift[edit]

Presentation of the algorithms[edit]

The recognition of images or objects, is one of the most important applications of computer vision, becomes a comparison of local descriptors SIFT (Scale Invariant Feature Transform) [David Lowe, 1999] and SURF (Speeded-UP Feature transform) [Bay and Tuytelaars 2006]. These two local descriptors to detect structures or very significant points in an image for a discriminating description of these areas from its neighboring points, in order to compare them with other descriptors using similar measures. Below are given some basic features of these two algorithms:

SIFT (Scale Invariant Feature Transform) was published by David Lower in 1999 with the idea of proposing an algorithm capable of extracting the characteristics of an image already splitting of these describe the set of objects that were contained in it. This algorithm can be broken into 4 stages:

-Detection of edges on the scale: the first stage performs a search in different scales and dimensions of the image by identifying possible points of interest, invariant to changes in orientation and scaling. This procedure is carried out with the function (DoG (Difference-of-Gaussian) giving different values to the σ, in the following equation):

D \left( x, y, \sigma \right) = L \left( x, y, k_i\sigma \right) - L \left( x, y, k_j\sigma \right),

Where G is the Gaussian function and the I is the image. Now Gaussian images are subtracted in order to produce the DoG, after that is submostrea the Gaussian image by a factor of 2 to obtain a DoG image sampling. Each pixel is set with its neighbors with a mask 3 x 3 to find maxim and minim local of D (x, y, σ).

-Location of the keypoints: key points are chosen on the basis of stability measures, removed the key points with low.

-Contrast or are located in the margins-orientation assignment: invariance to rotation is achieved by assigning to each of the points of orientation based on the local properties of the image and which represents the descriptor for this orientation.

-Descriptor keypoints: the local gradients of the image are measured in the region surrounding the key point. These are processed by means of a representation which allows distortion levels and changes in lighting locally.

Més informació del SIFT

SIFT more

SURF (Speeded-UP Feature Transform) as we have seen in previous sections, the SURF is an algorithm developed by Herbert Bay, Tinne Tuytelaars and Luc Van Gool (2008), works as a detector and descriptor of sites focused on object recognition. This algorithm has 3 stages:

-Detection of points of interest, keypoints.

-Assignment guidance.

-Extraction of those described.More information in the sections of features.

Theoretical differences[edit]

The main differences between the algorithms SIFT and SURF are the speed of implementation of respect for one another, but with both points of interest of the image keep invariants, scale and orientation, and changes in the lighting. The main differences of data in multiple experiments are that SIFT is kept the position, scale and orientation, since it is possible that in a same position (x, and), we find several points of interest to different scale and / or orientation σ. But another side in the SURF in a position (x, and) appears only a single point of interest, by what is not saved the scale and orientation, however if that registers the matrix of second order and the sign of the Laplacian.

Experimental differences[edit]

Test 1[edit]

To make comparisons must take into account the versions of the algorithms that are used in each case, in this article obtained conclusions with version 4 of the Sift and SURF with C code version 1.0.9 and analyzing images with greyscale, are also drawn conclusions from results of experiments with robots and response of the algorithms in platforms Android. The result of tests with 109 images, as announced before, it has been confirmed that the SURF method is much faster than SIFT. From the point of view of invariant points detected, the SIFT algorithm outperforms surfing more than double points detected. This is caused because the SURF allows that multiple invariant points in a same position with different scale or orientation, while in the SIFT if that is possible there are. This fact means that the distance between the number of detected features would decrease if the SIFT not duplication of points. The graphs show that the SIFT method has some 2.68 invariant points that the SURF with respect to time, SIFT more used about 1646 ms to analyze an image and the SURF 485 ms, is almost one-third of the time, efficiency in this factor is considerable. This factor is important in face of the purpose of each project, if you want an algorithm that can get a great amount of data would be more advisable to SIFT, but a quick response is needed and we don't give so much importance to the number of detected features SURF algorithm is the most suitable.

Mètode SURF Mètode SIFT

There is detail the order of the images by comparing the photo is very important, since results may vary if you choose a comparison with X vs and if done with and vs X. The SIFT method shows more points when the images being compared have more points in common, unlike the SURF gives more values when the images are more differentiated, false positive values. This happens with all photographs and leads to the conclusion that the SIFT algorithm has more durability in the time to SURF, obtaining more reliable pairings.


Detected points 1292 482
Average time 1646.53 ms 485.77 ms
  • Data extracted from an analysis made by A. M. Romero and M. Cazorla, in the document "comparison of detectors of Visual characteristics and its application to SLAM ”.

Test 2[edit]

The following study is an experiment of performance, consumption and efficiency of algorithms, in which 100 prepared logos have been and was executed 10 times each test in functionality, in order to see if the algorithms modify their behavior depending on the mobile device. The components of the testing scenario were the following:

Database: is a tuple consisting of a vector characteristics and other points of training Interes.

Training component: saves the database and points of interest generated by test algoritmos.

Try component: allows to get the vector of characteristics and interest points generated by each descriptor (SIFT, SURF) from a search (matching) input.

Search (matching): allows you to obtain a set of correspondences between the obtained tuple and tuple from the database.

Average interest points 222 431
Runtime 2887.13 ms 959.87 ms
Power consumption 0.04 % 0.02 %
Memory used 102.30 KB 385.02 KB
Effectiveness 0.33 0.40

In the previous table, the results obtained with regard to the points of interest is observed the difference from the average of points detected by the SIFT method much less than surfing, because of the difference in the compared images. Regarding runtime SIFT is much slower than the SURF, indicating that for real-time applications this algorithm should be ruled out. In battery consumption the results are proportional to the runtime, so SIFT consumes twice as much as the SURF method.Regarding the memory used SURF method exceeds the resources to store the vector of characteristics of images, almost four times more than the method SIFT on issues of effectiveness the SIFT method is also less effective than the SURF, but slightly lower.

See also[edit]


  1. ^ US 2009238460, Ryuji Funayama, Hiromichi Yanagihara, Luc Van Gool, Tinne Tuytelaars, Herbert Bay, "ROBUST INTEREST POINT DETECTOR AND DESCRIPTOR", published 2009-09-24 

External links[edit]