Structure from motion

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Structure from motion (SfM) is a range imaging technique; it refers to the process of estimating three-dimensional structures from two-dimensional image sequences which may be coupled with local motion signals. It is studied in the fields of computer vision and visual perception. In biological vision, SfM refers to the phenomenon by which humans (and other living creatures) can recover 3D structure from the projected 2D (retinal) motion field of a moving object or scene.

Obtaining 3D information from 2D images[edit]

Digital surface model of motorway interchange construction site
Real photo x SfM with texture color x SfM with simple shader. Made with Python Photogrammetry Toolbox GUI and rendered in Blender with Cycles.
Bezmiechowa airfield 3D Digital Surface Model extracted from data collected during 30min flight of Pteryx UAV

Humans perceive a lot of information about the three-dimensional structure in their environment by moving through it. When the observer moves and the objects around him move, information is obtained from images sensed over time.[1]

Finding structure from motion presents a similar problem as finding structure from stereo vision. In both instances, the correspondence between images and the reconstruction of 3D object needs to be found.

To find correspondence between images, features such as corner points (edges with gradients in multiple directions) are tracked from one image to the next. One of the most widely used feature detectors is the SIFT (Scale-invariant feature transform). It uses the maxima from a Difference-of-Gaussians (DOG) pyramid as features. The first step in SIFT is finding a dominant gradient direction. To make it rotation-invariant, the descriptor is rotated to fit this orientation.[2] Another common feature detector is the SURF (Speeded Up Robust Features).[3] In SURF, the DOG is replaced with a Hessian matrix based blob detector. Also, instead of evaluating the gradient histograms, SURF computes for the sums of gradient components and the sums of their absolute values.[4] The features detected from all the images will then be matched. One of the matching algorithms that track features from one image to another is the Lukas-Kanade tracker.[5]

Sometimes some of the matched features are incorrectly matched. This is why the matches should also be filtered. RANSAC (Random Sample Consensus) is the algorithm, which is usually used to remove the outlier correspondences. In the paper of Fischler and Bolles, RANSAC is used to solve the Location Determination Problem (LDP), where the objective is to determine the points in space that project onto an image into a set of landmarks with known locations.[6]

The feature trajectories over time are then used to reconstruct their 3D positions and the camera's motion.[7] An alternative is given by so-called direct approaches, where geometric information (3D structure and camera motion) is directly estimated from the images, without intermediate abstraction to features or corners.[8]

See also[edit]

Further reading[edit]

  • Richard Hartley and Andrew Zisserman (2003). Multiple View Geometry in Computer Vision. Cambridge University Press. ISBN 0-521-54051-8. 
  • Olivier Faugeras and Quang-Tuan Luong and Theodore Papadopoulo (2001). The Geometry of Multiple Images. MIT Press. ISBN 0-262-06220-8. 
  • Yi Ma, S. Shankar Sastry, Jana Kosecka, Stefano Soatto, Jana Kosecka (November 2003). An Invitation to 3-D Vision: From Images to Geometric Models. Interdisciplinary Applied Mathematics Series, #26. Springer-Verlag New York, LLC. ISBN 0-387-00893-4. 


  1. ^ Linda G. Shapiro, George C. Stockman (2001). Computer Vision. Prentice Hall. ISBN 0-13-030796-3. 
  2. ^ D. G. Lowe (2004). "Distinctive image features from scale-invariant keypoints". International Journal of Computer Vision. 
  3. ^ H. Bay, T. Tuytelaars, and L. Van Gool (2006). "Surf: Speeded up robust features". 9th European Conference on Computer Vision. 
  4. ^ K. Häming and G. Peters (2010). [URL: "The structure-from-motion reconstruction pipeline – a survey with focus on short image sequences"]. Kybernetika,. 
  5. ^ B. D. Lucas and T. Kanade. "An iterative image registration technique with an application to stereo vision". IJCAI81,. 
  6. ^ M. A. Fischler and R. C. Bolles (1981). "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography". Commun. ACM,. 
  7. ^ F. Dellaert, S. Seitz, C. Thorpe, and S. Thrun (2000). "Structure from Motion without Correspondence". IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 
  8. ^ Engel, Jakob; Schöps, Thomas; Cremers, Daniel (2014). "European Converence on Computer Vision (ECCV) 2014" (PDF).  |chapter= ignored (help)

External links[edit]

Structure from Motion software toolboxes[edit]

OpenSource solution

  1. Structure from Motion toolbox for Matlab by Vincent Rabaud
  2. Matlab Functions for Multiple View Geometry by Andrew Zissermann
  3. Structure and Motion Toolkit by Phil Torr
  4. Bundler - Structure from Motion for Unordered Photo Collections by Noah Snavely
  5. Libmv - A C++ Structure from Motion library
  6. openMVG An Open Multiple View Geometry library + Structure from Motion demonstrators
  7. MVE - The Multi-View Environment by Simon Fuhrmann, TU Darmstadt.
  8. MicMac, a SFM open-source code released by the Institut national de l'information géographique et forestière
  9. Python Photogrammetry Toolbox GUI - an open-source SFM GUI (Easy SfM and dense point cloud estimation launcher) by Pierre Moulon and Arc-Team
  10. Matlab Code for Non-Rigid Structure from Motion by Lorenzo Torresani
  11. SBA for generic bundle adjustment by Manolis Lourakis.
  12. ceres-solver for general non-linear least squares. Has features for bundle adjustment. Previously used by Google internally for google maps. Released to the public in 2012.
  13. LSD-SLAM: Large-Scale Direct Monocular SLAM in real-time, by Jakob Engel
  14. Theia: A Fast and Scalable Structure-from-Motion Library


  1. Smart3DCapture, a complete photogrammetry solution by Acute3D.
  2. 3DF Samantha - Command line structure from Motion pipeline for Windows, by 3Dflow srl. Free for non-commercial purposes.
  3. Automatic Camera Tracking System (ACTS) A structure-from-motion system for Microsoft Windows, by State Key Lab of CAD&CG, Zhejiang University.
  4. VisualSFM: A Visual Structure from Motion System, by Changchang Wu
  5. SFMToolkit a complete photogrammetry solution based on open-source software
  6. MountainsMap SEM software for Scanning Electron Microscopes. 3D is obtained by tilting the specimen + photogrammetry.
  7. Voodoo Camera Tracker, non-commerial tool for the integration of virtual and real scenes.
    Original site, archived: Laboratorium für Informationstechnologie, University of Hannover
  8. MetaIO Toolbox SfM for augmented reality on mobile devices.
  9. TacitView by 2d3 Sensing
  10. Catena Python Abstract Workflow Framework with SfM components.