Motion estimation

Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.

Closely related to motion estimation is optical flow, where the vectors correspond to the perceived movement of pixels. In motion estimation an exact 1:1 correspondence of pixel positions is not a requirement.

Applying the motion vectors to an image to synthesise the transformation to the next image is called Motion compensation. The combination of motion estimation and motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs.

Algorithms

The methods for finding motion vectors can be categorised into pixel based methods ("direct") and feature based methods ("indirect"). A famous debate resulted in two papers from the opposing factions being produced to try to establish a conclusion^[1]^[2].

Direct Methods

Block-matching algorithm
Phase correlation and frequency domain methods
Pixel recursive algorithms
MAP/MRF type "Bayesian" estimators
Optical flow

Evaluation Metrics

In direct methods several evaluation metrics can be used.

Mean Squared Error (MSE)
Sum of Absolute Differences (SAD)
Mean Absolute Difference (MAD)
Sum of Squared Errors (SSE)
Sum of Absolute Transformed Differences (SATD)

Indirect Methods

Indirect methods use features, such as Harris corners, and match corresponding features between frames, usually with a statistical function applied over a local or global area. The purpose of the statistical function is to remove matches that do not correspond to the actual motion.

Statistical functions that have been successfully used include RANSAC.

References

^ Philip H.S. Torr and Andrew Zisserman: Feature Based Methods for Structure and Motion Estimation, ICCV Workshop on Vision Algorithms, pages 278-294, 1999
^ Michal Irani and P. Anandan: About Direct Methods, ICCV Workshop on Vision Algorithms, pages 267-277, 1999.

[1] Philip H.S. Torr and Andrew Zisserman: Feature Based Methods for Structure and Motion Estimation, ICCV Workshop on Vision Algorithms, pages 278-294, 1999

[2] Michal Irani and P. Anandan: About Direct Methods, ICCV Workshop on Vision Algorithms, pages 267-277, 1999.

[1]

[2]