Statistics and Machine Learning
In machine learning, the term "ground truth" refers to the accuracy of the training set's classification for supervised learning techniques. This is used in statistical models to prove or disprove research hypotheses. The term "ground truthing" refers to the process of gathering the proper objective data for this test. Compare with gold standard (test).
Bayesian spam filtering is a common example of supervised learning. In this system, the algorithm is manually taught the differences between spam and non-spam. This depends on the ground truth of the messages used to train the algorithm; inaccuracies in that ground truth will correlate to inaccuracies in the resulting spam/non-spam verdicts.
Ground truth is a term used in remote sensing; it refers to information collected on location. Ground truth allows image data to be related to real features and materials on the ground. The collection of ground-truth data enables calibration of remote-sensing data, and aids in the interpretation and analysis of what is being sensed. Examples include cartography, meteorology, analysis of aerial photographs, satellite imagery and other techniques in which data are gathered at a distance.
More specifically, ground truth may refer to a process in which a pixel on a satellite image is compared to what is there in reality (at the present time) in order to verify the contents of the pixel on the image. In the case of a classified image, it allows supervised classification to help determine the accuracy of the classification performed by the remote sensing software and therefore minimize errors in the classification such as errors of commission and errors of omission.
Ground truth is usually done on site, performing surface observations and measurements of various properties of the features of the ground resolution cells that are being studied on the remotely sensed digital image. It also involves taking geographic coordinates of the ground resolution cell with GPS technology and comparing those with the coordinates of the pixel being studied provided by the remote sensing software to understand and analyze the location errors and how it may affect a particular study.
Ground truth is important in the initial supervised classification of an image. When the identity and location of land cover types are known through a combination of field work, maps, and personal experience these areas are known as training sites. The spectral characteristics of these areas are used to train the remote sensing software using decision rules for classifying the rest of the image. These decision rules such as Maximum Likelihood Classification, Parallelepiped Classification, and Minimum Distance Classification offer different techniques to classify an image. Additional ground truth sites allow the remote sensor to establish an error matrix which validates the accuracy of the classification method used. Different classification methods may have different percentages of error for a given classification project. It is important that the remote sensor chooses a classification method that works best with the number of classifications used while providing the least amount of error.
Ground truth also helps with atmospheric correction. Since images from satellites obviously have to pass through the atmosphere, they can get distorted because of absorption in the atmosphere. So ground truth can help fully identify objects in satellite photos.
Errors of commission
An example of an error of commission is when a pixel reports the presence of a feature (such as trees) that, in reality, is absent (no trees are actually present). Ground truthing ensures that the error matrices have a higher accuracy percentage than would be the case if no pixels were ground truthed.
Errors of omission
An example of an error of omission is when pixels of a certain thing, for example maple trees, are not classified as maple trees. The process of ground truthing helps to ensure that the pixel is classified correctly and the error matrices are more accurate.
US military slang uses "ground truth" to describe the reality of a tactical situation - as opposed to intelligence reports and mission plans. The term appears in the title of the Iraq War documentary film The Ground Truth (2006), and also in military publications, for example Stars and Stripes saying: "Stripes decided to figure out what the ground truth was in Iraq."
The military usage of the term is long-standing but its origins are obscure. It is plausible but difficult to prove that "ground truth" began life as military terminology and then migrated to other domains such as remote sensing control. (The Oxford English Dictionary (s.v. "ground truth") records the use of the word "Groundtruth" in the sense of a "fundamental truth" from Henry Ellison's poem "The Siberian Exile's Tale", published in 1833.)
- Ellison, Henry (1833). Mad moments, or first verse attempts by a born natural. p. 362. Retrieved 2014-10-24. "As the Groundtruth of her own Existence it must be regarded, thro' Him in its highest, purest Aspect shown!"
- Forestry Organization Remote Sensing Technology Project (includes an example of an error matrix)