Multispectral pattern recognition

Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic spectrum (Jensen, 2005). Subcategories of multispectral remote sensing include hyperspectral, in which hundreds of bands are collected and analyzed, and ultraspectral remote sensing where many hundreds of bands are used (Logicon, 1997). The main purpose of multispectral imaging is the potential to classify the image using multispectral classification. This is a much faster method of image analysis than is possible by human interpretation.

The Iterative Self-Organizing Data Analysis Technique (ISODATA) algorithm used for Multispectral pattern recognition was developed by Geoffrey H. Ball and David J. Hall, working in the Stanford Research Institute in Menlo Park, CA. They published their findings in a technical report entitled: ISODATA, a novel method of data analysis and pattern classification (Stanford Research Institute, 1965). ISODATA is defined in the abstract as: 'a novel method of data analysis and pattern classification, is described in verbal and pictorial terms, in terms of a two-dimensional example, and by giving the mathematical calculations that the method uses. The technique clusters many-variable data around points in the data's original high- dimensional space and by doing so provides a useful description of the data.' (1965, pp v.)ISODATA was developed to facilitate the modelling and tracking of weather patterns.

Multispectral remote sensing systems using ISODATA

Remote sensing systems gather data via instruments typically carried on satellites in orbit around the Earth. The remote sensing scanner detects the energy that radiates from the object or area of interest. This energy is recorded as an analog electrical signal and converted into a digital value though an A-to-D conversion. There are several multispectral remote sensing systems that can be categorized in the following way:

Multispectral Imaging using discrete detectors and scanning mirrors

Landsat Multispectral Scanner (MSS)
Landsat Thematic Mapper (TM)
NOAA Geostationary Operational Environmental Satellite (GOES)
NOAA Advanced Very High Resolution Radiometer (AVHRR)
NASA and ORBIMAGE, Inc., Sea-viewing Wide field-of-view Sensor (SeaWiFS)
Daedalus, Inc., Aircraft Multispectral Scanner (AMS)
NASA Airborne Terrestrial Applications Sensor (ATLAS)

Multispectral Imaging Using Linear Arrays

SPOT 1, 2, and 3 High Resolution Visible (HRV) sensors and Spot 4 and 5 High Resolution Visible Infrared (HRVIR) and vegetation sensor
Indian Remote Sensing System (IRS) Linear Imaging Self-scanning Sensor (LISS)
Space Imaging, Inc. (IKONOS)
Digital Globe, Inc. (QuickBird)
ORBIMAGE, Inc. (OrbView-3)
ImageSat International, Inc. (EROS A1)
NASA Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER)
NASA Terra Multiangle Imaging Spectroradiometer (MISR)

Imaging Spectrometry Using Linear and Area Arrays

NASA Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)
Compact Airborne Spectrographic Imager 3 (CASI 3)
NASA Terra Moderate Resolution Imaging Spectrometer (MODIS)
NASA Earth Observer (EO-1) Advanced Land Imager (ALI), Hyperion, and LEISA Atmospheric Corrector (LAC)

Satellite Analog and Digital Photographic Systems

Russian SPIN-2 TK-350, and KVR-1000
NASA Space Shuttle and International Space Station Imagery

Multispectral classification methods

A variety of methods can be used for the multispectral classification of images:

Algorithms based on parametric and nonparametric statistics that use ratio-and interval-scaled data and nonmetric methods that can also incorporate nominal scale data (Duda et al., 2001),
Supervised or unsupervised classification logic,
Hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products,
Per-pixel or object-oriented classification logic, and
Hybrid approaches

Supervised classification

In this classification method, the identity and location of some of the land-cover types are obtained beforehand from a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. The analyst would locate sites that have similar characteristics to the known land-cover types. These areas are known as training sites because the known characteristics of these sites are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. All pixels inside and outside of the training sites are evaluated and allocated to the class with the more similar characteristics.

Classification scheme

The first step in the supervised classification method is to identify the land-cover and land-use classes to be used. Land-cover refers to the type of material present on the site (e.g. water, crops, forest, wet land, asphalt, and concrete). Land-use refers to the modifications made by people to the land cover (e.g. agriculture, commerce, settlement). All classes should be selected and defined carefully to properly classify remotely sensed data into the correct land-use and/or land-cover information. To achieve this purpose, it is necessary to use a classification system that contains taxonomically correct definitions of classes. If a hard classification is desired, the following classes should be used:

Mutually exclusive: there is not any taxonomic overlap of any classes (i.e., rain forest and evergreen forest are distinct classes).
Exhaustive: all land-covers in the area have been included.
Hierarchical: sub-level classes (e.g., single-family residential, multiple-family residential) are created, allowing that these classes can be included in a higher category (e.g., residential).

Some examples of hard classification schemes are:

American Planning Association Land-Based Classification System
United States Geological Survey Land-use/Land-cover Classification System for Use with Remote Sensor Data
U.S. Department of the Interior Fish and Wildlife Service
U.S. National Vegetation and Classification System
International Geosphere-Biosphere Program IGBP Land Cover Classification System

Training sites

Once the classification scheme is adopted, the image analyst may select training sites in the image that are representative of the land-cover or land-use of interest. If the environment where the data was collected is relatively homogeneous, the training data can be used. If different conditions are found in the site, it would not be possible to extend the remote sensing training data to the site. To solve this problem, a geographical stratification should be done during the preliminary stages of the project. All differences should be recorded (e.g. soil type, water turbidity, crop species, etc.). These differences should be recorded on the imagery and the selection training sites made based on the geographical stratification of this data. The final classification map would be a composite of the individual stratum classifications.

After the data are organized in different training sites, a measurement vector is created. This vector would contain the brightness values for each pixel in each band in each training class. The mean, standard deviation, variance-covariance matrix, and correlation matrix are calculated from the measurement vectors.

Once the statistics from each training site are determined, the most effective bands for each class should be selected. The objective of this discrimination is to eliminate the bands that can provide redundant information. Graphical and statistical methods can be used to achieve this objective. Some of the graphic methods are:

Bar graph spectral plots
Cospectral mean vector plots
Feature space plots
Cospectral parallelepiped or ellipse plots

Classification algorithm

The last step in supervised classification is selecting an appropriate algorithm. The choice of a specific algorithm depends on the input data and the desired output. Parametric algorithms are based on the fact that the data is normally distributed. If the data is not normally distributed, nonparametric algorithms should be used. The more common nonparametric algorithms are:

One-dimensional density slicing
Parallelipiped
Minimum distance
Nearest-neighbor
Neural network and expert system analysis

Unsupervised classification

Unsupervised classification (also known as clustering) is a method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Unsupervised classification require less input information from the analyst compared to supervised classification because clustering does not require training data. This process consists in a series of numerical operations to search for the spectral properties of pixels. From this process, a map with m spectral classes is obtained. Using the map, the analyst tries to assign or transform the spectral classes into thematic information of interest (i.e. forest, agriculture, urban). This process may not be easy because some spectral clusters represent mixed classes of surface materials and may not be useful. The analyst has to understand the spectral characteristics of the terrain to be able to label clusters as a specific information class. There are hundreds of clustering algorithms. Two of the most conceptually simple algorithms are the chain method and the ISODATA method.

Chain method

The algorithm used in this method operates in a two-pass mode (it passes through the multispectral dataset two times. In the first pass, the program reads through the dataset and sequentially builds clusters (groups of points in spectral space). Once the program reads though the dataset, a mean vector is associated to each cluster. In the second pass, a minimum distance to means classification algorithm is applied to the dataset, pixel by pixel. Then, each pixel is assigned to one of the mean vectors created in the first step.....

ISODATA method

The Iterative Self-Organizing Data Analysis Technique (ISODATA) method used a set of rule-of-thumb procedures that have incorporated into an iterative classification algorithm. Many of the steps used in the algorithm are based on the experience obtained through experimentation. The ISODATA algorithm is a modification of the k-means clustering algorithm(overcomes the disadvantages of k-means). This algorithm includes the merging of clusters if their separation distance in multispectral feature space is less than a user-specified value and the rules for splitting a single cluster into two clusters. This method makes a large number of passes through the dataset until specified results are obtained.

References

Ball, Geoffrey H., Hall, David J. (1965) Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park,United States. Office of Naval Research. Information Sciences Branch
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. New York: John Wiley & Sons.
Jensen, J. R. (2005). Introductory Digital Image Processing: A Remote Sensing Perspective. Upper Saddle River : Pearson Prentice Hall.
Belokon, W. F. et al. (1997). Multispectral Imagery Reference Guide. Fairfax: Logicon Geodynamics, Inc.