Acoustic landmarks and distinctive features

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Acoustic landmarks and distinctive features is the name of a proposed a model of speech perception by Kenneth N. Stevens and his colleagues at MIT.


In this model, the incoming acoustic signal is believed to be first processed to determine the so-called landmarks which are special spectral events in the signal; for example, vowels are typically marked by higher frequency of the first formant, consonants can be specified as discontinuities in the signal and have lower amplitudes in lower and middle regions of the spectrum. These acoustic features result from articulation. In fact, secondary articulatory movements may be used when enhancement of the landmarks is needed due to external conditions such as noise. Stevens claims that coarticulation causes only limited and moreover systematic and thus predictable variation in the signal which the listener is able to deal with. Within this model therefore, what is called the lack of invariance is simply claimed not to exist.

Landmarks are analyzed to determine certain articulatory events (gestures) which are connected with them. In the next stage, acoustic cues are extracted from the signal in the vicinity of the landmarks by means of mental measuring of certain parameters such as frequencies of spectral peaks, amplitudes in low-frequency region, or timing.

The next processing stage comprises acoustic-cues consolidation and derivation of distinctive features. These are binary categories related to articulation (for example [+/- high], [+/- back], [+/- round lips] for vowels; [+/- sonorant], [+/- lateral], or [+/- nasal] for consonants.

Bundles of these features uniquely identify speech segments (phonemes, syllables, words). These segments are part of the lexicon which is stored in the listener’s memory. Its units are activated in the process of lexical access and mapped on the original signal to find out whether they match. If not, another attempt with a different candidate pattern is made. In this iterative fashion, listeners thus reconstruct the articulatory events which were necessary to produce the perceived speech signal. This can be therefore described as analysis-by-synthesis.

This theory thus posits that the distal object of speech perception are the articulatory gestures underlying speech. Listeners make sense of the speech signal by making reference to them. The model belongs to those referred to as analysis-by-synthesis.


Hayward, Katrina (2000). Experimental Phonetics: An Introduction. Harlow: Longman. 

Stevens, K. N. (2002). "Toward a model of lexical access based on acoustic landmarks and distinctive features." Journal of the Acoustical Society of America, 111(4), 1872-1891.