Speech production

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Human vocal apparatus used to produce speech

Speech production is the process by which spoken words are selected to be produced, have their phonetics formulated and then finally are articulated by the motor system in the vocal apparatus. Speech production can be spontaneous such as when a person creates the words of a conversation, reaction such as when they name a picture or read aloud a written word, or a vocal imitation such as in speech repetition.

Speech production is not the same as language production since language can also be produced manually by signs.

In ordinary fluent conversation people pronounce each second roughly four syllables, ten or twelve phonemes and two to three words out of a vocabulary that can contain 10 to 100 thousand words.[1] Errors in speech production are relatively rare occurring at a rate of about once in every 900 words in spontaneous speech.[2] Words that are commonly spoken or learned early in life or easily imagined are quicker to say than ones that are rarely said, learnt later in life or abstract.[3][4]

Normally speech is created with pulmonary pressure provided by the lungs that generates sound by phonation in the glottis in the larynx that then is modified by the vocal tract into different vowels and consonants. However speech production can occur without the use of the lungs and glottis in alaryngeal speech by using the upper parts of the vocal trait. An example of such alaryngeal speech is Donald Duck talk.[5]

The vocal production of speech can be associated with the production of synchronized hand gestures that act to enhance the comprehensibility of what is being said.[6]

Three stages[edit]

The production of spoken language involves three major levels of processing: conceptualization, formulation, and articulation.[1][7][8]

The first is the processes of conceptualization or conceptual preparation, in which the intention to create speech links a desired concept to a particular spoken word to be expressed. Here the preverbal intended messages are formulated that specify the concepts to be verbally expressed.[9]

The second stage is formulation in which the linguistic form required for the expression of the desired message is created. Formulation includes grammatical encoding, morpho-phonological encoding, and phonetic encoding.[9] Grammatical encoding is the process of selecting the appropriate syntactic word or lemma. The selected lemma then activates the appropriate syntactic frame for the conceptualized message. Morpho-phonological encoding is the process of breaking words down into syllables to be produced in overt speech. This syllabification is dependent on the preceding and proceeding words, for instance: I-com-pre-hend vs. I-com-pre-hen-dit.[9] The final part of the formulation stage is phonetic encoding. This involves the activation of articulatory gestures dependent on the syllables selected in the morpho-phonological process, creating an articulatory score as the utterance is pieced together and the order of movements of the vocal apparatus is completed.[9]

The third stage of speech production is articulation which is the execution of the articulatory score by the lungs, glottis, larynx, tongue, lips, jaw and other parts of the vocal apparatus resulting in overt speech.[7][9]


Speech production motor control in right handers depends mostly upon areas in the left cerebral hemisphere. These areas include the bilateral supplementary motor area, the left posterior inferior frontal gyrus, the left insula, the left Primary motor cortex and temporal cortex.[10] There are also subcortical areas involved such as the basal ganglia and cerebellum.[11][12] The cerebellum aids the sequencing of speech syllables into fast, smooth and rhythmically organized words and longer utterances.[12]


Speech production can be affected by several disorders:

See also[edit]


  1. ^ a b Levelt, WJ (1999). "Models of word production." (PDF). Trends in Cognitive Sciences 3 (6): 223–232. doi:10.1016/S1364-6613(99)01319-4. PMID 10354575. 
  2. ^ Garnham, A, Shillcock RC, Brown GDA, Mill AID, Culter A (1981). "Slips of the tongue in the London–Lund corpus of spontaneous conversation" (PDF). Linguistics 19 (7–8): 805–817. doi:10.1515/ling.1981.19.7-8.805. 
  3. ^ Oldfield RC, Wingfield A (1965). "Response latencies in naming objects". Quarterly Journal of Experimental Psychology 17 (4): 273–281. doi:10.1080/17470216508416445. PMID 5852918. 
  4. ^ Bird, H; Franklin, S; Howard, D (2001). "Age of acquisition and imageability ratings for a large set of words, including verbs and function words" (PDF). Behavior Research Methods, Instruments, & Computers 33 (1): 73–9. doi:10.3758/BF03195349. PMID 11296722. 
  5. ^ Weinberg, B; Westerhouse, J (1971). "A study of buccal speech". Journal of Speech and Hearing Research 14 (3): 652–8. Bibcode:1972ASAJ...51Q..91W. doi:10.1121/1.1981697. PMID 5163900. 
  6. ^ McNeill D (2005). Gesture and Thought. University of Chicago Press. ISBN 978-0-226-51463-5. 
  7. ^ a b Levelt, WJM (1989). Speaking: From Intention to Articulation. MIT Press. ISBN 978-0-262-62089-5. 
  8. ^ Jescheniak, JD; Levelt, WJM (1994). "Word frequency effects in speech production: retrieval of syntactic information and of phonological form". Journal of Experimental Psychology: Learning, Memory, and Cognition 20 (4): 824–843. doi:10.1037/0278-7393.20.4.824. CiteSeerX: 
  9. ^ a b c d e Levelt, W. (1999). "The neurocognition of language", p.87 -117. Oxford Press
  10. ^ Indefrey, P; Levelt, WJ (2004). "The spatial and temporal signatures of word production components". Cognition 92 (1–2): 101–44. doi:10.1016/j.cognition.2002.06.001. PMID 15037128. 
  11. ^ Booth, JR; Wood, L; Lu, D; Houk, JC; Bitan, T (2007). "The role of the basal ganglia and cerebellum in language processing". Brain Research 1133 (1): 136–44. doi:10.1016/j.brainres.2006.11.074. PMC 2424405. PMID 17189619. 
  12. ^ a b Ackermann, H (2008). "Cerebellar contributions to speech production and speech perception: psycholinguistic and neurobiological perspectives". Trends in Neurosciences 31 (6): 265–72. doi:10.1016/j.tins.2008.02.011. PMID 18471906. 

Further reading[edit]