In linguistics, prosody (from Ancient Greek προσῳδία prosōidía [prosɔː(i)díaː], "song sung to music; tone or accent of a syllable") is concerned with those elements of speech that are not individual vowels and consonants but are properties of syllables and larger units of speech. These contribute to such linguistic functions as intonation, tone, stress and rhythm. Prosody may reflect various features of the speaker or the utterance: the emotional state of the speaker; the form of the utterance (statement, question, or command); the presence of irony or sarcasm; emphasis, contrast, and focus; or other elements of language that may not be encoded by grammar or by choice of vocabulary.
Attributes of prosody
In the study of prosodic aspects of speech it is usual to distinguish between auditory measures (subjective impressions produced in the mind of the listener) and acoustic measures (physical properties of the sound wave that may be measured objectively). Auditory and acoustic measures of prosody do not correspond in a linear way. The majority of studies of prosody have been based on auditory analysis using auditory scales.
There is no agreed number of prosodic variables. In auditory terms, the major variables are
- the pitch of the voice (varying between low and high)
- length of sounds (varying between short and long)
- loudness, or prominence (varying between soft and loud)
- timbre (quality of sound)
in acoustic terms, these correspond reasonably closely to
- fundamental frequency (measured in Hertz, or cycles per second)
- duration (measured in time units such as milliseconds or seconds)
- intensity, or sound pressure level (measured in decibels)
- spectral characteristics (distribution of energy at different parts of the audible frequency range)
Different combinations of these variables are exploited in the linguistic functions of intonation and stress, as well as other prosodic features such as rhythm, tempo and loudness. Additional prosodic variables have been studied, including voice quality and pausing.
Prosodic features are said to be suprasegmental, since they are properties of units of speech larger than the individual segment (though exceptionally it may happen that a single segment may constitute a syllable, and thus even a whole utterance, e.g. "Ah!"). It is necessary to distinguish between the personal, background characteristics that belong to an individual’s voice (for example their habitual pitch range) and the independently variable prosodic features that are used contrastively to communicate meaning (for example, the use of changes in pitch to indicate the difference between statements and questions). Personal characteristics are not linguistically significant. It is not possible to say with any accuracy which aspects of prosody are found in all languages and which are specific to a particular language or dialect.
Some writers have described intonation entirely in terms of pitch, while others propose that what we call intonation is in fact an amalgam of several prosodic variables. The form of English intonation is often said to be based on three aspects:
- The division of speech into units
- The highlighting of particular words and syllables
- The choice of pitch movement (e.g. fall or rise)
These are sometimes known as Tonality, Tonicity and Tone (and collectively as "the three T's")(ref Halliday, Wells).
An additional pitch-related variation is pitch range: speakers are capable of speaking sometimes with a wide range of pitch (this is usually associated with excitement), at other times with a narrow range. English has been said to make use of changes in key: shifting one's intonation into the higher or lower part of one's pitch range is believed to be meaningful in certain contexts.
From the perceptual point of view, stress functions as the means of making a syllable prominent; stress may be studied in relation to individual words (named "word stress" or lexical stress) or in relation to larger units of speech (traditionally referred to as "sentence stress" but more appropriately named "prosodic stress"). Stressed syllables are made prominent by several variables, singly or in combination. Stress is typically associated with the following:
- pitch prominence, that is, a pitch level that is different from that of neighbouring syllables, or a pitch movement
- increased length
- increased loudness
- differences in timbre: in English and some other languages, stress is associated with aspects of vowel quality (whose acoustic correlate is the formant frequencies or spectrum of the vowel). Unstressed vowels tend to be centralized relative to stressed vowels, which are normally more peripheral in quality
These cues to stress are not equally powerful. Cruttenden, for example, writes "Perceptual experiments have clearly shown that, in English at any rate, the three features (pitch, length and loudness) form a scale of importance in bringing syllables into prominence, pitch being the most efficacious, and loudness the least so".
When pitch prominence is the major factor, the resulting prominence is often called accent rather than stress.
There is considerable variation from language to language concerning the role of stress in identifying words or in interpreting grammar and syntax.
Although individual speakers differ from others in their personal speaking rate (tempo), all speakers appear to use changes in speaking rate in a meaningful way. See Tempo of speech.
Although rhythm is not a prosodic variable in the way that pitch or loudness are, it is usual to treat a language's characteristic rhythm as a part of its prosodic phonology. It has often been asserted that languages exhibit regularity in the timing of successive units of speech, a regularity referred to as isochrony, and that every language may be assigned one of three rhythmical types: stress-timed (where the durations of the intervals between stressed syllables is relatively constant), syllable-timed (where the durations of successive syllables are relatively constant) and mora-timed (where the durations of successive morae are relatively constant). As explained in the isochrony article, this claim has not been supported by scientific evidence.
Although pausing is a natural phenomenon related to breathing, it is claimed that pauses may also carry some contrastive linguistic information. In English, pausing is more likely before a word carrying a high information content. Defining pause is not easy: it is necessary to distinguish between silent pauses and "filled" pauses where a hesitation is perceived but the speaker continues to emit sound. In the study of conversational interaction it is normal to note different lengths of pause.
Pausing or its lack is a factor in creating the perception of words being grouped together into a phrase, phraseme, constituent or other multi-word grouping, often highlighting lexical items or fixed expression idioms. For example, pausing before and after a multi-word grouping, but not within, groups words together and separates them from nearby words. Also, within a multi-word grouping, blending the sound of adjacent words together or speaking them faster than words outside the grouping contributes to the perception of the words as part of a group. A well-known example in English is "Know what I mean?" being said rapidly as if it is a single word ("No-whuta-meen?")
Intonation is said to have a number of perceptually significant functions in English and other languages, contributing to the recognition and comprehension of speech.
It is believed that prosody assists listeners in parsing continuous speech and in the recognition of words, providing cues to syntactic structure, grammatical boundaries and sentence type. Boundaries between intonation units are often associated with grammatical or syntactic boundaries; these are marked by such prosodic features as pauses and slowing of tempo, as well as "pitch reset" where the speaker's pitch level returns to the level typical of the onset of a new intonation unit. In this way potential ambiguities may be resolved. For example, the sentence “They invited Bob and Bill and Al got rejected” is ambiguous when written, although addition of a written comma after either "Bob" or "Bill" will remove the sentence's ambiguity. But when the sentence is read aloud, prosodic cues like pauses and changes in intonation will reduce or remove the ambiguity. Moving the intonational boundary in cases such as the above example will tend to change the interpretation of the sentence. This result has been found in studies performed in both English and Bulgarian. Research in English word recognition has demonstrated an important role for prosody.
Intonation and stress work together to highlight important words or syllables for contrast and focus. This is sometimes referred to as the accentual function of prosody. A well-known example is the ambiguous sentence "I have plans to leave", where if the primary accent is placed on "plans" the meaning of the sentence is usually taken to be "I have some plans (drawings, diagrams) to leave" but if the main accent is on "leave" the typical interpretation is "I am planning to leave".
Prosody plays a role in the regulation of conversational interaction and in signalling discourse structure. Much of the work of developing the theory of discourse intonation was done by David Brazil and his associates. In this work it is shown how intonation can indicate such things as whether information is new or already established; whether a speaker is dominant or not in a conversation, and when a speaker is inviting the listener to make a contribution to the conversation.
Prosody is also important in signalling emotions and attitudes. When this is involuntary (as when the voice is affected by anxiety or fear), the prosodic information is not linguistically significant. However, when the speaker varies her speech intentionally, for example to indicate sarcasm, this usually involves the use of prosodic features. The most useful prosodic feature in detecting sarcasm is a reduction in the mean fundamental frequency relative to other speech for humor, neutrality, or sincerity. While prosodic cues are important in indicating sarcasm, context clues and shared knowledge are also important.
Emotional prosody was considered by Charles Darwin in The Descent of Man to predate the evolution of human language: "Even monkeys express strong feelings in different tones – anger and impatience by low, – fear and pain by high notes." Native speakers listening to actors reading emotionally neutral text while projecting emotions correctly recognized happiness 62% of the time, anger 95%, surprise 91%, sadness 81%, and neutral tone 76%. When a database of this speech was processed by computer, segmental features allowed better than 90% recognition of happiness and anger, while suprasegmental prosodic features allowed only 44%–49% recognition. The reverse was true for surprise, which was recognized only 69% of the time by segmental features and 96% of the time by suprasegmental prosody. In typical conversation (no actor voice involved), the recognition of emotion may be quite low, of the order of 50%, hampering the complex interrelationship function of speech advocated by some authors. However, even if emotional expression through prosody cannot always be consciously recognized, tone of voice may continue to have subconscious effects in conversation. This sort of expression stems not from linguistic or semantic effects, and can thus be isolated from traditional linguistic content. Aptitude of the average person to decode conversational implicature of emotional prosody has been found to be slightly less accurate than traditional facial expression discrimination ability; however, specific ability to decode varies by emotion. These emotional[clarification needed] have been determined to be ubiquitous across cultures, as they are utilized and understood across cultures. Various emotions, and their general experimental identification rates, are as follows:
- Anger and sadness: High rate of accurate identification
- Fear and happiness: Medium rate of accurate identification
- Disgust: Poor rate of accurate identification
The prosody of an utterance is used by listeners to guide decisions about the emotional affect of the situation. Whether a person decodes the prosody as positive, negative, or neutral plays a factor in the way a person decodes a facial expression accompanying an utterance. As the facial expression becomes closer to neutral, the prosodic interpretation influences the interpretation of the facial expression. A study by Marc D. Pell revealed that 600 ms of prosodic information is necessary for listeners to be able to identify the affective tone of the utterance. At lengths below this, there was not enough information for listeners to process the emotional context of the utterance.
Unique prosodic features have been noted in infant-directed speech (IDS) - also known as baby talk, child-directed speech (CDS), or motherese. Adults, especially caregivers, speaking to young children tend to imitate childlike speech by using higher and more variable pitch, as well as an exaggerated stress. These prosodic characteristics are thought to assist children in acquiring phonemes, segmenting words, and recognizing phrasal boundaries. And though there is no evidence to indicate that infant-directed speech is necessary for language acquisition, these specific prosodic features have been observed in many different languages.
An aprosodia is an acquired or developmental impairment in comprehending or generating the emotion conveyed in spoken language. Aprosody is often accompanied by the inability to properly utilize variations in speech, particularly with deficits in ability to accurately modulate pitch, loudness, intonation, and rhythm of word formation. This is seen sometimes in persons with Asperger syndrome.
Brain regions involved
Producing these nonverbal elements requires intact motor areas of the face, mouth, tongue, and throat. This area is associated with Brodmann areas 44 and 45 (Broca's area) of the left frontal lobe. Damage to areas 44/45 produces motor aprosodia, with the nonverbal elements of speech being disturbed (facial expression, tone, rhythm of voice).
Understanding these nonverbal elements requires an intact and properly functioning right-hemisphere perisylvian area, particularly Brodmann area 22 (not to be confused with the corresponding area in the left hemisphere, which contains Wernicke's area). Damage to the right inferior frontal gyrus causes a diminished ability to convey emotion or emphasis by voice or gesture, and damage to right superior temporal gyrus causes problems comprehending emotion or emphasis in the voice or gestures of others. The right Brodmann area 22 aids in the interpretation of prosody, and damage causes sensory aprosodia, with the patient unable to comprehend changes in voice and body language.
- Phonological hierarchy
- Prosody (poetry)
- Semantic prosody, or discourse prosody
- Tempo of speech
- Hirst, D.; Di Cristo, A. (1998). Intonation systems. Cambridge. pp. 4–7.
- Crystal, D.; Quirk, R. (1964). Systems of Prosodic and Paralinguistic Features in English. Mouton. pp. 10–12.
- Collins, B.; Mees, I. (2013). Practical Phonetics and Phonology (3rd ed.). Routledge. p. 129.
- Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge. p. 13.
- Ashby, M.; Maidment, J. (2005). Introducing Phonetic Science. Cambridge. pp. 167–8.
- Hirst, D.; Di Cristo, A. (1998). Intonation systems. Cambridge. pp. 1–13.
- Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge. pp. 68–125. ISBN 0-521-59825-7.
- Wells, J. (2006). English Intonation. Cambridge. pp. 187–194.
- Stoyneshka, I.; Fodor, J.; Férnandez, E. M. (April 7, 2010). "Phoneme restoration methods for investigating prosodic influences on syntactic processing". Language and Cognitive Processes.
- Carroll, David W. (1994). Psychology of Language. Brooks/Cole. p. 87.
- Aitchison, Jean (1994). Words in the Mind. Blackwell. pp. 136–9.
- Wells, John (2006). English Intonation. Cambridge. pp. 116–124.
- Roach, Peter (2009). English Phonetics and Phonology (4th ed.). Cambridge. pp. 153–4.
- Brazil, David; Coulthard, Malcolm; Johns, Catherine (1980). Discourse Intonation and Language Teaching. Longman.
- Cheang, H.S.; Pell (May 2008). "M.D.". Speech Communication 50: 366–81. doi:10.1016/j.specom.2007.11.003.
- Charles Darwin (1871). "The Descent of Man". citing Johann Rudolph Rengger, Natural History of the Mammals of Paraguay, s. 49
- R. Barra, J.M. Montero, J. Macías-Guarasa, L.F. D’Haro, R. San-Segundo, R. Córdoba. "Prosodic and segmental rubrics in emotion identification" (PDF).
- H.-N. Teodorescu and Silvia Monica Feraru. In: Lecture Notes in Computer Science, Springer Berlin, Heidelberg. ISSN 0302-9743, Volume 4629/2007, “Text, Speech and Dialogue”. Pages 254-261. "A Study on Speech with Manifest Emotions,".
- J.Pittham and K.R. Scherer (1993). "Vocal Expression and Communication of Emotion", Handbook of Emotions, New York, New York: Guilford Press.
- Pell, M. D. (2005). "Prosody–face Interactions in Emotional Processing as Revealed by the Facial Affect Decision Task". Journal of Nonverbal Behavior 29 (4): 193–215. doi:10.1007/s10919-005-7720-z.
- Gleason, Jean Berko., and Nan Bernstein Ratner. "The Development of Language", 8th ed. Pearson, 2013.
- Elsevier. (2009). "Mosby's Medical Dictionary" 8th edition.
- McPartland J, Klin A (2006). "Asperger's syndrome". Adolesc Med Clin 17 (3): 771–88. doi:10.1016/j.admecli.2006.06.010. PMID 17030291.
- Miller, Lisa A; Collins, Robert L; Kent, Thomas A (2008). "Language and the modulation of impulsive aggression.". The Journal of neuropsychiatry and clinical neurosciences 20 (3): 261–73. doi:10.1176/appi.neuropsych.20.3.261. PMID 18806230.
- NESPOR, Marina. Prosody: an interview with Marina Nespor ReVEL, vol. 8, n. 15, 2010.
- Nolte, John. The Human Brain 6th Edition
- Lessons in Prosody (from the University of Freiburg, preserved by the Internet Archive)
- Prosody on the Web - (a tutorial on prosody)