The term, however, is used to refer to two separate concepts.
- Voicing can refer to the articulatory process in which the vocal cords vibrate. This is its primary use in phonetics to describe phones, which are particular speech sounds.
- It can also refer to a classification of speech sounds that tend to be associated with vocal cord vibration but need not actually be voiced at the articulatory level. This is the term's primary use in phonology when describing phonemes, or in phonetics when describing phones.
At the articulatory level, a voiced sound is one in which the vocal cords vibrate, and a voiceless sound is one in which they do not. For example, voicing accounts for the difference between the pair of sounds associated with the English letters "s" and "z". The two sounds are transcribed as [s] and [z] to distinguish them from the English letters, which have several possible pronunciations depending on context. If one places the fingers on the voice box (i.e. the location of the Adam's apple in the upper throat), one can feel a vibration when one pronounces zzzz, but not when one pronounces ssss. (For a more detailed, technical explanation, see modal voice and phonation.) In most European languages, with a notable exception being Icelandic, vowels and other sonorants (consonants such as m, n, l, and r) are modally voiced.
The International Phonetic Alphabet has distinct letters for many voiceless and voiced pairs of consonants (the obstruents), such as [p b], [t d], [k ɡ], [q ɢ]. In addition, there is a diacritic for voicedness: ⟨◌̬⟩. Diacritics are typically used with letters for prototypically voiceless sounds.
In Unicode, the symbols are encoded U+032C ◌̬ combining caron below (HTML
̬) and U+0325 ◌̥ combining ring below (HTML
The distinction between the articulatory use of voice and the phonological use rests on the distinction between phone (represented between square brackets) and phoneme (represented between slashes). The difference is best illustrated by a rough example. The English word nods is made up of a sequence of phonemes, represented symbolically as /nɒdz/, or the sequence of /n/, /ɒ/, /d/, and /z/. Each of these symbols is an abstract representation of a phoneme. This awareness is an inherent part of speakers' mental grammar that allows them to recognize words.
However, phonemes are not themselves sounds. Rather, phonemes are, in a sense, converted to phones before being spoken. The /z/ phoneme, for instance, can actually be pronounced as either the [s] phone or the [z] phone because /z/ is frequently devoiced in fluent speech, especially at the end of an utterance. The sequence of phones for nods might be transcribed as [nɒts] or [nɒdz], depending on the presence or strength of this devoicing. While the [z] phone has articulatory voicing, the [s] phone does not.
What complicates the matter is that, for English, consonant phonemes are classified as either voiced or voiceless even though this is not the primary distinctive feature between them. Still, the classification is used as a stand-in for phonological processes, such as vowel lengthening that occurs before voiced consonants but not before unvoiced consonants, or vowel quality changes (i.e. the sound of the vowel) in some dialects of English that occur before unvoiced but not voiced consonants. These processes allow English speakers to continue to perceive difference between voiced and voiceless consonants when the devoicing of the former would otherwise make them sound identical to the latter.
English has four pairs of fricative phonemes which can be divided into a table by place of articulation and voicing. The voiced fricatives can readily be felt to have voicing throughout the duration of the phone, especially when occurring between vowels.
|Pronounced with the lower lip against the teeth:||[f] (fan)||[v] (van)|
|Pronounced with the tongue against the teeth:||[θ] (thin, thigh)||[ð] (then, thy)|
|Pronounced with the tongue near the gums:||[s] (sip)||[z] (zip)|
|Pronounced with the tongue bunched up:||[ʃ] (Confucian)||[ʒ] (confusion)|
However, in the class of consonants called stops, such as /p, t, k, b, d, ɡ/, the contrast is more complicated for English. The "voiced" sounds do not typically feature articulatory voicing throughout the sound. The difference between the unvoiced stop phonemes and the voiced stop phonemes is not just a matter of whether articulatory voicing is present or not. Rather, it includes when voicing starts (if at all), the presence of aspiration (airflow burst following the release of the closure), and the duration of the closure and aspiration.
English voiceless stops are generally aspirated at the beginning of a stressed syllable while, in the same context, their voiced counterparts are only voiced partway through. In more narrow phonetic transcription, these the voiced symbols are maybe used only to represent the presence of articulatory voicing, while aspiration is represented with a superscript h.
|Pronounced with the lips closed:||[p] (pin)||[b] (bin)|
|Pronounced with the tongue near the gums:||[t] (ten)||[d] (den)|
|Pronounced with the tongue bunched up:||[tʃ] (chin)||[dʒ] (gin)|
|Pronounced with the back of the tongue against the palate:||[k] (coat)||[ɡ] (goat)|
When these consonants come at the end of a syllable, however, what distinguishes them is quite different; voiceless phonemes are typically unaspirated, glottalized and the closure itself may not even be released, making it sometimes difficult to hear the difference between, for example, light and like. However, auditory cues remain to distinguish between voiced and voiceless sounds, such as what has been described above, e.g. the length of the preceding vowel.
Other English sounds, the vowels and sonorants, are normally fully voiced. However, they may be devoiced in certain positions, especially after aspirated consonants, as in Copernicus, tree, and play, where the voicing is delayed to the extent of missing the sonorant or vowel altogether.
Degrees of voicing
There are two variables to degrees of voicing: intensity (discussed under phonation), and duration (discussed under voice onset time). When a sound is described as "half voiced" or "partially voiced", it is not always clear whether that means that the voicing is weak (low intensity), or if the voicing only occurs during part of the sound (short duration). In the case of English, it is the latter.
Juǀʼhoansi and some of the neighboring languages are typologically unusual in having contrastive partially voiced consonants: They have aspirate and ejective consonants, which are normally incompatible with voicing, in voiceless and voiced pairs. These consonants start out voiced, but become voiceless partway through, allow normal aspiration or ejection. They are [b͡pʰ, d͡tʰ, d͡tsʰ, d͡tʃʰ, ɡ͡kʰ] and [d͡tsʼ, d͡tʃʼ], plus a similar series of clicks.
Voice and tenseness
There are languages with two sets of contrasting obstruents that are labelled /p t k f s x …/ vs. /b d ɡ v z ɣ …/ even though there is no involvement of voice (or voice onset time) in that contrast. This happens, for instance, in several Alemannic German dialects. Because voice is not involved, this is explained as a contrast in tenseness, called a fortis and lenis contrast.
There is a hypothesis that the contrast between fortis and lenis consonants is related to the contrast between voiceless and voiced consonants, a relation based on sound perception as well as on sound production, where consonant voice, tenseness and length are but different manifestations of a common sound feature.