Voice analysis

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Voice analysis is the study of speech sounds for purposes other than linguistic content, such as in speech recognition. Such studies include mostly medical analysis of the voice (phoniatrics), but also speaker identification. More controversially, some believe that the truthfulness or emotional state of speakers can be determined using voice stress analysis or layered voice analysis.

Typical voice problems[edit]

A medical study of the voice can be, for instance, analysis of the voice of patients who have had a polyp removed from their vocal cords through an operation. Objective evaluation of voice quality improvement requires some way to measure of voice quality. An experienced voice therapist can quite reliably evaluate the voice, but this requires extensive training and is still subjective.

Another active research topic in medical voice analysis is vocal loading evaluation. The vocal cords of a person who speaks for an extended time suffer from tiring—that is, the process of speaking exerts a load on the vocal cords and tires the tissue. Among professional voice users (e.g., teachers, sales people) this tiring can cause voice failures and sick leaves. To evaluate these problems, vocal loading must be objectively measured.

Analysis methods[edit]

Voice problems that require voice analysis most commonly originate from the vocal folds or the laryngeal musculature that controls them, since the folds are subject to collision forces with each vibratory cycle and to drying from the air being forced through the small gap between them, and the laryngeal musclature is intensely active during speech or singing and is subject to tiring. However, dynamic analysis of the vocal folds and their movement is physically difficult. The location of the vocal folds effectively prohibits direct, invasive measurement of movement. Less invasive imaging methods such as x-rays or ultrasounds do not work because the vocal cords are surrounded by cartilage, which distorts image quality. Movements in the vocal cords are rapid, fundamental frequencies are usually between 80 and 300 Hz, thus preventing usage of ordinary video. Stroboscopic, and high-speed videos[citation needed] provide an option but to see the vocal folds, a fiberoptic probe leading to the camera must be positioned in the throat, which makes speaking difficult. In addition, placing objects in the pharynx usually triggers a gag reflex that stops voicing and closes the larynx. In addition, stroboscopic imaging is only useful when the vocal fold vibratory pattern is closely periodic.

The most important indirect methods are currently inverse filtering of either microphone or oral airflow recordings and electroglottography (EGG). In inverse filtering, the speech sound (the radiated acoustic pressure waveform, as obtained from a microphone) or the oral airflow waveform from a circumferentially vented (CV) mask is recorded outside the mouth and then filtered by a mathematical method to remove the effects of the vocal tract[citation needed]. This method produces an estimate of the waveform of the glottal airflow pulses, which in turn reflect the movements of the vocal folds. The other kind of noninvasive indirect indication of vocal fold motion is the electroglottography, in which electrodes placed on either side of the subject's throat at the level of the vocal folds record the changes in the conductivity of the throat according to how large a portion of the vocal folds are touching each other. It thus yields one-dimensional information of the contact area. Neither inverse filtering nor EGG are sufficient to completely describe the complex 3-dimensional pattern of vocal fold movement, but can provide useful indirect evidence of that movement.

Another way to conduct voice analysis is to look at voice characteristics. Some characteristics of voice are phonation, pitch, loudness, and rate. These characteristics can be used to evaluate a person's voice and can aid in the voice analysis process. Phonation is typically tested by looking at different types of data collected from a person such as words with long vowels, words with many phonemes, or just typical speech. A person's pitch can be evaluated by making the person produce the highest and lowest sounds they can, as well as sounds in between. A keyboard can be used to aid in this process. Loudness is valuable to look at because for certain people, loudness affects the way they produce certain sounds. Some people need to speaker louder for certain phonemes in comparison to others just so they can produce them. This can be tested by asking the person to use the same amount of loudness while singing a scale. Rate is also important because it looks at how fast or slow a person speaks.[1]

Use in forensics[edit]

Forensic voice analysis is currently being utilized in a broad range of domains, including criminal cases that involve murder, rape, drug dealing, bomb threats, and terrorism. Oftentimes voice might serve as the sole clue for police and forensic analysts in identifying criminals, which has led to voice analysis becoming one of the newest and most developing branches of this old science of analyzing crime scenes. Voice identification is a particularly predominant component of voice analysis, which uses a specially designed software that takes the recording in question, as well as a recording of a known person's voice and then compares the two utilizing a series of three tests.[2] Typically, a recording of at least seven seconds is required for optimally accurate results to be achieved. The software conducts a spectrograph analysis, followed by an average pitch analysis, and lastly a statistical analysis that includes a compiled database with millions of voices. After running the voice sample through the program, a percentage from 0 to 100 is generated as far as the likelihood that the two voices are the same. The forensic analyst will then make comparisons on the accent, syntax, and breathing patterns of the recordings, an analysis that the software is not yet able to make. This voice analysis technique has gained much notoriety for its use in the Trayvon Martin case, where a recording of a call made to the police was analyzed to determine if background screams came from George Zimmerman or from Martin.[3]

Forensic Voice[edit]

Professions like Phonetics, Speech-language pathology, voice teachers and musicians deal with different aspects of voice. Experts in forensic voice analyse recordings by examining transmitted and stored speech, enhancing it and decoding it for criminal investigations, court trials, and federal agencies. The first part is to make the speech in a recording comprehensible, since in different cases recording or samples can come from surveillance, phone calls, ‘bugs’, or other technology that does not have the best sound quality. Speech enhancement filters any noise that interferes with the speech or the speaker. These disruptions can come from the environment, noises such as the wind, and other rustling noise, they can also come from another speaker or other voices, any sudden sound like a car horn, or it can come from the equipment and technology that is being used to record. The adjustments are done in a digital copy, as to not alter the original recording. Spectrogram, or waveform are useful to visualize areas of concern. The second part involves speech decoding where a background in language is necessary, like that of a phonetician. Anyone working with voice, will be able to analyse the characteristics of a voice, for example accents and dialects, any voice disguise, stress in speech, tones of voice, multiple speakers, and voice affected by intoxication, fatigue, and bad health. Reports are made to include detailed information, if there is a section of the recording that is not comprehensible or is inaudible, an explanation of what was happening (in the recording) and what is missing needs to be included. [4]

Speaker Identification[edit]

Voice analysis has a role in speaker identification, this is when the identity of a speaker is unknown, and has to be identify from an array of other voices or suspects when pertaining to a crime investigation or court trial. [4] Proper identification of speaker and voices particularly for criminal cases depend on a list of factors, like familiarity, exposure, delay, tone of voice, voice disguising, and accents. Familiarity with a speaker increases the chances of properly identifying a voice, and distinguishing it. The amount of exposure to a voice also aids in correctly identifying a voice, even if it is an unfamiliar one. A hearer that listen to a longer utterance or was exposed to a voice more often is better at recognizing a voice, than someone who perhaps was only able to hear one word. A delay between the time of hearing a voice and the time of identifying the speaker also decreases the prospect of identifying the correct speaker. The tone of voice affects the ability to identify the right speaker. If the tone does not match that of the speaker at the time of comparison, it will prove to be more difficult to analyse. Disguise of the voice, for example when a speaker is whispering, will also hinder the ability to accurately match and identify the speaker. In some cases, individuals who speak the same language as the speaker whose voice is being analysed will have an easier time identifying them because of the accent and stress of the voice. [5]

See also[edit]


  1. ^ Hapner, Edie; Stemple, Joseph (2014). Voice Therapy: Clinical Case Studies. Plural Publishing.
  2. ^ "Voice analysis - an overview - ScienceDirect Topics". www.sciencedirect.com.
  3. ^ "How Audio Forensics Reveals Voices' Secrets".
  4. ^ a b Harnsberger, James D.; Bahr, Ruth Huntley; Hollien, Harry (2014-03-01). "Issues in Forensic Voice". Journal of Voice. 28 (2): 170–184. doi:10.1016/j.jvoice.2013.06.011. ISSN 0892-1997.
  5. ^ Solan Lawrence M., Peter M. Tiersma (2005). Speaking of Crime. Chicago: The university of Chicago Press. pp. 127–136. ISBN 0226767930.

External links[edit]