Silent speech interface

Silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. As such it is a type of electronic lip reading. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis.^[1]

Information sources

Silent speech interface systems have been created using ultrasound and optical camera input of tongue and lip movements.^[2] Electromagnetic devices are another technique for tracking tongue and lip movements.^[3] The detection of speech movements by electromyography of speech articulator muscles and the larynx is another technique.^[4]^[5] Another source of information is the vocal tract resonance signals that get transmitted through bone conduction called non-audible murmurs.^[6] They have also been created as a brain–computer interface using brain activity in the motor cortex obtained from intracortical microelectrodes.^[7]

Uses

Such devices are created as aids to those unable to create the sound phonation needed for audible speech such as after laryngectomies.^[8] Another use is for communication when speech is masked by background noise or distorted by self-contained breathing apparatus. A further practical use is where a need exists for silent communication, such as when privacy is required in a public place, or hands-free data silent transmission is needed during a military or security operation.^[2]^[9]

In 2002, the Japanese company NTT DoCoMo announced it had created a silent mobile phone using electromyography and imaging of lip movement. "The spur to developing such a phone," the company said, "was ridding public places of noise," adding that, "the technology is also expected to help people who have permanently lost their voice."^[10] The feasibility of using silent speech interfaces for practical communication has since then been shown.^[11]

In fiction

The decoding of silent speech using a computer played an important role in Arthur C. Clarke's story and Stanley Kubrick's associated film 2001: A Space Odyssey (film). In this, HAL 9000, a computer controlling spaceship Discovery One, bound for Jupiter, discovers a plot to deactivate it by the mission astronauts Dave Bowman and Frank Poole through lip reading their conversations.^[12]

In Orson Scott Card’s series (including Ender’s Game), the artificial intelligence can be spoken to while the protagonist wears a movement sensor in his jaw, enabling him to converse with the AI without making noise. He also wears an ear implant.

References

^ Denby B, Schultz T, Honda K, Hueber T, Gilbert J.M., Brumberg J.S. (2010). Silent speech interfaces. Speech Communication 52: 270–287. doi:10.1016/j.specom.2009.08.002
^ ^a ^b Hueber T, Benaroya E-L, Chollet G, Denby B, Dreyfus G, Stone M. (2010). Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication, 52 288–300. doi:10.1016/j.specom.2009.11.004
^ Wang, J., Samal, A., & Green, J. R. (2014). Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph, the 5th ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, Baltimore, MD, 38-45.
^ Jorgensen C, Dusan S. (2010). Speech interfaces based upon surface electromyography. Speech Communication, 52: 354–366. doi:10.1016/j.specom.2009.11.003
^ Schultz T, Wand M. (2010). Modeling Coarticulation in EMG-based Continuous Speech Recognition. Speech Communication, 52: 341-353. doi:10.1016/j.specom.2009.12.002
^ Hirahara T, Otani M, Shimizu S, Toda T, Nakamura K, Nakajima Y, Shikano K. (2010). Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, 52:301–313. doi:10.1016/j.specom.2009.12.001
^ Brumberg J.S., Nieto-Castanon A, Kennedy P.R., Guenther F.H. (2010). Brain–computer interfaces for speech communication. Speech Communication 52:367–379. 2010 doi:10.1016/j.specom.2010.01.001
^ Deng Y., Patel R., Heaton J. T., Colby G., Gilmore L. D., Cabrera J., Roy S. H., De Luca C.J., Meltzner G. S.(2009). Disordered speech recognition using acoustic and sEMG signals. In INTERSPEECH-2009, 644-647.
^ Deng Y., Colby G., Heaton J. T., and Meltzner HG. S. (2012). Signal Processing Advances for the MUTE sEMG-Based Silent Speech Recognition System. Military Communication Conference, MILCOM 2012.
^ Fitzpatrick M. (2002). Lip-reading cellphone silences loudmouths. New Scientist.
^ Wand M, Schultz T. (2011). Session-independent EMG-based Speech Recognition. Proceedings of the 4th International Conference on Bio-inspired Systems and Signal Processing.
^ Clarke, Arthur C. (1972). The Lost Worlds of 2001. London: Sidgwick and Jackson. ISBN 0-283-97903-8.

[1] Denby B, Schultz T, Honda K, Hueber T, Gilbert J.M., Brumberg J.S. (2010). Silent speech interfaces. Speech Communication 52: 270–287. doi:10.1016/j.specom.2009.08.002

[Hueber-2] Hueber T, Benaroya E-L, Chollet G, Denby B, Dreyfus G, Stone M. (2010). Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication, 52 288–300. doi:10.1016/j.specom.2009.11.004

[3] Wang, J., Samal, A., & Green, J. R. (2014). Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph, the 5th ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, Baltimore, MD, 38-45.

[4] Jorgensen C, Dusan S. (2010). Speech interfaces based upon surface electromyography. Speech Communication, 52: 354–366. doi:10.1016/j.specom.2009.11.003

[5] Schultz T, Wand M. (2010). Modeling Coarticulation in EMG-based Continuous Speech Recognition. Speech Communication, 52: 341-353. doi:10.1016/j.specom.2009.12.002

[6] Hirahara T, Otani M, Shimizu S, Toda T, Nakamura K, Nakajima Y, Shikano K. (2010). Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, 52:301–313. doi:10.1016/j.specom.2009.12.001

[7] Brumberg J.S., Nieto-Castanon A, Kennedy P.R., Guenther F.H. (2010). Brain–computer interfaces for speech communication. Speech Communication 52:367–379. 2010 doi:10.1016/j.specom.2010.01.001

[Deng-8] Deng Y., Patel R., Heaton J. T., Colby G., Gilmore L. D., Cabrera J., Roy S. H., De Luca C.J., Meltzner G. S.(2009). Disordered speech recognition using acoustic and sEMG signals. In INTERSPEECH-2009, 644-647.

[Deng2-9] Deng Y., Colby G., Heaton J. T., and Meltzner HG. S. (2012). Signal Processing Advances for the MUTE sEMG-Based Silent Speech Recognition System. Military Communication Conference, MILCOM 2012.

[10] Fitzpatrick M. (2002). Lip-reading cellphone silences loudmouths. New Scientist.

[11] Wand M, Schultz T. (2011). Session-independent EMG-based Speech Recognition. Proceedings of the 4th International Conference on Bio-inspired Systems and Signal Processing.

[12] Clarke, Arthur C. (1972). The Lost Worlds of 2001. London: Sidgwick and Jackson. ISBN 0-283-97903-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Information sources

Uses

In fiction

See also

References