Imagined speech

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Imagined speech (silent speech or covert speech) is thinking in the form of sound – “hearing” one’s own voice silently to oneself, without the intentional movement of any extremities such as the lips, tongue, or hands.[1] Logically, imagined speech has been possible since the emergence of language, however, the phenomenon is most associated with the signal processing[2] and detection within electroencephalograph (EEG) data[3] as well as data obtained using alternative non-invasive, brain–computer interface (BCI) devices.[4]


In 2008, the US Defense Advanced Research Projects Agency (DARPA) provided a $4 million grant to the University of California (Irvine), with the intent of providing a foundation for synthetic telepathy. According to DARPA, the project “will allow user-to-user communication on the battlefield without the use of vocalized speech through neural signals analysis. The brain generates word-specific signals prior to sending electrical impulses to the vocal cords. These imagined speech signals would be analyzed and translated into distinct words allowing covert person-to-person communication.”[4]

DARPA's program outline has three major goals:[4]

  • To attempt to identify EEG patterns unique to individual words
  • To ensure these patterns are common to different users to avoid extensive device training
  • To construct a prototype that would decode the signals and transmit them over a limited range

Methods for detection[edit]

The process for analyzing subjects' silent speech is composed of recording subjects’ brain waves, and then using a computer to process the data and determine the content of the subjects' covert speech.


Subject neural patterns (brain waves) can be recorded using BCI devices;[2] currently, use of non-invasive devices,[1] specifically the EEG, is of greater interest to researchers than invasive and partially invasive types. This is because non-invasive types pose the least risk to subject health;[4] EEG's have attracted the greatest interest because they offer the most user-friendly approach in addition to having far less complex instrumentation than that of functional magnetic resonance imaging (fMRI’s),[4] another commonly used non-invasive BCI.[2]


The first step in processing non-invasive data is to remove artifacts such as eye movement and blinking, as well as other electromyographic activity.[3] After artifact-removal, a series of algorithms is used to translate raw data into the imagined speech content.[1] Processing is also intended to occur in real-time—the information is processed as it is recorded, which allows for near-simultaneous viewing of the content as the subject imagines it.


Presumably, “thinking in the form of sound” recruits auditory and language areas whose activation profiles may be extracted from the EEG, given adequate processing. The goal is to relate these signals to a template that represents “what the person is thinking about”. This template could for instance be the acoustic envelope (energy) timeseries corresponding to sound if it were physically uttered. Such mapping from EEG to stimulus is an example of neural decoding techniques.

A major problem however is the many variations that the very same message can have under diverse physical conditions (speaker or noise, for example). Hence one can have the same EEG signal, but it is uncertain, at least in acoustic terms, what stimulus to map it to. This in turn makes it difficult to train the relevant decoder.

This process could instead be approached using higher-order (‘linguistic’) representations of the message. The mappings to such representations are non-linear and can be heavily context-dependent, therefore further research may be necessary. Nevertheless, it is known that an 'acoustic' strategy can still be maintained by pre-setting a “template” by making it known to the listener exactly what message to think about, even if passively, and in a non-explicit form. In these circumstances it is possible to partially decode the acoustic envelope of speech message from neural timeseries if the listener is induced to think in the form of sound[5].


In detection of other imagined actions, such as imagined physical movements, greater brain activity occurs in one hemisphere over the other. This presence of asymmetrical activity acts as a major aid in identifying the subject's imagined action. In imagined speech detection however, equal levels of activity commonly occur in both the left and right hemispheres simultaneously. This lack of lateralization demonstrates a significant challenge in analyzing neural signals of this type.[2]

Another unique challenge is a relatively low signal-to-noise ratio (SNR) in the recorded data. A SNR represents the amount of meaningful signals found in a data set, compared to the amount of arbitrary or useless signals present in the same set. Artifacts present in EEG data are just one of many significant sources of noise.[1]

To further complicate matters, the relative placement of EEG electrodes will vary amongst subjects. This is because the anatomical details of people's heads will differ; therefore, the signals recorded will vary in each subject, regardless of individuals-specific imagined speech characteristics.[3]

Limitations for practical communication[edit]

Foremost, EEG use requires meticulously securing electrodes onto a subject’s head; the electrodes are connected through a web of wires tethered to a CPU. So, creating an everyday, user-friendly communicator requires a further development of compacting EEGs and their signal-processors into an easy-to-use, lightweight, and fashionable device. (e.g. a headband with Wi-Fi or Bluetooth)

In addition, current detection methods cannot distinguish between more than two signals (i.e. /ba/ or /ku/, yes or no). Therefore, a significant advancement in EEG processing algorithms is still required. This may suggest that an overall understanding of human-information-processing patterns must be better understood first, as it would offer insight into classifying word-specific neural-patterns common to all people.

See also[edit]


  1. ^ a b c d Brigham, K.; Vijaya Kumar, B.V.K., "Imagined Speech Classification with EEG Signals for Silent Communication: A Preliminary Investigation into Synthetic Telepathy", June 2010
  2. ^ a b c d Brigham, K.; Vijaya Kumar, B.V.K., "Subject Identification from Electroencephalogram (EEG) Signals During Imagined Speech", September 2010.
  3. ^ a b c A. Porbadnigk; M. Wester; Schultz, T., "EEG-Based Speech Recognition: Impact of Temporal Effects", 2009.
  4. ^ a b c d e Robert Bogue, "Brain-computer interfaces: control by thought" Industrial Robot: An International Journal, Vol. 37 Iss: 2, pp.126 – 132, 2010
  5. ^ Cervantes Constantino, F; Simon, JZ (2018). "Restoration and Efficiency of the Neural Processing of Continuous Speech Are Promoted by Prior Knowledge". Frontiers in Systems Neuroscience. 12 (56). doi:10.3389/fnsys.2018.00056.