Speech-to-text reporter

From Wikipedia, the free encyclopedia

A speech-to-text reporter (STTR), also known as a captioner, is a person who listens to what is being said and inputs it, word for word (verbatim), as properly written texts. Many captioners use tools (such as shorthand keyboard, speech recognition software, or a CAT (Computer Aided Transcription) software system) which commonly converts verbally communicated information into written words to be composed as a text.[1] The reproduced text can then be read by deaf or hard-of-hearing people, language learners, or people with auditory processing disabilities.


STTRs often start their careers as court reporters, utilizing their skills to capture proceedings and provide transcripts upon request. The expertise acquired in court reporting has made them crucial in providing communication access for deaf or late-deafened individuals, especially through Communication Access Realtime Translation (CART).[citation needed]


Real-time captioning encompasses stenographic, voice writing, and automatic speech recognition methods. Trained and experienced real-time writers, whether using stenographic or voice writing, can achieve accuracy rates exceeding 98% at speeds of up to 300 words per minute. An STTR typically aims for consistent accuracy levels of 98.5% or higher.[citation needed]

Voice writing[edit]

Voice writers echo spoken language into a stenomask or voice silencer, which consists of a hand-held mask equipped with microphones and voice-dampening materials. This setup connects to an external sound digitizer, a laptop, and utilizes both speech recognition and CAT software.[citation needed] The words spoken by a voice writer are channeled through the mask, converted by the computer's speech recognition engine into streaming text, and can be disseminated in various formats including internet streaming, subtitling, or direct displays for end-users.


Palantype and stenotype[edit]

Two major chorded keyboards used in speech-to-text reporting are the Palantype and stenotype systems. While both systems are used in the UK, the US predominantly employs the 23-key, advanced technology, computerized stenotype machine. STTRs might also be termed palantypists or stenographers. Instead of pressing each letter individually like on a QWERTY keyboard, these systems use chords, where multiple keys are pressed simultaneously in a "stroke" to represent syllables, words, or phrases.


Stenographers utilize specialized software to convert phonetic strokes from their keyboards into English text. This software employs a context-specific vocabulary and algorithms to match syllable clusters to written forms. Errors may arise from STTRs mishearing words or from ambiguities in the statement that are only clarified by subsequent context.[citation needed]

What will a service user see on the screen?[edit]

Every word that is spoken will appear on the screen in an accessible format, although one can request a change in the color and font size. In addition to every word spoken, the words "NEW SPEAKER:" or ">>" will typically appear to denote when the speaker changes. If one sends the STTR (voice writer/palantypist/stenographer) the names of people attending the conference or meeting before the event, they too can be programmed into the computer, making it easier for one to recognize who is speaking. Other phrases, in brackets, may also appear, such as {laughter} or {applause}, to denote relevant environmental sounds.[citation needed]

Occasional mondegreen errors may be seen in closed-captions when the computer software fails to distinguish where a word break occurs in the syllable stream. For example, a news report of a "grand parade" might be captioned as a "grandpa raid". Mondegreens in this context arise from the need for captions to keep up with the fast pace of live communication.[citation needed]


To become an STTR requires rigorous training. For Palantype/Stenography, it involves two years of formal training on the relevant hardware and software, followed by another two years of on-the-job experience focusing on speed, vocabulary, accuracy, and context handling. Voice writing training has a similar structure but is slightly shorter in duration. Only after this comprehensive training are candidates eligible to undertake USA and/or UK certification exams. Numerous levels of certification exist, with bodies like NCRA and NVRA offering specific certifications to showcase a professional's proficiency and skill level.

See also[edit]


  1. ^ "Closed Captioning Web". Captions.org. 2006-02-13. Retrieved 2009-06-11.

External links[edit]