Speech-to-text reporter

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A speech-to-text reporter (STTR), also known as a captioner, is a person who listens to what is being said and inputs it, word for word (verbatim), using an electronic shorthand keyboard, speech recognition software, or a CAT (Computer Aided Transcription) software system. Their keyboard or speech recognition software is linked to a computer, which converts this information to properly spelled words.[1] The reproduced text can then be read by deaf or hard-of-hearing people, English language learners, or persons with auditory processing disabilities.


Many STTRs began their careers as court reporters. In the courts, the system is used to capture proceedings and provide transcripts when requested. The skills developed in this area have also made them invaluable in the field of communication access for deaf or late-deafened people, as they are used to producing work with an extremely high degree of accuracy and provide the deaf consumers of captioning individual autonomy through Communication Access Realtime Translation (CART).[citation needed]


Real-time captioning includes stenographic, voice writing, and automatic speech recognition. Trained, highly skilled and experienced real-time writers, whether by stenographic or voice writing methods, achieve very high accuracy rates of greater than 98 percent accuracy at speeds commonly as high as 300 words per minute. An STTR expects to reach consistent levels of accuracy of 98.5% and above.[citation needed]

Voice writers produce many of the same products as their stenotype/stenographic colleagues, including transcripts in electronic and printed formats. Stenographic real-time verbatim reporters connect their stenography machines to laptops, to real-time viewer programs, and can provide attorneys or other clients with real-time display of the text of proceedings, as well as computer files immediately upon completion of the sessions.[citation needed]

Voice writing[edit]

Speech-to-text voice writer reporters are similar to early mask writers who would respeak spoken language into a sound-canceling cone. Voice writing is a method used for court reporting and by some medical transcriptionists. Using the voice writing method, a court reporter speaks directly into a stenomask or voice silencer—a hand-held mask containing one or two microphones and voice-dampening materials. A voice writing system consists of a stenomask, an external sound digitizer, a laptop, speech recognition software, and CAT software.[citation needed]

A voice writer's words go through the mask's cable to an external USB digital signal processor, and then into the computer's speech recognition engine, for conversion into streaming text. Real-time reporters can send the streamed text to a) the Internet; b) a computer file; c) a television station for subtitling; d) to an end-user who is reading the captions via their laptop, tablet, smart phone, or e) software which formats the results in a way most familiar to judges, attorneys, or subtitling consumers.[citation needed]


Palantype and stenotype[edit]

There are two types of chorded keyboard widely used in speech-to-text reporting:

  1. the Palantype system;
  2. the stenotype system.

Both are used in the UK, but in the US the predominant keyboard type used is the 23-key, advanced technology, computerized stenotype machine used by stenographers. Hence STTRs are also sometimes referred to as palantypists and stenographers. Unlike a QWERTY keyboard, each letter in a word is not pressed individually, but are pressed together like a chord on a piano keyboard. Several keys will be pressed at once in a single "stroke" which represents whole syllables or words, or shortforms called "briefs" which represent long words or phrases.[citation needed]


Specially designed computer software then converts phonetic strokes from a chorded keyboard back into English which can then be displayed for someone to read. The computer software can use a pre-programmed vocabulary specific to the context, information that matches syllable clusters to written forms, and may suggest alternative captions from which the STTR chooses. Errors occur from the STTR mishearing the words and from the need for the STTR to make a decision before an ambiguous statement is made clear by the context of subsequent words.[citation needed]

What will a service user see on the screen?[edit]

Every word that is spoken will appear on the screen in an accessible format, although one can request a change in the color and font size. As well as every word spoken, the words "NEW SPEAKER:" or ">>" will typically appear to denote when the speaker changes. If one sends the STTR (voice writer/palantypist/stenographer) the names of people attending the conference or meeting before the event, they too can be programmed into the computer, making it easier for one to recognize who is speaking. Other phrases, in brackets, may also appear, such as {laughter} or {applause}, to denote relevant environmental sounds.[citation needed]

Occasional mondegreen errors may be seen in closed-captions when the computer software fails to distinguish where a word break occurs in the syllable stream. For example, a news report of a "grand parade" might be captioned as a "grandpa raid". Mondegreens in this context arise from the need for captions to keep up with the fast pace of live communication.[citation needed]


In order to become an STTR one needs extensive training: For Palantype/Stenography, formal training is typically two years on the associated hardware and software, plus at least an additional two years of on-the-job speed-building, dictionary/vocabulary building, improving accuracy, and gaining experience.

For voice writing, the formal training is somewhat shorter. Two years to learn voice writing on the associated hardware and software, required practice, building up speed, dictionary/vocabulary, improving accuracy, and gaining experience.

Only then is one ready to undertake the USA and/or UK Examinations for Certification. In the USA, NCRA and NVRA offer certifications that demonstrate the necessary training.

There are many levels of certification. NCRA and NVRA certify stenographic or voice writers, respectively, as court reporters, realtime reporters, CART providers, and broadcast captioners. NCRA or NVRA certification clearly demonstrates that the reporter has attained a high level of professionalism and skill.

The NCRA and NVRA certification testing programs are available to their respective membership. Tests are held at regular intervals throughout the year in various locations across the country.[2]

In the UK, Unitised CACDP Examinations and membership with the CACDP Register, confirms that one has reached the required minimum standard. The majority of Registered STTRs are also Members of the Association of Verbatim Speech-to-Text Reporters.

The professional association for STTRs is the Association of Verbatim Speech-to-Text Reporters. The Council for Advanced Communication with Deaf People and the Royal National Institute for the Deaf also give more information about STTRs.

See also[edit]


  1. ^ "Closed Captioning Web". Captions.org. 2006-02-13. Retrieved 2009-06-11.
  2. ^ "NVRA certifications". nvra.org. Retrieved 2017-01-05.

External links[edit]