Text to speech in digital television

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Text to speech in digital television refers to digital television products that use speech synthesis (computer generated speech providing a product that “talks” to the end user) to enable access by blind or partially sighted people. By combining a digital television solution (a television, set-top box, personal video recorder or other type of receiver) with a speech synthesis engine, blind and partially sighted people are able to access information that is displayed to other users visually on the screen and therefore can operate the menus and electronic program guides of the receiver.

User need[edit]

Using an audiovisual medium involves obvious problems for certain groups of people with disabilities, notably individuals with sight or hearing loss. These problems can be split between interface accessibility barriers and impediments in using the content itself. Text-to-speech in television products is a feature that addresses interface accessibility barriers for blind and partially sighted people who are unable to use in the standard visual interface even where this has special features such as large fonts, magnifiers, adjustable colour schemes, etc.

Digital television solutions are often more complicated products compared to their analogue ancestors.[1] The ability to navigate many menus, to see on-screen program information and to browse electronic program guides or on-screen content listings to find out what is available to watch – these are all essential to using digital TV.

Policy makers across the world have recognized the importance of access to (digital) television:

  • Recital 64 of the EU’s Audiovisual Media Services Directive (AVMS) [2] states: "The right of persons with a disability and of the elderly to participate and be integrated in the social and cultural life of the Community is inextricably linked to the provision of accessible audiovisual media services."
  • The initial report of a European Commission study "Measuring progress of eAccessibility in Europe"[3] refers to television as one of a set of fields "that are now essential elements of social and economic life".
  • The United Nations Convention on the Rights of People with Disabilities[4] makes specific reference to television access services in Article 30(1) ("Participation in cultural life, recreation, leisure and sport"): "States Parties recognize the right of persons with disabilities to take part on an equal basis with others in cultural life, and shall take all appropriate measures to ensure that persons with disabilities: [...] b. Enjoy access to television programmes, films, theatre and other cultural activities, in accessible formats".


Text-to-speech software has been widely available for desktop computers since the 1990s and Moore’s Law increases in CPU and memory capabilities have contributed to making their inclusion in software and hardware solutions more feasible. In the wake of these trends in information technology, text-to-speech is finding its way into everyday consumer electronics. In addition to text-to-speech solutions for computers, there are now talking watches and clocks, calendars, thermometers, kitchen aids and many other products. Talking books have also been around for some time, and GPS navigation systems have become widely used as well.[5]

Organisations representing blind and partially sighted people are long-standing supporters of text-to-speech technology in consumer electronics. In the UK, the Royal National Institute of Blind People (RNIB) has been arguing for speaking radio and television products since the early years of the century and has supported manufacturers in creating such solutions.[6][7]

The Digital TV Group, the UK Industry association for Digital TV, first discussed the topic in 2007 and subsequently brought the industry together to write a technical specification for Text to Speech in the horizontal market in 2009. This formed part of the UK Government BERR Usability Action Plan.[8] When complete, this was submitted to Digital Europe for ETSI standardisation and also published as a White Paper. Subsequently this was incorporated in the U-Book - UK Digital TV Usability and Accessibility Guidelines, including Text to Speech.[9]

In 2010, two talking products for digital television came onto the market in the UK. The Sky Talker is an add-on for the Sky set top box. It provides talking features for programme and channel information and play back control. The Sky Talker is operated through the standard Sky remote control. In the same year, the Smart Talk Freeview (terrestrial digital broadcasting) set-top box was also launched onto the UK market. This is a Goodmans branded Freeview set top box, developed by a partnership between Harvard International Ltd and the RNIB. It was the first complete talking solution for digital television in the UK, including speaking of the Electronic Programme Guide, menus and providing spoken assistance during setup.

In Japan, both Panasonic and Mitsubishi Electric have been producing television and Blu-ray products since 2010. According to information compiled by the Japanese blindness organisation Lighthouse for the Blind there are some 70-odd products from Mitsubishi and a similar amount from Panasonic with talking features.[10]

Around 2011, in Spain a talking, Linux-based, set-top box solution using the free Festival text-to-speech engine, was distributed to blind and visually impaired people free of charge by the Ministry of Industry, Tourism and Trade. This product is however no longer available.

In 2012, Panasonic launched its Voice Guidance solution on the UK market.[11] This is a set of talking features for their 2012 Viera range (and beyond). Voice Guidance announces on-screen information and the most important menus and has support for reminders, recording and playback functions. It is available for Freesat and Freeview receivers. In creating its solution, Panasonic took into account advice from RNIB experts.[12]

Also in 2012, TVonics, a former UK digital video recorder maker, launched its talking PVR solution, a twin-tuner Freeview HD recorder based on the Ivona TTS engine which is widely lauded by disability groups for its high quality voice. The TVonics solution was essentially a software addition for its existing platform and can be deployed as a software upgrade to customers of existing products. TVonics went into administration in June 2012.[13] The RNIB acquired the core DVR IP including the text-to-speech system. The TVonics brand was bought by Peterborough-based Pulse-Eight.

Features of text-to-speech for television[14][edit]

As a primary purpose of text-to-speech in television products is to render these accessible to blind and partially sighted people, talking features ideally should cover all television operations, from the initial setup, over basic and advanced receiver functions to programming and playback. In practice, there are significant technical challenges, in particular with regard to dynamic information, interactive applications, catch-up and on-demand functions in Connected TVs and dialogue handling that mean that none of the current text-to-speech products cover 100% of all features through their talking interface.

The main principle in developing text-to-speech solutions for digital television products should be to create a talking interface that achieves functional equivalence of what a sighted user can do using the default (visual) interface. Specifically, the intention is that a person operating the solution via the text-to-speech system gets the same feedback and can perform the same tasks as someone doing this via the default interface (commonly the screen in combination with a remote control).

List of possible text-to-speech enabled features[edit]

  • Initial set up and configuration (for Connected TVs this could include the network configuration, including authentication to the home network).
  • Power cycle control (on, off, standby).
  • Announcing the currently showing channel and programme, plus the list of available channels.
  • Assistance and feedback for basic receiver functions such as change channel and volume control.
  • Speaking the Electronic Programme Guide (EPG) and assisting the user in navigation of the EPG and other lists of services and content, including browsing on-demand and catch-up content and previously recorded or downloaded content as well as user customisable lists (favourites etc.).
  • Spoken feedback for reporting and changing the state of access services (in particular Audio Description, see Support for Audio Description/Video Description).
  • Talking features in support of playback and recording, including managing the recording schedule.
  • Notification of pay-per-view and other restricted content, restrictions and conditions and control over these functions, including the authorisation mechanism.
  • Feedback and control for on-screen information banners, dialogues and menus (including modal and other out-of band prompts).

Interaction with interactive services and widgets.

Customisation of talking features[edit]

Different consumers have different profiles of abilities and preferences. This is also true for blind and partially sighted people using text-to-speech enabled television products. In addition, novice users tend to require more guidance in the early stages of using a product, whereas more advanced users will prefer to be able to navigate the system as efficiently as possible. Consequently, the text-to-speech part of a talking television solution should allow user control and customisation options over its functions:

  • Users should be able to set the volume level of the text-to-speech output independently of the main television sound level.
  • Users should also be able to adjust properties such as text-to-speech voice type, pitch and output speed.
  • Good implementations also allow the verbosity of what is spoken to be adjusted, from very verbose (usually for novice users) to only the bare essentials (useful for advanced users very familiar with the system).

Support for Audio Description/Video Description[edit]

As blind and partially sighted users are the main target group for text-to-speech in digital television, particular attention should be given to supporting those features of the product that are of most value to this consumer group.

In particular, the ability to control the Audio Description/Video Description related settings of the product is of great importance to these users in those countries where such services are available. Audio Description/Video Description provides an additional narrative describing visual actions or elements that a blind or partially sighted person would not see, but which are important in order to follow the story. Typically, the narration includes characters, scene changes, on screen text and other visual clues not otherwise included in the default sound stream.

Talking features in the product should support in full the menus and other controls relating to Audio Description/Video Description (which also includes advertising the availability of this access service for content when browsing the Electronic Programme Guide and other content inventories).

Digital Television products with text-to-speech support[edit]





United Kingdom[edit]

United States[edit]

Implementation guidance and standardisation[edit]

An early effort to capture the user requirements and define a functional specification was undertaken by the Digital TV Group (DTG) in the UK, who published a White Paper on the subject. This White Paper has since been subsumed into the publication UK Digital TV Usability and Accessibility Guidelines[15] (known as the U-Book). The same White Paper was also used as the basis for a discussion between disability user groups and DigitalEurope,[16] a European industry body for manufacturers of consumer equipment, on the topic of text-to-speech for television. The DigitalEurope work stream led to the International Electrotechnical Commission (IEC) setting up a project group (IEC 62731) to create an International Standard for text-to-speech in digital television. The first edition of the standard, IEC 62731:2013 was published officially as an International Standard in January 2013.[14] The Standard does not dictate implementation, but provides a functional description on how a text-to-speech enabled television product should behave and what should be spoken when.

External links[edit]


  1. ^ Danker, Daniel (2 Mar 2012). "Me and My TV - How Can we Connect?" (PDF). BBC Internet Blog. Retrieved 2013-02-17.
  2. ^ "Directive 65". 11 Dec 2007. on the coordination of certain provisions laid down by law, regulation or administrative action in Member States concerning the provision of audiovisual media services (Audiovisual Media Services Directive)
  3. ^ Kubitschke, Lutz; Cullen, Kevin; Meyer, Ingo, eds. (October 2007), "MeAC - Measuring Progress of eAccessibility in Europe" (PDF), Assessment of the Status of eAccessibility in Europe - Main Report, Bonn
  4. ^ United Nations (2006). "Convention on the Rights of Persons with Disabilities". United Nations. Retrieved 2013-02-17.
  5. ^ RNIB. "Top ten talking products". RNIB. Retrieved 2013-02-17.
  6. ^ "Digital TV Equipment: Vulnerable Consumer Requirements" (PDF), A Report by the Consumer Expert Group to Government and Digital UK, London: Consumer Expert Group, March 2006
  7. ^ RNIB (6 September 2012). "Are you really listening?". RNIB. Retrieved 2013-02-17.
  8. ^ "Usability Action Plan" (PDF).
  9. ^ "UK Digital TV Usability and Accessibility Guidelines, including Text to Speech".
  10. ^ NipponLighthouse. 日本ライトハウス情報文化センター - 音声読み上げ機能付き地デジテレビ 品番リスト [A list of models with digital TV and text-to-speech support] (in Japanese). Retrieved 2013-02-17.
  11. ^ Panasonic (27 March 2012). "Panasonic Launches Range of Talking TVs". Retrieved 2013-02-17.
  12. ^ RNIB (10 July 2012). "Panasonic television with Voice Guidance". Retrieved 2013-02-17. With advice from RNIB experts
  13. ^ Whitfield, Nigel (27 June 2012). "Administrator eyes DVR firesale after TVonics collapse - Freeview HD recorder firm founders". The Register. Retrieved 2013-02-17.
  14. ^ a b International Electrotechnical Commission (29 Jan 2013). "IEC 62731 ed1.0: Text-to-speech for television - General requirements". International Electrotechnical Commission. Retrieved 2013-02-17.
  15. ^ "Books and White Papers" (PDF). UK Digital TV Usability and Accessibility Guidelines, including Text to Speech. Digital TV Group. September 2011. Retrieved 2013-02-17.
  16. ^ "Industry Self-Commitment" (PDF), To Improve The Accessibility of Digital TV Receiving Equipment sold in The European Union, Brussels: DigitalEurope, 30 Nov 2007