Talk:Speech synthesis

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Robotics (Rated Start-class, Mid-importance)
WikiProject icon Speech synthesis is within the scope of WikiProject Robotics, which aims to build a comprehensive and detailed guide to Robotics on Wikipedia. If you would like to participate, you can choose to edit this article, or visit the project page (Talk), where you can join the project and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 
Note icon
This article has been marked as needing immediate attention.
WikiProject Linguistics / Applied Linguistics  (Rated C-class)
WikiProject icon This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of Linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
Taskforce icon
This article is supported by the Applied Linguistics Task Force.
 
Note icon
This article has been automatically rated by a bot or other tool because one or more other projects use this class. Please ensure the assessment is correct before removing the |auto= parameter.
WikiProject Disability (Rated C-class, Mid-importance)
WikiProject icon Speech synthesis is within the scope of WikiProject Disability. For more information, visit the project page, where you can join the project and/or contribute to the discussion.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 
Version 0.5      (Rated start-Class)
Peer review This Engtech article has been selected for Version 0.5 and subsequent release versions of Wikipedia. It has been rated start-Class on the assessment scale.
Former featured article Speech synthesis is a former featured article. Please see the links under Article milestones below for its original nomination page (for older articles, check the nomination archive) and why it was removed.
Main Page trophy This article appeared on Wikipedia's Main Page as Today's featured article on June 3, 2004.

Microsoft Sam glitch[edit]

Does anyone know why when you have MS Sam read "soi" or "soy" it makes a really odd airy sound? There are other errors but I can't remember —Preceding unsigned comment added by 24.156.28.128 (talk) 11:30, 13 September 2008 (UTC)

Older comments[edit]

I think perhaps the different synthesis techniques are long enough to warrant their own pages Nohat 05:20 19 Jun 2003 (UTC)

Could somebody extend on 'Formant synthesis', both on the technical side and terminology, is any system using filting technics on basic waves + noise considered 'formant' synthesis? is it a specific technique, or just the general term of synthesising phenomenons?

Trillium Sound Research Inc (now defunct) offered unlimited vocabulary articulatory speech synthesis on the NeXT computer in 1994, so it is not accurate to say that articulatory speech synthesis is only of academic interest and not far enough advanced for commercial application. It was NeXT Computer that failed, not the synthesis, which was rated the best synthesis available at the time. That software is now the basis of the GnuSpeech project -- a port of the original NeXT software to Linux. It is under a GPL. The basis is an acoustic tube model, so it is low level articulatory synthesis with the necessary databases for varying the tube cross-sections, using the Fant/Carre research on formant sensitivity analysis and control regions. Provision is made for adding the higher level parameters such as tongue height, jaw opening, etc, but this extension is still undeveloped, and would rely on deriving relationships between these higher-level parameters and the low-level tube cross-section parameters. Other ports are possible/likely.

use in Weatheradio[edit]

Not sure where to put it, but the National Weather Service in the U.S. uses it on all Weatheradio stations now. The new voice sounds excellent, and i think uses a hybrid of patched voice and true synthesis. The Weather Channel also may use this for their Vocal Local announcements during the local forecast (but not on their Weatherscan channel). –radiojon 02:47, 2004 Jun 4 (UTC)

The NWS "Tom" and "Donna" AKA "Mara" voices are the SpeechWorks Speechify (now merged with Realspeak) American English voices "Tom" and "Mara" (no longer available), which use a purely concatenative system. [1] Nohat 06:57, 14 Apr 2005 (UTC)

External links[edit]

I think this article has too many (16) external links in the section, "Examples of current systems". I don't know much about the different systems we link to, but either A) each system we list is important in the topic of speech synthesis and needs to be mentioned in the article, or B) not all of these systems are important. In case A), we should use internal links: Don't use external links where we'll want Wikipedia links. In case B), we should just pick one or two examples, or link to a page which contains these links (Wikipedia is not a link repository.) — Matt 14:08, 17 Jun 2004 (UTC)

  • I'm not sure there's such a thing as "too many external links". "Wikipedia is not a link repository" applies to articles which consist of nothing but links, which this article is clearly not. I don't see what the point of removing all or some of the links would be. Nohat 18:59, 2004 Jun 23 (UTC)
  • I agree that once an article starts "collecting" external links, its hard to stop ("If website A is listed, why not website B"). Nohat, I don't agree with your assessment that the "Wikipedia is not a link repository" statement only applies to a particular type of article. Link spamming and excessive linking is becoming a major problem WP:WPSPAM. There is a major update and serious discussion underway here WP:EL with more defined do's and don'ts. The trend I believe will be toward fewer external links. One recommendation now appearing in the guidelines is to eliminate all but a few highly relevant links and placing a link to DMOZ that points to a directory of websites that relate to the article's topic. Speech synthesis in particular had many links to commercial websites promoting products and services, but these were removed per guidelines. I went ahead and removed several more today because they were promotional (one selling a book, free for a limited time, required registration, etc.). Any discussion on these edits would be welcome. Calltech 15:06, 5 December 2006 (UTC)

Request for references[edit]

Hi, I am working to encourage implementation of the goals of the Wikipedia:Verifiability policy. Part of that is to make sure articles cite their sources. This is particularly important for featured articles, since they are a prominent part of Wikipedia. The Fact and Reference Check Project has more information. If some of the external links are reliable sources and were used as references, they can be placed in a References section too. See the cite sources link for how to format them. Thank you, and please leave me a message when a few references have been added to the article. - Taxman 19:43, Apr 22, 2005 (UTC)

Early Voices Described as "Robotic" Seems Circular[edit]

Primitive speech synthesis devices sound robotic. A robotic voice is produced by a primitive speech synthesis device. This is circular. The popular idea of what a robot's voice sounds like comes from early attempts at speech synthesis. Film and television makers must have imitated what had been produced by early efforts at synthesis when creating robotic characters. Would be more accurate I think to say that the idea of a robotic voice came from efforts to produce speech synthesis. Saying that early speech synthesizers were robotic gets it backwards.

I'm not sure it's quite so simple as that. Interestingly, there has only ever been one speech synthesis system that spoke in a monotone (and not very popular or often-used one at that)—yet, the most common feature of "robotic" voices when imitated by humans is a monotone. Clearly this notion of what a robot sounds like was not based on listening to actual synthesized speech. It is more likely that the idea of "robotic" voices came from what people imagined a synthetic voice would sound like, rather than what actual synthetic voices sounded like.
Regardless of all this, to the contemporary reader, the idea of the voice sounding "robotic" is probably a fairly safe if perhaps preposterous in the literal sense base point to explain what old speech synthesis systems sounded like. Nohat 06:50, 25 October 2005 (UTC)

Open source software[edit]

Are there any open source speech synthesis projects? It would be great to summarize how the best few are doing or note the lack if there are none. — Hippietrail 17:36, 15 April 2006 (UTC)

Possible copyvio[edit]

A possible copyvio concern has arisen in the Feature Article review. User:Marskell wrote "I believe the Concatenative Synthesis section may be a text dump from here". This is a serious concern that should be addressed inmediately/ Joelito (talk) 19:30, 7 November 2006 (UTC)

External links cleanup[edit]

External links section was getting filled with lots of links to similar websites. WP is not a directory of links WP:NOT:

"Wikipedia articles are not mere collections of external links or internet directories. There is nothing wrong with adding one or more useful content-relevant links to an article; however, excessive lists can dwarf articles and detract from the purpose of Wikipedia"

I went ahead and removed most of the external links and added DMOZ category for speech synthesis (per WP recommendation). If you feel that any of the deleted links contribute substantially more than the others, please feel free to leave a comment here and we all can discuss. Thanks! Calltech 18:43, 20 December 2006 (UTC)

Fair use rationale for Image:MS Sam.ogg[edit]

Nuvola apps important.svg

Image:MS Sam.ogg is being used on this article. I notice the image page specifies that the image is being used under fair use but there is no explanation or rationale as to why its use in this Wikipedia article constitutes fair use. In addition to the boilerplate fair use template, you must also write out on the image description page a specific explanation or rationale for why using this image in each article is consistent with fair use.

Please go to the image description page and edit it to include a fair use rationale. Using one of the templates at Wikipedia:Fair use rationale guideline is an easy way to ensure that your image is in compliance with Wikipedia policy, but remember that you must complete the template. Do not simply insert a blank template on an image page.

If there is other fair use media, consider checking that you have specified the fair use rationale on the other images used on this page. Note that any fair use images lacking such an explanation can be deleted one week after being tagged, as described on criteria for speedy deletion. If you have any questions please ask them at the Media copyright questions page. Thank you.

BetacommandBot (talk) 13:25, 8 March 2008 (UTC)

How on earth could this be copyrighted? It's a voice saying a sentence. You can't copyright arbitrary audio from text-to-speech synthesizer. You can only copyright a specific recording. 99.14.103.236 (talk) 03:42, 20 September 2009 (UTC)

Text to speech based on Festival in Unix[edit]

www.wordtosound.com installed on a unix box. type any text (english only) and output as downloadable file wav or mp3. Voice is british accent and kind of croaky, but understandable. More clear in the wave format. —Preceding unsigned comment added by 69.85.110.110 (talk) 21:04, 27 May 2008 (UTC)

Suggest that Heterogeneous Relation Graph (HRG) and Delta should be described here[edit]

These comprise an important phase of most modern TTS systems and should be discussed here. I don't currently have the time to add this section, but if no one else gets around to it, I'll come back and write up a few things when I'm less busy. Twikir (talk) 04:08, 15 April 2009 (UTC)Twikir

Craptalker[edit]

There's a program online that might count as a speech synthesizer. It's called "CrapTalker". Should it be added to the links?

http://www.computerpranks.com/software/default.cfm?ItemID=1690

Sohzq (talk) 13:47, 26 May 2009 (UTC)

Text-to-speech voices[edit]

Can a new section and article be made comparing the Text-to-speech voices ? Besides the conventional microsoft Sam and microsoft Anna, some other voices might exist ?

Also, does a voice like the [Monster, Alien, or Amplifier Halloween voice changer exist ? These voices were featured in Fun with Dick & Jane; (see here and here) may be somewhat harder to make understand dough, but could still be used in some applications —Preceding unsigned comment added by 91.176.6.252 (talk) 11:55, 16 June 2009 (UTC)

Overview of text processing figure[edit]

Shouldn't the first block of the linguistic analysis component be "Phrasing" rather than "Phasing"? Broloks (talk) 16:55, 3 October 2009 (UTC)

A new alternatve front end?[edit]

The current front end to his technology seems to be soley text processing. But a recently reported study here demonstrated a method of reconstructing words based on the brain waves of patients simply thinking of those words, by monitoring the superior temporal gyrus of their brains. The 2012 study by Pasley et. al., reported in the journal PLoS Biology [2], used fMRI to track blood flow in the brains of 15 patients who were undergoing surgery for epilepsy or tumours, while playing audio of a number of different speakers reciting words and sentences. With the aid of a computer model, when patients were presented with words to think about, the team was able to guess which word the participants had chosen. Potential therapeutic implications have been suggested. Thanks. Martinevans123 (talk) 19:03, 14 February 2012 (UTC)

Robotics project attention needed[edit]

  • Refs - large amounts of text have no refs
  • Content - are all topics covered?
  • MoS compliance
  • Reassess

Chaosdruid (talk) 11:39, 24 March 2012 (UTC)

Needs more examples[edit]

Not to state the obvious, here, but this article needs more examples (i.e., sound files) of what kind of results the different types of speech synthesis can give. Right now the article has only two examples, neither tied to a specific section, and only one of which gives enough information to tell how it was generated. - dcljr (talk) 08:13, 14 January 2013 (UTC)

kurzweil reading machine[edit]

According to a source already cited in the article, Klatt, D. (1987) "Review of Text-to-Speech Conversion for English" Journal of the Acoustical Society of America 82(3):737-93, The Kurzweil Reading Machine is the first commercial text-to-speech synthesis system. That is on p. 770 Klatt has a diagram with a box for "Kurzweil Reading Machine, 1976" and under it it says "first commercial system". Do other sources disagree, or where is the best place to put this in the article? Silas Ropac (talk) 16:38, 8 February 2013 (UTC)

NeoSpeech & Natural Voices[edit]

After a quick short research looking for the most natural speech synthesis voices I found these two. I appreciate both but NeoSpeech seems more natural.

--TudorTulok (talk) 14:33, 11 March 2013 (UTC)