Talk:Microsoft Speech API

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Computing  
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Microsoft  
WikiProject icon This article is within the scope of WikiProject Microsoft, a collaborative effort to improve the coverage of Microsoft on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Disability (Rated Start-class, Low-importance)
WikiProject icon Microsoft Speech API is within the scope of WikiProject Disability. For more information, visit the project page, where you can join the project and/or contribute to the discussion.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 


Untitled[edit]

This page is starting to feel complete. Some sections on e.g. SAPI versions 1 through 4 are not done yet, and it's had only minimal proof-reading Dave w74 09:43, 9 February 2006 (UTC)

Okay this has most of the content I think is necessary and I believe this is all technically accurate. I've also done some basic proof-reading and clean-up. Dave w74 10:31, 10 February 2006 (UTC)

SAPI isn't Microsoft exclusive[edit]

I had a problem with it when SAPI only mentionned the Apache defintion, but didn't bother to comment on it because of the footnote about microsoft's SAPI as an alternate meaning. I would think that accepting the term is ambiguous and having a second page entry for the topic would be better, but frankly, I like this less. It ONLY discusses Microsoft's SAPI and in more detail than is likely necessary. The problem is that Microsoft SAPI isn't the only SAPI engine available. There is a whole family of them including the IBM one used in ViaVoice and they are all called SAPIs, that being the general industry term for the engine type, just as TAPI generally refers to Telephony engines, regardless of maker. To focus on the Microsoft SAPI as being exclusive doesn't feel like a 'general encyclopedia' entry to me but rather an advertisement for Microsoft, just as the last version of the other entry for SAPI looked like an Advertisement for Apache. If SAPI needs a more detailed explanation than just 'what is a SAPI', it should be in picking it apart into it's layers and not focusing on any specific versions of a SAPI.

I'm not a SAPI coder, but I've been researching them in an effort to hobble together a very specialised SAPI I'd rather not share details on at this time so have a pretty intimate knowledge of how they work and frankly, the Microsoft version of a SAPI isn't my prefered of the existing engines out there and isn't portable between different OSes, like the ViaVoice SAPI is.

Splitting a SAPI into levels can be done two ways. The General Levels are high and low, in the same fashion as all coding lingo. High level SAPI use is using the SAPI for all it can do and trusting the engine and your XML database to do all the actual work for you. Low level SAPI use is getting into the guts of the engine and doing most of the actual work (defining the voice, laying out the allophony tables, etc.) yourself, in your code, but just need the most basic abilities in the engine (custom speech engine usually).

Splitting the SAPI into technical levels results in 4 common levels of a SAPI and they are the order in which the SAPI processes sound for speech recognition. Creating speech is much simpler and doesn't use all 4 levels.

In the typical method of SAPI operation I am familiar with, the levels that process sound are:

Level 1: Determining the timing of the speaker... The process of trying to figure out when one word ends and the next begins and start to at least make some sense of what is being said.

Level 2: Process the word into a usable phonetic code that can be cross checked with the data base of words.

Level 3: Determine the word(s) being said.

Level 4: In the case of multiple possible words with that pronunciation, attempt to determine which word it might be in the context of surrounding words.

This is a very crude anatomy of a SAPI and some do parts of it better than others, but really, this does all come down to my not appriciating seeing advertisements for specific products in an encyclopedia... I'd have to double check my SAPI history, but I'm not even sure it was Microsoft that can get credit for making the first SAPI... They just happenned to do the first freely usable SAPI exclusive to their (most popular)OS.

Interesting comments. I agree with several things you say: Dave w74 21:03, 21 March 2006 (UTC)
  • Yes I agree this page could be named better to make it clear we are talking about a Microsoft API. I propose moving this page to "Microsoft Speech Application Programming Interface". The stuff I wrote on this page was absolutely not intended to imply that this is the only Speech API in the world, or that it was the first, or that it was the best.
  • Then in the SAPI disambiguate page one could add references to pages about other Speech APIs {if they exist}. However, to be clear, I think almost always the abbreviation "SAPI" refers to the Microsoft API. Other APIs tend to have slightly different abbreviations such as JSAPI for Java Speech API and SRAPI for Speech Recognition API. But I think it would be reasonable if it helps avoid confusion.
  • I think you're proposing a page about Speech APIs in general. I'm not sure I think this is necessary - I don't think there's "Telephony API" page, or a "Mail API" page so why add one for this? I also think it would be hard to discuss Speech APIs in general - there are so many kinds {recognition, synthesis, desktop, telephony, speaker verification etc.} But if you think it adds value, go for it ... Dave w74 21:03, 21 March 2006 (UTC)
Just switching it to MS SAPI or Microsoft SAPI would be good enough, but SAPI alone is a generic term used by multiple products.. Even the most common speech engine used in automated, voice recognising phone systems (more and more common with customer support these days) calls their product a SAPI and it does both ends of the job. I am pretty certian (but could be wrong) that the ViaVoice engine can do both, though the actual application they publish for speech recognition only does one. There are also SAPI's in other OSes. I will grant that YES, when most people say SAPI, they mean Microsoft SAPI, but when most people say computer they mean a Winbox too, but I don't see putting an entry up on computers that focuses on Windows. Though I use Windows a lot, I am actually partial to LINUX or Solaris, myself. As far as a general SAPI entry for Speech Application Programming Interface, I don't feel quite qualified to give it the history it would deserve. As Far as I am aware, the term SAPI for Speech Application Programming Interface was actually first coined in the mid-80s to describe the program end of a hardware device that could let older 8-bit computers use a VOX chip to talk. Can't find any older history on it, but there may be. - Original commenter - 12:53, 22 March 2006 (UTC)
If you can locate a "Speech Application Programming Interface" that goes by that very specific name in common usage, that isn't the API discussed in this article, please feel free to link it here in the talk section, and we'll sort it out. The proper capitalisation of the name is no accident; it really is the only API out there in the world that goes by that exact name. Renaming this article to "Microsoft SAPI" is a very bad idea, as it reduces the readability of the article's title -- the fact that it's a speech API is far more important than the fact that Microsoft produced it. The only circumstance by which this article should change names is if there is a specific disambiguation need. Right now, there simply isn't. Warrens 14:20, 22 March 2006 (UTC)

101 Things to do with Microsoft Sam[edit]

For a list of fun things to do with Microsft Sam, see User:Martinultima/101 Things To Do With Microsoft SamSpongeSebastian 04:58, 17 August 2006 (UTC)

Why merge with Microsoft Sam???[edit]

I don't understand why Microsoft Sam was deleted and now redirects here. I thought the Microsoft Sam page was absolutely fine. Now we have weird sections on this page with people pointing out that certain words sound funny, which really doesn't seem to fit well with the rest of the content.

I agree that getting a perfect arrangement of pages related to Microsoft Speech is a bit tricky, but this move seems counter-productive to me. Dave w74 02:24, 17 October 2006 (UTC)

Microsoft Sam is a TTS engine, and should not be a section in this page. Otherwise this page should list all the various engine versions that have shipped, including speech recognition engines. Maybe the organization should be separated into SAPI and SAPI engines (which includes SR and TTS engines). Charles Oppermann 04:31, 12 September 2007 (UTC)

I also removed the Easter Egg section. This wasn't a deliberate joke by Microsoft, it's just a bug or limitation in the TTS voice, and as such I don't think it's notable. No TTS engine pronounces every word or phrase perfectly - it's just a fact of the technology. User:Martinultima/101 Things To Do With Microsoft Sam is a great place for this kind of fun stuff.

Unfortunately, User:Martinultima/101 Things To Do With Microsoft Sam has been first abandoned, then deleted. Especially because MS Sam is buggy and inferior to other TTS engines, its special shortcomings apparently touch people in different ways. As stated below, MS Sam and his humorous aspects have way more significance in culture than many other parts of Windows. Also, Sam obviously contains easter eggs (deliberate jokes) like the famous "crotch" (he can speak "botch" or "notch" perfectly). But whether these jokes are deliberate or not is not important, how these bugs (or not) are understood by the users is the point that justifies an entry in WP. WP is not only a technical encyclopedia. --213.39.222.75 (talk) 05:08, 17 November 2007 (UTC)

What?[edit]

No popular culture references for Microsoft SAM?

I second that question. After deletion of the Microsoft Sam article, no reference to the great popularity of this (almost) useless part of XP is left. But Sam has actually a "career" on Youtube and many forums, reflecting the special humorous value of this TTS engine, which seems to be a noticeable part of "Windows folklore". —Preceding unsigned comment added by 213.39.155.59 (talk) 05:50, 16 November 2007 (UTC)

Streets and trips adds Anna voice but stays SAPI 5.1?[edit]

Is it correct that MS Streets and Trips only adds the Anna voice to XP (no SAPI bump)? Reesd27 12:38, 3 July 2007 (UTC)

That is correct. SAPI is middleware that is distributed with the operating system (5.1 in Windows XP, 5.3 in Windows Vista). Microsoft Anna is a TTS engine that implements SAPI interfaces. Charles Oppermann 15:23, 16 November 2007 (UTC)

Link[edit]

The link for "Microsoft site for SAPI 5" doesn't work anymore. Anyone know the correct url? I've been unable to find it. --72.43.103.251 17:59, 3 October 2007 (UTC)


Speech Recognition versus Speech Synthesis[edit]

The article contains the information: "Speech recognition support for 8 languages at release time: U.S. English, U.K. English, traditional Chinese, simplified Chinese, Japanese, German, French and Spanish, with more language to be released later." Traditional Chinese and Simplified Chinese are two different ways of writing chinese characters and are not related to recognizing spoken language. Is this feature supposed to refer to speech synthesis instead of speech recognition? —Preceding unsigned comment added by 169.233.52.52 (talk) 22:53, 6 February 2010 (UTC)