Talk:ARPABET

Military history: Technology / North America / United States

This article is within the scope of the Military history WikiProject. If you would like to participate, please visit the project page, where you can join the project and see a list of open tasks. To use this banner, please see the full instructions.Military historyWikipedia:WikiProject Military historyTemplate:WikiProject Military historymilitary history articles

This article has been checked against the following criteria for B-class status:
Referencing and citation: criterion not met Coverage and accuracy: criterion not met Structure: criterion met Grammar and style: criterion met Supporting materials: criterion not met

Associated task forces:
/
	Military science, technology, and theory task force
	North American military history task force
	United States military history task force

Linguistics

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics articles
???	This article has not yet received a rating on the project's importance scale.

Computer science

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

???

This article has not yet received a rating on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Software: Computing

	This article is within the scope of WikiProject Software, a collaborative effort to improve the coverage of software on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.SoftwareWikipedia:WikiProject SoftwareTemplate:WikiProject Softwaresoftware articles
???	This article has not yet received a rating on the project's importance scale.
	This article is supported by WikiProject Computing.

United States: Military history Low‑importance

	United States portal This article is within the scope of WikiProject United States, a collaborative effort to improve the coverage of topics relating to the United States of America on Wikipedia. If you would like to participate, please visit the project page, where you can join the ongoing discussions. Template Usage Articles Requested! Become a Member Project Talk Alerts United StatesWikipedia:WikiProject United StatesTemplate:WikiProject United StatesUnited States articles
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by WikiProject Military history - U.S. military history task force.

Contradiction[edit]

The article says the higher the digit, the higher the stress, which would of course be very awkward and counter-intuitive. The example shows the opposite. 1 = primary stress, 2 = secondary stress, thus the lower the digit, the higher the stress, except for zero.Bostoner (talk) 20:21, 23 March 2009 (UTC)[reply]

Missing phone?[edit]

I am not hugely familiar with arpabet, but I am using the beep dictionary (http://svr-www.eng.cam.ac.uk/~ajr/wsjcam0/node8.html), which uses arpabet and it contains the phone 'AX' for schwa (ə) sounds rather than 'AH.' I can't seem to find an official arpabet standard, but when I search for arpabet, I get several references that include 'AX' such as http://www.telecom.tuc.gr/~ntsourak/tutorial_arpabet.htm, http://www.stanford.edu/class/linguist238/fig04.01.pdf, and http://www-rohan.sdsu.edu/~gawron/compling/chap4/fig04.02.pdf. In fact, the only page I've seen in my (brief) search that does combine ʌ and ə into 'AH' is the CMU reference. ChristineInMaryland (talk) 19:01, 25 May 2010 (UTC)[reply]

Good points. I haven't been able to find anything that does a good job of describing the differences between the TIMIT and CMU versions of the arpabet either, but TIMIT is way more detailed (syllabic nasals, flaps, unreleased stops, etc.). The distinction between ʌ and ə is still in CMU, but it's preserved in the stress marks (AH1 vs AH0). The same is true for ER and AXR, which are mapped to ER1 and ER0 in CMU. I think that in the article, /her/ shouldn't be one of the examples for ɝ, or if it is, it should be transcribed HH ER1, not HH ER0. (CMU has both alternates.) 173.64.164.223 (talk) 14:19, 20 July 2010 (UTC)[reply]

Note that CMU does not differentiate between /ʌ/ and /ə/ in the /AH0/ case as it will merge unstressed STRUT vowels with the COMMA vowel, but not stressed STRUT vowels. For example, consider <undone> /ʌndˈʌn/, and <about>, /əbˈaʊt/. CMU cannot adequately transcribe all these cases so they can be differentiated. The /ER/-/AXR/ case is clearer as English does not have an unstressed NURSE vowel. The CMU transcription is consistent with the modern interpretation of the NURSE vowel as /əː/ in British English, where the /ə/ is only long in stressed positions. Rhdunn (talk) 16:46, 12 January 2015 (UTC)[reply]

The /ER/-/AXR/ case is clearer as English does not have an unstressed NURSE vowel. – That's not necessarily true. For instance, the last syllable of Gutenberg is pronounced as a lengthened schwa in RP, i.e. /ˌɜː/. But British dictionaries tend to not bother with secondary stress preceded by primary stress, so they just have /əː/ sans stress mark. CMU seems no different, as it has G UW1 T AH0 N B ER0 G, i.e. /ˈɡutənbɚɡ/. But this would suggest the last vowel would be pronounced as just a normal /ə/ in a non-rhotic accent. Marking the secondary stress could have easily prevented inaccuracy like this. (Granted, CMU doesn't seem to give much crap about dialectal difference anyway as it has no symbol for the unmerged LOT, though.) Nardog (talk) 03:18, 9 September 2017 (UTC)[reply]

IPA Phonemes[edit]

I'm going to update the chart to include the IPA representation for each phoneme. Note that I'm somewhat familiar with the IPA but am not exactly familiar with the sounds of the Arpabet. Therefore my edits may need to be corrected. Theshibboleth (talk) 10:17, 1 July 2010 (UTC)[reply]

Nevermind, somehow I didn't see the IPA phones already on the page 0_o Theshibboleth (talk) 10:18, 1 July 2010 (UTC)[reply]

Some Notes Based Mostly On Memory[edit]

I was involved in the ARPA Speech Understanding Project at CMU. I don't have full documentation, so this is not ready for inclusion, but I'm passing this on in case someone wants to find the documents to confirm my memory.

1) The ARPAbet actually came in two forms: the one and two letter forms. The one letter form was only occasionally used. It used both upper and lower case letters. Back then, a lot of work was still being done using low-end punch-card systems and teletypes (in fact, a slang term for CRT terminals were "glass-teletypes"), which did not have lower case letters available (they used the ASCII 6-bit encoding). The impression I had was that the intention was that the one-letter form was meant to be the preferred form, but the two letter, all caps, was there as a lowest common denominator version.

A copy of the chart we used (I was the immediate supervisor of the undergraduates who were hired to do manual phonetic transcriptions) can be found in ^[1]. It includes both the one and two character forms.

The two letter version almost immediately dominated for three reasons (this is probably not documented in print anywhere): 1) You didn't have to worry about what display or input hardware was available when files or software was exchanged, 2) It was hard to remember many arbitrary choices of upper vs lower case (e.g., "s" was "S" but "S" was "SH") 3) Words containing lexically aRBItrArY but semantically SiGNiFIcaNt mixes of case is hard to read and to type.

2) The two-letter form was supposed to be unambiguously parsable with a one-letter look ahead both forward and backwards. I have no idea why easy backward parsing was considered desirable, and I don't know of anyone who used that property. That was a good thing since one of the CMU team (and probably others elsewhere) discovered that the backward parsing was actually ambiguous. I have no idea whether this was published in any form.

3) As the above referenced chart shows there were additional symbols for phonetic punctuation. Use of "Punctuation marks ...[used] like in the written language" was not part of the original system, and in fact, using some (e.g., ".") would be in violation of it.

4) Some auxiliary informal standards developed both within and between groups but I don't remember the details very well. Among other things these were used for indicating segment timing, sub-phonemic structure (e.g., attack phase of plosives), diacriticals like aspiration and nasalization and allophonic variations. Some of this was just tacked on, some represented what should go inside the official annotation brackets and both square (I think that became the convention for enclosing the entire string) and curly brackets (we may have used them for allophonic variations in dictionaries and alternative matchings in transcription) were used in addition to the official "( )", "< >" and "** *". The use of "_" for word boundaries was pretty much universally replaced with " " quite quickly.

A lot of this was motivated by the ARPAbet being designed with a much narrower scope than the project required. It was intended as a phonetic alphabet for English (essentially, pronunciation templates), but speech recognition required detailed phonic annotation (classification of what speech sounds actually occurred).

5) Traditionally at least, it was always either "ARPAbet" (a acronym in caps portmanteaued to a word fragment treated as a suffix) or "ARPABET" (because back then, computer terms tended to be all in caps) -- but not Arpabet as in the article.

96.233.98.186 (talk) 18:43, 25 September 2012 (UTC)Topher Cooper[reply]

References

^ http://www.laps.ufpa.br/aldebaro/papers/ak_arpabet01.pdf

Making an important fix[edit]

The example for AO is misleading:

AO ɔ bought

Bought is AA in America and AO in Britain (cite: tophonetics.com, http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b)

Switching to "bore" which has a more similar vowelacross british/american english. — Preceding unsigned comment added by 129.170.195.68 (talk) 22:25, 13 November 2018 (UTC)[reply]

The cot–caught merger wasn't as advanced in the US as it is today when ARPABET was created. I don't disagree that a word with [ɔr] may be better suited as the example, though. Nardog (talk) 17:15, 14 November 2018 (UTC)[reply]

The cot–caught merger has made AO absurdly ambigious. All instances of AO should be replaced with AA or OW. Sandizer (talk) 04:41, 17 February 2023 (UTC)[reply]

[1] ttp://www.laps.ufpa.br/aldebaro/papers/ak_arpabet01.pdf

[1]