Talk:Phonetic symbols in Unicode

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Linguistics / Phonetics   
WikiProject icon This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of Linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
Taskforce icon
This article is supported by the Phonetics Task Force.
 
WikiProject Writing systems (Rated B-class, High-importance)
WikiProject icon This article falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Lots of Non-Displayed Characters[edit]

Whats the point of creating this chart if you can't display the characters ? Just to see hex column or rows? Even Firefox cannot show the hex 02EA~1D2B, 1D66~1DBF (except very few) ! Until most browsers supports or capable of display all those characters, a graphical representation is necessary, through picture/graphic file(s). Thanks. ~ Tarikash 00:34, 14 July 2006 (UTC).

that's unrelated to which browser you are using. You have to have unicode fonts installed. I am using Firefox and most characters render properly for me. Of course you are free to add graphics, and to add a list with an entry for each character (name, purpose, etc.), along the lines of Miscellaneous Symbols/Letterlike Symbols. If you just want to see what the character should look like, you may also click on the external link (unicode.org), where you get pdf files of the ranges. But by all means, do add the graphical representation. I have no motivation to do that, because I see the characters already. dab () 16:09, 14 July 2006 (UTC)
Modification, installing font(s), or by changing default setup ... many things can be achieved, but, normally (default setup) cannot be seen. (i thought) we are trying our best to make pages/things working/displayable, with minimum or no extra modification, for the average users. Anyway, background color approach is very nice, easily distinguishable character categories, great. Editing individual graphics for each char takes up too much time, but i'll try that in future. But adding character entity name should be little easier. ~ Tarikash 21:15, 14 July 2006 (UTC).
To use one of the available Unicode fonts to display the Unicode special characters, we need to specify the class="Unicode" in the table's TR tag (or, in each TD tag, but using in each TR is easier than in using in each TD), for wiki table code, we need to specify that after the "|-" (like |- class="Unicode"). Template code {{Unicode|char}} , <span class="Unicode"> ... </span>, etc for each character can also be used. I've updated few articles already with this class, and looks like few more characters are showing up than before. Thanks. ~ Tarikash 22:09, 15 July 2006 (UTC).
well, I am glad you know about {{unicode}}: note that it is just a temporary thing, it may help certain browsers to display things, but in the long run (say, if you wait for another year or so), most browsers should be able to render all this out-of-the-box. As I say above, you are welcome to add the {{unicode}}, of course, as well as other information. dab () 22:42, 16 July 2006 (UTC)
Until most browsers support these by default without any modification, we should still specify and use these, it is only a matter of one bot and may be an hour to remove all these codes, when not needed/necessary any more, am i right? so why should we make things disfunctional at this moment, and wait for one year to work? by letting the chart/symbols displayable for one year, we will help many to see it, and understand it better, thus further progress of Unicode. ~ Tarikash 22:58, 16 July 2006 (UTC).

"Semantic Phonemes"[edit]

The criticism related to "semantic phonemes" added by User:Indexheavy appears to be based on a number of misunderstandings, in particular surrounding the differences between glyphs, characters and referents (signifiés, viz. phonemes etc.) of characters. The topic it appears to address is the canonical names of some "IPA Extensions" characters. Since the Unicode range itself is called "IPA extensions", it somehow stands to reason that the character called "LATIN LETTER BILABIAL CLICK" is really the IPA symbol for a bilabial click, since there is really no Latin letter for a bilabial click. Yes, the character names are often not very happy choices. This points to a lack of professionality or consistence sadly often observed in the Unicode standard, however, the character names are merely convention anyway, and it is difficult to follow why they should be analysed depending on whether they describe a glyph shape or not, precisely because they are just a rather clumsily chosen convention. I suggest it is enough to just list the names and be done. If there is notable criticism related to the naming of the IPA characters, we should by all means cite it, but as it was, this discussion of "Semantic Phonemes" was imply "original research". dab (𒁳) 11:11, 1 May 2007 (UTC)

I think you misunderstand the purpose of the section. It wasn't meant so much as a critique as an aid in understanding the so often cited difference between glyphs and semantic characters. The phonetic characters serve as an excellent example where many reader will be able to graasp the difference. Moreover, I disagree that the names of the characters are merely convention. The code point means nothing to authors. All that remains to provide guidance on the use of the character are the glyph (which could vary widely from font to font) and the character names (along with script and block that are usually reflected in the names). The example you give of the bilabial click is an example of a semantic character name (and I do not think its one of the unfortunate names). Anyway, I'm going to restore the section. Perhaps some discussion here will help us understand what needs to be changed to make the writing more clearly convey what I wanted to convey (which wasn't all a critique). Indexheavy 19:45, 1 May 2007 (UTC)
Looking again at the change you made, this is the type of issue I'm trying to addres. You worte:
Unicode includes letters and marks from the International Phonetic Alphabet (IPA) and those supporting other phonetic alphabets too. In some instances, the canonical Unicode character name is IPA-centric in the sense that it appears to treat IPA conventions as part of the Latin alphabet, e.g. the character "LATIN LETTER BILABIAL CLICK" (U+0298 ʘ) is in fact an IPA symbol that has nothing to do with the Latin alphabet.
The issue is not with UCS names like that for character U+ 0298. Rather its that so many other characters have UCS names that do not correspond to the semantics (like latin letter r with tail, U+027D) rather than a semantic phoneme name. Also to clarify these phoneme names are not unique to the IPA; they are used across all phonetic disciplines. I suppose the use of Latin in the name may be related to the issue I am trying to shed light on, since semantic phoneme characters would be their own writing system and not tied to any other alphabet. Anyway, its clearly a confused topic and this issues surrounding phonetic characters (I don't like the use of the term symbosl in the name of this article since it also represents the same conflation I'm trying to address). Again, this isn't meant to be a critique of Unicdoe or UCS so much except to the extent that they contribute to the confusion in the general public.
So the problem I'm highlighting is not the use of the term ‘Latin’ in the case of the bilabial click. Its that in terms of Unicode’s goals of semantics (not always shared throughout the Unicode, UCS and ISO communitites), all the various phonetic writing systems are all a single writing system. The different phonetic alphabets could be handled by changing fonts or through glyph variant selection from the same font.
However, the various phonetic writing systems all borrow glyphs from other writing systems (either unchanged or with minor modification). This all leads to confusion among those authoring with Unicode characters. It also helps serve as a great example of the different between glyphs and semantic characters. Again, this is not original research its simply an example of other commonly cited confusion betweenglyphs, characters and referents (as you yourself cited above). However, too often the confusion is merely cited without giving readers any concrete examples. Here with phonetic characters, there's an opportunity to drive home those distinctions with examples that are too muddled in other writing systems. For example, in the Latin alphabet the name "Capital Letter A" describes a semantic character. However, it also connotes a glyph, so a reader cannot seaparate those two constructs in that context. However, in the context of a phonetic writing system, the name "Small Capital Lettter R" describes only the glyph: not the semantic character. In a sense the name really implies “The same glyph as a Small Latin Capital Letter R”. This disjoint between character and glyph (that does not occur when discussing the same character within the Latin script), provides an opportunity for an example that drives home the glyph/character distinction. I hope this is clearer here. Perhaps this discussion can help clarify the topic in the article. Indexheavy 20:45, 1 May 2007 (UTC)

I appreciate the point you are trying to make, but I repeat that your discussion is confused. First things first: what are your sources? I am prepared to read you charitably, but do you have any specific source for your "often cited difference between glyphs and semantic characters"? I do understand your use of "semantic", but it is itself confused and idiosyncratic: whence do you take your term "semantic character"? Proper terminology is simply character (graphemes) vs. glyph. "Capital Letter A" denotes a character just as much as "r with tail" or "IPA symbol for bilabial click" or "Small Capital Lettter R": these do not denote glyphs. A glyph is a specific graphical realisation of a character. I am afraid that you are yourself still rather confused on these points, which obviously doesn't help you in making the points you want. Also, this whole discussion would belong on character (computing), it is not really pertinent to Unicode Phonetic Symbols in particular. The problem UCS is facing is always: how do we delimit a single grapheme? There is no simple solution, since graphemes tend to blend into one another. Thus, any encoding standard will have to draw arbitrary lines. If you like, it is arbitrary to give two codepoints to Greek Α and Cyrillic А: The Cyrillic alphabet is "really" just the Greek alphabet with a few extra letter for Slavic phonemes. However, they have evolved apart far enough that there could be no question of treating them as separate scripts (case in point, Η vs. Н). But, the Coptic alphabet is an example where UCS at first opted to consider it an extension of the Greek alphabet, and later (4.1) changed its mind about it (which results in another hair-raising instance of a script's codepoints spread all over the BMP). These are the examples you are looking for. Discussing this with IPA is confusing. Strictly speaking, IPA a is not the same character as Latin a, but it would have been absurd to allocate a separate codepoint (not more absurd than allocating codepoints for Latin numerals, though; I wouldn't put it beyond the Unicode consortium to come up with "IPA LETTER OPEN FRONT UNROUNDED VOWEL" which will simply map to a glyphs). These are -- often difficult -- judgement calls, and the consortium gets it right sometimes, and not quite right at other times. We could discuss this at character (computing), character encoding, Universal Character Set, Unicode Consortium or similar, but I really see no reason to detail it here. dab (𒁳) 09:05, 2 May 2007 (UTC)

reply[edit]

I understand what you're saying and I don't disagree with most of it. However, I don't think you're taking the time to understand my point (which is why you think its confused). Let me take up some of the confusion here. Everyething else in your post I agree with completely, but I don't think its all that realted to the point I'm trying to make.
“First things first: what are your sources?”
All the contributions I've made over the last few weeks are drawn from the Unicode Standard 5.0 (with perhaps a few minor execeptions). I haven't added sources yet and that's my mistake. However, the articles I've been working on have virtually no sources cited at all, so it seems a bit unfair to call me out alone for lack of sources.
“I am prepared to read you charitably, but do you have any specific source for your "often cited difference between glyphs and semantic characters"?
You yourself cite that without a source To qquote your phrasing: “differences between glyphs, characters and referents (signifiés, viz. phonemes etc.) ”. This is what I was referring to. If you don't like my terminology then change the terminology. Don't just make wholesale deletions of anotherr contributors work. My point is that rather than repeateing that phrase, an encyclopedia article should explain what it means with examples and the like.
“Proper terminology is simply character (graphemes) vs. glyph.”
Except when discussing characters that are not semantic. Take for example the Arabic final forms. These are graphemes, but they do not have a semantic distinction from the character they decompose to. So if one is talking about two characters and want to clarify the distinction between semantic and non-semantic charcters I would think one would want to add the adjective semantic to the other terms you list.
“"Capital Letter A" denotes a character just as much as "r with tail" or "IPA symbol for bilabial click" or "Small Capital Lettter R": these do not denote glyphs. A glyph is a specific graphical realisation of a character.”
I understand, but in order to discuss semantic verses non-semantic characters (a distinction built into the UCS). For example a "Small Capital Letter R" is a character. That is not in dispute. However this canonical name suggest it is making a distinction from the "Capital Letter R" that is not a semantic one. It is distinguishing it based on the glyph it is meant to produce: a lowercase height version of the capital letter.
“I am afraid that you are yourself still rather confused on these points, which obviously doesn't help you in making the points you want.”
Here you indicate you are not reading me charitably at all, as you claimed above. Perhaps you should read a bit about the phonetic alphabets. I think then you might understand the exposition more. I think the average reader would have less trouble understanding the points being made. You're perhaps too close to the topic and need to make an effort to read it more charitably.
“Also, this whole discussion would belong on character (computing), it is not really pertinent to Unicode Phonetic Symbols in particular.”
The points being made here are specifically about the phonetic writing system(s).
I understand the point you make about arbitrary lines. However, for the phonetic writing system, this serves as an excellent place to discuss how those lines get drawn. One of Unicode’s principles is to focus on semantics (semantic characters if you will). Related to this si to encode plain text and not rich text. Obviously both of these principles rely on arbitrary boundaries where decisions must be made about where to draw lines.
“Discussing this with IPA is confusing. Strictly speaking, IPA a is not the same character as Latin a, but it would have been absurd to allocate a separate codepoint (not more absurd than allocating codepoints for Latin numerals, though; I wouldn't put it beyond the Unicode consortium to come up with "IPA LETTER OPEN FRONT UNROUNDED VOWEL" which will simply map to a glyphs).”
Why would it be absurd? Do you have any sources you can cite on this being an aburdity? With the NamesList property this is what Unicode is doing (though only part way). This is precisely what I am getting at. The names of the characters matter. They shape their usage by authors. In my view, the treatment of phonetics as a separate script would not be abusrd. In fact it would reduce the misuse of the characters, it would create better interoperability, and it would reduce the total number of characters needed to handle all phonetic writing systems. In any event, my views on that are not germane. However, if you could get over your preconcpetions of the aburdity of treating phonetics as a separate script, you would have less trouble digesting the points being made. Keep in mind that the general reader of wikipedia will not have the same strongly felt preconceptions about the aburdity of an independent phonetic writing system; so they won't have the same problems understanding the examples.
The naming of characters after their phoneme would make the characters semantic as opposed to naming the phonetic characters after the glyphs they adopt (the phonetic alphabets borrow glyphs from other scripts and not the letters the glyphs represent). I think you should read up on this semantic distinction a little and you might understand why that would the consortium would encode these. Again all of the phonetic writing system share very similar semantics. The differences in the phonetic alphabets are simply glyph differences (some very miinor semantic differences which would require a few additional characters). Whether IPA or APA, the semantic characters (as listed in Unicodes NamesList property) are the same. However, the canonical glyph based names are deceptive when using the "wrong" alphabet.
“These are -- often difficult -- judgement calls, and the consortium gets it right sometimes, and not quite right at other times.”
“We could discuss this at character (computing), character encoding, Universal Character Set, Unicode Consortium or similar, but I really see no reason to detail it here.”
Well I'll take a look at those articles. However, the issue of phonetic characters and the differences between the (typically) semantically focussed NamesList property and the glyph focussed canonical name property for phonetic characters seems very germane to the present article. These differences are reflected in the Unicode Standard 5.0. These differences are reflected even in the Unicode Character Database and related files. I don't understand why you would want to supress this discussion. Indexheavy 21:28, 2 May 2007 (UTC)

replied on your talkpage. dab (𒁳) 08:29, 9 May 2007 (UTC)

On my talk page, you raised an objection to semantic phonemes. Again, I don't understand the objection. However, let me start the dialog by trying to clarify the way I'm using it. Some of the phonetic characters are canonically named (from the UCS) with a name that reflects the semantics of the character from the field of phonetics. Other characters are canonically named according to the glyph borrowed and modified from another script. At times, Unicode adds through its NamesList properties, another alternate name for the character (sometimes referred to by the Standard as the corrected name) that often reflects the opposite of the UCS cannoical name (opposite in the sense that if the UCS gives a name drawn from the field of phonetic, the NamesList uses a name that is descriptive of the glyph and vice versa). The names related to the field of phonetics is what I'm reeferring to as semantic phonemes. Again this is drawn from a reading of The Unicode Standard 5.0 and the Unicode Character Database and related files. I'm not trying to review the standard or disparage or glorify it. I'm simply trying to render this material in a manner accessible by general readers. I'm still not clear whether I'm failing or if your objections are related to some other issue. I'm not trying to coin in neologisms here. I'm simply using semantic in the way I understnad it used in many places and the word character as its used by Unicode and elsewhere. I imagine your objection to the two terms together is not so much about citing sources as a misunderstanding and likely my own poor choice of words. Indexheavy 09:38, 9 May 2007 (UTC)
One more thing I want to underscore here. I think its important to discuss the way these phonetic characters are used or intended to be used by the Unicode Standard (again I prefer 'characters' or 'graphemes' or the like to 'symbols' when discussing phonetics). The NamesList property is an important part of that. Also the fact that many of the phonemes do not have their own characters assigned, but instead use characters (unchanged) from other scripts is important as well (again, this is not reflected in the properties of those borrowed characters but is covered by the Standard). But the discussion I added to this article is also meant to give readers an understanding of how this particular writing system compares and contrasts with the other writing systems. Because understanding the differences and similarities helps a general reader understand the whole topic. Please don't think I'm being flipant. I'm doing my best to respond to the concerns you raise and hopefully we'll both undersand each other's position better in the end. Indexheavy 09:46, 9 May 2007 (UTC)

It's been more than three years[edit]

The article page says this section needs the attention of an expert. It seems such an expert has not been found yet. Perhaps the message should be removed, as it discourages people from improving the section.

Until the expert saviour comes, the least that should be done is rewriting the section as an encyclopaedic coverage of a debate - yes, with sources referring to this debate, otherwise it is original research, albeit based on the public Unicode NamesList - and not as the piece of blunt propaganda it is now.

If this cannot be done the section should be removed - not because it's wrong and/or confused, but simply because Wikipedia, which is intended for encyclopaedic coverage of topics and not for original research and propaganda, is not the place for it.

188.169.229.30 (talk) 03:47, 16 September 2010 (UTC) Big text

Diacritics?[edit]

Could a table with diacritics/combining characters be added as well? I am currently looking to type the non-syllabic diacritic (which looks like a half-circle below the letter) but I can't find it. CodeCat (talk) 16:57, 4 March 2013 (UTC)

"Semantic phonemes and character names" section[edit]

I cannot say that I have read this section particularly thoroughly nor that I understand its meaning completely, but it seems to be mainly a criticism of Unicode's naming conventions. If this is the case, I do not believe that it belongs here. Wikipedia is meant to be encyclopaedic and not a forum for debate. Whether or not Unicode chose code points or names poorly is not the issue; it is whether this section is truly an objective documentation of a legitimate issue.

This section, in my opinion, needs to be reworked or else removed. It should refrain from complaining about the state of Unicode and should instead simply present the fact that Unicode assigned names and code points according to shapes from the IPA and other phonetic alphabets instead of the sounds they represent. I would do this, except that I am unsure how to proceed.

Please correct me if I am mistaken about the meaning of this section or the author's intent.

hwalter42 (talk) 00:16, 16 June 2014 (UTC)

Agree, no lamenting. I also get the impression that there is a lot of repetition over the whole article. Why not remove the section you mention, and rewrite it from scratch? -DePiep (talk) 10:43, 16 June 2014 (UTC)
It's an essay, and not appropriate for Wikipedia. I don't think there is any saving it, so agree with removing the section entirely. BabelStone (talk) 11:10, 16 June 2014 (UTC)