Talk:ISO 11940

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Thailand  
WikiProject icon This article is within the scope of WikiProject Thailand, a collaborative effort to improve the coverage of Thailand-related articles on Wikipedia. The WikiProject is also a part of the Counteracting systematic bias group aiming to provide a wider and more detailed coverage on countries and areas of the encyclopedia which are notably less developed than the rest. If you would like to help improve this and other Thailand-related articles, please join the project. All interested editors are welcome.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Writing systems (Rated Start-class, Mid-importance)
WikiProject icon This article falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 

Transliteration[edit]

Good thing there is now some info on this standard. Since ISO charges high amounts for their standards, I never had the opportunity of seeing it. It seems to be a transliteration in the narrow sense, from which the original spelling can be reconstructed. Apart from that it seems much stricter than, but quite compatible with RTGS. Perhaps it would be useful to add some comments on the comparison.

I have never seen this standard used in practice anywhere. Do you know of any instances?

Small note: perhaps you missed a diacritic on one of the "y"s.

Do you have any information on the vowels? −Woodstone 14:41, 8 April 2007 (UTC)

I have no idea what I am doing here... As you say, the standard is not in wide use, and I had a hard time googling it. All that turns up are sites trying to sell you the standard for 56 dollars or so. I ask you! My only source for this is the xml file at unicode. It contains more information, and it conscientiously tells us when it is deviating from ISO. Is it is, this is all unverified. From what I found elsewhere on the net, yes, this is definitely intended as a standard allowing the reconstruction of the original spelling from the romanization (unlike RTGS). dab (𒁳) 23:29, 9 April 2007 (UTC)

Thanks for finding this file. Very informative. I see you started on the vowel section. I guess that the vowels are written in the same sequence as in Thai? Relative to the consonants vowels are not spoken in the sequence as written in Thai script and many vowel sounds are written with several symbols, located before, after, below and above the consonant (cluster) that precedes them in pronunciation. So deciphering any text from this transliteration will be a challenge. Finding the syllable boundaries will be a nightmare. −Woodstone 09:56, 10 April 2007 (UTC)

no, I am sure the vowels in transliteration will be in phonological order. The xml file has some special rules for this afaics. Anything else would be insane, and without precedent in ISO 15919. It's p̣hās̄ʹāthịy, not p̣hās̄ʹāịthy. dab (𒁳) 13:29, 10 April 2007 (UTC)

I hope you guys find my additions helpful. The 'xml file' (ICU or CLDR to me) has several divergences from the standard - that's what impelled me to buy a copy of the standard. (I'm now working on a bug report against it.) It's based on someone's documentation of the system, not on the standard itself. I added the example of Chiangmai, because that looks bizarre whichever option you choose for vowel reordering.

I'd've liked to have compared the transliterations of ชุมนุม spelt with nikkhahit, but Windows XP doesn't render the sequence <sara u, nikhahit>. The two implementations give ch̊un̊u and chůnů respectively!

You may wonder why I mentioned dandas. The reason is that the Unicode Technical Committee, or at least, senior members, have agreed that using dandas in Romanised text is perfectly reasonable. The dandas to use are the ones named as Devanagari - the Unicode Character Database categorises them as 'common', i.e. used in multiple scripts.

The table for 'other diacritic marks' looks a mess. Perhaps we need a bearer - น/n probably has fewest typographical problems.RichardW57 20:41, 16 May 2007 (UTC)

Good work. I've always wondered why ISO tries to keep their standards secret by charging high amounts for them. As I guessed above, the vowels are in spelling order (not phonetic), making the transliterations quite difficult to read. In the vowel section the อ is missing: does it re-use the symbol x given for the silent consonant? The other weird choice is v for ฤ, mentioned both as consonant and as vowel. Quaint is also keeping the h as class indicator. The only deviation from RTGS in consonants, apart from the diacritics, seems to be c for จ (RTGS has ch). I need more time to let it sink in. As bearer for some vowels and diacritics, we have used "&ndash;"(–) in other Thai spelling articles. −Woodstone 22:54, 16 May 2007 (UTC)

I didn't follow the standard's partitioning of consonants and vowels. It classifies them into the exclusive categories of the 46 consonants (i.e including ฤ and ฦ), the 9 free-standing vowels (including sara am and lakkhangyao), 3 'vowels' below the line (sara u, sara uu and phinthu), 7 vowels above the line (including maitaikhu and nikkhahit, but not yamakkan), the 5 marks that can appear above others (tone marks and thanthakhat/karan), the digits, and the 'special markers'. The latter are all puncutation except yamakkan, which I think should have been included in the vowels above the line. I thought it more useful to put ฤ and ฦ in both the consonant and the vowel lists, and less misleading to only show lakkhangyao in the combinations ฟๅ and ฦๅ. However, it wouldn't surprise me if some minority language had an orthography in which lakkhangyao is a proper vowel. There's one language which uses karan as a sort of tone mark!

Mechanically identifying the function of อ as consonant, vowel or class indicator is not easy. Although it is a vowel in the word ผงอบ /phàʔ ŋɔ̀ːp/ 'extremely weak', it's not obvious why one can't translate 'baking powder' as ผงอบ /phǒŋ ʔòp/. I've always wondered about the old spelling 'Yuthia' of 'Ayutthaya'. Was อ being misinterpreted as a class indicator? The consonant combination อย used to be much commoner, and represented a pre-glottalised palatal glide (source: Fang-Kuei Li's 'A Handbook of Comparative Tai'). Similarly, there is no reason to disbelieve that the class indicator ห was not once a full consonant, as in Old English hnutu 'nut' or hlud 'loud'. Again, the difference cannot always be detected - there are two words แหน, pronounced /nɛ̌ː/ and /hɛ̌ːn/ respectively!
(preceding unsigned edit is marked 2007-05-18T00:07:54 RichardW57)

I suspect the choice of v for ฤ is inspired by the Thai cursive (Roman) 'r'. The latter can look rather like 'seagull' (e.g. U+033C), possibly following the example of cursive kho khuat, which can look like a cross between small gamma and ram's horns (ɤ). I don't know how to verify this idea. RichardW57 18:39, 18 May 2007 (UTC)

For the transliterations of ศ and ษ, see extract for ว ศ and ษ.

How do you feeling about displaying the consonants with the extended vargas lined up? I'm tempted to copy the sibilants to their rightful places, thus:

Thai
ISO k k̄h ḳ̄h kh k̛h ḳh ng c c̄h ṣ̄ ch s c̣h
Thai
ISO ṭ̄h s̛̄ ṯh t̛h d t t̄h th ṭh n
Thai
ISO b p p̄h ph f p̣h m
Thai
ISO y r v l ł w ṣ̄ s̛̄ x

Having one varga (วรรค) per line would make the table too tall. RichardW57 09:35, 20 May 2007 (UTC)

Tabular form[edit]

I have found the following layout of Thai consonants useful to see patterns. I think it would show the principles of the ISO system quite well, but I have no time now to work them in. They could replace (or be added to) the RTGS columns.

    labial dental guttural
class forming rtgs thai rtgs thai rtgs thai
high aspirated ph พ ภ th ท ฑ ฒ ธ kh ค ฆ
fricative f s h
nasal m n น ณ ng
semivowel w y ย ญ    
liquid     l ล ฬ r
palatalised     ch ช ฌ    
mid voiceless p t ต ฎ k
voiced b d ด ฏ kh  
palatalised     j -
low aspirated ph th ถ ฐ kh
fricative f s ส ศ ษ h
palatalised     ch    

Woodstone 13:28, 20 May 2007 (UTC)

I think you're looking for something like this:

    velar palatal former
retroflex
dental labial glottal
class manner Thai ISO Thai ISO Thai ISO Thai ISO Thai ISO Thai ISO
mid voiced d b
voiceless k c t p x
high aspirated k̄h c̄h ṭ̄h t̄h p̄h
once fricative ḳ̄h ṣ̄ s̛̄
low
(once
all
voiced)
formerly unaspirated kh ch ṯh th ph
fricative k̛h s f
forever aspirated ḳh c̣h t̛h ṭh p̣h
nasal ng n m
semivowel y r l w
semivowel 2

Including a glottal order might actually be unhelpful. Note that this table doesn't include ฤ and ฦ. RichardW57 23:37, 20 May 2007 (UTC)

horn?[edit]

Hi. I'm rather confused by the horn diacritic used for ฅ ↔ k̛h, ฒ ↔ t̛h, and ษ ↔ s̛̄. This is rather strange as to my knowledge the horn U+031B is used in Vietnamese (ơ and ư, and accented), in ALA-LC romanization of Lao (ư ư̄) and Thai (ư ư̄). I would expect the letter prime ʹ U+02B9 as in the Unicode CLDR XML file or perhaps the letter apostrophe ʼ U+02BC (that's what it looks like in the UNGEGN PDF file and book – the HTML page uses the apostrophe ’ U+2019, the punctuation "equivalent" of letter apostrophe ʼ U+02BC). Here's what they look like :

  • horn : ฅ ↔ k̛h, ฒ ↔ t̛h, and ษ ↔ s̛̄
  • letter prime : ฅ ↔ kʹh, ฒ ↔ tʹh, and ษ ↔ s̄ʹ
  • letter apostrophe : ฅ ↔ kʼh, ฒ ↔ tʼh, and ษ ↔ s̄ʼ

--Moyogo/ (talk) 13:46, 19 December 2011 (UTC)

U+031B is indeed what the standard lays down - see the extract at ว ศ and ษ. I don't doubt that the occasional use of ơ and ư for Thai led to the use of horn as second preference diacritic.

The ICU transliteration (available at http://demo.icu-project.org/icu-bin/translit) indeed transliterates เชียงใหม่ to cheīyngh̄ım̀ and not to cheīyngh̄m̀ı. Clusters in Thai cannot be identified with 100% reliability. RichardW57 (talk) 23:18, 29 March 2012 (UTC)

The sequence s̛̄ (U+0073 U+0304 U+031B) from [1] is rather odd. Its normalized form is s̛̄ (U+0073 U+031B U+0304). Which makes me wonder why would a non normalized character sequence be in a standard. --Moyogo/ (talk) 07:12, 31 March 2012 (UTC)
The real answer is probably that the authors didn't fully understand normalisation. Also, the standard seems to have sat around for 5 years before being approved in 2003, implying that there were severe doubts as to its usefulness. Besides, Clause 5.2 requires that the macron be 'typed' before the dot below or horn, and ISO 10646 does not require that equivalent sequences be treated identically. Finally, phinthu is transliterated to U+0325 COMBINING RING BELOW, so ษฺ is transliterated and normalised to s̛̥̄ U+0073 U+031B U+0325 U+0304 - normalisation makes automated back-transliteration tricky! − RichardW57 (talk) 12:14, 1 April 2012 (UTC)