User:Buddhipriya/LanguageTransliterationStyleGuides
This is a Wikipedia user page. This is not an encyclopedia article or the talk page for an encyclopedia article. If you find this page on any site other than Wikipedia, you are viewing a mirror site. Be aware that the page may be outdated and that the user in whose space this page is located may have no personal affiliation with any site other than Wikipedia. The original page is located at https://en.wikipedia.org/wiki/User:Buddhipriya/LanguageTransliterationStyleGuides. |
Manual of Style (MoS) |
---|
This is a page of notes on how different languages have standardized the use of diacritical marks on Wikipedia. Use of diacritical marks is an aspect of Wikipedia:Romanization standards on Wikipedia.
Keep in mind that "transliteration" and "transcription" don't necessarily require using diacritics; many languages can be transcribed in the Latin alphabet with no diacritics at all. For my notes on Indic issues specifically see: User:Buddhipriya/IASTUsage
General Wikipedia style guides
[edit]- Diacritic
- Wikipedia talk:Manual of Style
- Wikipedia:Naming conventions (use English)#Modified letters
- Wikipedia:Manual of Style (spelling)
- You can search in Category:WikiProjects and Category:Wikipedia style guidelines and Category:General style guidelines
Languages with active Wikipedia conventions related to diacritics
[edit]Chinese
[edit]WP:CHINESE has standards for Chinese romanization.
(Pinyin) is the standard, including diacritics for tone marks. Article says that while tone marks are often omitted, they can be essential to determine the actual word intended, and gives a detailed table of diacritics for tone marks. This is similar to the issue with preservation of long and short vowels in Indic speech.
Cyrillic
[edit]Wikipedia:Naming conventions (Cyrillic)
French
[edit]Wikipedia:Manual of Style (France & French-related)
Page samples
[edit]Background
[edit]"French proper names and expressions should respect the use of accents and ligatures in French" (Note table of diacritics. The French policy is clear that French diacritics are to be respected.)
Iceland
[edit]Wikipedia:Manual of Style (Iceland-related articles)
In accordance with the Manual of Style and naming conventions, Wikipedia should use the name most commonly used in English. This may differ from the Icelandic name, e.g. Westfjords rather than Vestfirðir and Left-Green Movement rather than Vinstrihreyfingin - grænt framboð; but it may also be the Icelandic name, e.g. Vestmannaeyjar rather than Westman Islands and rímur rather than rhymes (Iceland) or the like.
The use of Icelandic characters like 'æ', 'ö', 'þ', 'ð' is sometimes controversial and their frequency of use in printed text is hard to determine due to OCR errors. In practice most articles on Icelandic subjects do use Icelandic characters but it is still advisable to tread lightly when moving pages.
If an article has a name with one or more characters not used in modern English, diacritics, or both, remember to create redirects from other likely spellings. For example, Súðavík needs a redirect from Sudavik, and probably also from Súdavík since some texts may keep accent marks on vowels but change 'ð' to 'd'.
Indic
[edit]Wikipedia:Indic transliteration scheme, Wikipedia:Naming_conventions_(Indic)
Also see two inactive guides:
Buddhism Project page samples
[edit]Irish
[edit]Wikipedia:Manual of Style (Ireland-related articles)
Where a subject has both an English and an Irish version of their name use the English version of a name if that is more common among English speakers but mention the Irish name in the first line of the article. Create a redirect page at the Irish version of the name as appropriate.
- Example: Oifig Aicmithe Scannán na hÉireann (redirect page) → Irish Film Classification Office
Conversely, when the Irish version of a name is more common among English speakers use the Irish version of the name for the title of articles. Mention the English name in the first line of the article.
- Example: Irish Rail (redirect page) → Iarnród Éireann
If someone used the Irish version of his or her name use that version when naming the article if it enjoys widespread usage among English speakers. If the Irish version does not enjoy widespread usage among English speakers then use the English version when naming the article. In the latter case, refer to the Irish version of the name in the first sentence of the article. Example:
- Máirtín Ó Cadhain, not Martin Kyne
- Geoffrey Keating, not Seathrún Céitinn
The Ó in surnames always takes an accent and is followed by a space e.g. Tomás Ó Fiaich, not Tomas O'Fiaich.
When transcribing from Irish texts which contain lenited letters (the dot above letters indicating séimhiú), reflect modern usage by replacing the dot with an 'h'. Example:
The síneadh fada (or acute accent) should be used when Irish spelling requires it e.g. "Mary Robinson (Máire Mhic Róibín)", not "Mary Robinson (Maire Mhic Roibin)".
Japanese
[edit]Multiple methods of transliteration are used. Some of them use elaborations to enable non-native speakers to pronounce Japanese words more correctly. Typical additions include tone marks to note the Japanese pitch accent and diacritic marks to distinguish phonological changes, such as the assimilation of the moraic nasal /n/ (see Japanese phonology).
Kosovo
[edit]Wikipedia:Manual of Style (Kosovo-related articles)
MOS does not spell out a clear policy on diacritics, but uses them in the body of the article itself, noting that some English sources often omit them.
Mongolian
[edit]Wikipedia:Naming conventions (Mongolian)
The transliteration system seems to use only two vowel diacritics: Ö (ö) Ü (Ü)
Philippine
[edit]Wikipedia:Manual of Style (Philippine-related articles)
Diacritics or accent marks are to be preserved even if they are unused today.
- Example:
- Article name: José Rizal
- First mention: José Protacio Mercado Rizal y Alonzo Realonda
- Article name: José Rizal
Portugese
[edit]Wikipedia:Manual of Style (Portuguese-related articles)
Singapore
[edit]Wikipedia:Manual of Style (Singapore-related articles)
Page Samples
[edit]- Real Academia Española
- Jalapeño (redirects from Jalapeno, note ñ is part of Latin-1)
Background
[edit]Spanish is written in the Latin alphabet, with the addition of the character ‹ñ› ([eñe] Error: {{Lang}}: text has italic markup (help), representing the phoneme /ɲ/, a letter distinct from ‹n›, although typographically composed of an ‹n› with a tilde) and the digraphs ‹ch› ([che] Error: {{Lang}}: text has italic markup (help), representing the phoneme /t͡ʃ/) and ‹ll› ([elle] Error: {{Lang}}: text has italic markup (help), representing the phoneme /ʎ/). However, the digraph ‹rr› ([erre fuerte] Error: {{Lang}}: text has italic markup (help), 'strong r", [erre doble] Error: {{Lang}}: text has italic markup (help), 'double r', or simply [erre] Error: {{Lang}}: text has italic markup (help)), which also represents a distinct phoneme /r/, is not similarly regarded as a single letter. Since 1994 ‹ch› and ‹ll› have been treated as letter pairs for collation purposes, though they remain a part of the alphabet. Words with ‹ch› are now alphabetically sorted between those with ‹ce› and ‹ci› , instead of following ‹cz› as they used to. The situation is similar for ‹ll›.[1][2]
Thus, the Spanish alphabet has the following 29 letters:
- a, b, c, ch, d, e, f, g, h, i, j, k, l, ll, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z.[3]
With the exclusion of a very small number of regional terms such as México (see Toponymy of Mexico), pronunciation can be entirely determined from spelling. Under the orthographic conventions, a typical Spanish word is stressed on the syllable before the last if it ends with a vowel (not including ‹y›) or with a vowel followed by ‹n› or ‹s›; it is stressed on the last syllable otherwise. Exceptions to this rule are indicated by placing an acute accent on the stressed vowel.
The acute accent is used, in addition, to distinguish between certain homophones, especially when one of them is a stressed word and the other one is a clitic: compare [el] Error: {{Lang}}: text has italic markup (help) ('the', masculine singular definite article) with [él] Error: {{Lang}}: text has italic markup (help) ('he' or 'it'), or [te] Error: {{Lang}}: text has italic markup (help) ('you', object pronoun), [de] Error: {{Lang}}: text has italic markup (help) (preposition 'of'), and [se] Error: {{Lang}}: text has italic markup (help) (reflexive pronoun) with [té] Error: {{Lang}}: text has italic markup (help) ('tea'), [dé] Error: {{Lang}}: text has italic markup (help) ('give' [formal imperative/third-person present subjunctive]) and [sé] Error: {{Lang}}: text has italic markup (help) ('I know' or imperative 'be').
The interrogative pronouns ([qué] Error: {{Lang}}: text has italic markup (help), [cuál] Error: {{Lang}}: text has italic markup (help), [dónde] Error: {{Lang}}: text has italic markup (help), [quién] Error: {{Lang}}: text has italic markup (help), etc.) also receive accents in direct or indirect questions, and some demonstratives ([ése] Error: {{Lang}}: text has italic markup (help), [éste] Error: {{Lang}}: text has italic markup (help), [aquél] Error: {{Lang}}: text has italic markup (help), etc.) can be accented when used as pronouns. The conjunction [o] Error: {{Lang}}: text has italic markup (help) ('or') is written with an accent between numerals so as not to be confused with a zero: e.g., [10 ó 20] Error: {{Lang}}: text has italic markup (help) should be read as [diez o veinte] Error: {{Lang}}: text has italic markup (help) rather than [diez mil veinte] Error: {{Lang}}: text has italic markup (help) ('10.020'). Accent marks are frequently omitted in capital letters (a widespread practice in the days of typewriters and the early days of computers when only lowercase vowels were available with accents), although the RAE advises against this.
When ‹u› is written between ‹g› and a front vowel (‹e i›), it indicates a "hard g" pronunciation. A diaeresis (‹ü›) indicates that it is not silent as it normally would be (e.g., cigüeña, 'stork', is pronounced [θiˈɣweɲa]; if it were written ‹cigueña›, it would be pronounced [θiˈɣeɲa].
Interrogative and exclamatory clauses are introduced with Inverted question and exclamation marks (‹¿› and ‹¡›, respectively).
Languages with inactive Wikipedia MOS
[edit]- Wikipedia:Manual of Style (Macedonia)/historical
- Wikipedia:Manual of Style (United Kingdom-related articles)
- Wikipedia:Manual of Style (national varieties of English)
- Wikipedia:Manual of Style (Arabic), also see Romanization of Arabic
- Wikipedia:Vernacular_scripts
Languages the use diacritics, without formal Wikipedia guides
[edit]The academic standard for Pali is the same as for Sanskrit, but there does not appear to be a Wikipedia style guide for Pali articles. The standard practice is as used here (Source: [1])
Languages that do not require diacritics
[edit]Uses Wylie transliteration which does not require diacritics.
Languages not yet checked
[edit]Arabic
[edit]See;Arabic diacritics, Romanization of Arabic, Wikipedia:Manual of Style (Arabic) (inactive), Wikipedia:Naming conventions (Arabic) (inactive)
Greek (See: Greek diacritics and (WP:GREEK))
[edit]- From your user subpage I take it you are referring to the handling of diacritics in transliteration, right? Good question, and I think it's being handled less than consistently. The conventions should be at WP:GREEK. Currently it explicitly says that no accent marks should be used for Modern Greek. About Ancient Greek, I don't see any explicit rule, but two examples that do contain acute accents and length marks (Hómēros, Skýthēs). It might be worth clarifying this. Fut.Perf. ☼ 07:15, 18 March 2010 (UTC)
Modern Greek
[edit]For modern Greek the Greek MOS says not to use diacritics:
Modern Greek uses two diacritics: the acute accent (indicating stress) and the diaeresis (indicating that two consecutive vowels should not be combined). In some transliteration systems these are kept, but this is certainly not common practice. No diacritics should be used in Wikipedia article titles.
In addition to the letters, the Greek alphabet features a number of diacritical signs: three different accent marks (acute, grave and circumflex), originally denoting different shapes of pitch accent on the stressed vowel; the so-called breathing marks (rough and smooth breathing), originally used to signal presence or absence of word-initial /h/; and the diaeresis, used to mark full syllabic value of a vowel that would otherwise be read as part of a diphthong. These marks were introduced during the course of the Hellenistic period. Actual usage of the grave in handwriting had seen a rapid decline in favor of uniform usage of the acute during the late 20th century, and it had only been retained in typography.
In the writing reform of 1982, the use of most of them was abolished from official use in Greece[citation needed]. Since then, Modern Greek has been written mostly in the simplified monotonic orthography (or monotonic system), which employs only the acute accent and the diaeresis. The traditional system, now called the polytonic orthography (or polytonic system), is still used internationally for the writing of Ancient Greek.
Greek has occasionally be written in the Latin alphabet in the past, especially in areas under Venetian rule or by Greek Catholics (and called Fragolevantinika or Fragochiotika)[citation needed], and more recently is often written in the Latin alphabet in online communications (called Greeklish).[4]
Hebrew (See: Romanization of Hebrew)
[edit]Page examples
[edit]Using diacritics in title
[edit]- Aukštaitian dialect
- Dené-Yeniseian languages
- Drúedain (Middle Earth)
- Kings of Númenor (Middle Earth) Also see List of rulers of Númenor for table example
- Dzūkian dialect
- Öömrang
- Pirahã language
- Söl'ring
- Uldis Bērziņš (Latvian)
- Edvard Kožušník (Czech)
ü
[edit]- Max Müller (German)
- Khün language (Southeast Asian)
References
[edit]- ^ Diccionario Panhispánico de Dudas, 1st ed.
- ^ Real Academia Española, Explanation at Spanish Pronto (in Spanish and English)
- ^ "Abecedario". Diccionario panhispánico de dudas (in (in Spanish)). Real Academia Española. 2005. Retrieved 2008-06-23.
{{cite web}}
: CS1 maint: unrecognized language (link) - ^ Jannis Androutsopoulos, "'Greeklish': Transliteration practice and discourse in a setting of computer-mediated digraphia" in Standard Languages and Language Standards: Greek, Past and Present online preprint