User:Cscott/LanguageConversion

From Wikipedia, the free encyclopedia

Some background information on the use of LanguageConverter on wikimedia projects (zhwiki is the internal database name for zh.wikipedia.org):

  • The Chinese_Wikipedia article has a good historic overview of the process that led to LanguageConverter and a merged zhwiki.  It also provides useful statistics on readers/editors of zhwiki. Competitors to zhwiki include Bǎidù Bǎikē and Hùdòng Zàixiàn.  My understanding is that these are both sites hosted in mainland China, and thus have to adhere to legal restrictions preventing them from publishing content with traditional characters or in dialects other than Putonghua (the standard PRC Mandarin).
  • English is a Dialect of Germanic; or, The Traitors to Our Common Heritage gives a reasonable analogy for an English-speaker for how it can be that so many different languages can be 'read' from the same written text in Chinese characters.  The above Wikipedia article refers to this as "read in local pronunciation but preserving the vocabulary and grammar of Standard Chinese".  This is a reasonable starting point for understanding some of the zhwiki variants.
  • Diglossia and digraphia in Guoyu-Putonghua and in Hindi-Urdu considers zh-cn vs zh-tw more specifically; these are the "same language" but diverged in 1949, in writing script but also to some degree in vocabulary (and pronunciation, but that's less relevant).

Some language pairs to consider:[edit]

Sharing the same wiki:

  • zh-cn / zh-tw : "Taiwanese" Mandarin, in traditional characters -vs- "Mainland" Mandarin in simplified characters.
  • zh-sg / zh-mo : Chinese with Singapore/Malaysian terms, and Chinese as spoken in Macao.  I don't understand the linguistic issues here, but these dialects are both written in simplified characters and share the same wiki as zh-cn/zh-tw.
  • Wikipedias in Multi-writing Systems lists 9 more wikipedias using script conversion and sharing a wiki

Split into different wikis:

Language converter consists of two parts: a script conversion, and a word-level converter.  Other language pairs which could use this toolbox:

Script conversions:

  • ur/hi : Hindi and Urdu are mutually intelligible, just written in different scripts.  Large political differences, though.  See [[Urdu]]
  • Pinyan and Bopomofo transcriptions/annotations are often used for language learning in Chinese (including native speakers).  See How to learn to read Chinese.  Additionally, pinyan is widely used as an input method, so it may be worthwhile to allow pinyan display during authoring/editing.

Word level conversions:

  • ar/arz : Arabic and Egyptian Arabic. See [[Egyptian_Arabic_Wikipedia#Reaction]] which mirrors some of the zhwiki issues.
  • es-es/es-ar : Spain and Latin American Spanish.  There are other vocabulary differences within the Latin American countries as well, but es-ar is usually the first split made.  (OLPC has separate localizations, but eswiki hasn't (yet?) split.)
  • pt-pt/pt-br : Portuguese and Brazilian Portuguese.  A fork has been discussed but 80% of the contributors to ptwiki are Brazilian, see [[Portuguese Wikipedia]]
  • en-us/en-gb : British and American English.  Current policy is schizophrenic.
  • en/sco .  English and Scots.  See [[Scots Wikipedia]]

There are probably more, but these are the examples I'm currently familiar with.

Putting my cards on the table [ed note: this was written in 2013]: it would be nice to better support parallel texts in wikis -- the machine translation fans would greatly appreciate it!  Tools to better supporting parallel wikis might also provide some political cover (for example, for Urdu and Hindi, which don't want to admit that they are the same language).  But maintaining parallel texts is a rather speculative experiment at this time.  As a contrast, the technology and ideas behind LanguageConverter are known and the implementation roadmap for full Visual Editor support (by which I mean editing in your native variant) is well understood (members of the Parsoid and VE teams met during the Tech Days to hash out the steps required).

[2017 update: see One World, One Wiki!]

More information[edit]

See: mw:Parsoid/Language_conversion