Wikipedia talk:WikiProject Languages/Archive 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Archive 1 Archive 2


I recommend putting the language name as it is written in the language itself on top of the table. Listing its name in English is just redundant and useless. --Jiang 12:57, 22 Dec 2003 (UTC)

I agree. It would also make it similar to the the country templates. – Mic 14:34, Dec 22, 2003 (UTC)
Implemented suggestion at Swedish language and Danish language as a test. – Mic 14:45, Dec 22, 2003 (UTC)
The coloured bands within the tables in these two articles are different colours: were these chosen as appropriate to each country (as it would appear), or is it accidental? Phil 15:11, Dec 22, 2003 (UTC)

The Swedish ones were already there. For the ones I added (two?), I looked at the national flag. --Jiang 15:13, 22 Dec 2003 (UTC)

Regarding the new sidebars - this is just my opinion, but I find that they look a little "unbalanced" with two headings ("Official status" and "Language codes") at the bottom and nothing at the top. Should there perhaps be a "Statistics" heading (or some other name) at the top, like this (using the Romanian one as an example):

Spoken Romania, Moldova, Ukraine, Israel, Serbia, Hungary, the Balkans
Region Eastern Europe
Total speakers 28 Million
Ranking 36
Dialects 4

   East Romance

Official status
Official language Romania, Moldova
Language codes
ISO 639-1 ro
ISO 639-2 rum, rou

I don't know why, but that just seems a little more tidy to me. Feel free to ignore me if you disagree - it's not important. - Vardion 15:27, 22 Dec 2003 (UTC)

I agree. --Jiang

Inclusion of countries in table

I noticed that in the French table under "official status", only France is listed. Why not list organizations, such as the UN and IOC? I also noticed in the Poland table under "Spoken in" that a whole bunch of countries follow Poland. What's the criteria for inclusion? "significant numbers" should probably be defined. And should the list be shortened to "and 20 other countries" as is done for France? --Jiang 16:19, 22 Dec 2003 (UTC)

Language name policy

Each language should be on a page title XXX language, even in cases where the name of the language unambiguously always refers to the language

is contrary to general Wikipedia policy that disambiguation should not be used when there is nothing to disambiguate from. It would also lead to absurdities such as "Inuktitut language" (= Inuit language language). This proposal to undermine Wikipedia disambiguation policy was added without explanation, and there is no discussion on this page, so I've removed it. --Zundark 09:26, 30 Jan 2004 (UTC)

  1. Every language page needs to have "language" in the name because there is always something to disambiguate from, for every language has at the very least the possibility of other pages with which it would need to disambiguate. Using your example, Inuktitut, there could very well be Inuktitut literature, Inuktitut film, Inuktitut music, Inuktitut grammar, Inuktitut speakers, etc.
  2. Furthermore calling Inuktitut language an "absurdity" doesn't really hold any empirical water in English. viz "the hoi polloi", "the Los Angeles Police Department", "the alligator", "a zucchini", etc. all are "absurdities" according your standard of cross-linguistic grammatical consistency.
  3. Indeed, you provide your own best argument for the policy, as it seems the Inuktitut language has a suffix that means "language", something that English lacks, and therefore, in order to be unambiguous, must append the word "language", in order to be sure the article is about the language and not something about the people who speak the language.
  4. The explanation given was consistency, as most languages already require disambiguation, and those that don't will eventually, it makes sense for every language to be on a page called "XXX language", so other editors don't have to go look up if the language they're referencing happens to be one of the few languages isn't yet disambiguated and can just put [[XXX language|XXX]] whenever they are referring to the language. --Nohat 16:20, 2004 Jan 30 (UTC)
  1. The word 'Inuktitut' on its own cannot mean any of these things, so Inuktitut does not need to be disambiguated.
  2. The problem with 'Inuktitut language' is that 'Inuktitut' by itself means the language, and nothing else, so 'Inuktitut language' is not a sensible title for an article. (Nor are the four examples you give, for that matter, though they aren't really analogous.)
  3. How can an article called 'Inuktitut' be about the people who speak the language? They are not called Inuktitut.
  4. A redirect from [[XXX language]] suffices for this, and is obviously preferable to using an inappropriate article title. --Zundark 17:49, 30 Jan 2004 (UTC)

The text which I removed was added by you, without explanation and without discussion. Later, on Talk:Esperanto, you used the existence of the text to justify your moving of the Esperanto article, as though the text represented established Wikipedia policy, when it was simply your own policy and contrary to long-standing Wikipedia disambiguation policy.

So I was fully justified in removing it, and you should not have restored it. --Zundark 18:18, 30 Jan 2004 (UTC)

I think it should be "Whatever (language)", the same as other disambiguation. Unless "Whatever" by itself is unambiguous. anthony (see warning)

I support the use of XX language, especially Esperanto language, because there can also be EO culture, etc. Also, it gives WP a touch of standardisation when we have Esperanto language instead of just Esperanto. Also, for ido - Ido language would sound better. Inuktitut language, also better. Rronline 11:50, 12 Oct 2004 (UTC)

An interesting problem here is that Inuktitut actually doesn't mean "language of the Inuit". Instead, it translates to "like the Inuit". It might possibly be used for sth else sometime... but fortunately it only refers to the language at present. --[[User:Eequor|ᓛᖁ♀]] 10:19, 3 Dec 2004 (UTC)

Creoles, Families, and Color Bars

I organized all the langauges without infoboxes by family, so it should be faster for people to grab the right titlebar. However, I realized there is no color bar for Creoles. Creoles can't be genetically classified like other languages. The do not directly descend from another langauge the way most languages do. I was thinking that instead of just having one color bar for Creoles in general, that we should have different colors depending on the primary lexifier language. So all the French-lexifier creoles would have the same color, and all the English ones, and so on. --Tox 05:28, 23 Mar, 2004 (UTC)

I also think it helps to put horizontal dividing lines between the major sections of the template. It gives clear boundaries between topics in the text and makes it easier to read and to scroll through quickly. That way subheadings are more clearly subheadings. Some of these languages are starting to get a lot of headings and subheadings. --Tox 05:39, 23 Mar, 2004 (GMT)

I've added the info box to a few minority indo-european languages. The articles on many indo-european topics are an utter mess. Separate languages are described as one, language family articles contradict each other etc. I don't know what was meant by "official" languages - I have used official to mean officially recognised or protected. Secretlondon 22:46, 23 Mar 2004 (UTC)

The box shown above is not correct. The correct one is on Wikipedia:WikiProject_Language_Template. It's supposed to say Official Language of and also have a Regulated by section under the Official Status heading. Yeah, the articles are kind of a mess. One problem is that even the experts don't always agree on how to divide up languages. There are also many misconceptions held by people who don't know anything about linguistics (and even people who do). I'm not sure what the solution is. There are only so many of those contoversies that one can know enough about to inject some sense and order into. There may need to be some discussion under the banner of this or another related project on ways to clean up the language articles. --Tox 20:08, 25 Mar 2004 (UTC)

The idea that creoles cannot be genetically classified is a widespread but in my opinion unfounded myth. Hawaiian Creole isn't some sort of random mixture of languages; it's a descendant of English with a few borrowings, and it's far closer to English than English is to its own Anglo-Saxon ancestor. What can't be genetically classified is mixed languages, such as Michif; but those are extremely rare. The Ethnologue classifies creoles separately more bcause of their being a separate academic field than anything else, I would say. - Mustafaa 17:23, 16 Apr 2004 (UTC)

I think with most creoles, you are probably right in that one language is more predominant in the creole than the other, but the distinction between evenly mixed creoles and unevenly mixed creoles is difficult to pin down and likely to result in edit wars and accusations of POV. Probably the safest thing to do is to label the creoles the same as language isolates (rename the bar "language isolates, creoles, and pidgins" or maybe make a new color for creoles), and describe in the article that the creole is a mix of X and Y languages, with X being dominant or X and Y being evenly mixed as the case may be. Nohat 17:40, 2004 Apr 16 (UTC)
Also, and this is off-topic, but your example of Hawaiian Creole English being closer to English than English is to Old English might be interpreted instead as lending support to the English-as-creole hypothesis. :-) Nohat 17:42, 2004 Apr 16 (UTC)
Touché! But the same applies to Haitian Creole, French, and Latin...
I see you point, but very few creoles can reasonably be described as a mixture of languages at all. English-lexifier creoles have more African words than English does, sure, but usually not even enough to show up on a Swadesh list; same goes for French-, Portuguese-, Spanish-, and Arabic-lexifier creoles. In fact, I only know of one creole that could reasonably be called a mixture, and that's Copper Island/Mednyy Aleut (sp?). I suppose people do contend that other creoles are mixed, though - on the grounds of syntax and the like. - Mustafaa 17:53, 16 Apr 2004 (UTC)
Yes, the point I was getting at is that pidgins and hence creoles are inherently the product of the contact between multiple languages, and thus they have a "mixed" genetic inheritance. I suppose one could make the argment that the language contact that results in creoles is not fundamentally different from other kinds of language contact, such as the effect of say Gaulish Celtic on French, French on English, etc., but I'm pretty sure that view is outside of the mainstream of linguistics and should be identified as such. Nohat 18:16, 2004 Apr 16 (UTC)
Sure - the syntax is often from another language, or if you believe some theories, from the basic settings of the human mind. But syntax is one of the most common ways for languages to influence each other, and is commonly diffused throughout linguistic areas, as in Ethiopia or the Balkans. Still, some linguists do seem to accord this fact great importance, so I suppose we have to as well. But we should at least give creoles a separate color from true mixed languages and from isolates. - Mustafaa 19:11, 16 Apr 2004 (UTC)
Look, that is the true origin of creoles. Normally languages are heard by babies, whose minds are designed to figure out the grammar and listemes of the language. Because things are reanalyzed by newborn babies (and probably even sometimes by adults), the language changes gradually over time. A creole is not a reanalyzed and gradually changed form of an existing language. It is the birth of a new language from a pidgin. Pidgins are collections of words and phrases with very little grammatical structure. Since they do not have consistent rules, babies listening to them (and later talking to each other) end up unconsciously creating their own rules based on the base grammatical structures we're all genetically programmed with at birth. Put simply, by definition a creole is a language created from hearing words with no grammar, while a "normal" language has been passed on with both words and grammar. Of course, creoles then go on to be passed on in the normal manner. (There's a theory that Japanese may be a creole, which is why it's so hard to classify.) Of course this gets even more complicated by the fact that sometimes afterwards the creole has significant contact with the language it took most of its words from, resulting in what are called mesolects. This has happened in the Carribean in places like Jamaica, so that there's a spectrum with the Jamaican Creole on one end and "proper" English on the other, and a bunch of mesolects in between. A native English speaker would probably not understand the creole, but would the mesolects closest to English (and vice-versa). Again, Ebonics in America may be a mesolect between Southern American English and a creole like Gullah. Okay, this has gotten long-winded, but the subject is complex. I hope this clarifies that creoles cannot be classified in the normal manner, but can (usually) be classified according to the lexifier language (the one most of the words were taken from).Tox 09:54, 8 Oct 2004 (UTC)
The theory that creoles default to "the base grammatical structures we're all genetically programmed with at birth" is often argued to be disproved by cases such as Nubi and Sango, and I'm inclined to agree with Bickerton's critics. But in any event, I would argue that questions of syntax as pretty tangential to linguistic classification in the normal manner; in fact, syntax is generally the last thing one would look to reconstruct. Morphology is generally the most reliable, but only when it's actually present; when it's virtually absent - a point that applies as much to Chinese as to creoles - you look at vocabulary, and by vocabulary most creoles' classification is trivially obvious. There are, however, a very few interesting possible exceptions, like Berbice Dutch Creole. - Mustafaa 01:21, 4 Dec 2004 (UTC)


The term "language" (as opposed to dialect) is highly controversial for Arabic varieties, being vehemently denied by most of their speakers; to avoid accusations of NPOVness, I suggest not including it in the title of these pages, and consistently making them "variety name" + Arabic. - Mustafaa 20:49, 15 Apr 2004 (UTC)

OK. There has been a similar argument regarding Chinese language as to what to call the various Chinese dialects. I'm pretty sure the Ethnologue uses the criterion of mutual intelligibility to divide up the languages, although you seem to believe the Ethnologue's treatment of the varieties of Arabic is not very good. I don't have any reason to doubt either you or the Ethnologue, but I think it'd be good if you could show some external support for your breakdown of the different varieties of Arabic. I won't fight you on it, as I don't really know much about the different varities of Arabic other than the fact that large groups of them are mutually unintelligible. As for whether we should call them "languages" or "dialects", probably the best thing to do would be to call them "varieties" of Arabic, and explain that one of the main criterion used for dividing up languages and dialects is that if they're mutually unintelligible, then they're separate languages, and the varieties of arabic meet this criterion. However, speakers of the different varities don't tend to think of them as separate languages but more as dialects. Nohat 17:33, 2004 Apr 16 (UTC)
Sadly, the Ethnologue does not divide them by any such criterion - probably because SIL has virtually no missionaries in the Arab world. Even it acknowledges "Eastern Algerian and Tunisian dialects are close, and Western Algerian and Moroccan dialects are close, but speakers prefer their own varieties." I can personally attest that all three Maghreb dialects are absolutely mutually intelligible, even to illiterates; and Middle Eastern dialects seem far more mutually intelligible than the Ethnologue gives them credit for as well. For dialect divisions of Arabic, more reliable sources would include Kees Versteegh, The Arabic Language [1]; unfortunately, I don't have it at hand, so I'm going by the divisions that seem natural at the moment, apart from the more obscure creoles and so forth. The most basic division in Arabic is really Eastern vs Western vs Andalusi, as Ibn Khaldun said; but that may be a level too high for this. - Mustafaa 17:43, 16 Apr 2004 (UTC)
Oh, and Bedouin. - Mustafaa 17:54, 16 Apr 2004 (UTC)
Like I said, you seem to know more much more about this than me or anyone else here, so I would encourage you to go ahead and start articles on the different Arabic varities and write up what you do know. Go ahead and name the articles whatever you think is appropriate. What you have written here seems to point to the fact that you have some knowledge about the subject and your contributions would be very valuable. Nohat 18:10, 2004 Apr 16 (UTC)
Well, I'm not an expert, but I'll see what I can do! - Mustafaa 19:12, 16 Apr 2004 (UTC)
See Varieties of Arabic - Mustafaa 23:05, 22 Apr 2004 (UTC)

Summary table

I've created a wiki template for the summary table. I've already replaced the tables at Tatar language and Japanese language with the template syntax, and added one to Elamite language. The syntax for including it is:

|name=name of language
|nativename=native name of language
|familycolor=appropriate color for language family
|states=nations in which it is primarily spoken
|region=geographic region in which it is primarily spoken
|speakers=number of speakers
|rank=ranking by number of speakers
|family=genetic classification
|nation=nation where it is an official language
|agency=regulatory body
|iso1=iso-639-1 code
|iso2=iso-639-2 code
|sil=SIL code}}

The parameters can appear in any order. They can also be combined on lines as long as they're separated by pipes. I like to put name and nativename on the same line, speakers and rank on another, and the three codes on their own line, but it's not necessary.

Important notes: all parameters must be supplied, or else bits of the template's coding will show through. You can leave a parameter blank, though (like "iso1=" with nothing after the =). "familycolor" must be a valid color name (like "yellowgreen") or hex code (like #dddddd) or else it'll break the table. Also, due to a limitation of MediaWiki, you use put piped links in any of the parameters, because it'll interpret the vertical pipe character as ending the parameter and cut off the rest.

Hope this helps! -  – G↭a⇅a | Talk 19:02, 23 Jun 2004 (UTC)

Language codes and Language Families

Does anyone have any strong opinions or thoughts about adding the Linguasphere codes to the language box? Its biggest disadvantage, of course, is that these codes mostly cannot be looked up online... - Mustafaa 02:17, 20 Aug 2004 (UTC)

It is the first time I hear of this organisation, but a first glance on their website gives me a good impression. For me no objection to add these linguasphere codes Donar Reiskoffer 15:50, 20 Aug 2004 (UTC)

Thank you. Another idea I'm considering is language family/group boxes, something like this:

Songhay languages
Spoken in: Algeria, Benin, Burkina Faso, Mali, Niger, Nigeria
Region: Niger River
Total speakers: 3 million

 Songhay languages

Subgroup codes
Linguasphere 01-

(cf. [2] for the LINGUIST List codes). Any thoughts?- Mustafaa 00:45, 21 Aug 2004 (UTC)

I think that's a good idea.--Tox 11:08, 9 Oct 2004 (UTC)
Yes, the LINGUIST code would be a good idea. But the Linguasphere codes--if they can't be looked up online and everybody has to buy a book not many contributors could help with that. Wikiacc 21:32, 11 Jan 2005 (UTC)
Change that. It seems the LINGUIST codes cannot be looked up online because the link provided by Mustafaa can only be looked up by SIL code. Am I doing something wrong? Wikiacc 21:38, 11 Jan 2005 (UTC)
I now seem to have it reversed. Linguasphere codes can be looked up, it's just very hard to do. Wikiacc 21:40, 11 Jan 2005 (UTC)

Categories by Country

I object to the creation of tiny categories for each country in the world. It is unnecessary, and they only take up usually less than 10 articles. -- AllyUnion (talk) 05:21, 20 Dec 2004 (UTC)

Not for PNG they won't! (I assume you mean language categories.) I think they're rather useful myself, especially in extremely multilingual countries like most of Africa, South America, and South and Southeast Asia, though I admittedly wouldn't see the point of something like a "Languages of Lichtenstein" category. - Mustafaa 11:06, 20 Dec 2004 (UTC)

Extremely multilingual countries, yes, but for every nation in the world, no. -- AllyUnion (talk) 04:10, 22 Dec 2004 (UTC)

Most countries of any significant size have more than 10 languages, though, at the level of detail that Wikipedia should have; Ethnologue can be a real eye-opener. - Mustafaa 11:14, 31 Dec 2004 (UTC)

International English

This article has a disagreement, and input from one or more linguists could be useful. Thanks. Maurreen 09:03, 31 Dec 2004 (UTC)

I've got involved on the talk page: I think the problem has been to do with conflicting ideas of what the concept of Interantional English is. I'm looking at approaching it mainly from an TEFL approach with some historical background and referrence to various 'popular' uses of the term. If anyone has TEFL experience or another way of approaching the article, please join in. Gareth Hughes 01:17, 2 Jan 2005 (UTC)

Color bars again

Now it relates to constructed languages.

Constructed languages cannot use the template for a simple reason: the provided bar color is black. Since the text is written in black, the collision makes it impossible to see the titles. Would it be possible to change the template to allow different text colors, which would only have to be implemented where needed? (i.e., other attributes such as familycolor, if left out, will not show code.) If it is not possible, could there be another color for constructed languages? Wikiacc 23:06, 1 Jan 2005 (UTC)

I don't think I've broken it. The infobox can now have different font colours on the colour bars. Simply add |fontcolor=white to give white text. If this attribute does not appear in the article, it should still be rendered in black. I hope. Please don't break! Gareth Hughes 00:30, 3 Jan 2005 (UTC)

Thank you! Wikiacc 01:06, 3 Jan 2005 (UTC)
Also, how about adding a separate colour for pidgins and dialects? I used the table for Warsaw dialect and it works quite well, but the problem is that IMHO the dialects of languages should be treated separately. Also, pidgins are often a mixture of completely different languages comming from different language groups. Should we use the colours of the substrate or the superstrate? Or perhaps some different colour..? Halibutt 13:56, Jan 5, 2005 (UTC)

What colourful language! It would be lovely for every idiolect we could imagine to have its own colour, but we are talking about systematic use of the web-safe colours. I've been doing a lot with infoboxes on Modern Aramaic languages, and they're all yellow because they're all Semitic. However, I've put template at the foot of each page to link related languages together. This kind of thing might be the easiest way to deal with dialects. In the case where a pidgin or creole is so mixed that it cannot really be said to be one group or another, it would be best to have a separate colour to indicate mixed-family origin.

A lot of my infoboxes are unsuprisingly blank when it comes to the 'Official language of:' (nation=) line. I've noticed that some boxes (I can't remember which) put historical polities here (e.g. Akkadian, official language of Assyrian Empire), or non-national organisations (e.g. Syriac, official language of the Syriac Orthodox Church). What is the policy here? Are we to stick to current nation states in the infobox, or include anything that used the language officially?

Gareth Hughes 15:20, 5 Jan 2005 (UTC)
Don't pidgins already have a color bar? Wikiacc 20:40, 5 Jan 2005 (UTC)


What is this project's consensus on using SAMPA, X-SAMPA, IPA and approximative english transcription?--Circeus 22:36, Jan 10, 2005 (UTC)

I think there has been a consensus (by those with better computers than mine) that IPA be brought in. SAMPA does not translate across languages (requiring tables), and X-SAMPA might as well be Klingon: it's a rough and ready way to transcribe phonetics accurately using the basic resources of plain text. It looks like IPA is the way forward, but I imagine many users will not be Unicode compatible. I use a rather haphazard collection of 'symbols' that I can see: I need to upgrade before someone helps my work to comply. Is there a middle way? Gareth Hughes 23:14, 10 Jan 2005 (UTC)
use the correct unicode numbers, and then surround phonetics with the {{IPA|/whatever phonetic symbols/}} template. IPA notice is also recommended. That does it for all but a few obscure characters (most notably the link bar). --Circeus 23:34, Jan 10, 2005 (UTC)
That really isn't a middle way, but I agree that it's the way to go (in fact, I've been converting SAMPA to IPA on many pages). As far as compatibility goes, most browsers on any operating system support Unicode, and the {{IPA}} template does good things by forcing the right font. Mac OS 9 is really the only OS out there that still has serious Unicode issues, and even on that OS, there is Unicode support, and fonts are available. --Marnen Laibow-Koser (talk) 14:35, 11 Jan 2005 (UTC)

Conflicting color bars

In certain instances two or more color bars are applicable for the infoboxes on a page, e.g. Zuni language where both lightblue for Amerindian languages and #dddddd for language isolates. For general and specific colors I have been going with more specific, e.g. Tupi languages use the more specific deepskyblue over lightblue; is this a similar occurrence where #dddddd should be used or is discretion allowed? Wikiacc 02:23, 30 Jan 2005 (UTC)

Unicode transliteration proposal

Code Character Unicode Use
glot 146 glottal stop
Protosemitic alpu
Aramaic/Hebrew א
Arabic ء
aa ā 257 long a vowel
ee ē 275 long (open) e vowel
oo ō 333 long (open) o vowel
bh 7687 soft b
gh 7713 soft g
Protosemitic gharu
Arabic غ
htie 7723 Protosemitic harmu
dh 7695 soft d
Protosemitic dhabu
Arabic ذ
uu ū 363 long u vowel
odot 7885 (short) close o vowel
oodot 7897 long close o vowel
hdot 7717 Aramaic/Hebrew ח
Arabic ح
hline 7830 Egyptian
tdot 7789 emphatic t
Aramaic/Hebrew ט
Arabic ط
ii ī 299 long i vowel
edot 7865 (short) close e vowel
eedot 7879 long close e vowel
kh 7733 soft k
Arabic خ
ayn 145 Aramaic/Hebrew ע
Arabic ع
ph 7767 soft p
sdot 7779 emphatic s
Aramaic/Hebrew צ
Arabic ص
sacute ś 347 Protosemitic sannu
Aramaic/Hebrew ש
sh š 353 Protosemitic shetu
Aramaic/Hebrew ש
Arabic ش
th 7791 soft t
Arabic ث
j ğ 287 Arabic ج
ddot 7693 emphatic d
Arabic ض
dhdot 7827 emphatic dh
Arabic ظ
ch č 269 extended Arabic چ
zh ž 382 extended Arabic ژ
reed 7881 Egyptian

The table to the right is part of a proposal I have for making transliteration easier to do. I have restricted this to Semitic and Egyptian languages, but it would be possible to expand use to other groups.

The first problem is that some browsers don't deal with Unicode very well. Template:unicode and Template:IPA are designed to deal with this issue (are the two the same?). It would, thus, be good to use these templates for all work in Unicode.

The second problem is that it is difficult to find the code you want for a particular letter, and that the codes are as memorable as telephone numbers (something else I have a problem with!). It's not so bad when you're writing Syriac, as all the letters are in the same place. However, special characters for transliteration can be spread all over the chart. It would be good to have a template that would convert sensible codes into Unicode for us.

For example, in transliteration you would like to produce . There are all sorts of ways to avoid the character, but it is a common character in the transliteration of Semitic languages. Looking at Unicode, you find that it is coded 7717. You could include ḥ in your article, and the job is done, or, being helpful, you could write {{unicode|ḥ}}.

I propose that it would be even more helpful if you could write {{hdot}} which would call a template that introduced {{unicode|ḥ}} without having to type the whole lot. The table to the right shows thirty transliteration characters that could be so encoded. An alterantive would be to keep all the templates in related space by calling them {{sem/hdot}}, {{sem|hdot}} or something like that.

Am I completely deranged? Any thoughts? Gareth Hughes 14:25, 5 Feb 2005 (UTC)

Probably not a bad idea. Windows browsers seem to translate Unicode characters to HTML entities if you type them directly into the edit field, but Safari doesn't do that (I'm not sure about IE for Mac). However, I have an even more deranged idea. I know Wikipedia has a certain amount of TeX processing ability (it uses it for mathematical notation); can this be harnessed to code accented characters (something TeX does very well)? --Marnen Laibow-Koser (talk) 15:03, 7 Feb 2005 (UTC)
That's an interesting idea. I'm not sure how easy it would be to set up a transliteration system in TeX. Virtually all of the characters we need are already available in Unicode, and template:unicode is able to sort out most of the browser problems. I've worked out how to set up templates, so I could do that, but I'm not sure about anything more technical. Some browsers will show TeX as an image, which might disrupt the flow of text. Gareth Hughes 11:28, 8 Feb 2005 (UTC)
TeX already has entities for most of what we need, and something like {{TeX|\=a}} would be a lot easier to remember than ā for "ā". As far as the browser interpreting TeX, it wouldn't have to – the conversion could (and should) be handled by the wiki software itself, as it currently is for mathematical TeX. --Marnen Laibow-Koser (talk) 14:11, 8 Feb 2005 (UTC)

BTW, there's a LaTeX symbol list at (or if you like A4 paper). --Marnen Laibow-Koser (talk) 14:30, 8 Feb 2005 (UTC)


The vast majority of the world's language do not have an official status or a standards body, or a known ranking, and never will. I suggest creating a separate template to use for such cases to avoid cluttering the screen with empty fields. - Mustafaa 01:43, 25 Feb 2005 (UTC)

Like, say, this:

{{{name}}} ({{{nativename}}})
Spoken in: {{{states}}}
Region: {{{region}}}
Total speakers: {{{speakers}}}
Genetic classification: {{{family}}}
Language codes
ISO 639-1 {{{iso1}}}
ISO 639-2 {{{iso2}}}
SIL {{{sil}}}
I agree. Is there already a template for it? Peter Isotalo 09:20, Mar 17, 2005 (UTC)
See template talk:language for some proposals for a modular template design. Gareth Hughes 10:20, 17 Mar 2005 (UTC)

Phonology tables

Since you've done such a good job with the language template here, I would be very interested to hear any input you might have on making fairly standardized phonology tables. I've started a thread over at Wikipedia talk:WikiProject Phonetics if you're interested. Peter Isotalo 13:51, Mar 17, 2005 (UTC)

Chinese language(s)

If we can argue on whether each spoken varient division of Chinese is a language or a dialect, and we ended up with using "(linguistics)" in titles, the title of the article Chinese language can also be arguable. If the divisions are languages then the article would be titled "Chinese languages". Any opinion? – Instantnood 10:13, Apr 5, 2005 (UTC)

We have a good excuse for the current title regardless: singular titles are preferred to plural ones. Like how webcomic talks about webcomics in general, rather than a specific one.  – Gwalla | Talk 03:34, 6 Apr 2005 (UTC)
Umm.. for families of languages they are usually titled "Something languages", for instance West Germanic languages. – Instantnood 09:51, Apr 6, 2005 (UTC)
Chinese isn't really considered to be a language family because of the common written language. The term is as far as I know fairly accepted and the situation is thoroughly explained in Chinese language. Why do you feel that we need to change the name? Peter Isotalo 16:30, Apr 8, 2005 (UTC)
If whether the spoken variants are languages or dialects is disputed, the same would apply to the parent of the group, whether as a language or as a family of languages. – Instantnood 21:07, Apr 19, 2005 (UTC)

Phonetics project

Hey, everyone. I'm trying to get some activity in the Phonetics Project going. I've organized the project page and added some links and templates. Anyone who feels they're interested in phonetics, come and have a look. The more the merrier. Peter Isotalo 15:01, Apr 13, 2005 (UTC)

Are dialects languages?

I recently stumbled across the article Scanian language. The article is supposed to describe a dialect of Swedish, but since SIL has given it its own language code, Scanian editors have taken this as an excuse to assign it the language template.

Scanian is used as a term for two seperate things; one is a dialect group of Swedish that has very few speakers (the 80,000 figure from SIL is most likely inflated), the other is used to describe a variety of southern Standard Swedish, which is not even considered a proper dialect by Swedish linguists, and differs from other forms of Standard Swedish only in pronunciation and to a very small extent vocabulary. The grammar and syntax is far as I know identical.

SIL is obviously not giving out language codes based on proper linguistic criteria. In the case of Flemish this is done even when the speakers themselves consider their language a dialect of Dutch, and the same is as far as I know true to Scanian. How should this be handled? Should the language template be used for very clear-cut cases of dialects? Peter Isotalo 09:26, Apr 19, 2005 (UTC)

A similar case is with the Alemannic language, which is really considered a group of dialects of the German language in linguistics. For naming the different local dialects, I've adopted the local solution of calling them XXX German, which besides is also how varieties of English are called on wikipedia (why should there be a special way of naming English varieties?).
The Alemannic dialects are a very good example for SIL not adopting linguistic criteria. It is called Schwyzerdütsch 'Swiss German' even though – on one hand – it embraces varieties that are outside of Switzerland (a third part of the speakers according to the Ethnologue).
On the other hand, it excludes certain Swiss German varieties (in the literal and usual sense of "local German varieties of Switzerland"), which I fail to understand: There is a separate entry for Walser, a "group" of Highest Alemannic dialects which can only be grouped together because of the (definitory) fact that they are spoken in Walser settlements. Many of them are more similar to non-Walser Highest Alemannic dialects than to other Walser dialects. – j. 'mach' wust ˈtʰɔ̝ːk͡x 11:06, 6 Jun 2005 (UTC)

Basic sentences in each language

I'm wondering if it would be a good idea to have a section in each language article in which a handful of basic sentences are given in the language. For example:

  • "Hello, my name is _."
  • "I'm sorry, I do not speak <language>." (or "I do not understand", etc.)
  • "Thank you."
  • "Goodbye."

Or something like that. Obviously, care would need to be taken to prevent the list from mushrooming out of control. Perhaps a link could follow to a dedicated page named something like List of useful sentences in <language> so that further contributions could be directed there. My main point is, there should be such a list/page for every language and there should be some measure of standardization among them. That's why I'm suggesting it here. - dcljr (talk) 07:11, 20 May 2005 (UTC)

A good idea, but I think it would be better in place at Wikibooks or Wiktionary. Basic sentences like this are not the best examples for an encyclopedic article about a language (i.e. to exemplify basic features of the grammar of the language). I would say: keep tourist phrasebook content at other Wiki-projects; it's not something for an encyclopedia. – mark 11:02, 20 May 2005 (UTC)
See also Common phrases in various languages and Talk:Common phrases in various languages#Transwiki?. – mark 11:04, 20 May 2005 (UTC)

Prominent link to alphabet

Another suggestion: I'd like to see the language infobox link to the relevant alphabet/script(s) used in writing each language. (Is it POV to relegate info about written language to the text of the article?;) - dcljr (talk) 07:34, 20 May 2005 (UTC)

Tonal languages, Pitch accent and Melodic accent

There is a discussion at Talk:Melodic accent#Pitch/Melodic on which terminology should be used to describe what could be called "tonal elements" in different languages. The participants, including me, are mainly Swedish speakers, and we all seem to be a bit confused regarding the English terminology both in general and on Wikipedia (the latter seems to be somewhat confused as well). The article Melodic accent is written from a very Scandinavian perspective and as Peter Isotalo has pointed out the term seem to be sort of home-made/original research. He instead suggests the term tonal word accent which is sometimes used by Swedish linguists writing in English, but from Googling it seems that Pitch accent is the most common term for similar features in all European languages, including Swedish and Norwegian. The text of Pitch accent and the text of Tone (linguistics) unfortunately does not do much to explain anything but rather adds to the confusion, as they are both rather unclear and somewhat contradictory.

I find myself asking the following questions:

  1. Is there a significant difference between the ways tonal elements - "pitch accent" or "melodic accent" - is used in Swedish, Norwegian, Limburgish, Serbo-Croatian, Lithuanian and in some dialects of Basque and Slovenian?
  2. Is there a significant difference between "pitch accent" in these languages on one side and the Japanese pitch accent on the other?
  3. Is any difference big enough to merit a distinction in terminology, or should they all be covered in the same article?
  4. And what term(s) should actually be used?
  5. Can any of the languages mentioned above correctly be described as a tonal language, just as Mandarin and others, simply because of this feature? Or is this statement from Tone (linguistics) inaccurate:
Some Indo-European are usually characterized as tonal, such as Lithuanian, Old Church Slavonic, Slovenian, Serbian, Croatian, but they are in fact pitch accent languages; Limburgian, Swedish and Norwegian may be closer to being true tone languages.

If there is anyone out there who knows something on the subject (or has good sources at hand) any input would be very valuable. /Alarm 17:54, 20 May 2005 (UTC)

I don't know this subject as well as I should. As I understand it, the distinction between pitch accent languages and full-fledged tone languages is that, in a pitch accent system, once you know which syllable is "accented" you can predict the pitch of every syllable in a word. For example, according to one of the analyses in our article on Japanese pitch accent, three syllable Japanese words would have one of three shapes: HLL (accent on first syllable), LHL (accent on second syllable), and LHH (accent on the final syllabe or unaccented). In different pitch accent languages, the accented syllable is realized phonetically in different ways, but the pitch is still determined solely by the position of the accent.
In a true tone language, there are fewer restrictions on the use of pitch, so the tonal information must be specified for each syllable (more or less). For example, my phonology text has examples of noun stems in Shona, a two tone system, which are essentially unrestricted in their use of tone. Three syllable stems may have any of the eight possible shapes: HHL, LLL, HHH, HLH, HLL, LHL, LHH, LLH. Thus, tonal information has to be specified lexically in Shona. (Exacly how it's represented is a complicated and controversial subject in autosegmental phonology.)
That's my rough understanding of the distinction. Deciding whether a language has pitch accent or tone is non-trivial and requires a great deal of analysis; many languages described as tonal might really have a pitch accent system and some alleged pitch accent languages may be better described as tone languages. I don't think all linguists accept the distinction as valid or useful, but it is widely used. I haven't encountered the term "melodic accent", but that dosen't mean too much as this isn't really my field. I have seen Scandinavian languages, Serbo-Croatian, etc. descibed as pitch accent languages. --Chris Johnson 21:52, 20 May 2005 (UTC)
As I understand the subject, you're exactly on the mark. When you have pitch accent (as when you have stress accent) there is some feature on a particular place of the word (a syllable, a mora, or between two syllables or morae) that allows you to determine the accent pattern of the whole word. When you have contrasting tone, you can't do this; tonal patterns are assigned to each syllable/mora arbitrarily. I seem to recall that, more specifically, in a pitch accent system you have two elements: (1) a particular syllable/mora, and (2) a pitch pattern (basically dictating what drops and what rises in pitch) that is calculated for every syllable/mora based on the position of (1). More or less. The Japanese system is in fact more complicated when you come to the actual phonetic realization. --Pablo D. Flores 03:14, 21 May 2005 (UTC)
Tonal word accent is as far as I've understood, the most descriptive term for the use of pitch accent in Swedish and Norwegian, since Swedish features both stress and the tonal characteristics that is known as pitch accent. Japanese, on the other hand, does not feature stress. I don't feel that we actually need an article called tonal word accent, though. The term "melodic accent" just stems from a misconception of the Swedish prosodic features being so fanastically unique that they need their own terminology. :-p If you google for it, you'll see that it's used far more often as a way to describe accent in the context of music.
Peter Isotalo 12:16, May 21, 2005 (UTC)
Thanks a lot for your responses! Based on what has been said here and my own research, I've made a draft for a new version of Pitch accent at User:Alarm/Pitch accent. My intention is that this should replace both the current article and the Melodic accent one. All kinds of feedback on the draft is very much welcome. / Alarm 17:06, 23 May 2005 (UTC)

Infobox colors

I have to say that I really don't like most of the colors being used in the infoboxes. Paul August 18:11, May 27, 2005 (UTC)

I don't either. Many people complain that it is painful to readers' eyes (see my debate with Hpnadig over that and related subjects at Talk:Kannada language and my talk page.) I understand we have to use web-safe colors, but I find it hard to believe that there is not a better selection to use that is not as painful. Wikiacc 19:34, May 27, 2005 (UTC)
I dunno, I really have a hard time seeing the problem. I would even go so far to say that the infoboxes actually benefit from being reasonably colorful, since they're supposed to give the reader a quick overview of some of the most basic facts about the language. Someone who pops in at a language article would be expected to look at the infobox first and then read the text, and it seems to me as quite far-fetched that people are so easily distracted by colors that they are actually incapable of reading the lead. And the colors do serve a purpose as showing which family they belong to.
As for the actual choice of color scheme, I'm sure everyone has opinions about it, but this is usually an unsolvable issue which can only be handled by simply sticking to whatver color that was chosen first. I think the choice is between staying conservative or not having colors at all.
Peter Isotalo 20:06, May 27, 2005 (UTC)

Well I don't want to offend anybody, but in my opinion, these colors are really ugly, they are way too bright. I'm sure any experienced graphic artist could come up with more pleasing pallet of colors. I think pastel tones would be much better, similar to and harmonious with, the pale blue and violet used on the main page. Paul August 20:22, May 27, 2005 (UTC)

Template talk:Language is looking at some of the issues with the infobox. I believe the colours were chosen on the KISS principle. You can see the possible colours in the article Web colors, and it might be an idea to ask for thoughts on Talk:Web colors. Have fun picking out the curtains! --Gareth Hughes 16:01, 28 May 2005 (UTC)

For another example of a set of more pleasing colors see: List_of_Presidents_of_the_United_States. Paul August 02:37, Jun 1, 2005 (UTC)

Although I cannot source this claim, I recall from my studies of human-computer interaction that a small patch of bright colour on an otherwise monochome screen such as one full of text will hold the reader's eye so strongly as to interfere with their ability to assimulate the text. The choice if such bright colours in the infobox is hindering readability. --Theo (Talk) 18:41, 1 Jun 2005 (UTC)

If you can't quote a source, you should probably not be making the claim, since it's pretty obvious that people are quite capable of reading the article. This seems to be entirely a matter of taste, and as such the only option is to be conservative or not having color at all, with the latter being a very unsatisfying approach. Personally, I would not wish for the taxoboxes to be subjected to Pastel Hell.
Peter Isotalo 19:17, Jun 1, 2005 (UTC)
I am disappointed that you should give so little weight to my recalled learning simply because I was unable to cite a source (it is probabably from Dix, Alan; Finlay, Janet; Abowd, Gregory; Beale, Russell. Human-Computer Interaction. (New York: Prentice Hall, 1993) ISBN 0134582667). I am more disappointed that I failed to state my position clearly. A small patch of bright colour on a screen otherwise filled with monochrome text will hold the reader's eye in such a way that it will be more difficult for the reader to read the text than would be the case were the colour absent. I did not mean to imply that such colour would prevent a reader from reading the article; only that it would make it more difficult (not impossible). This is not "entirely a matter of taste". It is an observed phenomenon about human perception in a significant majority of some sample or samples (although I recognise that I am taking the validity of the original studies on faith). I agree that an entirely monochrome page is undesirable (because the human eye/mind appears to enjoy variety) but to assert your personal preference for bright colours over a known perceptual concern seems inappropriate to me. I do hope that this explanation makes my position clearer. --Theo (Talk) 22:31, 1 Jun 2005 (UTC)
Bright colors certainly make it more difficult for me to read. Paul August 22:58, Jun 1, 2005 (UTC)

Er ... I understand from the discussion at Wikipedia:Featured article candidates/Swedish language that this discussion may be redundant because the infobox colours are already "widely accepted". If the colour scheme is no longer a "suggestion", then that flag should be removed from this section of the project page. If, on the other hand, this is an overstatement of the position, then, in my opinion, the statement at FAC should be modified. --Theo (Talk) 11:38, 2 Jun 2005 (UTC)

Whatever its status, the colour scheme proposed here is widely used. Practically everything linking to Template:Language for example uses this scheme. If anyone feels like designing a better scheme, go ahead! Note however that it would require quite a lot of work since it would involve updating all individual language articles. – mark 11:44, 2 Jun 2005 (UTC)
The scheme is not in discussion. The colours are. Changing the colour for more pastel ones would do the job with edits in one single place. I don't know why some people are so convinced the colours chosen are the best, if so many complain about them. And if needed, we can always vote.-Mariano 10:45, 2005 Jun 10 (UTC)
I don't see how you hope to accomplish this with edits in a single place. The colors are added as literals where the templates are used. For example, each language article on a Sino-Tibetan language includes "familycolor=tomato". If one wanted, say, light pink instead, each language article for Sino-Tibetan languages would have to be updated to "familycolor=tomato". It is not implemented in a way that can be accomplished with one edit on a single place, such that that the articles include "familycolor=sinotibetan", and a single source translates "sinotibetan" to the desired color, where it can be trivially changed from "tomato" to "light pink".
On the other hand, the modularization scheme could be extended to create a situation where the colors are defined in one place and then the colors could be easily changed. changed. The decoupling of the language family from being assinged a specific color in the language article could take place without changing the present color scheme – in other words, it could proceed even if there is not yet a consensus on the colors to be chosen.
For example, replace {{{familycolor}}} in the current templates with {{Language/{{{familycolor}}}}}. Template:Language/color can be created for each of the colors currently in use, to substitute the color names. (For example Template:Language/tomato, with the content "tomato"). Then the Template:Language/family could be created. (For example Template:Language/sinotibetan with the content "tomato"). Each of the Sino-Tibetan language articles could then change "familycolor=tomato" to "familycolor=sinotibetan". At that point, the color of the whole family could be changed in one place by changing Template:Language/sinotibetan to the desired color name, and changed again in the future if tastes change.
Would this be going too far? --Tabor 19:18, 14 Jun 2005 (UTC)
That's what I was thinking of. Instead of familycolor=lawngreen it should have something line familycolor=Indoeuropean_Colour, where Indoeuropean_Colour is defined as a colour that can lat be changed if needed. -Mariano 06:57, 2005 Jun 15 (UTC)
Human Language Families Map (Wikipedia Colors .PNG

I like them as they are, and wish they could be adopted on a wider scale. I created a variant of Industrius' map of language families using the infobox colors, and I think it's actually quite nice to look at:

--Peter Farago 00:52, 14 Jun 2005 (UTC)

Your map is nice, Peter, but because it's schematic, not because of those particular colours. -Mariano 06:57, 2005 Jun 15 (UTC)
Perhaps I should clarify. I feel the use of bright colors is particularly good for a map-like representation, where high contrast is valuable. In my opinion, it would be nice to have a set of colors that can be used consistantly throughout wikipedia, in infoboxes, maps and so forth. --Peter Farago 01:09, 16 Jun 2005 (UTC)

I too like them as they are. There aren't enough bright colors around the language articles... - Mustafaa 00:53, 16 Jun 2005 (UTC)

Swiss German "language"

I'd like to move back Swiss German language to Swiss German. Swiss German is a variety of the German language. Of course there are those who prefer to consider it a language, but it is more often considered a dialect by its speakers and almost exclusively so in German language linguistics.

Talking about "Swiss German language" seems as inappropiate to me as talking about, for instance "Australian English language". Varieties of English are called XXX English. Why should this be any different with varieties of German (and Italian, and Arabic, and Chinese, and so on). Of course, the different varieties of English are usually more intelligible with each other than the different varieties of German, but that's a characteristic of the German language, but mutual intelligibility is only one factor among others that are used to define language. – j. 'mach' wust ˈtʰɔ̝ːk͡x 11:32, 6 Jun 2005 (UTC)

Surely it's more akin to Scots. Considered a language by some, a dialect by others. Scots is likewise often considered a dialect by its speakers. BovineBeast 16:53, 31 July 2005 (UTC)
And there is considerable use of American language. The problem with Swiss language would be: which Swiss language: Swiss German, Swiss French, Ticinese, or Ladin? Septentrionalis 20:00, 13 October 2005 (UTC)

Language families, subgroups etc.

I had been messing about with Oceanic language families for a while before I discovered this page, otherwise I would have floated my thoughts here earlier (and used the tables properly). Anyway, what I was think is that the pages for language families with long genetic classifications are generally hard to navigate through, so I have been showing all the subgroups of each family two steps down at the bottom of the page. See Oceanic languages to get an idea.

There are two things I would like opinions on. I have started using the header Components rather than Classification (see North New Guinea languages for instance) because Classification implies looking up rather than down to me. Which would be better? Would Subclassification be a better alternative than either.

Secondly, what do people think about adopting this format for other families with many subfamilies? I personally find it easy and interesting to navigate with the components listed, but I can understand if others think it is not necessary. Conrad Leviston 16:06, 15 July 2005 (UTC)

Improvement Drive

Acholi language is currently nominated to be improved by Wikipedia: This week's improvement drive. If you are interested in improving this article, you can vote for it on that page.--Fenice 13:25, 16 July 2005 (UTC)

colors for the Americas

hi. the current colors for American languages dont make much sense. It is probably not practical to make a color for each family (there are about 100).

I dont understand why Tupi gets a unique color and not Nadene. (this doesnt follow Greenberg)

Maybe there should be a division that is areal and not genetic, i.e. South American languages, etc. Or something else?

Anyone have any ideas? (See Native American languages if you need more information.)

Thank you – ishwar  (speak) 01:30, 2005 July 21 (UTC)

Amendment to Structure

I propose to readd the following to the point on Consistency, in the paragraph on Structure; as the reasoning behind the explicit exception at the bottom of the section. It is as concise, and as neutral, as I can make it.

However, this should not be permitted to overwhelm good reasons for having an article under some other title, including avoiding controversy over whether a given linguistic community speaks a language or a dialect; see below. Wikipedia is inconsistent; this is not merely a compromise, but a virtue.

I further request a retraction of the personal abuse in Peter Isotalo's edit summary. Septentrionalis 19:54, 13 October 2005 (UTC)

I agree that sometimes a language variety cannot be called either a "language" or a "dialect" (or an implied dialect by calling it a variety of a better known language). This is the reason why the Chinese languages/dialects are called Mandarin (linguistics), Cantonese (linguistics), etc. There's a similar argument going on at Talk:Skånska: people don't like Skånska because it's not English, but calling it Scanian language pushes one POV, and calling it Scanian dialect or Scanian Swedish pushes another. Another example I can think of is Valencian: is it a language or a dialect of Catalan? And we should make note of situations like this. However, Peter Isotalo's edit summary contained no personal attack; it was critical of the suggestion, not of the person who made it. --Angr/tɔk tə mi 20:27, 13 October 2005 (UTC)
Sept, I'm sorry if you took offense, but what you added simply wasn't neutral. It was clearly your own opinion of how the guideline should be interpreted and that doesn't really belong on a page that's supposed to be a reflection of the consensus of the participants of this project and people who are active in editing language articles.
Peter Isotalo 20:44, 13 October 2005 (UTC)


I'm removing the usage of the rather pointless diambiguator Arabic language to discourage its usage. There's been an attempt to keep all Arab-related adjectives at Arab (disambiguation), which has resulted in a pretty confusing dab page. "Arabic" as a noun can only refer to the language, and considering that we're an encyclopedia, any adjective forms are secondary and should not motivate the use of disambiguators. This is no different from keeping "Norwegian cuisine" out of the dab page Norwegian, for example.

Peter Isotalo 08:34, 21 October 2005 (UTC)

Structure change

I got bold and reduced the reasoning for the use of XXX language titles to the really relevant issues. "Ambiguity" and "Consistency" were somewhat overlapping and the latter isn't even relevant. The latter was also very confusing when trying to claim that EB wasn't consistent by using just "Esperanto" or the likes, since the reason for this is obvious; there's no need for it. I also removed the part about "cultural sensitivity", since using the rule: "XXX language in all cases except where disambiguation is necessary" serves the exact same purpose. No need to add more motivations to something that is already very obvious.

Peter Isotalo 08:45, 21 October 2005 (UTC)

Question: I've noticed there a small number of languages where the acticle name does not follow the XXX language title policy. The ones I've noticed are Latin, Hindi, Urdu and Kodava Thakk. These seem unambiguous (and in the case of Kodava Thakk, I would guess Thakk means "talk" or "language" and so the suffix language would be redundant). Any views on whether these should be standardised?Martin.Budden 16:15, 23 October 2005 (UTC)