Jump to content

Talk:List of languages by number of native speakers: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Whiteknox (talk | contribs)
→‎Tajikstan????: new section
Line 919: Line 919:


Why does it say that 300 million people speak Tajik? It only lists it as the official language of [[Tajikstan|Tajikstan, whose population is given as only 6 million]]. The edit was added today. --[[User:Whiteknox|Whiteknox]] ([[User talk:Whiteknox|talk]]) 21:52, 7 March 2008 (UTC)
Why does it say that 300 million people speak Tajik? It only lists it as the official language of [[Tajikstan|Tajikstan, whose population is given as only 6 million]]. The edit was added today. --[[User:Whiteknox|Whiteknox]] ([[User talk:Whiteknox|talk]]) 21:52, 7 March 2008 (UTC)

== English speakers data from 1984? ==

http://en.wikipedia.org/wiki/Ethnologue_list_of_most_spoken_languages

It says the English stats (the same 309m) are from 1984. THAT makes sense, but having that here is insane. It casts doubt over the whole article. Stats from that long ago are useless, it's only slightly more than the current population of the US alone!

Revision as of 11:58, 2 April 2008

WikiProject iconLanguages Unassessed
WikiProject iconThis article is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.

Languages

Languages, or Language Families?

Korean language is not isolated language its related to Altaic. Total Korean speaking population is about 80 million. —Preceding unsigned comment added by Korea4one (talkcontribs) 12:53, 20 September 2007 (UTC)[reply]

The entire organization of this whole page is rediculous. Aside from dubious statistics, why are certain languages being counted as entities they are not. Chinese and Arabic are not languages, Mandarin and Standard Arabic are. Its odd that "Chinese", which encompasses dialects so different speakers from one region are cannot understand speakers from another, is one language while Italian and Spanish, languages whos speakers readily understand eachother, are counted separately. The large language blocks should be divided.

--DigitalA 04:11, 10 April 2007 (UTC)[reply]

Chinese is separated on this page. I'm not sure how you missed it? Rysten 19:09, 10 April 2007 (UTC)[reply]
Languages are defined based on political, rather than linguistic criteria. Speakers of Cantonese and Mandarin consider themselves to be native speakers of the same language. So, certainly, do Arabic-speakers (and in that case, dialects are very poorly defined). The same cannot be said of Spanish-speakers and Italian-speakers. john k 19:39, 10 April 2007 (UTC)[reply]
I guess a line does need to be drawn somewhere, such as with Serbo-Croatian, which is politically three different languages. Rysten 20:06, 10 April 2007 (UTC)[reply]
Serbo-Croatian is a bit more complicated. Politically, it used to be considered a single language, and there are probably a fair number of people today who still consider it as such. Arabic has always been considered to be a single language. john k 02:11, 11 April 2007 (UTC)[reply]
Yes, wasn't disagreeing with you. Serbo-Croatian is at the opposite end of the spectrum. Rysten 11:00, 11 April 2007 (UTC)[reply]

Missed the Chinese, regardless other languages should be broken up more. I guess the only other main worry is the estimates, The estimate for English seems deadly low.--DigitalA 02:49, 11 April 2007 (UTC)[reply]

I dont know about DigitalA,but im a spanish speaking person and i do not understand italian at all,its a big mistake for you to called them allmost dialects.Theyre from from a family of languages like english and german are.--Andres rojas22 05:45, 11 April 2007 (UTC)[reply]

Spanish

I think the count for Spanish is a bit misleading. Spanish has many different dialects, and those dialects are very, very different. For example, many of the differences between American and British english exist in pronunciation and idioms, and a few minor spelling differences, but they can surely understand eachother with no problem. That is not always the case with Spanish. Spanish dialects often have more significant differences, including in some cases, different pronouns and conjugations. A person who speaks the Puerto Rican dialect would most likely be completely unable to understand someone who speaks the Argentinian dialect, and vice-versa. I think this should be noted, or to find a reliable source for it somewhere. I would myself, but I am about to go to bed. - MK ( talk/contribs) 10:10, 26 July 2007 (UTC)[reply]

While this isn't true for American vs. standard British English, surely one can find dialects within the British Isles that are pretty much incomprehensible to speakers of standard American English. I've never seen Argentinian Spanish listed as its own language anywhere. Also, "completely able to understand" always strikes me as nonsense in cases like this. Native Spanish speakers with only the most minimal training are able to understand spoken Italian fairly well. Given that, I find it hard to believe that an Argentinian and a Puerto Rican would find it impossible to converse. john k 15:44, 26 July 2007 (UTC)[reply]
Not true at all. I'm mexican and I can perfectly understand any spanish-speaker. The grammar and sintaxis are the same. The ortography is the same... sometimes the way of speaking changes, but the elements that constitute the language do not --189.135.204.243 21:01, 7 August 2007 (UTC)[reply]
I am from Argentina and I can understand a guy from Puerto Rico... probably I won't be able to understand the SLANG of Puerto Rico... but the language itself will remain understandable to me, and to any native spanish speaker... And I will appreciate that You don't call "AMERICA" to the USA... Because I am as american as a guy native from the USA.... Plus I think a guy from USA will have lots of troubles talking to someone from England, than someone from Argentina talking to someone from Puerto Rico, Mexico, or whatever... And I Agree with John Kennedy... we can easily understand ITALIAN and PORTUGUESE...Chau!

I'm spanish and I must say the first guy is TOTALLY WRONG. The only thing that changes betweenm the spanish spoken in each country is the "intonation", the way you pronounce words, and that's all. It's the SAME LANGUAGE, and OF COURSE (it's obvious I think) we can understand each other. Were you trying to make a joke?? ^^ —Preceding unsigned comment added by 84.125.57.216 (talk) 12:43, 14 October 2007 (UTC)[reply]

Chinese

That there are different dialects within Mandarin is indisputable, but I think that people are failing to realize that these "dialects" are really closer to separate languages. One of the most defining attributes of a language is that it is mutually intelligible to speakers but unintelligible anyone else. From my own experience and those of my Chinese colleagues, I think that based on this definition Mandarin should be broken up into several languages. Just as English, Spanish, and French speakers might be able to discern select words and phrases from the other two languages without knowing them, so might people who have grown up in Shanghai, Xian, and Beijing be able to understand bits and pieces of what people from the other two cities are saying, but in both cases complete comprehension is not possible. Therefore, just as English, Spanish, and French are considered separate languages, so should the various "dialects" of Mandarin. This might have been touched on before, but I think it deserves additional consideration. -- Mike

I think you are mixing the terms "Chinese" and "Mandarin" here. Chinese is indeed more like a macrolanguage composed of several languages; you have Mandarin, Wu, Minnan, Minbei, Hakka, Gan and some more, which are usually regarded as seperate languages by linguists. Mandarin or Putonghua is by nearly everyone considered a language with dialects, not as a macrolanguage. Your comparison with French, Italian and Spanish (English doesn't fit in, here) works for the Chinese macrolanguage, but not for Mandarin. A speaker of the Beijing dialect of Mandarin doesn't have much problems in understanding a Mandarin speaker from Yunnan or Taiwan. The differences there are comparatively small, so that from the criterium of mutual intellegibility Mandarin is one single language. — N-true 10:52, 6 April 2007 (UTC)[reply]

The number given for native speakers of Mandarin seems unreasonably high. As recently as March 2007, the Chinese government claimed, as reported by Xinhua (http://news.xinhuanet.com/english/2007-03/07/content_5812838.htm), that only somewhat more than half of the population is even a competent oral user of Mandarin, and yet the number given for native speakers accounts for more than three quarters of the Chinese population. Even allowing for a high number of native speakers outside of China, it seems extremely difficult to believe that you could add more than 200 million, the figure necessary even if everyone counted as 'proficient' is labeled also as a native speaker, which is itself indefensible. - Manicsleeper 7:39, 11 June 2007

English

The count for English speakers is absurdly wrong. The population of the UNited States alone is 300 million: http://en.wikipedia.org/wiki/Population_of_united_states

IF there is some scientific calculation or classification decision causing the Ethnologue writers to arrive at this apparently bizarrely wrong figure, it needs to be clearly spelled out, because as it stands the number looks instantly wrong and cast doubt over the entire article.

Sailboatd2 14:04, 27 March 2007 (UTC)[reply]

My first reaction was the same as you. However, not everyone in the US is a native speaker of English; the 2000 census recorded "just" 215,423,557 people aged 5 or more who only spoke English at home. See Table 1, Language Use and English-Speaking Ability: 2000. -- Avenue 05:24, 28 March 2007 (UTC)[reply]
Yes, but just because the remander does not only speak english at home it does not mean that they are not native speakers. It would be hard to grow up in the US without being a native speaker.00:43, 2 April 2007 (UTC)
No, that's what "native speaker" means - a person who speaks the language at home. A person can only have one native language, traditionally. john k 03:16, 2 April 2007 (UTC)[reply]
You're wrong here, John, think about it (or read this article): A native speaker is by definition someone who speaks a certain language since his childhood, when he/she has learned it from his/her childhood. Your native language doesn't suddenly change just because your move to another country and live with your wife who happens to speak this other language. :D — N-true 11:40, 2 April 2007 (UTC)[reply]
Children who grow up speaking a different language from English at home are not native English-speakers. There probably are not very many people who fit your model. "Language spoken at home" is not a perfect measure of "first language speaker," but it is the closest approximation available. john k 04:24, 3 April 2007 (UTC)[reply]
Untrue. I speak English as my first language, but my first Language used to be Polish. I've been in the States far to long to speak English as a second language, when I speak in Polish, I speak like a child with an English accent.
Well Mr. Noname, the human brain is able to memorize hundreds of millions of "language information pieces" fully automatically without "learning" - but only within the first years of life, when the child typically still lives at home. If the childhood is followed by a proper education in this language, a very high level of language competence can be acquired, unmatchable by other people that start to learn that (ethnic) language in the age of 10 or above. Though it may be difficult sometimes to draw a precise line between native and not native, in your case it seems obvious that Polish is your native language, even if you haven't developed full Polish competence, and English may now be your primary (= primarily used) language, however it is your second language, eventually with a "near-native level", higher than mine. You can see this for yourself, because a real first language speaker of English would probably not have mixed up "first" with "primary", and "second" with "secondary" as you did. --Allgaeuer 00:13, 24 April 2007 (UTC)[reply]
The population of The United States is approx 300 million people. The population of the United Kingdom is approx 61 million. Australia 21 million. Canada 32 million. And in India, English is an official language of their government which means it is taught as such. In most European countries english is an official language, or a required language in their school systems. The notion that only 300 million people world wide are native english speakers is just stupid. Don't become stubborn and refuse to accept facts. I'm not saying that these countries only speak english but I'm saying the vast majorities of these countries speak english as a native language. http://en.wikipedia.org/wiki/English_speaking_countries This is a list of countries that officially speak English. It's well over 300 million. Please change it. --HomrQT
This is a list of native speakers, not total speakers. There are almost no native English-speakers in India or on the European continent, or in Africa outside South Africa (where there's only a pretty small percentage of native English speakers). A sizeable percentage of the population of Canada has French, or some other language as their native language. The same is true in the UK (Welsh, Urdu, Hindi, etc.) and especially the United States (Spanish, mostly, but other languages too). There's also the very young, who aren't generally counted as having any language, as I understnad it. Are you really saying that most French people speak English as a native tongue? john k 17:48, 25 July 2007 (UTC)[reply]
No I'm not saying most french speak english as a native tongue. But I am saying there are a percentae of the people in France who do use english as a native language. And so are there in Germany, Italy, Spain, Portugal, etc in Europe and throughout the world, just as there are native spanish speaking people in the US, there are native English speaking peoples in every one of those countries, and a very considerable amount. And for there to be only 300 million native speakers of english, then just in the US, UK, AND Australia alone there would have to be less than 80% of all 3 of these locations speaking english as a first language???? That's not even counting every other country in the world that does have english as their official language. --HomrQT 25 July 2007 (UTC)
What in the world are you talking about with these supposed native English-speakers in Germany and France? Obviously there are emigrés, but those are a pretty tiny percentage of the over-all population. Pretty much no actual Germans and French people would speak English as a native language, even if they are fluent in it. I agree that 300 million seems low - given an estimate of about 80% in each of the main English speaking countries save Canada, and about 2/3 in Canada, one would imagine at least 240 million or so for the US, 48 million for the UK, 20 million for Canada and 16 million for Australia, for a total of, what 325 million? That might be conservative, and that's ignoring several million more native speakers in Ireland, New Zealand, the anglophone West Indies, and South Africa. One would have to imagine there's about 350 million or so native English speakers. But there's very few in India and continental Europe. john k 18:15, 25 July 2007 (UTC)[reply]
The 309 million figure is wrong. If you are really conservative in your estimations you get 230m in the US, 50m in the UK, 15m in Canada, 15m in Australia which is already a million over. Adding in all the other English speaking populations around the world it is unlikely that English is the native language of less than 330million people.rsloch 12:45, 22 August 2007 (UTC)[reply]
309 million native English speakers should cover pretty much North America... Now what about the rest of the world? 350 million sounds like a more reasonable number. that's like a 17% increase than what is posted. I wasn't suggesting an additional 150 million speakers or anything like that. But I thought between 350, 380 would accomidate much better for native english speakers of the world. And you seem so shocked about people in Germany and France speaking english as a native language in their home. Yet you don't seem surprised that spanish speaking people in the US would natively speak spanish in their homes, yet speak fluent english outside of their homes... John I have family in Europe. I've been to Europe. It's not as crazy as you're making it sound. Because of the United States military presence in these countries (military bases), and our movies , television, music, radio, and our influence through the internet, generations of peoples in Europe and alot of the world are alot more fluent in english than you think, and a lot more in numbers speak it in their homes than you think. --HomrQT 25 July 2007 (UTC)
I've lived in Paris. There's obviously anglophones in these European countries, especially in the bigger cities. But they're a tiny percentage of the overall population (even if they're a large percentage of the population of Parisian pubs). And I defy you to present any evidence that actual native-born continental Europeans speak English in the home. You can't just assert such a thing. It's simply not true. Many Europeans are fluent in English. But it's simply not their native language. The situation with Spanish-speakers in the United States is not the same at all - there are large immigrant communities in the United States who speak Spanish as their first language. Especially for second generation immigrants, they are also usually fluent in English, but Spanish is their native language. The third generation generally speaks Spanish only as a second language, if that (depending on where they live - assimilation is slower in more Hispanic areas, obviously). There is no large English or Canadian or American immigrant population in continental Europe. What Anglophones there are are basically expats, and I would imagine that the vast majority only live on the continent for a few years, at most. This is not a similar situation at all. William Waddington may have been a Frenchman who was a native English-speaker, but there really aren't very many people like that, and if you want to demonstrate that there are, you need more than anecdotal evidence. john k 22:06, 25 July 2007 (UTC)[reply]

I'd have to agree that number is absurdly low. While I accept that many people living in the United States are English-as-a-first-language speakers, 300million is an incredibly low figure. Part of it seems to be the absurdly out of date figures used in the Ethnolouge report; here - The World figure is 8 years old, and that is one of the better ones. The U.K population figure is from 1984! American Samoa, for example, uses the 1970 census figures! Canada is 1998. This is a bigger problem than for just English speakers. P.S - "Second-language speakers in India: 11,021,610 (1961 census)." Oh come on! Iorek85 01:00, 15 August 2007 (UTC)[reply]

The person(s) that are editing the english speaking numbers are too stubborn to accept that their numbers are low. This web page is for nothing if they have idiots unwilling to accept that they have the wrong information. Anyone that looks at this number immediately recognizes it being too low. 03:00, 01 October 2007 (UTC) —Preceding unsigned comment added by 135.214.154.104 (talk)

Arabic

What happened to arabic? Even if there are dialects, it's still a real language. ZeroFive1 02:50, 24 January 2007 (UTC)[reply]

Arabic was taken out because a lot of these "dialects" aren't even vaguely mutually intelligible. Putting them all together could thus create this erroneous impression to the less informed viewer. This is why all the other Chinese languages besides Mandarin were taken out in the top row. BTW, does anyone know if the different variants of Hindi are mutually intelligible?Pedrassi

Sorry, I've now noticed Pedrassi that you have been discussing it. But it looks like an edit war is evolving. In defining a language, what speakers feel themselves to be speaking seems to have a stronger influence that mutual intelligibility (which works in dialect chains anyway: I think everyone in the Arab world can understand their neigbours in the nearest town or village just fine), so my vote for this list is to keep Arabic as a language. But I won't revert as there hasn't been a full discussion. You're right, Pedrassi, about consistency, and as you suggest, Hindi does have the same issues, which is why ethnologue gives it 180m. But somewhere between 200-300m people feel themsleves to be speakers of Arabic, whether they can understand everyone else who thinks the same or not. I don't know about Chinese: do people first consider themeslves of Mandarin/Cantonese or Chinese: my hunch is gernally the former, but that's just a hunch from talking to a few Chinese people, nothing scientific. Drmaik 05:45, 29 January 2007 (UTC)[reply]

But Drmaik, a language's purpose is to communicate effectively and without mutual intelligibility that cannot happen. People may feel they speak Arabic, but if an Egyptian can't talk to a Saudi in his mother tongue then this talk of a "common language" is little more than an illusion. And if we consider otherwise on this list, this article risks bordering on irrelevance, by putting side by side unified languages with ununified ones, which is unfair.That's what I think anyhow.Pedrassi

That is not right. Arabs understand each other even if they speak in their own dialects.--AraLink 02:38, 1 February 2007 (UTC)[reply]

Is there a reason why arabic isnt on the list anymore? just something I noticed. --The Fear 01:18, 30 January 2007 (UTC)[reply]

Someone must have vandalised the page and removed it; this is one of the most highly vandalised pages on Wikipedia. I seem to recall it being around fifth or so. I don't have the time right now to sift through old edits looking for where Arabic was taken out, but someone should. —Cuiviénen 01:34, 30 January 2007 (UTC)[reply]
Probably it is an act of vandalism again. I've added it back. Can we request a simi-protection to a page? --AraLink 01:54, 30 January 2007 (UTC)[reply]

I moved the two discussions on Arabic together. Let's keep on chatting, but it seems there's more of an agreement to have Arabic as a united language. Drmaik 05:21, 30 January 2007 (UTC)[reply]

Yes, Aralink, we SHOULD request a "semi-protection" on this page to stop your constant editing. At least I've explained my actions on this page, something you have persistently failed to do. I can only conclude then that you don't actually have an argument for putting Arabic on the list, and that you are doing it on the grounds of Nationalism, etc. We really need an admin here to intervene. After some research, some "variants" of Hindi are not mutually intelligible (the Indian government is even planning on making some of these "variants" different, recognised languages) so I'm going to change that as well.Pedrassi

I agree with Drmaik that Arabic should appear as a single language in this list. Kyle Cronan 21:03, 31 January 2007 (UTC)[reply]

Single language or not, Arabic should appear on the list somewhere. If we insist on seperating out dialects,some dialets will still be on the list. I don't know, but I've heard that most of North African Arabic is mutally inelligible, which would certainly put it high on the list. If even half of Egyptians can understand eachother, Egyptian Arabic wMoreover (by the way I made the previous comment too, just not when I was signed in), I think I have a solution that could fix a lot of these dialect problems: put both on. If you look at a CIA factbook ranking of, say, population, it will go something like China, EU, US, etc...but include European countries seperately as well. Because the point of this page is not to award prizes to widely spoken languages. The goal of this page is to impart information. That some form of Arabic is spoken by the fifth largest number of people is interesting and important information, as is the breakdown by dialect. For now, not having Arabic at all makes this page ill still be on the list.

worse that unreliable: it reduces it to irrelevancy.

Feel free to add Egyptian Arabic to the list. In fact, it was on a while back (thanks to Drmaik I believe) but was taken out (by Aralink I presume). Aralink continues to revert without explanation so I suggest blocking him indefinitely until he begins to participate in a more contributive manner.Pedrassi 11:13, 2 February 2007 (UTC)[reply]

I don't have the wikipedia skills change the table, so if someone else could do that...I did however find a useful site [1], Egptian should be at 46 million, and many other dialects will be on the list.

Ok Pederassi, take out Arabic if you want, but please put at least some dialects in to replace it. ~Matveiko

I've added Egyptian Arabic to the list. Let's hope Aralink and other vandals don't come back lurking again... Pedrassi 10:40, 9 February 2007 (UTC)[reply]

I thought this list was meant to be based on political definitions of languages, not linguistic ones. Removing Arabic and splitting it seems bad to me. john k 16:46, 9 February 2007 (UTC)[reply]

With Arabic defined as one language again, which may be inevitable given the difficulty of seperating out dialects, I added a note that this includes all dialects not necesarily mutually intelligible. That should clear up any false impressions that readers might get if they assume (like I used to) that Arabic is one uniform language.

It seems to me that there ought to be a decision made that is at least consistent. I understand the challenge in determining whether or not Arabic is unified enough to be considered a single language, but if we're operating on that assumption, why is it only at #4? Given the Encarta estimate, it seems like it would make sense to place it higher. As it stands, it doesn't look right. Mikehoffman 20:36, 14 February 2007 (UTC)[reply]

I believe it would be appropriate to treat each macrolanguage as a single group. Dividing these languages into their smaller dialects/languages can typically cause a lot of problems, and at least we have an established ISO list of metalanguages to conform to, rather than a seperate concensus opinion of our end that such a division/conglomeration is appropriate/inappropriate. I think it shows up as #4, because we have no Ethnologue value reporting number of speakers. As such, we're taking a best guess (CIA factbook + SIL + other aggregate date). I suppose it may be helpful to mention in the article that these are at best estimates, and that the ranking by number of speakers is potentially out-of-date, invalid, or just plain wrong. (Of course, this is the standard Wikipedia disclaimer) --Puellanivis 20:54, 14 February 2007 (UTC)[reply]
But the Ethnologue does give a figure for speakers of all varieties. I even provided a web reference when I put it in, and someone took it out. Here we go again. Drmaik 05:54, 15 February 2007 (UTC)[reply]
My apologizes then. Hopefully this won't get edited out then. --Puellanivis 06:25, 15 February 2007 (UTC)[reply]

The list mentions that all the Arabic dialects are included, but it does not say the same thing for other languages (such as German where there are many dialects). Why mentioning dialects only for Arabic?. Another thing, the Modern Standard Arabic is the only official language in use in most Arab countries (in the media, education, government...). The article is starting to be about politics not linguistics. One last thing, there are many wiki users who always try to give the issue of Arabic dialects more than its real size. Most satellite TV channels for example, target all the Arab countries (Pan-Arab) and not only their country of origin, here in the Arab World, it is not a big issue because we understand each other. Bestofmed 18:20, 10 March 2007 (UTC)

It mentions all dialects because the main source, the Ethnologue, classifies Arabic as a macrolanguage, and lists the dialects seperately as different varieties. We thought it best however to include all these together in this list, which explains the comment. Drmaik 10:55, 12 March 2007 (UTC)[reply]
In my opinion guys, just Arabic should be included without adding dialects. I'm Syrian myself and I can understand any Saudi, Egyptian, Lebanese, Algerian ..etc even if he/she is speaking with the dialect because we use the same words. It is just the way in English. In USA itself there are many dialect, and everybody knows that the word tomato can be said in two different ways. Also, it's very easy to distinguish between an American, British, or Australian dialects; however, they are all included under English language because they can understand each other. Same in Arabic. Moreover, Arabic dialects are not written. Every Arabic speaker writes in Arabic language, so all 300,000 people write the same language. Fianlly, in the Arabic mass media, the formal Arabic is spoken. To sum things up, I believe that we should put just Arabic without adding dialects because dialects haven't any effect on our ability to understand each other, in addition of the fact that in formal speaking, the formal Arabic is used in all 22 Arab countries. qawmi 01:41 AM (GMT -8), 12 March 2007
Arabic is a macrolanguage (as said by ISO and not by Ethnologue. Correct me if I am wrong), not because it is not the same across multiple regions, but because there are some differences in locals (such as hour/date format, adapted months name, some minor terms); I mean this classification is for political/colonial past/geographical reasons and not linguistic (I am not saying that all Arabs speak one dialect). And remember my question was: why only dialects/varieties are mentioned only for Arabic; there are German varieties (even in Switzerland itself, they can not understand each other across german cantons), there are Portuguese varieties (even different in writing system). Why only Arabic (which at least has the same writing system with an agreed unified form; Arabic Wikipedia is a great example)??. I agree with qawmi, I am a Tunisian, and I can understand a Syrian easily as many other Tunisians (more than that, I am a fan of Syrian TV series; you see what I mean). Before closing Drmaik, if you want I can help you in your work on Tunisian Arabic. BestofMed 02:11, 13 March 2007 (UTC)[reply]

Hey, "There are Portuguese varieties (even different in writing system)", just prove it!!! That is not true, defend your ideology concerning arabic, but do not say what you does not know!!! —Preceding unsigned comment added by 200.157.35.20 (talk) 15:43, 14 December 2007 (UTC)[reply]

Hindi/Khariboli

Khariboli is a dialect of Hindi and so are atleast 10 other different dialects of Hindi. And I have modified the page accordingly. I would request you to quote the correct figure for it. I hope your next step won't be splitting american english and British english in two seperate langauges. -apurv1980 15:46, 8 February 2007 (UTC)

Well, a lot of these "dialects" aren't even mutually intelligible. In fact, the Indian government is even considering making some of these "dialects" official separate languages in the near future. Therefore, your comparison with the two main variants of English is completely preposterous. I suggest better research before attempting further edits. Cheers Pedrassi 21:05, 9 February 2007 (UTC)[reply]

Can you provide one reliable link from GOI that says they are considering to make these dialects as seperate languages. I have lived in those states where these dialects are spoken for 18 years. I speak all of them without knowing which one I am speaking. Then lived in England 5 years and then moved to US for last 4 years. Don't tell me there is no diffrence between british english and american english. Infact english differs widely in the US itself. People from northern states find it little difficult to communicate with those from Bible belt. I am reverting again your so called facts unless you provide link to "consideration of indian government". apurv1980 18:22, 10 February 2007 (UTC)

English differs even more widely within Britain than it does between Britain and America, I think. At any rate, I don't think mutual intelligibility can ever be the basis for distinction here, due to the problems of dialect continuum. Political identity, I think, is more important. So Arabic, Hindi, Chinese, and so forth probably ought to count as single languages despite their mutual incomprehensibility. I'm willing to ignore this for Chinese, where the various dialects/variant forms are well-defined and well-known. I'm less willing to do this for Arabic and Hindi, where the various forms are much less well-defined (and, in the latter case, where the relationship of these forms to one another seems to be fairly unclear). john k 21:28, 10 February 2007 (UTC)[reply]

This type of linguistic classification should be done by linguists and NOT by wikipedians. If the languages classified under Hindi are marked by linguists as distinct ones, mention that. If not, do that. But we should not depend on original research or depend on people's experiences. --Ragib 21:31, 10 February 2007 (UTC)[reply]

What is a "language" cannot be defined by a linguist, because that is not what linguistics does. There is a dialect continuum, and no clear and agreed upon way to decide what languages are. I agree that languages should not be determined by "wikipedians." They should be determined by (more or less official) political definitions, because the difference between a language and a dialect is entirely political. Linguists don't consider Hindi to be distinct languages. They view there to be a wide variety of different linguistic forms in northern India. To call all these forms "dialects of Hindi" is a political decision of the Indian government. Other people view these as separate languages, and linguists may agree with them that they are not all mutually intelligible. But linguists have no role in deciding whether they are dialects or separate languages, because that's not what linguists do. john k 02:36, 11 February 2007 (UTC)[reply]
Well, i do not see any point in your argument. If neither GOI does recognizes them as seperate languages nor any major group of linguists then why put it that way on wikipedia. All north Indians (more than 350 million) understand each other perfectly irrespective of dialect. And regarding POLTICAL standpoint, there are no polical/literary movements to recognize them as seperate languages. And in my personal experience difference between Boston english Vs South American English are more profound than in between different dialects of Hindi. But still if you could provide some reliable references from GOI or majority of linguists that they are different languages then we can put it that way on wikipedia. -apurv1980 03:57, 11 February 2007 (UTC)
Er, my argument is that you are asking linguists to do something that linguists don't do. Whether something is a language or not is a political and not a linguistic designation. The government of India considers them dialects of Hindi. Other people, basing their arguments on the fact that linguists note that these dialects are often not mutually comprehensible, consider them separate languages. I will tell you that virtually any linguist will say that the dialects of Hindi are more distinct from one another than different variants of American English, or, for that matter, than any two variants of standard English, period. Whether this makes them separate languages is, again, a political and not a linguistic distinction, but let's not exaggerate. john k 04:11, 11 February 2007 (UTC)[reply]

Hebrew - Take your political battle away from Wikipedia, please

It looks like for Hebrew it lists West Bank as a country. Why? It's not a country yet and it's a part of Israel. Once Palestine becomes a country, great I'll be the first to add it myself but until it happens, it should not be there. I'll remove if no argument to keep it will be presented.

Not all of the West Bank belongs to Israel. Some parts belong to the Palestinians and other parts belong to Israel. The West Bank is worth mentioning. It should remain as is. Jerse 22:35, 14 February 2007 (UTC)[reply]

But currently, there's no such a country as Palestine, sorry. It's a non existent country. It's just an autonomy but currently all the West Bank territory is under the Israeli law. I'm only willing to rid Wikipedia of Political battles. I'm a left winged Israeli and fully support the creation of the Palestinian state. However, since such country does not yet exist, its future territories should not be mentioned separately.

It's still worth mentioning. In the same sentence it states the United States and California and New York. Same concept. If you want put it in parentheses, go ahead, but other than that leave it be.Jerse 00:41, 15 February 2007 (UTC)[reply]
None of the "significant communities in..." lists countries. It lists the West Bank, USA and then specifies that directly with New York, and California, and Gibralter. Gibralter is claimed by both Spain and England, is not a country, and by the same logic excluding "West Bank" would exclude "Gibralter", and we would have: "significant communities in USA." As such, West Bank should stand, as the list is not intended to recognize nationality in any way shape or form, as exampled by the other examples. --Puellanivis 01:25, 15 February 2007 (UTC)[reply]
The West Bank is not part of Israel. The State of Israel does not consider the West Bank to be part of Israel, with the exception of East Jerusalem. It is a territory over which no state is recognized to be sovereign, under belligerent occupation by the Israelis. It should stay as it is. I will say that Gibraltar, while not a sovereign state, is a well-recognized dependent territory, and that while the Spanish feel they have a moral right to it, they signed away their legal rights to it in repeated treaties, most notably the Treaty of Utrecht, and as such, cannot claim it as de jure Spanish territory. john k 02:00, 15 February 2007 (UTC)[reply]

Is the "west bank" a physical location and an accurate area to describe as a location of significant population of Hebrew speakers? The inclusion of the "West Bank" specifically removes it from Isreal. As long as the "west bank" is a valid geographic region with a significant population of Hebrew speakers, it should remain here. --Puellanivis 02:19, 15 February 2007 (UTC)[reply]

Ok. I stand corrected. It should be included because now I see it lists different geographical locations. However, the fact remains: West Bank is fully administrated by Israel: "The West Bank (Hebrew: הגדה המערבית‎, Hagadah Hamaaravit, Arabic: الضفة الغربية‎, aḍ-Ḍiffä l-Ġarbīyä), also known as Judea and Samaria, is a landlocked Israeli administered territory on the west bank of the Jordan River in the Middle East. It was occupied by Israel after the conclusion of the Six-Day War of (1967)" No flames here, really. But just quoted the West Bank article.

It is administered by Israel, but it is not part of Israel, by anyone's reckoning. john k 18:00, 16 February 2007 (UTC)[reply]

Welsh

What about Welsh? That's not mentioned at all- ut has many native speakers.

ASL is also not covered. Unfortunately, many languages are not covered. These are really only the particular languages that are either over a significant number of speakers. If you do have information about Welsh, you can maybe even find the Ethnologue article for it, and add it yourself.  :) --Puellanivis 21:03, 17 February 2007 (UTC)[reply]

Russian

I used my internet browser's search function to find "Russian" within this article and was unsuccessful. How could it have been entirely overlooked or did I just miss something? Muaddib 23:57, 21 February 2007 (UTC)[reply]

You certainly missed it. Why don't you look into section 1 manually. If your browser can't find it, then perhaps it's broken? --Ragib 23:59, 21 February 2007 (UTC)[reply]
I guess combing through such a long article I'm bound to have overlooked it. So much for relying on IE's search function! Thanks for the help. Muaddib 16:42, 22 February 2007 (UTC)[reply]

Bulgarian

6.6 million in Bulgaria (2005) and ~1 million abroad = 7.5 million native?? There are 7.5 million people in bulgaria and they all speak bulgarian (turkish isn't oficial lenguage in Bulgaria!). They are also about 1 million abroad so there are 8.5-9 million total speakers.

So are you saying that because Turkish is not an official language, all Turkish people in Bulgaria speak Bulgarian as their native/first language? That is a rather novel definition of "native language." john k 07:19, 1 March 2007 (UTC)[reply]
He/She is saying that the number of total number of people speaking in Bulgarian in Bulgaria is higher by 1 million. Although the definition of what "Turkish" is, might be disputed (Is muslim=Turkish? e.g.), almost all of them know Bulgarian as a second language the least, and for a significant part of them Bulgarian is the native language. The remark that "turkish isn't oficial lenguage in Bulgaria!" is a bit irrelevant (and has spelling errors), but what I think he/she hints is that although there is about a million citizens of Bulgaria defining themselves as Turkish, they are not like "Turkish immigrants" or something like that and are born for the most part in Bulgaria. Most are bilingual even before going to school (where Bulgarian is used to teach and other languages are second languages; "mother-tongue" classes are available though). —The preceding unsigned comment was added by 138.16.11.213 (talk) 07:03, 27 April 2007 (UTC).[reply]

Uyghur

The official number of Uyghur speakers is appr. 8.5 million in xinjiang only, but there is also a significant number in former soviet union. The unofficial figures put the number of over 20 million speakers world wide! It is because of present situation in xinjiang(East Turkestan/Uyghurstan)where most of the Uighur people don't get passports due to the limited child policy, and the nationality change in former soviet union (central asian states) due to chinese pressure on central asian states to crack down on Uighurs.

Portuguese

The number of native speakers in Ethnologue and Encarta is shockingly wrong. It quotes that the number of native speakers is 177million. Taking into consideration that the sum of the total population of Brazil(188million) and Portugal(11million), which have 100% speakers, is around 200 million it is clearly evident that the two sources are immensely innaccurate. And that's just two of the countries that speak portuguese leaving out countries such as Mozambique and Angola. I strongly believe that Encarta and Ethnologue should be avoided for ranking. —The preceding unsigned comment was added by 80.1.72.245 User: WhiteMagick 12:16, 21 March 2007 (UTC)[reply]

I agree. Such an incorrectness is typical of Ethnologue. The Portuguese language article speaks of 210 million native speakers, this seems much more likely to me. — N-true 13:41, 21 March 2007 (UTC)[reply]
wrong sure, but shockingly wrong? Have you looked at the ethnologue entry for English? It's almost the same as the population of the United states. Somehow the United Kingdom, Canada, Australia, Ireland, South Africa, New Zealand, Jamaican, Guyanese, native english speaking populations combined with the native english speaking populations in countries with huge rates of english language fluency such as India manages to amount to only a paltry 22 million. I'll save "Shockingly wrong" for their data on the english language and call their Portuguese data merely "grossly wrong" ;) If you object that the US has many non english speaking people, there are in fact people in both Portugal and Brazil who are non Portuguese speakers as well.
in any case ethnologue may do some valuable work but general language demographic data doesn't seem to be part of that. AS far as I can tell nobody likes their numbers for *any* of the languages and they always undercountZebulin 19:43, 21 March 2007 (UTC)[reply]
the count for english speakers is just absurd but counting the native speakers in countries such as India is very hard. There may be e lot of fluent speakers of the english language but they are still second language speakers as the people there have different local mother languages. but this is not the topic of discussion for this subsection! let's talk about the portuguese speakers. i strongly suggest that galician speakers should also be included because even the EU parliament uses spoken portuguese rather than galician; or atleast point out that galician is closely related to portuguese. User:WhiteMagick
100% of the populations in Portugal and Brazil do not speak Portuguese. But it's got to be pretty close (95%?) My understanding is that Mozambique and Angola do not have that many people who speak Portuguese as their mother tongue, although our article on languages in Angola claims otherwise. It would nonetheless appear that there is something of an undercount for Portuguese. john k 00:08, 22 March 2007 (UTC)[reply]
I begin to suspect that ethnologue systematically undercounts every language. Perhaps they are simply using a lot of outdated data? Since efforts to find their sources have been unencouraging I'd say this portuguese undercount is just one more reason to find another primary source for the numbers used to group the languages in our list.Zebulin 01:04, 22 March 2007 (UTC)[reply]
They're definitely using a lot of outdated data. The key to the Portugal undercount would appear to be the very swift population growth in Brazil. Ethnologue gives 163 million Portuguese speakers in Brazil, based on numbers from 1998. In the 2000 census, Brazil had a population of 169 million. But, apparently, the current estimate has Brazil's population increasing to 188 million. So the undercount would seem to entirely arise out of the growth in Brazil's population in the last few years. Looking at English, the numbers for the UK are from 1984 (!!), although they appear to be only slightly smaller than you would expect (55 million). But the numbers for the United States (210 million in 1984, again) are particularly out of date. If we used more recent US numbers, we'd apparently go up to about 250 million native English speakers in the US, which pushes the overall English speaker figures to 360 million. This seems to be the basic issue - systemic undercounting based on old statistics. john k 01:49, 22 March 2007 (UTC)[reply]
Yes the numbers for all countries by Ethnologue are seriously outdated, which leads to the argument of why do we use Ethnologue's rankings if they are so inaccurate?User:WhiteMagick
Because it's the only source that lists number of users for such a large number of languages. I'd love to be able t o find a better source, though. john k 16:16, 27 March 2007 (UTC)[reply]
Surely when using a tiered ranking we can use ethnologue for the languages with fewer speakers which presumably we have fewer sources for and use some more up to date reference for the largest languages (100 million or more) which surely have many more up to date sources available.Zebulin 18:56, 27 March 2007 (UTC)[reply]
I would concur. I suggested as much elsewhere. john k 22:55, 27 March 2007 (UTC)[reply]

Galician language

According to several dictionaries (see for instance http://www.answers.com/topic/galician) Galician language is "the dialect of Portuguese (sometimes regarded as a dialect of Spanish) spoken in Galicia northwestern Spain". Therefore, it should not be considered as an independent language. Probably the best option would be doing something similar as for German for which they distinguish the Swiss speakers from the speakers of standard German. Similarly, Galician speakers should be in the Portuguese cathegory with a small remark.

I agree completely. Galician though is more closely related to Portuguese rather than Spanish. User:WhiteMagick
The Galician is a language totally independent from the Portuguese, they have the same root linguistics, the Galician-Portuguese, but with the time they have been drifting apart up to being different languages. To see this page of the Wikipedia on the Galician and let's not do ramblings.80.36.174.107 18:25, 28 March 2007 (UTC)[reply]
Galician is accepted as Portuguese in the European Union. And with the Reform in 2003 the language was brought orthografically closer to Portuguese because a lot of archaic Galician-Portuguese spelling was reintroduced, spelling which is present in today Portuguese. Spelling differences are of small importance because pronounciation remains the same for a word for both languages. Example: Espelho - Espello. There is a considerable efford and growing support to shed the spanish influence on both the culture and language of the region. User: WhiteMagick
Galician isn't accepted as Portuguese in tue EU. Any galician can communicate with the EU in Galician after the signature of an agreement on the part of the European Ombudsman and the of the Committee of the Regions president with in Spanish ambassador Carlos Sagües Bastarreche. See: Languages of the European Union: Catalan, Galician and Basque where there is a lot of information of numerous sources.80.36.174.107 19:31, 3 April 2007 (UTC)[reply]


I think WhiteMagick meant the fact at the Galician MPs in the European Parliament, were assigned to the Portuguese translation services.

Indonesian

Hi all: I understand that in general Indonesian is very much a part of the native speakers versus "speakers" issue. But nonetheless even if you only count native speakers of indonesian (see the ethnologue list or the encarta list) you get 17 million native speakers! But, this page has no mention of the Indonesian language at all. Is there something I'm missing here? Seems like a big oversight.Thewhiterabbit11 21:41, 19 April 2007 (UTC)[reply]

Hi, maybe you can read this discussion in Indonesian language article. For a reason that mentioned there (from Kunderemp), it is OK not to put Indonesian in the list. For me, as an Indonesian, it is also ok. Most of us, Indonesian, do not really care about this actually; partly because knowing that international community seems can not accept the fact that Indonesian is used everyday by us, even since we learn to talk (yes, we learn at least two languages at the same time from the very beginning of our life). I am not a linguist, and I do not know the definitions of native language (and their implications on a language position in the list). However, I find myself native Indonesian who was born in Jakarta, learned a local malay creole (Betawi language) when my neighbours played with me, understood/learned Javanese from my mother (she's Javanese), and learned Indonesian from my mother (also) and television (before I went to school). Sounds not native to you? Kembangraps from wiki id: 15:29, 3 May 2007 (UTC)

Farce?

The ranking looks like a farce. If you take the Ethnologue column, you get a ranking that doesn't match the "Ranking" column. If you use the other total speakers column (CIA estimates), you get another ranking which also doesn't match the "ranking" column. This MUST be sorted out ASAP. --Ragib 06:02, 14 February 2007 (UTC)[reply]

I agree. I'm not aware of any decision of how to rank, but it would seem best to do it according to one source, to avoid constant changes. In this case the ethnologue (2005)[2] seems to be the best way to do it. I'd also propose only putting sourced figures in the other column, and to have some principled way of dealing with outliers (a more complicated issue, which need not be sorted out before the other 2 principles). Let's get some consensus on this, so we can have several editors editing and reverting according to the same principles. Drmaik 06:11, 14 February 2007 (UTC)[reply]
I agree using Ethnologue's ranking/numbers for this page. Otherwise, this has become a daily battleground for various language-advocates. I noticed an attempt today at mass scale change of most language rankings by one particular editor. Such unsourced and arbitrary changes are regrettable. --Ragib 06:14, 14 February 2007 (UTC)[reply]
It seems the Ethnologue column was only added recently (within the last week), but no one got around to reordering the rankings. Diego Lee 06:55, 14 February 2007 (UTC)[reply]
You missed this. --Ragib 07:23, 14 February 2007 (UTC)[reply]

I agree that the ranking shown should reflect the Ethnologue counts. The SIL (the organization that maintains and publishes the Ethnologue) is a highly respected organization among linguists - I can go ahead and reorder accordingly. What I'm confused about is why so many people are citing the Joshua Project which (like the SIL) collects information on languages around the world as part of a missionary effort, but unlike the SIL, does not perform surveys on the number of language speakers. Am I wrong about the Joshua Project? Their site doesn't seem to offer much information on numbers. --SameerKhan 07:46, 14 February 2007 (UTC)[reply]

To follow up - I just reordered the ranking to reflect the Ethnologue statistics. As I mentioned before, the SIL is highly respected in the linguistic community, and the statistics provided by them in the Ethnologue are the most widely used in linguistic journals for references on numbers of native speakers. I haven't gone through and verified the statistics provided here on the Ethnologue itself (if someone would like to do that and confirm that these are the correct numbers, that would be really helpful), so they may be inaccurate if someone tampered with the numbers earlier. Anyhow, please let me know if I've made a mistake. --SameerKhan 07:56, 14 February 2007 (UTC)[reply]
Yet another follow-up - I just saw that the Ethnologue list of most spoken languages includes statistics that are vastly different from that which is shown here. Can someone verify the real situation? —The preceding unsigned comment was added by SameerKhan (talkcontribs) 08:02, 14 February 2007 (UTC).[reply]
First, thanks for doing the re-ordering. I don't think Joshua Project do as much direct data collection as SIL, though I think the latter do their direct research mainly for languages they work in, which tend to be smaller. It seems that various advocates of different languages find the biggest figure they can come up with, so will use whichever source gives the biggest number: I think this is why the Joshua Project data is being used. As for what the Ethnologue actually says, I think Ethnologue list of most spoken languages does accurately reflect it (that has been my aim), but by all means check online at [3]. As for the real situation, that's what we're all struggling with! Drmaik 09:27, 14 February 2007 (UTC)[reply]
Ethnologue does things like separate Punjabi and Farsi into multiple separate languages. I'd want to be careful about using it as our basis. john k 18:03, 16 February 2007 (UTC)[reply]
Although Punjabi and Farsi are a bad example, since they're widely accepted as diverse languages, it's not true at all that SIL is "highly respected" among linguists. Actually quite the contrary. The SIL data can quite often be proved inaccurate, sometimes even wrong. Although it's a very large and comprehensive list, every linguist knows that one should always be sceptical about the data from that site. To name a few examples: Their naming conventions are sometimes intriguing, numbers of speakers can sometimes be off, clear dialects (especially those from Germany, that noone, be it linguist or housewife, would consider a "language") are declared "languages", while in other cases a distinction is not made. It's a good source for starting a research, though. But: Be careful with its data! Crosscheck twice! — N-true 13:34, 21 March 2007 (UTC)[reply]

Seems like after all the discussions above and below, we are back to square one as people add more and more "estimates", and pick one of them arbitrarily to suit their preferred ranking from whatever language group they belong to. It might be better to have a consensus on what data source to use for ranking, as the 3 different sources provide widely varying estimates for a given language. --Ragib 05:56, 17 February 2007 (UTC)[reply]

It does seem that adding the Ethnologue column was a concensus decision, but I'm not sure if adding the Encarta column was. S. Lodovico 10:08, 17 February 2007 (UTC)[reply]

Encarta column

I added a column that shows data from Encarta 2006. Maybe this could finally provide an accurate ranking system... Jerse 16:34, 14 February 2007 (UTC)[reply]

I think "other estimates" should be more respected.--220.217.87.84 18:50, 14 February 2007 (UTC)[reply]

Encarta is a copyrighted source. While "facts" are outside of the copyrighted domain, it's giving numbers that are far too specific to be meaningful. The number of Arabic speakers is given as: "422,039,637". How do they know it's not 422,039,638, or 422,039,636? Such specificity is improper at that scale. These numbers should all have no more than 3 or 4 significant digits, and the ones digit should be reported as significant only in the case of hundreds of speakers, not in the millions of speakers. I will ask you please to make the proper changes:
  • Do not reference Encarta 2006, as this is a copyrighted work, rather find out where they got their facts/information from, and use that.
  • Do not include unnecessary and inaccurate specificity in the numbers. Arabic has about 422 million speakers, not 422,039,637.
I will reference to SIL who provided the information to Encarta instead —The preceding unsigned comment was added by Jerse (talkcontribs) 19:53, 14 February 2007 (UTC).[reply]
That sounds much better! Thanks. :) --Puellanivis 20:49, 14 February 2007 (UTC)[reply]

Since SIL provides Ethnologue with their language information, isn't it safe to assume that SIL has the same credibility as Ethnologue? In other words, can we finally arrange the chart by information that's up-to-date by using the SIL column insted of the Ethnologue column?Jerse 21:18, 14 February 2007 (UTC)[reply]

well, the Ethnologue is the mouthpiece of SIL, so providing two columns, one ethnologue, one SIL, doesn't really make sense. The encarta data for Arabic is so different from from the ethnologue, that one has to question where they really got the data from: it is an extreme outlier: all other data I've seen gives Arabic between 170-225 million [4]. Even another encarta site [5] gives 206 million, evidently from the ethnologue. CIA has a figure of 323 million for population of all Arab countries (and there are big non-Arabic speaking minorities in Morocco, Algeria, Sudan, Iraq), so a 422 million figure is, well, ridiculous. So my proposal is, change the column back to encarta, rather than SIL, and mark the Arabic figure as an extreme outlier. Arabic is the main problem with the encarta data, though the ethnologue data also seems to be out of date in some places. Drmaik 06:50, 15 February 2007 (UTC)[reply]
Well if Arabic is the main problem as you say, here are a number of different factors that can contribute to the increase in recent numbers. 1) Islam is the religion of about 1 billion people if not more. The Holy Quran is written in Arabic and the it is very good to know how to understand Arabic in order to read the Quran. 2) Even though the events of 9/11 were tragic, it has been a milestone in the increase of Arabic speakers around the world. More people are studying Arabic taday than at any other time in the history of the world, I myself am apart of this group. 3) Who are you to say that the estimate of a well-respected linguistic corporation as SIL is incorrect. 4) The 2006 SIL estimate is only 1 year old. All other sources in this article are older, most of which date to the last millenia. The list goes on... —The preceding unsigned comment was added by Jerse (talkcontribs) 03:28, 16 February 2007 (UTC).[reply]
Also the Ethnologue list already has it's own webpage. Why would wikipedia need two chart's of the same information?Jerse 03:33, 16 February 2007 (UTC)[reply]


Number 1 point is quite wrong. A lot of Muslims can read Arabic, without understanding it at all. With translations available in most languages, it is not necessary to understand Arabic to read The Quran. Number two point is original research without supporting stats. As for latest SIL estimates, any referenced information is quite welcome. But we need to be consistent, we can't use a 1999 stat to compare with a 2006 stat when making a ranking. --Ragib 03:36, 16 February 2007 (UTC)[reply]
So are you saying we should use the SIL coulmn?Jerse 23:47, 16 February 2007 (UTC)[reply]
No. There is no consensus, so first try to achieve that. --Ragib 10:00, 17 February 2007 (UTC)[reply]

There is no reference to the SIL source. The reference (1) points to encarta, not to SIL. --Ragib 10:00, 17 February 2007 (UTC)[reply]

I you look at the source at the bottom of that link it says "Source:Summer Institute of Linguistics"Jerse 16:48, 17 February 2007 (UTC)[reply]
Then link to THAT directly. It is misleading to link to Encarta and claim SIL 2006 as a source. Thanks. --Ragib 16:50, 17 February 2007 (UTC)[reply]
Why is this so difficult? The source is clearly stated.Jerse 16:52, 17 February 2007 (UTC)[reply]
Well, that's because we can't really see the source to verify it. Right now, we only see that Encarta has this info, and cited SIL as the source. At most, you can claim Encarta as the source of the info and have the column named as such. But you can't link to Encarta and claim SIL as the source. --Ragib 16:55, 17 February 2007 (UTC)[reply]
Well at first I did name it the Encarta column but there was a problem because it's a copyrighted source. So instead I changed it to Encarta's actual source, SIL. But now there is a problem with that as well. The SIL website hasn't been updated to show the current data, or if it is I can't find it (and trust me I looked for it). Encarta seems to have correct sources being an encyclopedia and all, I don't understand the problem. Jerse 17:04, 17 February 2007 (UTC)[reply]
In that case (if you yourself haven't seen SIL/06), you should name the column and the source as encarta. --Ragib 17:10, 17 February 2007 (UTC)[reply]
So can the chart be ranked by Encarta 2006 then? Jerse 17:12, 17 February 2007 (UTC)[reply]

(resetting indent) That's a completely different issue. As mentioned above, the consensus seems to be of using Ethnologue data. If you want, you can start an RFC to gain a consensus on what data source to use. Thanks. --Ragib 17:15, 17 February 2007 (UTC)[reply]

I just took a look into your "Encarta" link. Actually, it is about "languages spoken by more than 10 million people". It doesn't specify *at all* whether they are considering native speakers. Also, you continuously mention SIL 2006. However, the Encarta page *only* refers to "Source: Summer Institute of Linguistics.". That is, there is no mention to the 2006 SIL report you keep mentioning (even though you haven't seen it yourself). Please clarify this. Thank you. --Ragib 18:51, 17 February 2007 (UTC)[reply]

If you scroll down it says, in red, *Data are for first language speakers only*. And I'm still working on the SIL/2006. Anyway, why would Encarta use obsolete information? It's not like it's wikipedia... Jerse 21:07, 17 February 2007 (UTC)[reply]
When Jerse changed the source to SIL, I had been thinking that he was changing the cited source as SIL, not continue to cite Encarta. We should cite SIL, and use whatever information they have released, out-dated or not, and then once SIL/06 information is released publicly, we can then update the information. --Puellanivis 21:09, 17 February 2007 (UTC)[reply]
Arabic is not the only problem. Ethnologue has also strange numbers on Persian. For example, according to CIA, the number of Persian-speakers in Iran alone is more than 30m. According to Ethnologue, the number of Persian-speakers world-wide (including those in Afghanistan, Tajikistan, etc) is only 31m - not mentioning the large Persian-speaking minority in Uzbekistan. According to experts, the number of Tajiks in Uzbekistan is up to 10m (see: D. Carlson, "Uzbekistan: Ethnic Composition and Discriminations", Harvard University, August 2003). According to Ethnologue, the number is 0! I have sent many E-Mail to Ethnologue and asked for their sources. Either they simply ignored the mails, or gave a simple answer: "We are not really sure". Tājik 17:26, 23 February 2007 (UTC)[reply]

Removed influences in language family catogory

I removed the influences from other families in the language family colomn. For Norwegian, Swedish and Danish they were mostly wrong (these languages are much more influenced by the Romance languages, especialially Latin and French than from Slavic or Finno-Ugric). I don't know much about Finnish, Lituanian, Slovak or Afrikaans, and the claimed influences might well be correct, but since this isn't mentioned for other languages, I see no reason to mention it for these languages.

213.225.127.188 02:57, 15 February 2007 (UTC)[reply]

I agree, it's information that is not appropriate for this list. Such information is more appropriate for the distinct articles for the languages themselves, as such there they can give the issue the proper treatment that it deserves. (Influences of a language are a very complicated subject.) --Puellanivis 02:59, 15 February 2007 (UTC)[reply]
Perhaps the language families should be reduced to three, as some are rather specific. PioKuz4 20:40, 21 February 2007 (UTC)[reply]

Hindi again

Hindi being listed as only 182 million native speakers, because that is the number of native speakers of Khariboli, is problematic. We don't list any of the other dialects of Hindi separately, nor the Bihari, etc., languages that are sometimes considered dialects of Hindi. We should either count all the "dialects" of Hindi together when giving the totals for Hindi, or we should list them separately, or some combination of the two (counting, say, Awadhi and Haryanvi as dialects of Hindi, but Maithili and Bhopuri as separate languages). Whatever solution is agreed upon, though, the current set-up is unacceptable. Either Bhojpuri is its own language, with 25 million odd speakers, in which case it should be listed here as its own language, or else it is a dialect of Hindi, in which case those 25 million odd speakers ought to be counted as Hindi speakers. As it stands, they are not counted as anything. The same goes for Awadhi and Maithili and Haryanvi and Kanauji and so forth. john k 19:11, 18 February 2007 (UTC)[reply]

To expand on this, our article on Hindi lists five groups of dialects/languages which are considered to be "Hindi" - Western Hindi, spoken around Delhi, in Haryana, and in Western Uttar Pradesh and Madhya Pradesh, including standard Hindi; Eastern Hindi, spoken in eastern Uttar Pradesh and Madhya Pradesh, and in Chhattisgarh; Rajasthani, spoken in Rajasthan; Pahari, spoken in Uttarakhand and Himachal Pradesh; and Bihari, spoken in Bihar and Jharkhand. Obviously, these languages are often quite different from one another, and aren't always mutually comprehensible. Western Hindi is closer to Urdu than it is to the other Hindi dialects, and the Pahari dialects are closer to Nepali, also considered a separate language. But I think that we have to employ political definitions of languages in this article, because those are more or less the only definitions that exist. Linguists can tell us that standard Hindi is closer to Urdu than it is to Bhojpuri, but they can't say that Bhojpuri is a language and not a dialect, because that's not what linguistics is concerned with. At any rate, I would suggest alternately a) counting all speakers of Western and Eastern Hindi (other than Urdu speakers) as Hindi speakers, and counting speakers of Rajasthani, Pahari, and Bihari languages separately; or b) counting all speakers of all five, save Urdu and Nepali speakers, as Hindi speakers. john k 19:58, 18 February 2007 (UTC)[reply]

Actually John, 182m is the Ethnologue quote for all of Hindi. I believe Drmaik can confirm this. Bhojpuri and Maithili appear to be the only two listed separately. It seems probable that someone thought the figure was too low and added Khariboli dialect. Ryan Leigh 20:23, 18 February 2007 (UTC)[reply]
My dear friend Ryan, just to give you a little insight in what you are claiming, look at population of state of Uttar Pradesh which is believed to be home state of Hindi speakers. It is somewhere 165 million. And go to any government of India site and it will tell you that only language spoken at home in UP is hindi(may be different dialects). Now there are atleast 5 other states (bihar, madhya pradesh, jharkhand, chatisgarh, haryana) with population more than 30 million where hindi is majority language (though again different dialects). Cities like mumbai, delhi etc having poulation near to 10 million (not a joke) are predominantly hindi speaking. I do not know what you talking about that there are just 182 million speakers of hindi. The studies qouting figures above 300 million seems to be more reasonable to me. -zombie_neal 21:22, 18 February 2007 (UTC)[reply]
No, I never claimed any figure. Actually, I was merely answering John's question about how Ethnologue classifies Hindi. I don't have an opinion on Hindi. Ryan Leigh 23:35, 18 February 2007 (UTC)[reply]
Ryan, Ethnologue lists the following languages separately, which are normally considered dialects of Hindi - Haryanvi, Kanauji, Awadhi, Chhattisgarhi, Bagheli, Bundeli, and some others. These languages/dialects account for about 50 million additional speakers to the 180 million Hindi speakers they give. If you include the Bihari languages (65 million or so for Bhojpuri, Maithili, and Magahi), and the Rajasthani (another 35 million), it gets even worse. One way or the other, these languages aren't being counted either in our count for Hindi or on their own. They should be counted one way or the other. john k 16:43, 21 February 2007 (UTC)[reply]
John, the 181m number is listed as simply Hindi [6]. I'm only concerned with what Ethnologue meant when they used the word Hindi. Looking at Arabic as an example, it is important to note that Ethnologue sometimes groups all varieties, other times separates them; but when they list Arabic is means all varieties of Arabic. As for how Hindi should be classified here, that's up to you and the others. Ryan Leigh 17:25, 21 February 2007 (UTC)[reply]
I don't understand your point. The Ethnologue number for "Hindi" clearly excludes Awadhi, Kanauji, etc., because that's how ethnologue works - it specifically tells you if it's double-counting, as it does with Arabic. It doesn't do that with Hindi. At any rate, any kind of review of the population of India makes it fairly clear that 181 million is too low if it is meant to include dialects. The combined population of Uttar Pradesh, Madhya Pradesh, Haryana, Delhi, and Chhattisgarh is something like 260 million, and even if we assume about 10% Urdu speakers that still leaves us with a lot of Hindi speakers unaccounted for. In fact, it leaves us with approximately the 50 million Haryanvi, Kanauji, Awadhi, Chhattisgarhi, Bagheli, Bundeli, and so forth speakers. If one counts the Bihari and Rajasthani languages as dialects of Hindi, as they often are, the numbers change even more. At any rate, I'm not really sure what your point is. Don't you think that the 25 million or so native speakers of Bhojpuri should either be counted in the Hindi totals, or listed on their own? john k 19:13, 21 February 2007 (UTC)[reply]
I've never disagreed with anything you've said. I just wasn't sure what is or isn't included under Hindi, that's all. I would've assumed it would be listed as Hindi, Standard or Hindi, Khariboli instead if they meant that. But you and apurv1980 seem passionate about the subject, so perhaps any further discussion should be with each other, not with me. Ryan Leigh 19:50, 21 February 2007 (UTC)[reply]
Guys, I think each one of us is saying the same thing and that is that status of hindi is not reported correctly on this page. Two possible sloutions are 1) Report all dialects of hindi as seperate languages with accurate figures 2) Report all dialects as one langauge 'Hindi' with total figure. Now it is upto other editors that which way they want to report it but the page in current form is factually incorrect and misleading. apurv1980 20:17, 21 February 2007 (UTC)[reply]
Yes, or perhaps one could report it as one language, but give figures for each division; in the way Persian is listed now. Ryan Leigh 22:09, 21 February 2007 (UTC)[reply]
I agree totally with your idea Ryan, lets report them as persian is reported. But the thing is there is lot of editing war going on here, how do we make a consensus. apurv1980 17:32, 22 February 2007 (UTC)[reply]
hindi - 182m, these guys make me laugh. Figure cannot be far from truth unless you consider various dialects of hindi as seperate languages. If they are not considered then this figure is a joke. -apurv1980 21:17, 18 February 2007 (UTC)[reply]
John's question wasn't about the number of speakers. Ryan Leigh 23:35, 18 February 2007 (UTC)[reply]
First, if we're using the the ethnologue as the source for ranking, then here's what it says: '180,000,000 in India (1991 UBS). Population total all countries: 180,764,791. Ethnic population: 363,839,000 (1997 IMA).' It also says that 'Alternate names Khari Boli, Khadi Boli'. Ethnic population means the number of people who self identify as Hindi speakers, but evidently the Ethnologue feels that mutual intelligibility is not assured. And if you look at [7], you will find varieties such as Bhojpuri, Haryanvi and Bundeli listed separately, with their own figures. Indian census data 1991 for Hindi was 337m, I believe: I did put that in in the midst of the edit warring, but it seemed to get lost. It should be there somewhere. Drmaik 05:49, 22 February 2007 (UTC)[reply]

Update on Encarta information

I e-mailed SIL at info-sil@sil.org to ask if the information on Encarta was correct and they confirmed. Here is the original message:

"Dear Mike, Yes, we sent them the data. Conrad

Info-SIL/IntlAdmin/WCT Sent by: Jane Pappenhagen 02/19/2007 09:23 AM To Editor Ethnologue/IntlAdmin/WCT@SIL cc Subject Re: Encarta 2006 information

Jane Pappenhagen SIL Information

To <info-sil@sil.org> cc Subject Encarta 2006 information

Hi, I was just wondering whether or not the SIL information on Encarta 2006 is correct?

Here is the link: http://encarta.msn.com/media_701500404/Languages_Spoken_by_More_Than_10_Million_People.html

SIL is cited on the bottom of this page."

It's not much but it's a start. Just thought I would add this to this thread.Jerse 05:28, 21 February 2007 (UTC)[reply]

Ranking

the precise ranking is flawed. Maybe we should try tiers or something. The problem is that there is no central authoritative source. Ethnologue is the best bet, but their data are from rather different periods, and with population doubling every 30 years in some countries, data from 1991 just doesn't cut it (e.g. Tajikistan). We still have to go by ethnologue for the moment, there is no obvious alternative, but maybe we should review the whole idea of "ranking" "languages" by number of speakers. dab (𒁳) 21:10, 25 February 2007 (UTC)[reply]

I don't consider Ethnologue the best bet. Indeed, in many cases, it is the worst source for this kind of statistical data. CIA factbook is a much better source. Jahangard 02:29, 26 February 2007 (UTC)[reply]


CIA factbook would be monstrously difficult to use for ranking the languages as the data is divided amongst every country in the world. It is also inconsistent in the manner in which it reports linguistic information for each country, sometimes giving a veryt detailed linguistic profile of a country and other times only stating offical languages with a list of some select minority languages with no way to determine raw numbers of speakers. As a result the biggest obstacle to using the world factbook for ranking is that the effort to derive the ranking from their data renders it essentially original research by wiki standards.Zebulin 03:34, 26 February 2007 (UTC)[reply]
For major languages, using data from CIA factbook is very straight-forward, and adding a couple of numbers which are given for different counties is not original research (because there is no room for different interpretations). For other languages (specially those with less than 10 million speaker), there is no reliable source wich can be used for all of them, and therefore, it's better to just forget the idea of ranking them. Jahangard 05:17, 26 February 2007 (UTC)[reply]
It becomes original research for some languages because it is so difficult to uniquely identify all of the countries entries which will be consulted to derive a total for a given language. The only way to avoid that sort of judgement call would seem to be to consult *all* of the countries data for *all* of the languages so as to be sure that no minority population is missed in the total. It might even be less work to script it somehow in that case.Zebulin 22:42, 26 February 2007 (UTC)[reply]

Ranking by comparing information from various sources, from various time periods, is inherently problematic. So, Dab's idea of a tier based listing sounds good. Also, ethnologue itself is sometimes contradictory, especially in separating languages and dialects (e.g. Chinese, Hindi). --Ragib 04:20, 26 February 2007 (UTC)[reply]

The problem of Ethnologue is much more than that. Ethnologue uses different statistical data from different sources, and from different dates. The main problem is that in many cases, Ethnologue mixes these data in the most stupid way, and sometimes generates pure statistical nonsense. Jahangard 05:16, 26 February 2007 (UTC)[reply]
The problem is that there is no source which uses precisely the same criteria for each language, as there is not one organisation that collects all the data. I think before changing the ranking, may be try to have another column for CIA data. But I have not seen any good CIA data for Arabic, for example. (If I've missed it, let me know). Stating the population of the Arab world is irrelevant. So, faute de mieux I'd recommend leaving the criteria as they are: getting rid of ranking would make the article much less clear, and useful, and without established criteria for where a language would be in the tiers, we'd be back at edit warring with proponents of a particular language. Drmaik 06:02, 26 February 2007 (UTC)[reply]
I tend to agree with Drmaik. The ranking does already say that it's based on Ethnologue, so the reader will know the rank is an estimation and not perfect. Tiers may cause more disagreements. Lyc. 00:41, 27 February 2007 (UTC)[reply]

ethnologue is the only organisation I am aware of that collects and makes readily available data on all (6000+) languages. If we're just going to rank the top 100 (and anything else is pointelss anyway), we might find some more up-to-date source. The problem is the "rank" parameter in {{language}}. I think that should go, because the "rank" cannot be established with any certainty for any but a handful of languages. I think we could rank the top 25 or "above 50 M" or so with some confidence. After that, we should drop the ranking and just do tiers of 10-50 M, 5-10 M, 1-5 M. For the top 25, I think we can also manage to compare various sources and look for a consensus estimate. To do this for the entire list would be a nightmare. Regarding Persian, what is the source for the "70-80 million" figure? dab (𒁳) 07:57, 26 February 2007 (UTC)[reply]

It seems the other estimates column is mostly from 2004-2005, so I doubt you will receive a response. The 50-100 ranks seems to have remained relatively unchanged for years. Perhaps no one looks at them. Lyc. 00:41, 27 February 2007 (UTC)[reply]
About the whole idea of of ranking languages, I think it should be limited to languages with more than 20-30 million speakers (for others, it's not feasible). About Persian language, are you asking me? I've changed the numbers to 62 million (for native speakers) and the source is the CIA factbook (its estimate of the percentages might be old, but it's still the most reliable source that we have). Jahangard 08:10, 26 February 2007 (UTC)[reply]
I just noticed that. 62 million sounds like a reasonable estimate for Persian. This means that the actual ranking of Persian would be closer to 20 than 27. But we cannot deviate from the stated "ordered by SIL" just for Persian. Somebody would have to make the effort to check all the top 30 or so against the CIA factbook, and then order by that. Ranking above 30 or so is increasingly pointless. Already above 12 it is becoming difficult, what with French vs. Wu, Javanese vs. Korean, Cantonese vs. Marathi vs. Tamil. We should give the same "rank" for estimates within 2% or so of one another. I hereby suggest we remove the "ranking" number for the entries below 10 M. We can state how many languages we list in each tier, but maintaining a strict ranking is going nowhere (we'll never get rid of the "disputed" tag). dab (𒁳) 08:37, 26 February 2007 (UTC)[reply]
Using CIA factbook even for just the top 30 will certainly be harder than most people seem to be giving it credit for. I remember trying to use CIA factbook to just get a rough ballpark estimate for english using their data for just the 4 countries that I assumed would have the most native english speakers. Straight away I found an oddball in the data for the united kingdom which seemed to suggest that everybody in the country either spoke english or welsh as their native language. I had expected to make a rough estimate in maybe 5 minutes from US, UK, Canada and Australia cia world factbook data but in fact it took something like 10 minutes due in large part to time pondering the obviously odd UK data. Are we going to want to rank the top 30 languages by such half arsed methodology? Furthermore english is probably not the most difficult case. Deriving a total from world factbook data for Spanish, Arabic, French, and Russian for instance will surely be much more frustrating. Any number thus derived will surely be continually tweaked up and down due to minor arithmetic errors by the original editor and by would be revisionist editors attempting to check their work. I'm tenatively backing the tiered ranking idea. We can alphabetize the languages within each tier. The only sensitive points would be those languages that happen to fall on the borderline of a tier but skillful selection of tier ranges may minimize the ambiguity of which languages belong to each tier.Zebulin 22:24, 26 February 2007 (UTC)[reply]


If in fact we do go ahead and use CIA world fact book derived totals we ought to include a listing of all country entries from which the total is derived to facilitate proper checking of the math. A wrong assumption about which countries were considered in deriving the total number of speakers for a language will make the total thus derived seem to have been either grossly in error or doctored/made up.Zebulin 22:34, 26 February 2007 (UTC)[reply]
so far, our options appear to be CIA or SIL. Maybe there are other sources we are missing so far? If we're going to rely on CIA, we need to create a clean table of the CIA data first. I.e. download all 200 or so pages and parse them into a html table sorted by language. Once we have that, addition will be comparatively simple. In cases where SIL and CIA are close, the number is probably reliable, and we'll just have to look further into those cases where the two sources are in significant disagreement. dab (𒁳) 11:44, 28 February 2007 (UTC)[reply]

Official speakers

In addition to my previous comments in Archive 2, can we continue adding the following languages in the bottom table: Tetum 800k speakers Venda 750k speakers Irish Gaelic 380k speakers Maltese 371900 speakers Luxembourgish 300k speakers Dhivehi 300k speakers Maori 165k speakers Dzongkha 130k speakers Hiri Motu 120k speakers Romansh 60k speakers New Zeland Sign Language 7.7k speakers Bislama 6.2k speakers, 200k as second language.

Thanks to those for their additions earlier. I would do it myself, but unable to due to lunchtime access only to PC, and lack of editing experience in tables!!!! RAYMI....................80.68.39.212 15:18, 5 March 2007 (UTC)[reply]

Navajo, with 178,000 would also be a good addition. john k 16:53, 5 March 2007 (UTC)[reply]

Old Proposal

I propose deleting the rank from the language template, I think every body who reads the talk page of this article will appreciate it. --Pejman47 19:22, 6 March 2007 (UTC)[reply]

Continue to list the countries in the current order but just eliminate the rank column?Zebulin 01:33, 7 March 2007 (UTC)[reply]
I mean deleting it from the language template that is in all the language articles. --Pejman47 21:18, 8 March 2007 (UTC)[reply]
I can't believe I hadn't thought of how much trouble that is likely causing until just now.Zebulin 21:38, 8 March 2007 (UTC)[reply]
If you agree, we can vote for it here. --Pejman47 20:30, 11 March 2007 (UTC)[reply]

Indian languages

Has anybody else seen the Central Institute of Indian Languages site? It gives very detailed figures for number of speakers of Indian languages, and divides them up in a comprehensible way - it gives "Scheduled languages" and "Non-Scheduled languages" as broad groupings, with total numbers, and then lists "Mothertongues" within each of the broader groupings. It lists every scheduled and non-scheduled language and every mother tongue with more than 10,000 speakers. Thus, for instance, Hindi the scheduled language is listed with 337,272,114 speakers. Within that, it gives 233,432,285 speakers for Hindi as a mother tongue, and then goes on to the various other related languages - 23,102,050 for Bhojpuri, 10,595,199 for Chhattisgarhi, and so forth. It strikes me that it would make sense to use this source for our numbers of speakers of Indian languages. john k 20:31, 8 March 2007 (UTC)[reply]

excellent. God knows we can always use more hard references for this.Zebulin 21:14, 8 March 2007 (UTC)[reply]
The question would be whether we should use the scheduled/non-scheduled language totals or the mother tongue totals for the purposes of this list. This makes a significant difference for Hindi (and also determined whether Bhojpuri, Chhattisgarhi, etc., are listed separately or not), and considerable difference for some of the others, especially ones like Bhili, where there's lots of different dialects and no clear standard form. The other issue would be how to combine the numbers given with data for other countries - Bengali, Punjabi, Sindhi, Nepali, English, Tibetan, and Arabic are spoken primarily outside India, and there are significant numbers of Hindi, Tamil, Urdu, and perhaps Gujarati speakers outside of India as well. The numbers given also appear to exclude Jammu and Kashmir, or do something else that results in an abnormally low number of Kashmiri speakers (which it acknowledges, noting that figures for Kashmiri are partial, but failing to indicate what exactly they cover. Figures for Dogri are also low). john k 21:43, 8 March 2007 (UTC)[reply]

Another issue is how to deal with it on the list, in terms of sourcing. john k 21:48, 8 March 2007 (UTC)[reply]

Persian

Persian should be moved up to the list of more than 100 Million speakers. There are approxiamtly 110 Million Persian speakers worldwide. The chart itself states that.Dariush4444 20:26, 11 March 2007 (UTC)[reply]

How do you get to 110 million? There's 70 million people in Iran, but a sizeable minority do not speak Persian as their first language (at least 30% - i.e. no more than 49 million native Persian speakers in Iran). Afghanistan, with about 30 million, has about 50% native Persian speakers - that gives us 64 million or so. This is all being rather generous, as most figures I have seen give no more than 60% for the Persian population of Iran, and there's certainly not a 55 million person Persian diaspora. john k 22:23, 11 March 2007 (UTC)[reply]
yes, I agree with you, Dariush exaggerates and I reverted him. But you forgot to add tajikistan and Uzbekistan (via CIA fact book), and about Iran, there is a situation like in Wales or Scotland or some parts of Spain: most of the population are at least bilingual from childhood. --Pejman47 22:47, 11 March 2007 (UTC)[reply]
The CIA numbers do not mention bilingualism either. Many people in Iran, Afghanistan, Tajikistan, Uzbekistan, etc are bilingual and speak Persian as well as some other language (mostly Azeri, Pashto, and Uzbek) as a "first language". Thus, the number of native speakers is indeed much larger than the 60m mentioned in the text. But this also means that the number of Pashto, Azeri, and Uzbek speares is larger. The CIA (as well as Ethnologue) count everyone as "Non-Persian-speaker" who speaks also another language in addition to Persian. Someone with mixed Azeri and Persian origins is automatically labled "Azeri". That way, the number of Azeris in Iran reaches 20-30% of the total population, while Persian remains at 50% although the "real" number of native Persian-speakers may be up to 90%. Tājik 00:15, 12 March 2007 (UTC)[reply]
Tajik, my understanding is that this page is generally based around the assumption that a person has only one first language, and first language is generally defined on the basis of the common census category of "language used at home." What the exact answers to this question in Iran are, I cannot say, but we certainly shouldn't be double-counting. john k 00:21, 12 March 2007 (UTC)[reply]
I know the problem. But the point is that Ethnologue is not a reliable source at all. Not only in case of Iran, but generally. The numbers for Iran are simply invented by Ethnologue - they have no sources for it, they have not carried out a census. Maybe you should write them a letter and ask them for information. Believe me: either they will ignore you or they will tell you that they have sources - of course not naming them. Ethnologue's numbers for Uzbekistan contradict all other sources, even that of the Uzbek gouvernment: [8] Tājik 21:00, 13 March 2007 (UTC)[reply]
The Uzbek government should hardly be treated as a trustworthy source. I agree, though, that Ethnologue is not particularly reliable. But what source would you suggest we use instead? john k 21:46, 13 March 2007 (UTC)[reply]
I suggest to use either academic sources (for example the Encyclopaedia of Islam) or the CIA Factbook. The CIA factbook is not really reliable either, but at least it is something - and it is official. Ethnologue is the mouth-piece of a religious organization and has certain "agendas". Tājik 23:53, 13 March 2007 (UTC)[reply]
HI just wanted to let you know ethnologue is not reliable with regards to Iran. I have contacted them directly and they said they can not locate their sources and they will make an update on the next version. I have the e-mails with this regard if anyone is interested. The e-mails are from Ray Gordon the major editor of ethnologue. So the ethnologue info should be removed all together with regards to Iran. --alidoostzadeh 02:47, 14 March 2007 (UTC)[reply]
The World Factbook gives 58% Persian speakers in Iran, 50% in Afghanistan, 80% Tajik in Tajikistan (if we are counting Tajik as the same as Persian), and 4.4% in Uzbekistan. There's apparently an addition 33,200 in China. That would come to, er, 62,451,835. Presumably beyond that there's a considerable diaspora - According to various wikipedia articles, there's 310,000 in the United States, and 94,095 in Canada. I'd assume beyond that large communities in western Europe and the gulf states, at least. But probably no more than, what, a million or so? So no more than 64,000,000, using the CIA numbers. john k
Yes, that sounds much better and much more realistic to me - except for the numbers in Uzbekistan. The 4.4% are directly copied from official Uzbek numbers and do not reflect the opinions of Western scholars and experts on Central Asia. The real number for Uzbekistan is - by estimate - somewhere between 20-50%. The 50% is too high since many Uzbek nationals are naturally bi- or multi-lingual and speak several languages at a native level (including Russian). The best guess for Uzbekistan's Persian-speakers is probably 30% (1/3 of the total population). Keeping in mind that almost all Uzbek cities - except Tashkent - are predominantly Persian-speaking (most of all Bukhara and Samarqand), the 4.4% are totally wrong. Conclusion: estimating the total number of native Persian-speakers at 70m sounds pretty good to me. Another 50-70m (estimate) speak it as a second language, most of all in Iran and Afghanistan where the language is spoken and understood by 90-99% of the population in each country. Tājik 10:46, 14 March 2007 (UTC)[reply]
Well, the list is ranked according to ethnologue. I agree that their figure for Persian is almost certainly too low, but I think we need to keep it for the sake of consistency. Putting other referenced data in is fine, including adding up CIA figures. But coming up with new figures for Uzbekistan seems to me to be original research, which isn't what wikipedia is about. Drmaik 13:31, 14 March 2007 (UTC)[reply]
I do not see why the ranking should be by ethnologue when ethnologue confirms their numbers with regards to Iran are wrong. --alidoostzadeh 01:17, 15 March 2007 (UTC)[reply]
I would concur with this. It seems deeply odd to present readers with information based on a source that itself acknowledges that information is incorrect. john k 04:37, 15 March 2007 (UTC)[reply]
Err, they said they couldn't find their sources, which is very different from saying/admitting it is wrong. It's just that once ranking is done by more than one criterion, everyone will want in, and come up with their own reasons for their own language to be ranked higher, and the page will become essentially worthless. This isn't a competition, people. By all means put criticism of the ethnologue figure in, point to other sources etc., but deciding to change the basis of the ranking based on one case will dirupt the whole page. Drmaik 05:52, 15 March 2007 (UTC)[reply]
Fair enough on what Ethnologue actually said. Beyond that, I think we should change the basis of the ranking because Ethnologue is pretty bad on a rather wide front. Personally, I'd prefer to use official census type data for as many countries as we can find it for, and for ones where we can't to use the best sources available, and only use Ethnologue when we have no better option, but I fear this might count as "original research." But, at any rate, there's plenty of good reason not to use Ethnologue. Largely because it's shit. If we want to have a list of ethnologue's top languages, that's easy enough to do. This list is at least theoretically meant to be a list of the number of speakers of languages, not a list as determined by Ethnologue. If the general sense of those who should know is that Ethnologue is not terribly good, we shouldn't rely on it when we can avoid it. For instance, as I've noted before, for Indian languages there's a much better source available, in the form of the Central Institute of Indian Languages. For the US, the US Census of 2000 has detailed figures available on language use (although it sometimes groups together several related languages for the smaller groupings of immigrant languages). South Africa's census is online, as well, and includes linguistic data, at least for South Africa's eleven official languages (the rest seem to be grouped together as "other"). If a mishmash list can actually be sourced, I don't really think it counts as OR. Or, at least, if it does, that only shows how out of whack the OR policy has gotten. john k 06:03, 15 March 2007 (UTC)[reply]
BTW, I think this discussion probably needs to be under a different title now, but didn't want to move anyone else's contributions without their permission. I wouldn't oppose a 'properly sourced mishmash list' (don't think that would be WP:OR), but we'd need quite a discussion of the principles first, and state them clearly somewhere (I think it would need to be a little complicated), and have quite a few people on board. And it seems that most editors don't stick around here for very long, probably becuase of the constant edit warring, which, it seems to me, has been a lot simpler to deal with since this page is ranked according to the ethnologue. So I think I'm mildly suportive of what you're thinking, more so in theory than in practice! Not sure how much I could contribute though... Drmaik 06:28, 15 March 2007 (UTC)[reply]
I agree that in practice this might be difficult to implement. I'd like to hear other opinions about it. john k 19:06, 16 March 2007 (UTC)[reply]
I wonder if Tajik might point us to some links as to the estimates of "western scholars" of the Tajik population of Uzbekistan. john k 14:45, 14 March 2007 (UTC)[reply]
Sources were already given, for example D. Carlson, "Uzbekistan: Ethnic Composition and Discriminations", Harvard University, August 2003, who estimates the total number of Tajiks in Uzbekistan to be somewhere around 11m (40% of the population), and the number of those who speak Persian at home at 30% (the total number also includes Tajiks who speak Uzbek or both languages at home). Some other sources [9] have also picked up this number, even going as far as 14m. Tājik 18:17, 14 March 2007 (UTC)[reply]
citing other sources is fine. But: for the sake of consistency, this list needs to be ordered by one central source, and Ethnologue is our best bet for that. Otherwise, ranking will become a function of the presence of linguistic activists on Wikipedia who go and collect the highest possible estimates: to do this for Persian but not for other languages that may also be under-estimated will lead to biased ranking. If Ethnologue is, as you say, aware their numbers for Persian are too low, it's no big deal, just wait for the next edition and they will give a higher estimate. dab (𒁳) 20:38, 16 April 2007 (UTC)[reply]

Nia / Dene languages

I expected to see some languages from this family on the list, since they are shown on the map that is on this page. I was suprised not to find the Navajo/Dine language on this list, since it is shown on the map (SW United States). Maybe a different map would be more appropriate to illustrate this article? 71.213.139.166 08:42, 18 March 2007 (UTC)[reply]

Navajo could easily be added to the list. john k 00:22, 22 March 2007 (UTC)[reply]

Indonesian language and placing

In the article of Indonesian language it ranks the number of most spoken at 8, while it doesnt appear on the list and its place is taken by Russian. I'm not fully aware of the complexities of the Indonesian language and that, if someone could explain that'll be beneficial to my self-awareness. Cheers. Aeryck89 17:19, 18 March 2007 (UTC)[reply]

You are quite correct. There are more than 200 million Indonesians, who learn the official language of the republic at school. There are hundreds of local languages, which in places are the language of the home and in some the language of business also. Encarta quotes Indonesian speakers at 17 million. I just cannot understand where they get that number!
I will do my best to research the issues that make quoting language statistics so difficult. Alastair Haines 02:02, 13 April 2007 (UTC)[reply]
I noticed this weird link from the Indonesian language page, too. I think 8th is roughly the correct ranking for Indonesian if you include all fluent (instead of native) speakers. From what I understand, only a few people in Indonesia learn Indonesian *first*, while almost everyone learns it fluently later on in school. So there are very few "native speakers" though there are over 200 million perfectly fluent speakers. That's why you have this absurdly low 17 million number - it's only capturing "true" native speakers, whatever that means.
So 8th appears to be roughly correct with rankings that take this into account. One at the bottom of this page "The Thirty Most Spoken Languages In the World" has Malay, Indonesian at 9th.
Obviously Indonesian is really missing out in most of these language rankings. It's spoken by almost everyone in the fourth most populous country in the world, plus almost everyone in neighboring Malaysia speaks a dialect of the same language (however you want to classify these things, Indonesia is a dialect of Malay or vice-versa). Point being that this all seems to be a huge flaw in the ranking methodology, though I can see how it arose.Thewhiterabbit11 20:39, 19 April 2007 (UTC)[reply]

See my response in the upper part. Kembangraps 15:34, 3 May 2007 (UTC)


This entire article is fundamentally flawed

IMHO the information provided in this table is misleading and ill-defined. Throughout the above discussion, no real consensus has been reached on what constitutes a native language, and even if we fix on a particular definition (which would have to be merely arbitrary), the information will still be useless. Here's a few reasons why:

  • Millions of people grow up in one country, move to another, and eventually end up speaking their adopted tongue with far more ease than their "native" language.
  • A German child may be regularly visited by an Australian au pair but have only a rudimentary knowledge of English. But because he speaks the language "from childhood" and "in the home" he would be classed by some of the above contributors as a "native speaker". Conversely, someone from China may have learned English at university as a third language, and speak it far better.
  • People like Vladimir Nabokov and Joseph Conrad would not register in this list, which renders it meaningless.
No, it doesn't render it meaningless. Nabokov (who wrote novels in Russian, too) was a native speaker of Russian, and Conrad was a native speaker of Polish. That they both wrote well in English tells us nothing of what their native language is. Nor are extraordinary writers statistically significant in what is an effort to have some sense of what the most spoken languages are. john k 04:40, 10 April 2007 (UTC)[reply]

I could give many other illustrations of why, in compiling such a table ranking, half-baked attempts to define "native" are neither interesting, nor of any practical use.

I'm also not sure what "half-baked a ttempts to define "native" means. We have generally used the listings of Ethnologue which, although it is not the greatest source in the world in terms of being rather out of date, is a widely respected source which in fact records the number of native speakers of more or less all languages in the world. Many governments also make similar renderings of their residents (the U.S. census and South African census figures on native language, for instance, are available online. There's an Indian institute, possibly government sponsored, which keeps similar information, also available online). john k 04:45, 10 April 2007 (UTC)[reply]

What would be far more useful would be simply a list of speakers of languages. How many people know English/Arabic/Mandarin/Russian well enough to be able to freely communicate? Admittedly you can't possibly obtain a precise figure for the number of speakers, you can only provide ball-park data. For example, we could provide a ranking of the number of people who live in territories where a given language has official status.

As it stands, this list will do nothing but confuse and irritate people. The global use of English vastly outstrips that of Spanish, so it's a patent absurdity to rank Spanish above English. I have no idea where the figure for Russia came from, but it's complete nonsense. I know from experience that most people in former Soviet states speak Russian fluently, very often without a foreign accent. Above the age of about 35, the majority of Kazakhs, Muscovites, Belarusians, Kiev Ukrainians and Riga Latvians are able to converse with one another in not only the same language, but pretty much in the same idiom and dialect - which is more than can be said for a Liverpudlian and a Texan.

In short, the upper part of this list bears bears no relation to any kind of reality, and says nothing useful. Palefire 19:55, 8 April 2007 (UTC)[reply]

It would be much much less useful to have simply a list of speakers of languages. Among other things, as you say, this is very difficult to determine. At any rate, the list is not a list of number of people fluent in a language (which is very hard to determine) but a list of languages by native speakers. You may not find such a list useful, but that doesn't really make it so. Lots of governments keep data on people based on "language used at home" or "native language." This is a useful criterion, because it more or less assigns everyone above a certain age to a single language. This also gives some sense of the size of ethnic groups, and so forth. Obviously, it does not do everything that one would wish a list of languages to do, but so what? No single list possibly could. There is nothing stopping you from making a List of languages by number of total speakers, if you can find sources for such a list. But this list is a List of languages by number of native speakers. It is rather unreasonable for people to get "confused and irritated" because this list does what it says it does. For all the difficulty you want to put around the concept of "native speaker," it really isn't that difficult an issue, and its one that is used by many sources. john k 04:40, 10 April 2007 (UTC)[reply]
A further point, which is that you harp a lot on the supposed difficulties of defining "native language," but this is actually a fairly well-established concept. The idea of listing people based on whether they know a language well enough to "freely communicate" is a lot more difficult to figure out, and there's going to be far less data on this conception. john k 04:43, 10 April 2007 (UTC)[reply]
I feel sympathy for those who are working on this article. There are many issues and they are not easy issues. Reliable sources differ in substantial and significant ways. I do not think the idea of the article is flawed, however. It is very natural to ask, "how many languages are there?" and "how many people speak each one?" More people speak English than Sanskrit, for obvious reasons. I am curious to know, though, if Spanish is indeed more widely spoken in some way than English. For me these questions are merely interesting. For a young person considering which language or languages they might want to learn, there is a practical side to the questions. One thing that would be ridiculous is to suppress numbers of speakers or ranking of languages out of fear of offending smaller language groups. For one thing, near extinct languages cannot be protected unless we know facts. Alastair Haines 02:29, 13 April 2007 (UTC)[reply]

This article is redundant.

Over the past few years I've been watching this article, commenting in the discussions, et cetera. I have thus decided that this article will forever be redundant and utterly moot. Why? Because no consensus can be reached on how many NATIVE speakers there are of a language. This will never change. The amount of English speakers is still hundreds of millions below figures I painstakingly laid out almost a year ago. God only knows how far off the other languages are.

Thus, I put it to you that we create an article entitled List of languages by number of speakers so that way there can't be niggling and arguing over figures incessantly and no forwards moving progress like there is with this article. I think this is the only real solution, and still won't negate the fact that there will still be quite a bit of niggling, the amount -spoken- is much better measured than the amount of 'native speakers'.

So, please, let's get a consensus going on this matter and we can try and lay the framework of the new article. Jachin 11:52, 18 April 2007 (UTC)[reply]

I don't think you've checked this page recently. In fact, the problem might not be that there's too many arguments, but that there's too few (or not any). There have been a few difficult-to-define languages, such as Hindi and Arabic, but other than that the article is extremely quiet. Perhaps too quiet, for such a large page. Sèryt 05:04, 2 May 2007 (UTC)[reply]

  • For the creation of the new article as a replacement of this article in purpose, but retainment of this article with redirection of citations of this article as a source of 'speakers' subjugated to that of 'native speakers' clearly with 'speakers' being focused more on the newly created article. Jachin 11:52, 18 April 2007 (UTC)[reply]
Let's not vote yet. Your proposal is not worded very clearly, and some of the statements above are misleading at best. It is generally much harder to find clearly defined and consistent measures of all "speakers" than it is to find figures for native speakers, so it would be harder to arrive at consensus under your proposal. Your English figures were rebutted last year. Also, a good part of the "niggling and arguing" here is about which languages to use, which would still be just as much of an issue under your proposal. -- Avenue 14:24, 18 April 2007 (UTC)[reply]
As Avenue says, it is much harder to find figures for all speakers than it is for native speakers. And the issue of definitions will be just as strong, too, because there will be issues of how fluent one has to be to count as a speaker. And, again, this doesn't solve the issue of when to group and divide macro-languages, and similar business, which is much of the arguing. john k 14:26, 18 April 2007 (UTC)[reply]
This proposal amounts to doing original research. If it's hard to agree on what defines a native speaker, and to find consistent sources for that, it is doubly, nay thrice, so hard to do it for the definition of a speaker. Drmaik 15:08, 18 April 2007 (UTC)[reply]
Hi, I'm new to this article and working on it seems to be an incredibly difficult (but interesting) task. While the arguments about sticking to the number of "native" speakers make a lot of sense on the whole, they also lead to some patent absurdities. For example, Indonesian/Malay is spoken by almost everyone in the fourth most populous country in the world (Indonsia's population is 235 million and Malaysia's is 25 million), yet Indonesian doesn't even appear on this page. For the lay user of Wikipedia (I fall more in this category) this is more confusing than anything when coming onto this page. "Native speakers" may be an unbiased way of looking at it, but it also sacrifices a lot of richness. I wonder if there is some sort of third way? Like leaving a lot of stuff about native speakers, but at least attempting a rough ranking based on fluency? There seem to be some sources with first and second languages that could work. Barring that, perhaps some discussion of the difficulties/controversies surrounding ranking languages? Surely this issue is a hot topic in many academic areas other than just the Wikipedia talk page. Thewhiterabbit11 21:10, 19 April 2007 (UTC)[reply]
I don't think Indonesian has been consciously excluded, it's missing only because no one's added it yet. I've noticed that languages are still being added, so feel free to. It would have a low native speaker count (Ethnologue column) and a high fluent speaker count(other estimates column). Granted, it's a little awkward, but then it's probably the only language in that situation. Sèryt 04:28, 2 May 2007 (UTC)[reply]

arabic

so arabic is num 2 after Chinese —The preceding unsigned comment was added by 89.139.225.36 (talk) 00:03, 5 May 2007 (UTC).[reply]

This is puzzles me ! tamilian population is 62M and people speaking tamil is 63M and population of karnataka is roughly 53M and people speaking kannada is 33M .. come on this is plain wrong stastics! Anand.raichur 16:36, 14 May 2007 (UTC)[reply]

Really stupid...the literary Arabic is written in all arabic conuntry, but in each arabian country they have a particular dialect... which means that an algerian, do not understand a palestinian when they speack together. So, cause You make a ranking on the number of speakers, and cause the arabic is only a literary language, you should asap remove the "arabic language "from a ranking on number of speacker. --84.47.61.180 18:43, 26 May 2007 (UTC)[reply]
This is not true, an Algerian would undrstand a Palestinian and any Arab from any Arab country. All arabs only study the Stndard Arabic in School and University and they can use it to communicate instead of dialects if there is a problem, the Quran and Hadith is wrriten in Classical Arabic which the Standard Arabic is its contemporary version and so religion keeps the Standard Arabic preserved, it the only language that servived very few changes in more than 1000 years according to some Academics (refer to some Arabic books). The only difficult dialect is from Almaghrib (Morocco, Algeria and Tunisia) when they speak fast. I am from the Gulf and when I was in Morocco I did not have too much problem understanding them unless they are old people from Villages or are using french loan words, they could understand my Gulf dialect easily. I used to switch to Standard Arabic if there was any problem in communication.
Really, I'm a Tunisian but I have no problem in understanding a Palestinian or an Iraqi. I am getting tired of explaining this. Moroccans or Algerians use some French loan words because of the colonial past but this is disappearing with new generations.138.48.213.186 (talk) 22:52, 8 December 2007 (UTC)[reply]

Afrikaans

Please note that Afrikaans has not since its independence in 1990 been an official language of Namibia. It is, however, probably the most widely spoken language.

I agree about the estimate for English speakers being extremely low. I also agree with the part about English and Italian being referred to as dialects. I am a fluent English and Spanish speaker and I do not understand any Italian at all. US-UK-NOR-MEX 01:56, 26 May 2007 (UTC)US-UK-NOR-MEX[reply]

French language

Well, there is a list of languages spoken by the TOTAL number of people, instead of native speakers. See "See Also" of this page! —Preceding unsigned comment added by Oscarch (talkcontribs) 13:33, 16 February 2008 (UTC)[reply]

Several detractors enjoy changing the datas, in particular the French language's rank, officially between the 11th and the 13th rank, the francophobe detractors decreasing this language on the 18th rank, they have to stop this hacking, the French is spoken in France (63,5 million native French speakers), to Canada (8 million native French speakers), in Belgium (3-5 million native French speakers), Switzerland (2 million French speakers), and more generally by divert immigrated in the USA, in Europe of French origins who speak french.

err, no, you're changing the data. The reference I gave in reverting your edits previously [10] is what the ethnologue actually says, while you seem to be making up the figure in the ethnologue column. Wikipedia is based on sources, not what you or I think. Also, as well as falsifying data, your edit includes some vandalism (changing 30-100 million to 10-30), so I'll have to leave a warning on your talk page. Drmaik 12:17, 28 May 2007 (UTC)[reply]

Désolé Drmaik, I don't undestand, your ranks' changing. In all the books French is classified 11th language by number of native speakers, it's possible that these last one years languages as the Javanese and Wu, due to the demography, were made declined French on the 13th place. But, I'm sure that 65 million native French speakers is an erroned data! It isn't Chauvinism, is Realism, the native french speakers in Québec are 8 millionss, in Belgium 4 millions, in France also 62 millions, etc... So you don't see a big error somewhere?

--Irrintzi 16:48, 28 May 2007 (UTC)[reply]

Il n'y a pas de quoi. I have two main comments to make. 1) we rely on sources, and it was decided, (not unanimously) to base the ranking on the ethnologue, for all its imperfections, as there seems to be nothing that is clearly better. The figures in wikipedia need to state what respected sources say, rather than original research. And you can check... 2) on that figure. Yes, it's probably a little low, but let's check assumptions. We've had 2 figures quoted without source for the number of speakers of French in France: 62 and 63.5 mill. These seem to be estimates of total population. But look at Languages of France, where research by INSEE (hardly a francophobe organisation) puts native French speakers at 86% of the population, which takes the number of French speakers in France down to around 50 million. Remember we're talking mother tongue speakers, not people who are fluent (and, yes, that is tricky, but this is the best data available).
On another note, where I do feel there is serious undercounting is in Africa, but this is unsourced: many people grow up with French as their first language in Abidjan for example, but would report to the census their ethnic tongue, the tongue of their tribe. But I know of no respected study which makes reliable estimates for this, so we can't put such figures in wikipedia for the moment, as they would be original research. Merci de votre comprehension. Drmaik 05:42, 29 May 2007 (UTC)[reply]

Ok, it's clean and clear. I undestand now, even if I'm a little pragmatic, I don't manage to find the ethnologists' link about the number of maternal/native speakers... I'm lost in the main page, could you help me? Thanks for all --Irrintzi 12:11, 30 May 2007 (UTC)[reply]

check [11]. And I'll copy something from the page...
French
A language of France
ISO 639-3: fra
Population 51,000,000 in France. Population total all countries: 64,858,311.
So in fact 64.9 would be a better rounding.
Hope that helps. Drmaik 12:28, 30 May 2007 (UTC)[reply]
I think this little discussion here illustrates the fact that ranking languages by native speakers is completely absurd. Henry Kissinger is not a native English speaker, and Shimon Peres is not a native Hebrew speaker, yet they are obviously English and Hebrew speakers now, so it really doesn't matter what language they spoke when they were 5 y/o if they have long discarded these languages. In the case of France, claiming that only 86% of the people are native French speakers is technically correct but practically meaningless. In real life, close to 99% of the inhabitants of France speak French only or almost only in their daily lives. Also not that if we were to use this strict native tongue criteria, probably the number of standard German native speakers is not even half of the German population, yet the list here considers that all the inhabitants of Germany are native standard German speakers. Same with Italian. Strictly speaking in Italy a large part of the population are native speakers of Sicilian, Napolese, Milanese, etc. All I can say is that this list here is deeply flawed. Godefroy 15:06, 18 June 2007 (UTC)[reply]
I wonder what your source would be for 99%? I know many British would say the same about the UK, but they'd be just as wrong. That's the whole problem with what you're saying here and below: sources. The majority always marginalises the minority. Have you been into Arab/Berber homes in France? in the quartiers nords of Marseille, Belleville, Barbes? Listened to conversations in cafes in Alsace? If so, I don't think you'd suggest 99%. But anyway, sources are the thing. Sorry you don't like the results. Drmaik 05:46, 19 June 2007 (UTC)[reply]
In the quartiers nord of Marseille as in Alsace, everybody speak French in their daily lives, even the people who speak Arab or Alsacian at home speak French only or almost only outside of home. In fact in 27 years spent in France the only one time I met someone who wasn't able to speak French was the Indian owner of an internet café in an immigrant neighborhood of Paris who was able to speak close to little French. And that was really odd (everybody complained how can you run a business in France and not speak French; actually the people who complained the most about this guy were the black immigrants from French-speaking Africa). That's the only one time. So 99% is certainly not far-fetched. Besides, remember that 1% of the French population that's still 640,000 people, which is a lot of people not being able to speak French. Godefroy 12:53, 19 June 2007 (UTC)[reply]
Hi. I wasn't saying these other people can't speak French, or not even not very well, but was talking about their first language. And actually in the quartiers nord lots of people do speak more Arabic/Berber in their daily lives than French, even though that doesn't mean they don't speak good French. Moi, j'ai vecu en France, et je n'ai pas eu de problemes a communiquer avec les gens la, mais je ne me suis jamais considere comme francophone (c.a.d de langue maternelle). Mais il y a un grand nombre d'autres anlgais la, en normandie et bretagne, qui ne parlent presque pas le francais (mes excuses!). In any case, I know we're not debating that, nor who exactly speaks what in the quartiers nord, but comparable definitions of speakers are much trickier than the already tricky issues concerning the mother tongue.

Hey there,... Godefroy send me a wiki-mail to join this particular conversation. It makes me upset when I see a foreigner wikipedian who try to convince other ones, that in France, there are like in the US, people who are not able to speak the national language.
For me, I've moved a lot throughout my born-land Alsace and dealing with this specific point : it brooks no agreement that people in Alsace speak in their daily lives only FRENCH, nevertheless, I have to add, so to recognize, that after a long working day, or during some family lunch during the weekend, some alsatian people would speak at first sight alsacian together in a restaurant but this point does not signify that they are only able to speak alsatian : alsatian is just a local dialect, a familial one. To work, to fulfil administrative paper, to go shopping or for everything else, French is THE language. Moreover, they is less and less people who are able to speak correctly alsatian, the reason is certainly that it is outdated and not daily spoken. So to conclude, a living person in France has to speak French... there is no way he could live speaking only a dialect Paris75000 14:12, 19 June 2007 (UTC)[reply]

Paris7500,you said of me I see a foreigner wikipedian who try to convince other ones, that in France, there are like in the US, people who are not able to speak the national language. Well, I did not say that. Please read comments before making such accusations. Merci bien Drmaik 05:22, 20 June 2007 (UTC)[reply]
Like I said before the list as it stands now is completely flawed because for France the Ethnologue counts only 51 million speakers of French, which is just 83% of the population of metropolitan France, whereas for Germany the Ethnologue counts 75.3 million standard German speakers which is 91.5% of the German population, and for Italy they count 55 million standard Italian speakers which is 94% of the Italian population. How can there be a larger proportion of standard German and Italian speakers in these two countries than the proportion of French speakers in France, when actually large parts of the population in Italy and Germany are dialect speakers and not standard German or Italian speakers? It just shows that the Ethnologue data are not credible and should not be used for this article. Godefroy 17:08, 19 June 2007 (UTC)[reply]
Godefroy makes a good point, these ethnologue figures are inaccurate. It would be reasonable to simply count the entire population of metropolitan France and the DOM-TOMs as 100% French speaking. These people all speak French, as will their children over the age of 24 months. I live in Alsace and I don't know a single Alsatian who doesn't speak French. Does anyone have sources to the contrary ? Metropolitan France + the DOM-TOMs (Guadeloupe, Guyane, Martinique, Réunion) + the territoires d'outre-mer = 64,5 million French speakers in France. As for the African nations, we'll have to be more careful because significant portions of their populations don't speak French. However, including Canadian francophones, Belgian and Swiss francophones, we are already at approximately 77 million. This is not individual research, it's just common sense. We must find more reliable statistics which better represent what is obvious to anyone who has studied francophonie. --84.101.142.131 09:11, 7 November 2007 (UTC)[reply]

Is Ethnologue a reliable source ?

The figures given in this page are not up the standards of wikipedia. Several figures appear to be contentious. Why the choice of Ethnologue as a primary reference ? Is it a politically motivated one ? Ethnologue is quite partial and not based on accurate statistics. If we look at Wikipedia in Chinese, Spanish or other languages we can see quite different rankings that match each other. Obviously, there the sources come from not only ethnologue, but also from UN and several UN agencies, national stats and CIA fact book. Ethnologue could be cited, but not used as the primary reference. It just depreciate the standards of quality and gives a partial point of view.

Original research

There are many problems with this article, but I can't see how original research is one of them. It's mainly cited, and if you read WP:OR you'll see that an we are meant to create a synthesis. Please quote something from the policy if you disagree. Otherwise I'll remove the tag. Drmaik 13:25, 4 June 2007 (UTC)[reply]

Spanish native speakers according to Encarta

The Spanish native speakers figure according to Encarta is 322 to 358 million people in the world, not only 322. The 322 figure is incorrect. See Encarta: [12], Encarta/Spanish Language, Encarta/VI.Languages of the World/The 10 most widely spoken languages

No, 322 is correct. Check the reference at the top of the column. For one column, we use one source, and that source is the one I've repreatedly referred to. I can also find a figure (referenced above) of 206 million for Arabic for Encarta, but it's not appropriate to put that in, as it's not from the same page. And we're using that link becuase it lists all languages above the top 10, rather than just the top 10. Drmaik 11:44, 7 June 2007 (UTC)[reply]

The list is only a summary, but the correct figure is in the definition of spanish in Encarta, and is only one source because is the Encarta's figure, it isn´t another source. Spanish is underestimated because It is not the same 322 than 322 to 358.

The figure 322 to 358 is more according to Ethnologue [13], and with CIA figures (See World figures). In both sources, spanish is the second in native speakers.

Yes, but that's the list referred to. I'm making the variable figure clearer in the article. Does that keep you happy? And in any case Spanish is listed second as is. Personally, I doubt that this is correct, but that's no justification for me to change it! The sources chosen say 2nd, so 2nd it is! (BTW, that's very old ethnologue data you refer to: it's from the 1996 edition). Oh, and please sign your contributions with four tildes. Drmaik 09:46, 8 June 2007 (UTC)[reply]

Has anyone actually read the references listed?

One of the references states that Spanish will be the fourth most spoken language in the world after chinese, english and Indian! This reference is clearly not suitable as Indian isn't even a language.

Lo cual lo que lo sitúa como cuarta lengua del mundo después del chino, el inglés y el indio.

Another reference states that Spanish will be the number one language of the USA in 50 years time. Where did you people get these references from? Also according to the same reference El castellano ocupa el cuarto lugar después del mandarín, más de 1000 millones; el inglés 500 millones y el hindi 497 millones/ in other words Spanish occupies the fourth place after madarin, ingles and hindi!

These references seem to be saying that Spanish is the fourth most spoken language yet the people that have included these articles as references have it listed as second in this article. What's going on?

Link

strange behaviour: you question these sources (which are provided as other estimates, rather than as a basis for ranking), and then seem to use them to demote Spanish to 4th place, while the ranking is based on the ethnologue. And you add Spanglish later on... What am I to conclude? Drmaik 09:20, 8 June 2007 (UTC)[reply]

Why ranking languages by native tongue speakers?

Native speakers data are inherently misleading. See what I wrote a few sections above. It would make more sense to rank languages by number of total speakers today (i.e. people who use most the language in their daily lives today, whether they are native or 2nd language users is irrelevant). Or better still, let's rank languages in alphabetical order. Ranking languages by native speakers will always lead to feuds and disagreements. Godefroy 15:08, 18 June 2007 (UTC)[reply]

So will your proposal, you cant satisfy everyone. Enlil Ninlil 19:37, 8 July 2007 (UTC)[reply]

Proposal

Ok, this is my proposal. We shouldn't rank the languages by number of speakers. This can only lead to permanent feuds because data are just too poor and inconsistent. What we should do is we should list languages in alphabetical order, which is more neutral. Then for each language we should present speaker figures from various sources. We should not make the Ethnologue figures prominent, as there are serious credibility problems with the Ethnologue (read what I wrote a few sections above). What do people think about this? Please vote (support, oppose, or neutral), and state the reason for your choice. Godefroy 17:23, 19 June 2007 (UTC)[reply]

Well, there's already an article List of languages by name, please go and edit that if that's what you want to do. This is a different article. Drmaik 05:21, 20 June 2007 (UTC)[reply]

Support

  1. Support - I've already stated why. Godefroy 17:23, 19 June 2007 (UTC)[reply]
  2. Support, the best source for this list is Ethnolog which is in the best case is outdated. --Pejman47 18:32, 19 June 2007 (UTC)[reply]
  3. Support, this list is pretty messy, that's why I clearly support its deletion Paris75000 18:48, 19 June 2007 (UTC)[reply]
    1. Comment This is not the place to discuss deletion: that's AfD, which is not done on the talk page. (later) Interesting to note that the two supporters besides the proposer were contacted on their talk pages encouraging them to vote. I wonder on what basis they were chosen? Drmaik 05:21, 20 June 2007 (UTC)[reply]
  4. Support, as long as other sources besides Ethnologue are given--its data has been debated numerous times. Perhaps someone leaving a note to the effect of "We know Ethnologue is outdated, but it's the most complete source we have" on the talk page would help clarify the situation. -- Sectori 03:39, 23 June 2007 (UTC)[reply]
  5. Support Our standards are not fallen so low that we start comparing ourselves with trash like Encarta. The rankings here have severe problems. For example, there aren't probably more than a couple of hundred million native speakers of Mandarin Chinese. deeptrivia (talk) 04:43, 5 July 2007 (UTC)[reply]
  6. The ranking will always be debated regardless of the number and the credibility of the sources used. GizzaDiscuss © 02:56, 6 July 2007 (UTC)[reply]
  7. support I fully agree with the reasons stated by Godefroy. The Ethnologue source is completely flawed since the criteria vary according to the language. A serious encyclopedia cannot rely solely on this source.
  8. support Ethnologue is not a reliable source at all. For example when considering Italy it states that in Italy a substantil part of the population is not native speaker of Italian! FALSE and MISLEADING! I'm Italian and I can assure you that in Italy people when speaking UNFORMALLY can use regional words and accent but they are able to understand and speak standard Italian NATIVELY! Then, these data penalize FRENCH a lot: recent data show there are about 200 milions 1st or 2nd language speakers of French (and 2nd language is a different thing than "foreign language"), francophone Africa counts more native franch-speakers every year, if we count speakers of French as a foreign language this figure rises to 500 milions (source: official site of the French Governement) PS: anyway the Italian Wikipedia solution of listing a number of different charts based on different sources in a very good idea. --Easyboy82 23:37, 10 July 2007 (UTC)[reply]
    yeah, but the most useful list is still the Ethnologue one. The others go to rank 17 at most. We cannot reproduce ten list with 200 entries each, that's madness. You'll just have to accept that the nature of language is fractal as it were: there is no way of defining it objectively, but lingustic communities are still a reality that can be charted in this way. The problem is that ther is no better source than Ethnologue, for all its shortcomings. If there was a better source, we'd use it, but as it is, we have to be thankful we at least have the SIL data. dab (𒁳) 13:14, 31 July 2007 (UTC)[reply]
  9. support. Ethnologue sources are not credible and are ridiculous ! Ethnologue sources destroy the "List of languages by number of native speakers"(Ethnologue sources are not a neutral sources but only an american point of view) Please, help and save that topic about languages in the world. Oldealliance 06:57, 3 August 2007 (UTC)[reply]

oppose

  1. OpposeThis is the kind of list that encylopedias (e.g. Encarta) have. It's useful information. Personally, I quite like the way the one in Spanish wikipedia is arranged, but that system will only work for the top few languages, as there are very few sources for less-spoken languages. It's also quite a lot of work. The only list that I know that goes as low as 10 million is the encarta one. Drmaik 05:21, 20 June 2007 (UTC)[reply]
    The ranking here is not based on Encarta. It is based on the Ethonologue, which has serious credibility problems. Godefroy 12:53, 20 June 2007 (UTC)[reply]
  2. Oppose - i oppose deletion wholehearteded how about a new proposal, lets put in all available estimates and list languages by the lowest credibale estimate available?Cholga 00:19, 22 June 2007 (UTC)[reply]
  3. Oppose It's not that I don't think this article needs a serious change, but I don't think that the proposed replacement has much useful merit. It would be difficult to interpret, and little use to visitors who simply want to compare language statistics.
    1. Equally though, I don't like the page as it is. There is already a page for Ethnologue list of most spoken languages, so this page relying heavily on those statistics is unnecessary (not to mention the debatable merit of the Ethnologue itself). I would be all for a list of speakers of a language, period; native or not. If such figures could be found, it'd be a more favourable replacement. Patch86 15:52, 30 June 2007 (UTC)[reply]
  4. Oppose It seems over-optimistic to me that putting the list in alphabetical order would really deter the nationalists. Furthermore, the basic question "which languages are the "biggest?" is a valid question which any self-respecting encyclopaedia SHOULD attempt to answer. The fact the available sources are not as reliable as we would like doesnt invalidate the question. I personally think the native-speaker criterion is the most meaningful; I dont really understand why some people have a problem with it. Jameswilson 23:01, 5 July 2007 (UTC)[reply]
  5. Oppose If you list it that way you're still presenting the same data, so you have to ask yourself: "In an article entitled "List of languages by number of native speakers", would it be more useful and logical to present it in a descending order or alphabetical. Frankly, I think most people are coming to this page to get a general idea of the size order of the world's major languages, so alphabetical order would only be counterproductive in presenting the data in the clearest manner. As long as there is a disclaimer at the top saying that all numbers are necessarily rough estimates, there is nothing wrong or POV with presenting the data in such an order. Joshdboz 23:13, 7 July 2007 (UTC)[reply]
  6. Oppose - I'd like to echo what others say - this list is deeply problematic, but it's still basically worth doing, and the way to make i t better isn't to give up on it. john k 15:50, 9 July 2007 (UTC)[reply]
  7. Oppose - I oppose for the same reason as everyone else. --Stefán Örvarr Sigmundsson 03:47, 17 July 2007 (UTC)[reply]
  8. Of course oppose - As per Jameswilson and my own personal reasons, I have referred to this page countless times for projects and so forth. Max Naylor 20:32, 17 July 2007 (UTC)[reply]
  9. oppose obviously. This list is useful, it just has to be very clear about its own shortcomings. But we shouldn't let the rank numbers run beyond the 10 M tier: that's supremely pointless. dab (𒁳) 13:10, 31 July 2007 (UTC)[reply]

neutral

Punjabi

Punjabi is ranked at 60 million, the "western Punjabi" statistic. There are 20 million "eastern punjabi" speakers, and that number is not included in the total punjabi native speakers count. Western and Eastern Punjabi is the same language, separated by a political border (Indo-Pakistan border). They may differ slightly, but that is to be expected with very similar dialects of the same language. To make an excellent analogy, this is like the variety in Spanish between, say, Spain and Mexico. I think Punjabi should be ranked by the 80 million because of the small difference (mostly religion/politics based) between western and eastern punjabi, and because of the precedent set by including all or most dialects of spanish under one umbrella figure. Thank you, user:Virsingh 68.38.87.65 12:10, 9 July 2007 (UTC)[reply]

I agree that Punjabi is generally considered to be a single language, whatever Ethnologue may say about it. john k 15:48, 9 July 2007 (UTC)[reply]

I'm a little new to wikipedia...how would i go about gaining legitamacy for moving punjabi to the 87 mil mark? Do I wait a few days and see if its unanimous, or do I make a heading saying Proposal and then see if people agree? Virsingh 21:24, 9 July 2007 (UTC)[reply]

in the interest of consistency, we should not rank "Hindi" according to the narrow Ethnologue definition, and then list Punjabi by a wider definition than Ethnologue. Delineating language boundaries will always be arbitrary, and there is no point in debating this here, we have to take some sort of scheme and stick to it. This article follows Ethnologue, which isn't a great source for this sort of thing, but the best we have readily available. By the same token, we would have to list German not at Ethnologue's "narrow" 95 M, but at the wider 123 M. We have to recognize that there simply is no way of doing this objectively, and that people will be unhappy with the list no matter what we do. dab (𒁳) 13:06, 31 July 2007 (UTC)[reply]

Ordering of some of the Amharic and Hausa.

In reviewing the information presented for the Amharic and Hausa it would seem that the order presented on the list is wrong any way you look at it. I am no expert on the subject, but if you say Amharic is ranked 36 with 17.4 million (ethno) 27 million native (32.7% Ethiopia [1994 census] and 2.7 million emigrants), 10% (7 million) as a second language = 34 million total and Hausa is ranked 42 with 24.2 million speakers, plus 15 million second langauge speakers for a total of 40 million. Hausa has more first lanauge speakers and more first+second langauge speakers than Amharic. Perhaps we should go through all the rankings to make certain the rankings given match the numbers in the explanation. 70.189.39.195 04:09, 30 July 2007 (UTC)[reply]

Follow up on the rankings on the Amharic/Hausa

I re-ranked the number of speakers in this list 10-30 million speakers based on the two different estimtes given in the table. The first set is based on the SIL estimate, the second set is based on the other information box and the third set is the rankings as origianaly present for comparison. For the second estimate I used the estimate of 'native' speakers only, if given in the 'total speakers' column. I used the upper value when a range of values was presented. Finally I give the original ranking. All columns sorted in descending order based on the estimate.

Langauge SIL estimate Language 2nd Estimate ranking in list Ranking Sundanese 27.0 Kurdish 31.4 Amharic 36 Romanian 26.3 Sindhi 28 Sundanese 37 Sindhi 24.5 Amharic 27 Romanian 38 Hausa 24.2 Sundanese 27 Kurdish 39 Malay 23.6 Romanian 26 Dutch 40 Pashto 22.8 Dutch 25 Pashto 41 Uzbek 20.1 Pashto 25 Hausa 42 Dutch 20.0 Hausa 24 Indonesian 43 Yoruba 20.0 Oromo 24 Oromo 44 Igbo 18.0 Tagalog 22 Tagalog 45 Amharic 17.4 Uzbek 20 Uzbek 46 Oromo 17.2 Yoruba 19 Sindhi 47 Indonesian 17.1 Lao 19 Yoruba 48 Tagalog 17.0 Cebuano 18.5 Somali 49 Assamese 15.4 Malay 18 Lao 50 Nepali 15.0 Igbo 18 Cebuano 51 Cebuano 15.0 Indonesian 17.1 Malay 52 Hungarian 14.5 Serbo-Croatian 17 Igbo 53 Shona 14.0 Malagasy 17 Serbo-Croatian 54 Zhuang 14.0 Somali 16 Malagasy 55 Madurese 13.7 Nepali 16 Nepali 56 Sinhalese 13.2 Assamese 15 Assamese 57 Greek 12.0 Shona 15 Shona 58 Czech 12.0 Khmer 14 Khmer 59 Fula 11.4 Zhuang 14 Zhuang 60 Serbo-Croatian 11.1 Madurese 14 Madurese 61 Malagasy 10.5 Hungarian 14 Hungarian 62 Somali 9.8 Sinhalese 13 Sinhalese 63 Quechua 8.3 Fula 13 Fula 64 Khmer 8.0 Tamazight 65 Tamazight 65 Kazakh 8.0 Haitian Creole 66 Haitian Creole 66 Haitian Creole 7.8 Czech 12 Czech 67 Kurdish 6.0 Greek 12 Greek 68 Tamazight 3.5 Kazakh 12 Kazakh 69 Lao 3.2 Quechua 10.4 Quechua 70

Statistics

is Statistics Include languages are from origin arabic and other arabic languages which use between arabs..?

like

Literary Arabic language

Bathari Arabic language

Harsusi Arabic language

Maltese Arabic language

Tigrinya Arabic language

Jibbali Arabic language

Mehri Arabic language

Soqotri Arabic language

Darija Arabic language

Where's Luxembourgish?

??? --MosheA 21:27, 3 August 2007 (UTC)[reply]


Isn't this list original research from the beginning to the end. People move languages up and down as it suits them, almost never giving any motivations or any sources to justify why one languages has certainly acquired a substanstial number of new speakers or tragically lost a few millions. And what's with Serbo-Croatian according to this list, there are 11 millions who speak Serbo-Croatian. That's the number of speakers for Serbian, so either the language should be called Serbian with separate listings of Croatian and Bosnian or the speakers of all three languages formerly known as Serbo-Croatian should be put in, resulting in well over 20 millions. JdeJ 18:48, 10 August 2007 (UTC)[reply]

The logical explanation is that it was probably listed as Serbo-Croatian at one time, then a user decided they should be separate, changed it to Serbian and reduced it to 11 million speakers; then a second user switched it back to Serbo-Croatian again but forgot to rechange it to 20 million. You're probably the first person to notice the mistake. Lyreto 05:40, 11 August 2007 (UTC)[reply]
It seems as if for every language we have the Encarta estimate, the Ethnologue estimate and then something called "other estimates". It almost defies belief, but it seems like the ranking is done on these other estimates. So anybody can put in whatever number they want and then the ranking is done on that. Although both Encarta and Ethnologue are external sources, these are disregarded in favour of these mysterious "other estimates". I'll start deleting them in accordance with WP:OR unless sources are given for them. JdeJ 19:14, 10 August 2007 (UTC)[reply]
I think you are being a bit harsh there, JdeJ. The point is that the Encarta and Ethologue figures are so often wrong. They are just a starting point awaiting improvement by those who have access to better info on a given language. It is right that the "other estimates" column should be treated as the best estimate.
Actually the ranks are supposed to be based on Ethnologue, but it appears no one's had the time to look at the lower-ranked languages. Lyreto 05:40, 11 August 2007 (UTC)[reply]

It is the net result of all the modifications which various contributors have suggested (based on national censuses or whatever) and others have debated on the talk page over a period of several years now. Until the recent clampdown on unsourced statements many of us often didnt bother to state the sources on the main page (if it was in a little-understood language for example or fully sourced on the wiki page for the language in question) but you shouldnt assume the figures werent subject to scrutiny by those involved in the particular discussion at the time. Granted, sometimes people do come along and stick in any figure they want but those are generally weeded out immediately. Jameswilson 23:18, 10 August 2007 (UTC)[reply]

JdeJ, there isn't a team of editors working on this page. There is basically only one editor, who is Drmaik and he can't keep up with every anonymous change. So you're basically arguing to an empty room! Also, I'm not sure what adding fact tags will accomplish, those figures probably were added years ago. Lyreto 05:40, 11 August 2007 (UTC)[reply]

Lack of consistency

Seems like there are many different ways to do the ranking here. For some languages, only native speakers are counted. For others, second language speakers are included as well. Some consistency please, we can't have it both ways. Almost by definition, the versions including second language speakers should be deleted. It's hard enough to arrive at a good estimate for native speakers, estimating the number of people who have learned it well enough to be considered second languages speakers is completely impossibly and original research in each and every case. JdeJ 19:10, 10 August 2007 (UTC)[reply]

Well, I think I've checked the top 33 or so, which are based on the ethnologue ranking, which was a narrowly-carried decision some time ago. Even maintaining that takes quite a bit of work (well, for me anyway!), as there are several people out there who want to inflate the status of their language. Below 33 (may be down to 50?) I'm not sure that there's much point in ranking languages. This list should have a lower limit: this is not list of languages. 10 million anyone? Top 50? And yes, unsourced 'other estimates' should be removed soon-ish. Drmaik 17:30, 12 August 2007 (UTC)[reply]
All of it sounds good. The further down the list goes, the harder it becomes to get any precise data. Top 100 might still be idea, but not any more than that. I must admit to being very sceptical about Ethnologue as a source. Being an academic within sociolinguistics myself, I know much too well how often they get things wrong. Everybody does it some times, and I can understand it with small languages, but they make so severe mistakes even with big languages at times that it almost defies belief. Of course, the alternatives are a bit limited... ;) Using the list provided by Encarta for languages between 10.000.000 to 50.000.000 might be a good idea. Above that, the Ethnologue figures are probably more up to date.
I'd propose to simply remove the column with other estimates. We'd also need to decide for how to do with some languages that are sometimes counted as dialects. It's pretty clear that almost every person living in Lombardy is counted both as a native speaker of Lombardian and Italian. I doubt any census has ever been made on the number of people who speak Lombardian, nor is it known to what degree the speakers actually use it in everyday life.
Another thing that would be urgently needed is to check the factboxes for at least major languages. Some of them are mindblowing. :) Looking at Italian language, we are informed that there are 120.000.000 native speakers of Italian and 200.000.000 speaker in total. Where are they hiding. The Italian diaspora is very large, but not that large given that many (the majority) are no longer native speakers of Italian. Same thing with Greek language, up to 25.000.000 native speakers. Most European languages (sorry for being more ignorant about languages outside Europe) do have sourced factboxes, but some feature these fantasy-figures. JdeJ 19:26, 12 August 2007 (UTC)[reply]
well, I think the 'other estimates' column has its place: for census data, for example. Doesn't the encarta data get more similar to the ethnologue the further down one goes? They state SIL as the source, which would explain that (though the Arabic figure in Encarta is just bizarre). In any case, I'd like to limit it to Top 100. I agree on factboxes, but there tend to be edit wars over such things. Drmaik 20:10, 12 August 2007 (UTC)[reply]

ok, the problem, as is recognized by everybody:

  • SIL Ethnologue is often outdated, its estimates are rather conservative, and it tends to classify as separate languages what are usually merely considered dialects
  • SIL Ethnologue is the only source we have that aims at providing statistics for all the world's languages.

The task of sorting we are facing here is of a different nature for large languages than it is for small ones. For smaller languages, say, below 5 million speakers, we'll have no choice but rely on Ethnologue. The question is, should we sort by other, more reliable sources for large languages? I have introduced a "Top 20" tier now. That's as arbitrary as any other tier structure, but it happens to nicely cover languages with 60 million native speakers or more, and it dovetails with our Ethnologue list of most spoken languages article. Now, for the people unhappy with SIL ordering, how about focussing on this "Top 20" tier exclusively for now? You can gather the most up-to-date statistics and estimates, and we could re-order it according to these. Once you have done that, you can attack the 30-50 M tier, and maybe even the 10-30 M one, at which point you should stop and leave the smaller languages to SIL. I think that if this is carried out cleanly for all languages in a tier, and not just for a single favourite, this will be uncontroversial. As such, this isn't a content dispute so much as a suggestion for improvement. Since the problems and possible solutions seem well defined, and admitted by the article itself, what remains is really mostly somebody sitting down and sacrificing the time necessary to do this properly. --dab (𒁳) 14:29, 14 August 2007 (UTC)[reply]

I think your idea is worth trying: we do have the Ethnologue list of most spoken languages already, so this wouldn't need to have the same ranking. The biggest problem is with the big languages, which tend to be spoken in lots of countries. So I would propose that we discuss changes of figures here, and once we've done a few, we could change the ranking basis. It does sound like a lot of work, but I don't mind working with others on it. But I think we need ideally 3 doing it.

Drmaik 14:46, 14 August 2007 (UTC)[reply]

(later) My only worry is that the current system is easy to administer, and seems to have reduced the amount of single-issue fact-changing. The more complex we go, the more open to abuse this whole thing is. Some criteria might help, e.g. census data being the most valued, then studies by serious research bodies (e.g. look at Languages of France for some well-grounded data done for France, done by a French research body (but contested by some French editors, basically becasue they want to inflate the figure)) Drmaik 06:26, 15 August 2007 (UTC)[reply]
This sounds like a very good idea. Having spent a number of years studying the sociolinguistic of smaller languages in particular, I'd be happy to help out with those belowe 10M. Of course I'll be happy to help with the bigger ones as well :) JdeJ 17:14, 14 August 2007 (UTC)[reply]
Definitely. I like the idea of a top 50 or 25; or even by >50million. I'd avoid using ethnolouge all together. As someone who isn't a linguist, I don't know how respected it is, but the figures are absurdly out of date. 1984 population for the U.K? 1970 census for American Samoa? 1999 for world english figures? Its crap, pure and simple. I think a list of total speakers might be more useful; you could have one column for native, one for second language, and one for total. I don't know how difficult that would be. It seems we only have either 'official language' or 'native speakers', which doesn't actually tell you how many people speak which language. Iorek85 00:56, 15 August 2007 (UTC)[reply]

I've just worked on linguistic demography, which should be the article discussing the difficulties involved here. We have to recognize, that for major languages, it will not be possible to get an estimate that is better than to perhaps 10%. This isn't so bad, but of course makes it impossible to give a reliable ranking. The only thing we can do is give "rowspan" rank ranges. E.g., there is no way to be positive on whether Russian or Portuguese have more speakers, but it is clear that together they rank as 7th and 8th, after Bengali and Arabic, and before Japanese:

Comrie (1998) Weber (1997) SIL
1. Mandarin Chinese 836 1,100 873 (1990s)
2.-4. Hindi+Urdu 333 250 364 (1997)[1]
Spanish 332 300 322 (1995)
English 322 330 309 (1984)
5.-6. Bengali 189 185 171 (1994)
Arabic 186 200 206 (1998)
7.-8. Russian 170 160 145 (2000)
Portuguese 170 160 178 (1995)
9. Japanese 125 125 122 (1985)
10. German 100 100 114 (1990s)

--dab (𒁳) 12:20, 15 August 2007 (UTC)[reply]


Rank 2 of Hindi

Hi All,

I have calculated the number of native speakers of all the dialects of Hindi as given in Ethnologue tree for Hindi [14] and have come up with this 2nd rank.

Please feel free to recompute again.

Previously some predjudiced and unknowlegeable people had put the number of only one dialect of Hindi (Khariboli) which was only 181 million.

I hope this ends the speculation about the rank of Hindi , as the Ethnologue data clearly shows Hindi as No2 in languages of the world.

thanks and regards,

Bdebbarma —The preceding signed but undated comment was added at 17:27, August 24, 2007 (UTC).

Somebody switched it back. The page Ethnologue list of most spoken languages shows that the Ethnologue figure for Hindi only includes the Khariboli dialect. Also, there are other discussions on this talk page which are about the Hindi figure. Apparently, Ethnologue says that Khariboli is the same thing as Hindi, and gives figures for two other dialects. Whether Khariboli should be called Hindi or not, these dialects are not mutually intelligible, and are listed separately. Someone the Person 21:33, 27 August 2007 (UTC)[reply]

indeed. It's a matter of definition, read the Hindi article, and see the disambiguation notice right on top. Hindustani is a dialect continuum, and there is no objective way of saying how many languages there are. Your "Ethnologue tree" lists all of the Indo-Aryan languages. --dab (𒁳) 17:40, 31 August 2007 (UTC)[reply]

English language figure too low

English is spoken by way more people than this article suggests. I don't speak English at home with my parents but consider myself a native English speaker as I use it all the time when I leave my house. I think that its incorrect to think that bilinguial children are not native English speakers just because they use another language at home with their parents. —Preceding unsigned comment added by 210.49.197.7 (talk) 01:28, 21 October 2007 (UTC)[reply]

The main problem is that Ethnologue is using ancient census data for their numbers, even in their 2005 version. For example they cite a 1984 estimate of U.S. English-speaking population, which is now about 30 million too low. --Delirium 18:45, 28 October 2007 (UTC)[reply]

Improving the page

Why doesn't someone add a an extra colomn that would contain the total number of people that live in countries where the particular language in the row is official regardless of whether they are native or not. For example, if you add up the number of people that live in all the countries where English is official you get a figure that gives you a ranking that actually represents the status of the language today. —Preceding unsigned comment added by 210.49.197.7 (talk) 01:34, 21 October 2007 (UTC)[reply]

Excellent idea!!!

Robledo —Preceding unsigned comment added by 201.73.79.20 (talk) 12:17, 22 November 2007 (UTC)[reply]

Spanish number too high

People that speak Mixtec, Quechua, Guarani, Aymara and so on as a native language should not be counted as Spanish speakers. —Preceding unsigned comment added by 210.49.197.7 (talk) 21:42, 31 October 2007 (UTC)[reply]

That's the problem with using Ethnolog as a primary source. They tend to lump all these dialects together under one language, in this case...Spanish. This whole article has become a joke. The English numbers are absurdly low, and the Spanish numbers are incredibly inflated. It's almost like some politically motivated *****-fest to see that Spanish ranks higher than English on here. As stupid as it sounds...And the motivation for that is beyond me. Abalu (talk) 09:44, 14 January 2008 (UTC)Abalu[reply]

Arabic too high (please comment)

Of course, that the arabic people represents the number of almost 400 million people, but it is their ethinicity, and in my opinion we can not count for the arabic language a number of speakers similar to the population of the arab world. I really know that is very difficult to define language, but we must consider here what is mutually intelligible. We count in arabic languange a speaker of Marroco and a speaker of Iemen, but if these two guys meet each other they can't understand each other. If we count then together, the we must count all latin together, at least portuguese-spanish together (galician is the same of portuguese), they share 90% of vocabulary and it´s total comprehensible for both speakers, more close then marrocan arabic and iemenite arabic. We must count together, norwegian, swedish and danish, as much as indonesian and malay!! I'm very interested in semitic languages, but, as matter of fact, arabic is not the same language everywhere, and if we consider MSA, then we must consider that portuguese-spanish-italian, are very close when written, the same is worthy, for norwegian, swedish and danish, as much as indonesian and malay. Post your opinion!

Thanks Robledo —Preceding unsigned comment added by 201.73.79.20 (talk) 12:15, 22 November 2007 (UTC)[reply]

What you suggest sounds rather like original research. The issues are messy, but are discussed in the article. Drmaik (talk) 13:01, 22 November 2007 (UTC)[reply]
First as said, your claim is an original research. I am wondering why we find hundreds of Pan-Arab media. Do we find Italian media targeting Portuguese, or a Brazilian channel speaking Portuguese targeting Brazil and Argentina? Italian or Spanish are languages, they have similarities as other languages do, but what you are stating here are dialects. According to you we should change German too (a German can not understand a Swiss German). By the way it's Yemen not Imen.Bestofmed (talk) 19:01, 7 December 2007 (UTC)[reply]

Look, I'm not trying to discuss what means to be an Arab, well in my opinion if all the Arab World is one and united it's great, you have your beautiful traditions and customs which me myself am very fond of. I'm just proposing separating the varieties of Arabic languages that are not mutually intelligible, as someone did with Chinese language in this very same article, which similar to Arabic is a Macrolanguage. I believe when it concerns to language the less important thing is how many persons speaks it, I've seen here people trying only to enlarge the number of speakers of it's own languange just to get higher position in the rank, as it would really matter at all!!! I believe we must spend some effort to make the article as better as we can! What I pointed out is... if we count all varieties and dialects or creoles and pidgins or whatever an Arab speaks it's not the same language if it's not muttual intelligible (my opinion), IF.... we consider then altogether (something like "total Arabic"), THEN.... we must consider altogether ("total Latin") because latin people have many things in common too, like cuisine, dance and many more related to it, plus the language that are very close. That's All!!! PS:. Not trying to discuss anything related to ethnicity or the language itself, just proposing to account the number in a different manner.

Robledo

Is Cantonese a dialect?

Is Cantonese supposed to be a dialect? If not, you should add more languages to it. In my hometown, at least half a million people speak Fuzhou dialect. There are also thousands of dialects in China... —Preceding unsigned comment added by 141.155.107.9 (talk) 23:51, 7 December 2007 (UTC)[reply]

No it is classed as a seperate language from Mandarin. If they were dialects, there would be no reason to learn the other. Just because the languages of China use the same script says nothing of their relationship. Enlil Ninlil (talk) 10:44, 31 December 2007 (UTC)[reply]

Izon language

The largest of Ijoid languages is missing. —Preceding unsigned comment added by 88.195.46.112 (talk) 10:38, 31 December 2007 (UTC)[reply]

Someone needs to create a cited article or improve the current ones. Enlil Ninlil (talk) 10:47, 31 December 2007 (UTC)[reply]

Altaic?

If the table row for Korean says "considered either language isolate or Altaic," then why doesn't the table row for Japanese say anything about how Japanese-Ryukyuan is sometimes considered a subfamily of Altaic? Someone the Person (talk) 21:18, 5 January 2008 (UTC)[reply]

Indonesian and Malay

The respective articles for the Indonesian language and the Malay language both describe the former as a variety of the latter. It seems contradictory and inaccurate to keep them separate in this list. I do not believe there is any political controversy about their being the same language, they are simply given different names for convenience when describing the separate official dialects.GSTQ (talk) 00:34, 18 January 2008 (UTC)[reply]

Estimeits of toutal spiik'rs

Added the high estimates of teh first 39 languages and got 7,03*109, so there's at least about 500 million people who speak two languages as well as the first one. 88.195.46.112 (talk) 06:37, 21 February 2008 (UTC)[reply]

Tajikstan????

Why does it say that 300 million people speak Tajik? It only lists it as the official language of Tajikstan, whose population is given as only 6 million. The edit was added today. --Whiteknox (talk) 21:52, 7 March 2008 (UTC)[reply]

English speakers data from 1984?

http://en.wikipedia.org/wiki/Ethnologue_list_of_most_spoken_languages

It says the English stats (the same 309m) are from 1984. THAT makes sense, but having that here is insane. It casts doubt over the whole article. Stats from that long ago are useless, it's only slightly more than the current population of the US alone!

  1. ^ "ethnic population"; SIL divides what is considered "Hindi" by other sources into numerous sub-languages. SIL's "Hindi" is Kharboli only.