Talk:List of languages by number of native speakers

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Languages (Rated List-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of standardized, informative and easy-to-use resources about languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 List  This article has been rated as List-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.

This article has comments here.

A problem of precise definitions more than accurate numbers[edit]

Throughout this article there is confusion as to which columns or numbers refer to native speakers, which to secondary speakers (but those for whom a given language, in this case English, has become their primary language), and others. With English, to continue that example, there are in the UK, USA, etc. large numbers of immigrants, and in countries like India and Ghana, for whom English is the national language, there are large numbers for whom the "second language" has become their primary mode of communication. And then, of course, there are those for whom it is only an "international language" that they speak in order to do business, science, or communication with English-speaking foreigners, both native and non-native.

The problem here is less one of accurate numbers than of more precise definitions of the categories of entities being counted,explicitly laid out and consistently applied across languages, as well as possible. Fair estimates might be made of each, and that could be useful to someone seeking information on this page.

Spanish-Italian intelligibility.[edit]

I have already responded to a comment on Portuguese. The fact is that Italian and Spanish are mutually intelligible, not 100 percent of course, but to a large extent. Any Spanish speaker knows that. Why is that issue not addressed? In fact a native Speaker of Spanish can understand with a high degree of intelligibility formal, standard Italian. The fact changes a lot when we speak about regional dialects, informal language, slang, etc., but the standard, formal Italian language is understood to a large extend by Spanish speakers. I guess the same applies to Italian speakers with formal, standard Spanish: I have taken this example from the Italian article. Even if you do not speak Spanish, I think that you can see the similarities. If on top of that you take into account that the sound systems are almost the same, the fact is that communication between both languages is pretty acceptable. Under A is Italian, under B the Spanish translation:

A) La seguente tabella si basa su dati provenienti dalla pubblicazione.

B) La siguiente tabla se basa en datos provenientes de la publicación.

"Languages of the World".[1]

A) Molte delle stime si riferiscono ad anni precedenti il 2010.

b) Muchas estimaciones se refieren a años precedentes 2010.

A) Sono state considerate le prime cento lingue parlate al mondo.

B) Han sido considerados los primeros cien idiomas hablados en el mundo.

A) ordinate per numero di madrelingua.

B) ordenados por número de hablantes nativos.

I think you can see the extreme similarities between both languages. Often it is a question of style to use the same or other words, but the same words exist in both languages for the same or similar meanings in many cases.

I think you forgot
A) I sindaci dei villaggi si sono detto: mai più!
B) Los alcaldes de las aldeas se han dicho: ¡Nunca más!
Not that similar. But joking aside, you are of course right. The problem is that while most parts of this article is rather well sourced, almost no claim of mutual intelligibility is sourced. It's not about whether it's true or not, it's about whether it's sourced. If nobody objects, I'll remove the claims that aren't sourced.Jeppiz (talk) 18:21, 18 May 2015 (UTC)

Even in that case there are words that would be recongnized in Spanish. See:

A) I sindaci dei villaggi si sono detto: mai più!

b) Los sindicados de las villas se son dicho: jamas mas! — Preceding unsigned comment added by (talk) 13:33, 20 May 2015 (UTC)

Link to old version[edit]

with the disputed Ethnologue or what ever sources, before someone decided to delete the smaller languages out of the list:

Swahili (140 million+) is missing, should be No. 9 in the list[edit]


Semi-protected edit request on 13 January 2015[edit]

Hello, I noticed that the amount of Malay/ Indonesian speakers is far too low. There are about 270 native speakers, the Wikipedia page on the Malay language shows the same amount. Almost everybody in Indonesia alone is able to speak bahasa indonesia which is approximatly 250 million. Could you therefore change 77 (for malay/ indonesian speakers) to 250. This way this page and the Wikipedia page on the Malay language will not contradict each other.

great resource Stickee. The thing that's hard to explain is in Indonesia, people grow up speaking 2 langs: Indonesian, and the island dialect. At home they'll use the dialect, but for all official transactions (reading, writing, or speech), and in big cities, it's all in Indonesian. So really there are two bahasa sehari-hari. It's hard to explain if you come from America, where you only grow up with English. I'd like to point out you interpreted a dichotomy between daily and outside languages, while actually they're the same. Finally, there are actually 197m speakers, page 427/732. I see this is a losing battle, so please, for the fourth time asking: how do I add to the Other Estimates section? If I can't change the number of native speakers (250m), I really really would like to at least add a note about growing up with the dialect/national langs simultaneously. (talk) 04:24, 15 May 2015 (UTC)

How about this as the note: (start)Having only been officialized in 1945, Indonesian is one of the youngest languages in the world[1]. Because of this, Western censuses decided to classify hundreds of millions of its speakers as non-native speakers. However, most Indonesians, when asked if they consider themselves native speakers of Indonesian, answer with a clear "yes"; therefore this number may possibly be much higher at around <bold>268 million</bold> native speakers[2].(end) Is that ok? Please read especially the second ref carefully.Bookracoon (talk) 02:45, 18 May 2015 (UTC)

that's because, once again, you editors didn't answer my simple yes-no questions, nor does it appear do you read my questions carefully. I'm asking to add to Other Notes, yet Jeppiz is adamantly against that? Of course it matters if Indonesia has bilingual citizens, the current main estimate is hundreds of millions off. (talk) 18:10, 13 January 2015 (UTC)

Red question icon with gradient background.svg Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format.  B E C K Y S A Y L E 23:40, 13 January 2015 (UTC)
Red question icon with gradient background.svg Thanks for the clarification but still not done: The information here is about native speakers, not about how many are able to speak the language.Jeppiz (talk) 17:18, 20 February 2015 (UTC)
X mark.svg Not done as clearly explained at the top of the article we ONLY use the figures in the Swedish Nationalencyklopedin (2007, 2010), so the figures are comparable - no other sources, or opinions, will be used. - Arjayay (talk) 08:06, 14 May 2015 (UTC)
Still X mark.svg Not done This is a list about the number of native speakers and it is based on the figures in Nationalencyklopedin. It is not "random", it is the most neutral and accurate estimation available to us, as explained by kwami in the discussion at the bottom of this talk page.Jeppiz (talk) 17:50, 14 May 2015 (UTC)
While you're right that most Indonesians can speak Indonesian (155 million), only 43 million Indonesians are native speakers according to the 2010 census (table 30.9). Stickee (talk) 22:23, 14 May 2015 (UTC)
Red information icon with gradient background.svg Not done: please establish a consensus for this alteration before using the {{edit semi-protected}} template. — {{U|Technical 13}} (etc) 18:10, 15 May 2015 (UTC)
Red information icon with gradient background.svg Not done: And please stop this repetitive use. Your request has been answered multiple times, always with "no". What you're suggesting is original research WP:OR. If Indonesia has a different definition of "native speaker", then that changes nothing at all. In case the whole world adopt that definition, we will change the article. As for now, we won't.Jeppiz (talk) 12:29, 18 May 2015 (UTC)

Rank column[edit]

A rank (position) column would be useful. Tavilis (talk) 09:40, 12 March 2015 (UTC)

We used to have one. We deleted it because it misinformed: language #20 might have more speakers than #15. — kwami (talk) 23:43, 12 March 2015 (UTC)

Semi-protected edit request on 21 March 2015[edit]

| Ukrainian
Українська || 30 || 0.46%|| Ukraine || Partially mutually intelligible with Russian and Belarusian.|-

Please change "українська мова" to correct form "Українська" in Ukrainian language field.
Source: uk:Українська мова
Ukrainian - Українська, Ukrainian language - Українська мова.
Thank you. Deluxeman (talk) 00:05, 22 March 2015 (UTC)

Well, I am not an Ukrainian, but as a Slavic speaker it sounds extremally odd to use only "Українська" as the name of the language.

"Polski" instead of "język polski" also sounds odd to native speakers. — Preceding unsigned comment added by (talk) 20:35, 26 March 2015 (UTC)

Red information icon with gradient background.svg Not done: as your request is contradictory:-
First you asked to change "українська мова" to correct form "Українська"
Then you say Ukrainian - Українська, Ukrainian language - Українська мова.
Furthermore, you have not cited reliable sources to back up your request, Wikipedia is not a reliable source and without such a source no information should be added to, or changed in, any article. - Arjayay (talk) 11:15, 27 March 2015 (UTC)

Semi-protected edit request on 30 March 2015[edit]

Where is Hebrew on your list of languages?

Carol (talk) 03:35, 30 March 2015 (UTC)

It is some way off the bottom of the list, which only includes the top 100 languages.
Konkani at No 100 has 7.4 million speakers, but according to Ethnologue, Hebrew only has 5.3 million - Arjayay (talk) 08:21, 30 March 2015 (UTC)

Bring back languages with < 7 million speakers?[edit]

I suppose this is something of an edit request, but really more a general comment/question. I teach linguistics and have used this list in the past for a project where students must research a language with between 1-10 million speakers. I understand some of the reasoning for restructuring and trimming this list but I have been unable to find any comparable resource online that clearly lists these smaller (but not endangered) languages. What I have ended up doing is sending my students to an archived version of the page that still contains the more comprehensive list of languages. While I am not a very active or prolific Wikipedia editor, I wonder if there is any interest among those who more actively monitor/update this page to include more languages on this list. I would be happy to add them myself based on Ethnologue or another suitable source but do not want to make edits without consulting with others who are interested in this topic. — Preceding unsigned comment added by Bibliotecaria iluminada (talkcontribs) 17:16, 2 April 2015 (UTC)

It's a good question. My spontaneous reaction would be that there is too little reliable information. Ethnologue is, in my personal opinion, about as trustworthy as the average soothsayer. I exaggerate a bit, but the typical Ethnologue entry can date to anytime between 1975 and 2015, making any comparison moot (and that is without even entering into the question of severe factual errors). So is there a reliable source for languages below 7 millions?Jeppiz (talk) 17:24, 2 April 2015 (UTC)

Semi-protected edit request on 14 April 2015[edit]

| Tamil || தமிழ் || 210 || 661,500.0000000%||India (Tamil Nadu, Karnataka, Puducherry), Sri Lanka, Singapore, Malaysia, Mauritius||Schedule 8 official language of India. (talk) 07:24, 14 April 2015 (UTC)

X mark.svg Not done as explained at the top of the list, it ONLY uses the figures in Nationalencyklopedin so that all the figures are compatible. - Arjayay (talk) 08:01, 14 April 2015 (UTC)

Is it time to revisit the preference for Nationalencyklopedin, which seems to be full of incredible figures for various languages? -- WeijiBaikeBianji (talk, how I edit) 14:03, 27 April 2015 (UTC)
Sure, but what do you suggest instead? Ethnologue is filled with errors, Nationalencyklopedin may well have errors as well. If you have a good, reliable sources without errors, I'm all ears.Jeppiz (talk) 14:24, 27 April 2015 (UTC)

75 million for French speakers ![edit]

This is just ridiculous! Just France has 65 million people and speakers. If you had Belgium, Luxemburg, Switzerland, Quebec and all the African ex colonies like Ivory Coast, Cameroon, Gabon, the two Congos, Algeria etc. you get a figure which is well over 100 million. See this link for example: And also the statement that English has less speakers than Spanish is just as ridiculous. The same link confirms that. This whole article is devoid of scientific meaning.

 — Preceding unsigned comment added by (talk) 09:01, 27 April 2015 (UTC) 
Yes, this article is certainly devoid of scientific meaning. So is all of Wikipedia, try citing Wikipedia in a scientific article and you'll be rejected right away (and rightly so). Wikipedia is not about advancing science, which is WP:OR.Jeppiz (talk) 14:26, 27 April 2015 (UTC)
Not everyone in France has French as their first language (L1), several million immigrants, I've seen figures from 5-10 million, have other languages as their first language and French as their second (L2) language, and in francophone Africa very few people have French as their first language, using it as their second language for business etc. And the statistics regarding the number of speakers of different languages only includes L1, that is people who use it as their first language. Thomas.W talk 15:14, 27 April 2015 (UTC)
French Belgians, French Swiss and French Canadians may add total up to 12 mln to 63 mln of France. Luxembourg and African countries we may freely ignore as where French is a second language. So the number may not be too precise but very close - up to 80 mln, but obviously slightly less than German.--Lüboslóv Yęzýkin (talk) 13:16, 10 May 2015 (UTC)

All the other language speakers can also speak English[edit]

The list says 955+ million people speak Chinese, hence making it the most spoken language. But here's the thing, out of those 955 million people, I'm damn sure that at least a million or two can also speak English. It's the same with all the other languages. Every other language speaking population can also speak English, so technically speaking, shouldn't English be the most spoken language rather than any thing else? This is weird and insane. Gujarati's can speak English, people who speak Gujarati can also speak Hindi, people who speak Hindi can also speak another regional Indian language, who in turn can also speak English, so by this count, shouldn't the number of people who actually speak English increase by many fold?

I might be wrong here, please correct me if I am. Thank you. D437 (talk) 06:36, 2 May 2015 (UTC)

I can reassure you that people outside of the former British colonies (you seem to live in one) most probably do not speak English at all or hardly speak broken "tourist's" or "bazaar English".--Lüboslóv Yęzýkin (talk) 07:04, 2 May 2015 (UTC)
There is no reason to trade personal anecdotes here, when it should be possible to look up language survey figures in reliable sources. This is the article about counts of native speakers, so that is the focus of this article. -- WeijiBaikeBianji (talk, how I edit) 10:59, 2 May 2015 (UTC)

Why use Nationalencyklopedin as a source?[edit]

I see this is one of several articles about languages of the world that use Nationalencyklopedin as a source, but why? The statements in that source are in many cases plainly not comparable between one language and another, and a lot of other sources strenuously disagree with that source. (I speak both English and Chinese, and my sense of how those two languages compare in total number of speakers or in number of native speakers makes me doubt Nationalencyklopedin quite a lot.) When did editors here begin using Nationalencyklopedin as a source, and why? -- WeijiBaikeBianji (talk, how I edit) 20:02, 9 May 2015 (UTC)

WeijiBaikeBianji, you have been making this same comment several times, I think. The answer is still the same: we can very well use another source, but please provide which source. If you find a better source to use than Nationalencyclopedin, and other users agree it's better, then of course we will use it.Jeppiz (talk) 21:20, 9 May 2015 (UTC)
Thanks for your thoughtful reply. I wasn't sure if there was a lot of water over the dam about this issue in previous years, or if this was just a handy source for some editors (it is not handy for me) that has been heavily used on the basis of convenience. I'll start digging more deeply into sources that I have reason to think are more reliable, now that I have a better sense of the consensus here. -- WeijiBaikeBianji (talk, how I edit) 04:20, 10 May 2015 (UTC)
The main advantage of using Nationalencyklopedin as a source, compared to most other sources, is that all figures are comparable, i.e. all figures are based on the same criteria. So if you suggest a new source make sure that it includes all languages, or at least all major languages, and not just one or two. Thomas.W talk 08:04, 10 May 2015 (UTC)
Thanks for your thoughtful reply too. I gathered that that was the main rationale for using Nationalencyklopedin as a source. With respect to any editors who have used Nationalencyklopedin as a source in the past, I'd have to say that that source's treatment of Chinese, English, and Hindi shows that it is not using comparable criteria to rank the different languages, but rather very much comparing apples to oranges. (I speak Chinese and English fluently, and have studied a little bit of Hindi and of course know many speakers of that language.) But I will have to dig into other reliable sources to have any substantive edits to do here, and meanwhile will continue to observe the discussion here on the talk page and the edits to article text by other editors to guide my understanding of the editing issues here. -- WeijiBaikeBianji (talk, how I edit) 15:58, 10 May 2015 (UTC)
It was discussed here and here. The reason is simple: some person who considers himself as "responsible" for the "linguistic sector" of Wikipedia and thinks he knows the best than all others just decided to delete all other sources (particularly "Ethnologue"). And he ignored any objections as you can see.--Lüboslóv Yęzýkin (talk) 12:57, 10 May 2015 (UTC)
Thank you for drawing my attention to the links showing previous discussion of sources. I'll study the earlier discussion and ponder that as I look up sources. -- WeijiBaikeBianji (talk, how I edit) 15:58, 10 May 2015 (UTC)

Comment I most protest at Lüboslóv Yęzýkin snide personal attack at kwami. Kwami did not decide this unilaterally, it was a consensus decision that was perfectly in line with Wikipedia's policies. To get back to the topic, I doubt anyone think Nationalencyclopedin is a fantastic source or that it is 100% correct. However, previous discussions have resulted in a consensus that it is the least bad source. Should a better and more reliable source be found, we would seriously consider it. The problem is that there are very few WP:RS that list a large number of languages.Jeppiz (talk) 16:24, 10 May 2015 (UTC)

What Wikipedia policy argues for preferring a source like Nationalencyklopedin over, say, a source like Ethnologue? (I'm genuinely curious, as that will guide my selection of other sources.) It occurs to me, after searching the Worldcat library database for holdings of Nationalencyklopedin, that the Encyclopedia of language & linguistics may be much more accessible to a wider group of Wikipedians than Nationalencyklopedin is, although of course the facts and figures about particular languages are spread over dozens of different articles in that source. But that may still be a feature, rather than a bug, if the Encyclopedia of language & linguistics "shows the work" for how the figures are calculated for each language. -- WeijiBaikeBianji (talk, how I edit) 17:03, 10 May 2015 (UTC)
No Wikipedia policy recommends Nationalencyklopedin over Ethnologue, I was referring to the policy of discussions and consensuses. The two are not necessarily incompatible; for a long time we presented data from both Nationalencyklopedin and Ethnologue. Ethnologue was not removed because of Nationalencyklopedin, Ethnologue was removed because several users felt it is so bad as to better left out, due to its many (and I mean MANY) inaccuracies.Jeppiz (talk) 17:44, 10 May 2015 (UTC)
Nationalencyklopedin (NE) does not seem to be a major authority when it comes to language statistics. It seems to have been chosen mostly as a convenience. NE isn't a linguistic authority and it doesn't compile it's own statistics. As such, the sources that NE uses should be more relevant. Relying entirely on just one source in this way is clearly a POV problem.
Peter Isotalo 20:32, 10 May 2015 (UTC)
I'm afraid the comment by Peter Isotalo shows a lack of understanding of how articles like this work. Peter Isotalo is here because of WP:HOUNDING me after I opposed an edit of his on another article, so he probably didn't check out the archives, where this has been discussed at great length. For any list of this kind, we need to have a single source for all the data, as that is the only way to gain some accuracy. If we just used different sources for different languages, any user with the faintest nationalist WP:POV would find the data that makes their own language bigger, and there would be no way of controlling if the same methodology and definitions have been used. This is not unique to this article, a similar praxis is usually followed on most List of largest X articles. The question of whether Nationalencyklopedin is the best source or not is already ongoing, and I can only repeat what has been said several times by several users: it is the least bad we have to date, but any constructive suggestion of an alternative source would be welcome for discussion.Jeppiz (talk) 08:59, 11 May 2015 (UTC)
I'm not sure that the statement "For any list of this kind, we need to have a single source for all the data, as that is the only way to gain some accuracy" fully convinces me, especially given the problem you mention that there may or may not be some sort of bias that influences the source. We need to have reliable sources, yes, that "show their work" about where the numbers come from, but I'm not as readily convinced that all the numbers have to come from one source, even as a starting point. Meanwhile, I'll look for other sources. -- WeijiBaikeBianji (talk, how I edit) 12:34, 11 May 2015 (UTC)
For the record, we used to have it the way you propose, and that only resulted in eternal edit wars and POV-pushing. Any Turkisk user would come up with a source where Turkish was particularly big, French users with a source where French was really big, and the same for Koreans, Italians, Russians and you name it. Whenever there is a ranking, regardless of what is ranked, the methodology needs to be the same. If we would take one source that gives a certain number of English speakers and another source that gives a certain number of Spanish speakers and then claim that one is larger than the other, we would be engaging in original research as neither of those two sources would say that one is bigger than the other. If we do it for 100 languages, we have original researchx100.Jeppiz (talk) 12:43, 11 May 2015 (UTC)
So what exactly is NE's methodology and why is it superior to that of any other sources?
Peter Isotalo 12:59, 11 May 2015 (UTC)
NE satisfies WP:RS, and as has been said repeatedly in the discussion, nobody is claiming it's a fantastic source. On the contrary, it has also been repeated several times that everybody is happy to discuss an alternative source. So what about starting to WP:HEAR that now? What source do you support?Jeppiz (talk) 13:04, 11 May 2015 (UTC)
I have made no claims that it doesn't satisfy WP:RS, but I'm concerned that just one source is used. It's a perfectly valid concern in this context.
You've referred to the "methodology" of NE several times and stressed that it's very consistent. What exactly is that methodology, though? Where did NE's figures come from?
Peter Isotalo 13:11, 11 May 2015 (UTC)
I have absolutely no idea, and I could not care less. A national encyclopaedia is put together by experts, and is WP:RS. As usual, your point is purely disruptive and you contribute nothing to this article, just as you don't contribute to other articles either. Using sources does not require a thorough analysis of the exact methodology of those sources. There are countless articles referring to Encyclopaedias such as Encyclopaedia Britannica without the exact methodology. If you have a source to suggest, then suggest it. At Wikipedia, Competence is required and it's clear from your contributions across several articles that you don't have that competence. I quote Thomas.W, "That is, quite frankly, a load of cr*p, Peter. And you know it. I have pointed out your errors, your flawed reasoning/logic and your obvious lack of knowledge" [1]. The same comment applies here. Now, is there any serious user who has a serious comment to make?Jeppiz (talk) 13:21, 11 May 2015 (UTC)
Encyclopedias are useful, but they're not automatically superior to other sources. We evaluate sources all the time, and in this case, NE has obviously taken these statistics from someone else. NE is not a linguistic institute or a widely recognized authority on language demographics, so there's no reason to trust them blindly. So where do the figures come from? And what methodology and definitions have they actually used?
Peter Isotalo 13:48, 11 May 2015 (UTC)
Go and find out. In the meantime, does anyone has an alternative source to suggest?Jeppiz (talk) 14:05, 11 May 2015 (UTC)
Ethnologue for one, or the sources that Ethnologue refers to. Language has several general sources relating to the world's languages that contain figures, like The World's Major Languages (Comrie, ed. 2009) or Concise Encyclopedia of Languages of the World (Brown & Ogilvie, eds. 2008). Other major encyclopedias seems reasonable as well
The point here is that sticking to literally just one source is seldom, if ever, neutral or appropriate. Different sources usually give different figures and it's perfectly natural to explain this variation to readers. I see no indication that NE is considered an authority on language statistics and there is no indication that the methodology used is uniquely consistent and accurate. It should be complemented with other sources.
Peter Isotalo 14:47, 11 May 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Rather annoying to have to repeat oneself five times just because a user doesn't hear but nobody argues we should only one Nationalencyclopedin. On the contrary, if there is another source with an another ranking that satisfies WP:RS, we should use that source. So far, no such source has been presented. I have Comrie at hand, great book that I use almost every week, but it does not propose a ranking for the largest languages.Jeppiz (talk) 15:12, 11 May 2015 (UTC)

I see no reason to limit ourselves to sources that publish figures specifically in the form of rankings. The point of a list like this is to provide stats for native speakers of major languages overall. If different sources give different figures, we should report this, not just a single number; anything else is faux accuracy. This is not list of languages by number of native speakers according to unedited lists from single sources after all. There is nothing in WP:RS that bars us from compiling ranked lists from several sources.
Peter Isotalo 17:06, 11 May 2015 (UTC)
Making a claim that language A is larger than language B without a source supporting it is WP:OR, and I think most users understand that (Competence is required). Even if we would engage in original research like that (and we should not), the question still remains which source(s) to use and which ones to disqualify. We used to have more than one ranking in this article, I seem to recall, and there is no reason whatsoever we could not have more rankings again. They way to get there is to be constructive and produce a source and compile a ranking and suggest it for discussion. That was how the present ranking came about.Jeppiz (talk) 17:18, 11 May 2015 (UTC)
You've seriously misunderstood what this list is about, then. It's not so much a fixed ranking but a list of languages with the reported number of native speakers; otherwise it would be titled list of rankings of languages by number of native speakers. These figures vary depending on the source, so the ranking can also vary. And unless you can explain how and why NE has compiled these statistics, there's no reason to insist that it is in anyway superior to a compilation of several other sources.
As for lists being original research because they use several sources, I suggest checking these out:
You ought to take your attitude down a few notches, btw. Throwing WP:CIR in people's faces after making such an imaginative interpretation of WP:OR isn't terribly convincing.
Peter Isotalo 18:12, 11 May 2015 (UTC)
Well, I have little time for users who stalk other users to export conflicts across articles, concerning my "attitude". An admin already concluded you edit "with a vengeance" which is not the right reason. But back to the actual topic, I already said that you're more than welcome to make your own ranking and present it here, complete with sources. Just go ahead. If it's good, I will gladly support it. If it's not good, no doubt other users will also point that out. Until you have made the ranking, there's little to discuss.Jeppiz (talk) 18:24, 11 May 2015 (UTC)

Comment. I haven't read all of the above, as it started turning into personal attacks rather than an honest debate of the sources. IMO it would be best to have several lists, NE being one, so that the reader can compare them. We should not rank languages when the article is based on a single source, as that would imply that language #20 has more speakers than language #25, which it might not. But we could rank them if we had multiple sources, each in a separate section. Then readers would see that rankings are close to meaningless.

I've spoken to the editor of the NE article, and his methodology was to find the best source he could for the population of a language in each country it is spoken in, then adjust for the population growth of that country, so that the estimates would at least all be normalized to the same year. Thus all the estimates in NE (2007) are for 2006 (I think). This contrasts with Ethnologue, where one language may have an estimate for 2014, another for 1984, and another for 1954, and the dates given by Ethnologue may not be the dates of the estimates, but the dates of publication, which may be 30 or 40 years later. Peter, if you find other sources like NE, please provide them. They will make the article better. One thing we do not want is for each language population to be estimated by nationalists from that population, which is what happens when we allow different sources for each language. That's just as problematic as letting them classify their languages, so that Croatian is a primary branch of Slavic, unrelated to Serbian, Kurdish is a primary branch of Indo-European, unrelated to the Iranian languages, etc. — kwami (talk) 20:00, 11 May 2015 (UTC)

Is it Mikael Parkvall?
Peter Isotalo 21:46, 11 May 2015 (UTC)
Yes, that's right. We could probably do as good a job, but would run into accusations of bias, OR, etc. — kwami (talk) 23:49, 11 May 2015 (UTC)
I've used one of Parkvall's reports in minority languages of Sweden and Yiddish language#Sweden. He seems to be fairly thorough. It would be good if he's actually named as the author of the lists. It would also be preferable if his methods were detailed because choosing the "best source" is pretty vague.
Peter Isotalo 17:10, 12 May 2015 (UTC)
I agree, in case any of that information is available in NE (which I don't have at hand right now). To the best of my recollection, NE entries do not mention authors nor methodologies, though. What kwami says is very interesting, and I agree with Peter that Parkvall is thorough. If his methodologies are in personal communication with kwami, I'm not sure if it's correct to add them, but I could of course be wrong.Jeppiz (talk) 18:07, 12 May 2015 (UTC)
I added his name to the refs auto-generated by the language info box. — kwami (talk) 18:26, 12 May 2015 (UTC)
Jeppiz, we can never add details we don't have reliable sources for. Personal communication is not a reliable source.
Peter Isotalo 19:52, 12 May 2015 (UTC)
Peter, that's what I assumed, thanks for the confirmation.Jeppiz (talk) 20:02, 12 May 2015 (UTC)
Thanks, Kwami, for your statement, "if you find other sources like NE, please provide them," as I didn't want to act against consensus, but I do think that the article needs new and better sources. Your further statement that "We should not rank languages when the article is based on a single source, as that would imply that language #20 has more speakers than language #25, which it might not. But we could rank them if we had multiple sources, each in a separate section. Then readers would see that rankings are close to meaningless" is also a helpful statement, and it sums up how I've been feeling about the current ranking as I have been watching edits on this article over the course of this calendar year. Best wishes to you and to everyone here as I try to dig up other sources. I'm baffled that the Nationalencyklopedin isn't even held by the best academic library I have access to (not as an online resource nor as a print resource), even though it holds a huge collection of Swedish-language books. -- WeijiBaikeBianji (talk, how I edit) 18:36, 12 May 2015 (UTC)
You're right, WeijiBaikeBianji, that is a bit surprising. Foreign encyclopaedias are usually very high in priority at most academic libraries. As I said yesterday, using Nationalencycklopedin is apparently common on Wikipedia in quite many languages, which would suggest that not that many national encyclopaedias provide a ranking of languages. I would expect any language in which a domestic version is available to prefer that one over a Swedish one, that of course has the double disadvantage of not being widely available and being in a language few people are able to read. Best of luck in your search for sources! Jeppiz (talk) 19:45, 12 May 2015 (UTC)
  • Comment I took a quick look at the corresponding articles in some other Wikipedias to see if there would be a useful and different source among them, but found very little. Most other major languages use Nationalencyclopedin as well, probably influenced by this article, and often combined with Ethnologue, which I tend to dislike for the reasons kwami outline above. The French article used a third source, from La Francophonie, which was very close to Nationalencyclopedin with the sadly unexpected exception of ranking French much higher. That seems like bias, and I would be reluctant to include that source for that reason.Jeppiz (talk) 00:32, 12 May 2015 (UTC)

We don't use NE in the language info boxes because of the example of this list. The two changes occurred at the same time: We decided, after some discussion, to both add an NE section to this list and to shift from Ethn. to NE in the articles. Many of us had long been dissatisfied with Ethnologue, and when NE came along, we jumped at the chance to have an actual RS. The fact that it took so long to find a reliable source suggests there are not many of them. Ethnologue has long used sources like almanacs, which have no indication of where the data comes from, so they're evidently not finding much either, though recent editions seem to be taking the problem more seriously. Where we do not follow NE is with the Hindi languages, as NE is based on the Indian census, which relies on speaker identification as to whether their language is "Hindi" or not. (The publishers of the census results recognize the problem, and group together the languages whose speakers self-report as Hindi.) — kwami (talk) 02:04, 21 May 2015 (UTC)


Hungarian is also natively spoken in neighbouring countries: Romania, Slovakia, Serbia, Austria, Slovenia, Ukraine. It also has official status in local governments. Shouldn't that be on the list?

Not if the column is described as "Mainly spoken in" and is focused on countries (or large provinces like Zhejiang and Andhra Pradesh).
Peter Isotalo 15:01, 18 May 2015 (UTC)


Berber has between 16 and 30 million native speakers (see here). Should it be added? MassachusettsWikipedian (talk) 02:50, 24 May 2015 (UTC)

Zhuang is not a language[edit]

It says in the list that Zhuang is a group of many separate languages. In the article "Zhuang", it becomes clear that none of the many Zhuang languages has as many as 3 million speakers. Zhuang has no business being on the list, I will remove it unless someone has a good argument. Arrecife (talk) 04:24, 31 May 2015 (UTC)

Semi-protected edit request on 1 June 2015[edit]

It was seen that recently the languages Hindi and Urdu were merged together in this article , but the data for Urdu was not included in the merged language. According to the data that was present previously, Hindi has 310 million native speakers constituting 4.70% of the total world population and Urdu has 66 million speakers constituting 0.99% of the total population. If these two were to be merged, then the final data would be 376 million speakers and 5.69% of the total population. But the data was not changed. And since the page is protected, I can't change it. Moreover, the 2007 edition of Nationalencyklopedin had mentioned Hindi and Urdu separately, so these two should not have been merged. It is not accordance with the data present in Nationalencyklopedin, and ought to be amended. So, either the two languages should be listed separately with their individual figures, or the merged data should be updated as it is incorrect.

Xiaoxin2015 (talk) 17:14, 1 June 2015 (UTC)

  1. ^
  2. ^