Jump to content

Languages used on the Internet: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Updating the page from a new study insights. Only the introduction has been changed in order to maintain previous text (although it would deserve a major change).
Line 1: Line 1:
{{Use dmy dates|date=August 2014}}
{{Use dmy dates|date=August 2014}}


The most used language on the [[Internet]] is unknown,<ref name="towards">[http://net-lang.net/lang_en ''NET.LANG: Towards a multilingual cyberspace''] MAAYA (coord.), Laurent Vannini and Hervé le Crosnier (eds.), Maaya Network, C&F éditions, March 2012, 446 pp., {{ISBN|978-2-915825-08-4}}</ref> although about half of the homepages of the most visited sites on the Internet are in [[English language|English]], with varying amounts of information available in many other [[language]]s.<ref name="UofCLBWApril2013"/><ref name="12years">{{cite journal | url=http://www.unesco.org/new/en/communication-and-information/resources/publications-and-communication-materials/publications/full-list/twelve-years-of-measuring-linguistic-diversity-in-the-internet-balance-and-perspectives/ | title=Twelve years of measuring linguistic diversity in the Internet: balance and perspectives | author=Pimienta, Daniel, Prado, Daniel and Blanco, Álvaro | journal=United Nations Educational, Scientific and Cultural Organization | year=2009}}</ref>
The most used language on the [[Internet]] is unknown,<ref name="towards">[http://net-lang.net/lang_en ''NET.LANG: Towards a multilingual cyberspace''] Laurent VAnnini and
Hervé le crosnier (eds.), Maaya Network, C&F éditions, March 2012, 446 pp., {{ISBN|978-2-915825-08-4}}</ref> although about half of the homepages of the most visited sites on the Internet are in [[English language|English]], with varying amounts of information available in many other [[language]]s.<ref name="UofCLBWApril2013"/><ref name="12years">{{cite journal | url=http://www.unesco.org/new/en/communication-and-information/resources/publications-and-communication-materials/publications/full-list/twelve-years-of-measuring-linguistic-diversity-in-the-internet-balance-and-perspectives/ | title=Twelve years of measuring linguistic diversity in the Internet: balance and perspectives | author=Pimienta, Daniel, Prado, Daniel and Blanco, Álvaro | journal=United Nations Educational, Scientific and Cultural Organization | year=2009}}</ref>


The two main indicators of languages on the Internet are :
Other top languages, according to W3Techs, are [[Russian language|Russian]], [[German language|German]], [[Japanese language|Japanese]], [[Spanish language|Spanish]], [[French language|French]], [[Chinese language|Chinese]], and [[Portuguese language|Portuguese]].<ref name="UofCLBWApril2013" />

- The language of users of the Internet.

- The language of contents in the Internet.

The data about languages can be specifies either as related only to [[first language|mother tongue]] (L1) or as related to first language plus [[second language|second language]] spoken (L1+L2). Data on second languages are far from being consensual and the differences are one of the main cause of discrepancy between data on languages used on the Internet.

- In term of users, there is a consensus to state that the top 3 languages are respectively [[English language|English]], [[Chinese language|Chinese]] and [[Spanish language|Spanish]]; beyond the consensus is lost.

- In terms of contents, there is no consensus on the order of languages beyond the fact that [[English language|English]] is still the first language in terms of contents, although the value of the corresponding percentage varies greatly depending on the source.

'''As for the language of users''', the main and most reliable source for persons connected to the Internet by country is the ITU <ref name="ITU">[http://www.itu.int/en/ITU-D/Statistics/Documents/statistics/2016/Individuals_Internet_2000-2015.xls''Percentage of Individuals using the Internet''] ITU, 2016</ref>. From this [[United Nations|United Nation's]] authoritative source, two sources derive the persons connected by language, with some differences:

- According to InternetWorldStats, the 10 top languages in terms of connected users are respectively: [[English language|English]], [[Chinese language|Chinese]], [[Spanish language|Spanish]], [[Arabic language|Arabic]], [[Portuguese language|Portuguese]], [[Malay language|Malay]], [[Japanese language|Japanese]], [[Russian language|Russian]], [[French language|French]] and [[German language|German]]. Besides, the source offer statistics per country and region on various aspects.

- According to the FUNREDES/MAAYA Observatory's last study <ref name="Alternative">[http://funredes.org/lc2017/ ''An alternative approach to produce indicators of languages in the Internet''] Pimienta, Daniel, June 2017 </ref>, the 10 top languages are, respectively: [[English language|English]], [[Chinese language|Chinese]], [[Spanish language|Spanish]], [[French language|French]], [[German language|German]], [[Portuguese language|Portuguese]], [[Japanese language|Japanese]], [[Russian language|Russian]], [[Hindi language|Hindi]] and [[Arabic language|Arabic]]. Besides, the study offer a set of more detailed indicators for the 140 languages with more than 5 millions speakers.

The differences between the figures seems to be related to the data about second languages and to the computing of the L1+L2 populations per language.

'''As for the language of contents''', two sources exist and they present important differences.

- According to W3Techs, the top languages for content are, respectively : [[English language|English]], [[Russian language|Russian]], [[Japanese language|Japanese]], [[German language|German]], [[Spanish language|Spanish]], [[French language|French]], [[Portuguese language|Portuguese]], [[Italian language|Italian]] and [[Chinese language|Chinese]]. <ref name="UofCLBWApril2013" />

- According to FUNREDES/MAAYA Observatory, the top languages for content are: [[English language|English]], [[Chinese language|Chinese]], [[Spanish language|Spanish]], [[French language|French]], [[Russian language|Russian]], [[German language|German]], [[Portuguese language|Portuguese]], [[Japanese language|Japanese]], [[Italian language|Italian]], [[Hindi language|Hindi]], [[Arabic language|Arabic]] and [[Malay language|Malay]]. <ref name="Alternative" />

FUNREDES/MAAYA observatory argues that using Alexa ranking for the 10 millions sample of websites on which W3Tech applies a language recognition algorithm provokes a huge under-estimation of many Asiatic languages, primarily Chinese and languages from India. In the referenced paper and associated presentations arguments are presented and warnings are made on the importance on [[Bias (statistics)|biases]] in the measure of languages on the Internet.


==Languages used==
==Languages used==
Line 201: Line 226:
*[http://unesdoc.unesco.org/images/0018/001870/187016e.pdf Twelve years of measuring linguistic diversity in the Internet], UNESCO (2009).
*[http://unesdoc.unesco.org/images/0018/001870/187016e.pdf Twelve years of measuring linguistic diversity in the Internet], UNESCO (2009).
*[https://web.archive.org/web/20130820111814/http://gii2.nagaokaut.ac.jp/gii/blog/lopdiary.php Language Observatory], Japan Science and Technology Agency (2012).
*[https://web.archive.org/web/20130820111814/http://gii2.nagaokaut.ac.jp/gii/blog/lopdiary.php Language Observatory], Japan Science and Technology Agency (2012).
*[http://funredes.org/LC Observatory of linguistic and cultural diversity on the Internet], Networks and Development Foundation, FUNREDES
*[http://funredes.org/LC Observatory of linguistic and cultural diversity on the Internet], FUNREDES/MAAYA


{{Africa topic|Internet in}}
{{Africa topic|Internet in}}

Revision as of 15:50, 31 August 2017

The most used language on the Internet is unknown,[1] although about half of the homepages of the most visited sites on the Internet are in English, with varying amounts of information available in many other languages.[2][3]

The two main indicators of languages on the Internet are :

     - The language of users of the Internet.
     - The language of contents in the Internet.

The data about languages can be specifies either as related only to mother tongue (L1) or as related to first language plus second language spoken (L1+L2). Data on second languages are far from being consensual and the differences are one of the main cause of discrepancy between data on languages used on the Internet.

     - In term of users, there is a consensus to state that the top 3 languages are respectively English, Chinese and Spanish; beyond the consensus is lost. 
     - In terms of contents, there is no consensus on the order of languages beyond the fact that English is still the first language in terms of contents, although the value of the corresponding percentage varies greatly depending on the source.

As for the language of users, the main and most reliable source for persons connected to the Internet by country is the ITU [4]. From this United Nation's authoritative source, two sources derive the persons connected by language, with some differences:

- According to InternetWorldStats, the 10 top languages in terms of connected users are respectively: English, Chinese, Spanish, Arabic, Portuguese, Malay, Japanese, Russian, French and German. Besides, the source offer statistics per country and region on various aspects.

- According to the FUNREDES/MAAYA Observatory's last study [5], the 10 top languages are, respectively: English, Chinese, Spanish, French, German, Portuguese, Japanese, Russian, Hindi and Arabic. Besides, the study offer a set of more detailed indicators for the 140 languages with more than 5 millions speakers.

The differences between the figures seems to be related to the data about second languages and to the computing of the L1+L2 populations per language.

As for the language of contents, two sources exist and they present important differences.

- According to W3Techs, the top languages for content are, respectively : English, Russian, Japanese, German, Spanish, French, Portuguese, Italian and Chinese. [2]

- According to FUNREDES/MAAYA Observatory, the top languages for content are: English, Chinese, Spanish, French, Russian, German, Portuguese, Japanese, Italian, Hindi, Arabic and Malay. [5]

FUNREDES/MAAYA observatory argues that using Alexa ranking for the 10 millions sample of websites on which W3Tech applies a language recognition algorithm provokes a huge under-estimation of many Asiatic languages, primarily Chinese and languages from India. In the referenced paper and associated presentations arguments are presented and warnings are made on the importance on biases in the measure of languages on the Internet.

Languages used

There is debate over the most-used languages on the Internet. A 2009 UNESCO report monitored the languages of websites for 12 years from 1996 to 2008 found a steady year-on-year decline in the percentage of webpages in English from 75 percent in 1998 to 45 percent in 2005.[3] The authors found that English remained at 45 percent of content for 2005 to the end of the study, but believe this was due to the bias of search engines indexing more English-language content rather than a true stabilization of the percentage of content in English online.[3]

Ongoing monitoring by W3Techs showed that in March 2015, just over 55 percent of the most visited websites had English-language homepages.[2] Other top languages that are used at least in 2 percent of the one million most visited websites according to W3Techs are Russian, German, Japanese, Spanish, French, Chinese, and Portuguese.[2]

The figures from the W3Techs study are based on the one million most visited websites (i.e., approximately 0.27 percent of all websites according to December 2011 figures) as ranked by Alexa.com, and language is identified using only the home page of the sites in most cases (i.e., all of Wikipedia is based on the language detection of http://www.wikipedia.org).[6] As a consequence, the figures show a significantly higher percentage for many languages (especially for English) as compared to the figures for all websites.[citation needed] The figures for all websites are unknown, but some sources estimate below 50 percent for English; see for instance, Towards a multilingual cyberspace[1] and the 2009 UNESCO report[3] referenced earlier.

The number of non-English pages is rapidly expanding. The use of English online increased by around 281 percent from 2001 to 2011, a lower rate of growth than that of Spanish (743 percent), Chinese (1,277 percent), Russian (1,826 percent) or Arabic (2,501 percent) over the same period.[7]

Content languages for websites

Estimated percentages of the top 10 million websites using various content languages as of 4 March 2017:[2]

Content languages for websites as of 12 March 2014[2]
Rank Language Percentage
1 English 51.6%
2 Russian 6.6%
3 Japanese 5.6%
4 German 5.6%
5 Spanish 5.1%
6 French 4.1%
7 Portuguese 2.6%
8 Italian 2.3%
9 Chinese 2.0%
10 Polish 1.7%
11 Turkish 1.6%
12 Persian 1.5%
13 Dutch, Flemish 1.4%
14 Korean 0.9%
15 Czech 0.9%
16 Arabic 0.8%
17 Vietnamese 0.6%
18 Indonesian 0.5%
19 Greek 0.5%
20 Swedish 0.5%
21 Romanian 0.5%
22 Hungarian 0.4%
23 Danish 0.3%
24 Thai 0.3%
25 Slovak 0.3%
26 Finnish 0.3%
27 Bulgarian 0.2%
28 Hebrew 0.2%
29 Lithuanian 0.1%
30 Norwegian 0.1%
31 Ukrainian 0.1%
32 Croatian 0.1%
33 Norwegian Bokmål 0.1%
34 Serbian 0.1%
35 Catalan, Valencian 0.1%
36 Slovenian 0.1%
37 Latvian 0.1%
38 Estonian 0.1%

All other languages are used in less than 0.1% of websites. Even including all languages, percentages may not sum to 100% because some websites contain multiple content languages.

Internet users by language

Estimates of the number of Internet users by language as of June 30, 2016:[8]

Rank Language Internet
users
        
1 English 948,608,782 26.3%
2 Chinese 751,985,224 20.8%
3 Spanish 277,125,947   7.7%
4 Arabic 168,426,690   4.7%
5 Portuguese 154,525,606   4.3%
6 Japanese 115,111,595   3.2%
7 Malay 109,400,982   3.0%
8 Russian 103,147,691   2.9%
9 French 102,171,481   2.8%
10 German 83,825,134   2.3%
11–36 Others 797,046,681  22.1%
Total 3.61 Billion. 100%

See also

References

  1. ^ a b NET.LANG: Towards a multilingual cyberspace MAAYA (coord.), Laurent Vannini and Hervé le Crosnier (eds.), Maaya Network, C&F éditions, March 2012, 446 pp., ISBN 978-2-915825-08-4
  2. ^ a b c d e f "Usage of content languages for websites". W3Techs.com. Retrieved 24 March 2015.
  3. ^ a b c d Pimienta, Daniel, Prado, Daniel and Blanco, Álvaro (2009). "Twelve years of measuring linguistic diversity in the Internet: balance and perspectives". United Nations Educational, Scientific and Cultural Organization.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. ^ Percentage of Individuals using the Internet ITU, 2016
  5. ^ a b An alternative approach to produce indicators of languages in the Internet Pimienta, Daniel, June 2017
  6. ^ "Technologies Overview". W3Techs. Retrieved 24 March 2015.
  7. ^ Rotaru, Alexandru. "The foreign language Internet is good for business". Archived from the original on 7 April 2013. Retrieved 21 June 2011.
  8. ^ "Number of Internet Users by Language", Internet World Stats, Miniwatts Marketing Group, 30 June 2016, accessed 15 November 2016

External links

Template:Internet in Oceania