Template talk:Infobox language/Archive 10

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 5 Archive 8 Archive 9 Archive 10 Archive 11

Ethnicity parameter: too vague to keep?

" 'When I use a word' ", Humpty Dumpty said in rather a scornful tone, 'it means just what I choose it to mean — neither more nor less.' "

Tl;dr: Do readers benefit in any way from keeping the ethnicity parameter in template {{Infobox language}}? Secondly: does the breadth and vagueness of the term ethnicity encourage inconsistency among articles, even strife among editors, making retaining it a net negative for the project?

Details:

Is the ethnicity parameter worth keeping? Given the extremely broad and varied definitions of ethnicity, it seems like it can mean whatever one chooses it to mean.

As a concrete example: I'm looking at this edit at Catalan language, removing the values Aragonese people, Balears, and Valencians from param {{{ethnicity}}} in the {{Infobox language}} at Catalan language. I'm trying to decide whether I should undo the edit, or leave it as is. This more or less comes down to the question, are the Aragonese, Balears, and Valencians "ethnic groups"? Who knows; it really depends on one's interpretation of ethnicity; but more to the point, is this a question we really want to see broached in an Infobox about Catalan?

It's one thing for an article like Ethnic group to be vague or broad (or comprehensive) about the meaning of a term like ethnicity, there's plenty of space in an article to include centuries of scholarship about the issue, cover the evolution of the term, the history and politics behind the changes in meaning of the term in time and place, the majority and minority opinions among ethnologists, anthropologists, and so on. But an infobox parameter simply doesn't have room for all that, and just seems like an anything-goes param, or a strife honeypot to me.

There are about 1,500 articles that use the Infobox template with the ethnicity param. In many of the infoboxes for major languages, the parameter value seems unhelpful: Italian is spoken by "Italians"; Indonesian language is spoken by Indonesians; Arabic is spoken by "Arabs and several peoples of the MENA region"; Russian is spoken by "Russians and others"; German is spoken by "German people" (but note the many contentious discussions at Talk:German people trying to figure out what the term Germans means, and the debates about whether a 'German ethnic group' even exists); Portuguese is spoken (among others) by "Brazilians" (Brazilians are an ethnic group? And the far-flung Luso-Asians living in London, Toronto, Turkey, Thailand, and Taiwan—they're an ethnic group?).

It just seems like this parameter is not helpful to our readers (what information does it convey?) and beyond that, is tailor-made to propagate disagreeents among editors about the meaning of "ethnic group" and to invite edit warring, and I'm hard-pressed to see where the advantage lies in keeping it, or what articles are likely to benefit from it. And I wouldn't know how to advise someone about that edit at Catalan language.

If we decide to keep the param, I feel that minimally, the doc page needs to provide significantly longer and more detailed guidance about how to use it in in a way that will encourage some reliability and consistency. We can't have people edit-warring on the parameter values, because they don't agree what the parameter itself means. My guess is, that even a discussion about how to word such additional doc page guidance would not be an easy one. I think I'd be inclined to remove this parameter from {{Infobox language}}, as a net positive all around, but I haven't heard the counter-arguments yet. @Lambiam, Austronesier, Kwamikagami, SMcCandlish, Maunus, Lectonar, Nardog, and Uanfala: any thoughts on this? Mathglot (talk) 00:16, 30 August 2022 (UTC)

(edit conflict) Listed at: WT:WikiProject Languages, WT:WikiProject Linguistics, WT:WikiProject Ethnic groups. Mathglot (talk) 00:28, 30 August 2022 (UTC)

This param was intended for the smaller, ethnic languages. Agreed that it's usually not helpful for inter-ethnic languages. Maybe narrow the scope to eponymous ethnicity? E.g. English people for English language? Or is that too narrow?
The purpose of this param is to encourage cross-linking the language and ethnicity articles. It had previously sometimes been difficult to locate our article on the people, and because of that we often had duplicates. Things are better now, to a large extent because of this param. If the entry lists anything more complex, then IMO it should be a summary of the speaker-population section of the language article, not just a list of whatever anyone can think of without bothering to justify. I think reverts can be justified per our template MOS, where templates are intended to be summaries of the articles.
This param has also been a handy place to include the ethnic population, which can give an idea of how endangered a language is, and whether that's because the people themselves are endangered or because they've shifted to something else.
IMO it's better to have a few fights over the proper scope of the parameter than for the reader to be unable to find our coverage of the speakers of a language. — kwami (talk) 00:26, 30 August 2022 (UTC)
@Kwamikagami:, wrt to your suggestion, what if we added some guidance to the doc, along the lines of:
For eponymous ethnicity, add the link to the ethnic group article (e.g., for "English language" add |ethnicity=English people, or for Limbu language use |ethnicity=Limbu people. For more complex cases, add a link to a section of the article which covers the speaker population of the language.
The downside I see to this, is that not all articles may have such a section. I wanted to use Portuguese as an example for the second type, saying something like, "... e.g., for Portuguese language use |ethnicity={{slink||Speaker population|nopage=yes}}". but there isn't a section at Portuguese language of that name or similar to it. (Closest is § Lusophone countries.) Mathglot (talk) 00:48, 30 August 2022 (UTC)
IMO, if there's no such section, then we stick to the eponymous ethnicity. If people complain that's not enough, then they can create a reliably sourced speaker-population section themselves. If they can't be bothered to do that, then IMO they have no right to complain. — kwami (talk) 00:51, 30 August 2022 (UTC)
This is not only an issue whether people might complain that the eponymous ethnicity is not enough. We should not present information that is misleading, which will be the case if we were to present (for example) the ethnicity of Lusophones as "Portuguese". This is worse than leaving the parameter value blank.  --Lambiam 07:44, 30 August 2022 (UTC)
There are only maybe a hundred articles where the ethn param causes this kind of difficulty, whereas there are thousands of stubs where it can provide crucial information and cross-linking. My interest is in small languages like Hadza and Aka-Bea, so that's my bias and I hardly care about Portuguese or English. And really, any language that important internationally should have a section on international/inter-ethnic use, and if it doesn't then it's underdeveloped as an article anyway, and perhaps trying to fix the info in this param will encourage editors to make it more complete.
I think eponymous ethnicity is probably too narrow. There are lots of cases where small languages are spoken by neighboring groups but only named after one of them. A little common sense may be needed. Or maybe we'll be able to ease up a bit after the articles on the world's major languages are cleaned up so they're no longer a problem. — kwami (talk) 00:57, 30 August 2022 (UTC)
Common sense is great, but probably available in greater quantity or quality among habituees here, than, for example, the anon user who mucked with the ethnicity at Catalan language, faced (maybe for the first time) with a template doc page having what must be 80 params (I didn't count) with precious little to say about param |ethnicity=. This use may be an ethnic nationalist with no idea of the difference between a dialect and a morpheme, and no wish to learn about it either, or perhaps just an enthusiastic newb who is on a tear, eating up language/ethnic/peoples articles. I just don't think common sense is going to work well in situations like that, and if for no other reason, we need some specific guidance so that more clueful editors who would wish to revert what seem like arbitrary edits, would have some kind of guidance upon which to rely, and/or link to if it came to that. (I had no idea about that user a priori, but as it happens, 186.101.15.32 (talk · contribs) started last week, and has 70 total edits, of which 56 are to people/language/dialect articles, and of those 56, 37 have been reverted so far.)
As a second issue, I get it about smaller languages, but Hazda gets about 100 page views a day, Aka-Bea gets around 10, and Portuguese around 2,400. The smaller languages are by far more numerous of course (it's a typical long tail phenomenon), but we can't ignore the major languages or at least the ones that gets lots of views. (Klingon language gets around 500; although in a shocking oversight, the infobox doesn't include the 'ethnicity' param; draw your own conclusions.) Mathglot (talk) 01:35, 30 August 2022 (UTC)
Ha! Yes, it would be amusing to see an argument over adding it to Klingon. (Or maybe just tiresome.)
Yes, a guideline would be useful when dealing with nationalists.
I get your point about page views of major articles, but this param really isn't intended for them. I wouldn't object to removing it from Portuguese etc. altogether. If it's not there, it's not going to attract those kinds of editors. And if it's not useful, why have it at all? As you say, there is no "Brazilian" ethnicity, yet Brazilians are a large majority of Portuguese speakers. Limiting the field to just "Portuguese people" seems wrong.
Maybe that could go in the guideline: an identification of ethnicity is optional and may be omitted when its inclusion would not be useful. Maybe something like this:

The 'ethnicity' field in language info boxes pairs up with the 'language' field of ethnicity info boxes. It is intended for quick identification and for cross-referencing those ethnicity articles, to facilitate access to and encourage consistency between articles. It should be a bare-bones summary, as appropriate for an info box: it is not intended to be an exhaustive list of the ethnicity of every speaker of the language. The prototype for the 'ethnicity' field is the eponymous people that speaks the language, for example Tatars under Tatar language. That's all that is required for the large majority of the world's languages. It's not necessary to list every sub-group of the Tatar nation: that's what the link is for. Some languages are spoken by several often neighboring groups with no overarching name; in such cases it may be appropriate to list them individually, such as Nama people and Damara people under Khoekhoe language. However, listing ethnicities is not always a useful enterprise, especially with large inter-ethnic and inter-national languages, and in such cases the field might say "see Speaker population" (with a link to that section of the article) or be omitted altogether. Examples of such languages are Portuguese, English and Malay, where the largest groups of native speakers (Brazilians, Americans and Indonesians) are not in general ethnic Portuguese, English and Malay, and are not ethnicities that can be supplied to the field.

— kwami (talk) 03:08, 30 August 2022 (UTC)
P.S. Where I said that there were relatively few articles where this param creates a problem, I was looking at it from the editor's POV: it's a lot easier for us to clean up coverage in those articles (both because of their small number and because we generally have good refs for them) than it would be to address the remainder of both the language articles and the ethnicity articles, which are often in far worse shape the language articles and were sometimes orphans before we started linking to them from the language boxes. — kwami (talk) 03:53, 30 August 2022 (UTC)
I lean toward Mathglot's deletion proposal. In particular, I don't buy the "difficult to locate our article on the people" argument, since that's what lead sections are for. Any language article we have that doesn't make it clear early on who actually used/uses the language is not written properly.  — SMcCandlish ¢ 😼  12:51, 30 August 2022 (UTC)
But that's the point. Most of them are not written properly. Most are stubs. And even when they mention who the speakers are, they may not link to the article, because the editor has no idea it exists. Someone may start a new, duplicate article because there's no link to the existing one. The name of the ethnicity article may be quite different that that of the language article, and may not link to it because its editor had no idea the language article existed. E.g. if it was based on some colonial source from the 1920s. Once the two are linked to each other, editors can start harmonizing them. The most productive place to link them to each other is in the info boxes, where editors know to look so they can scan hundreds of articles at a time. — kwami (talk) 20:11, 30 August 2022 (UTC)
(edit conflict) Like @kwami, I think there is still a useful baby in the bathwater. For many articles, the parameter is a nice way to navigate between a language and its speaker group. When labelled "ethnicity", it should strictly be used for groups that are commonly considered ethnic groups, nothing beyond that. Yes, it comes with the price of patrolling pages where the parameter is totally inapt (English, Portuguese etc.) or can lead to convoluted entries that violate MOS:INFOBOX (as in Levantine Arabic; the current version is borderline, but it has been worse in the past); but pages with thousands of page views have hundreds of watchers, so we should be able to handle it. Documentation in the guideline is important. Not everyone reads it before fiddling with the infobox, but we will have something to point to when rejecting the edit. –Austronesier (talk) 20:13, 30 August 2022 (UTC)
P.S. For what you "don't buy", many of the Australian articles were a real mess when this parameter was added. Even when they mentioned the people, they usually did not connect to the appropriate article, and often used an entirely different name. It was often not possible to ID which article was the correct one just from the info on WP; in fact, even when reviewing external sources it often wasn't clear what the correct ethnic article would be. I'd call that "difficult". — kwami (talk) 01:59, 31 August 2022 (UTC)
Would it be worth expanding the label to something like (but not necessarily) “originating ethnicity”? I get where Kwami is coming from, in that there is a usefulness to a link between a language and an obviously connected ethnicity or two. Such a more detailed label would make a link to English people less confusing (slash offensive). Even many smaller languages aren’t going to be spoken exclusively by the associated ethnic group. Worth remembering that for every confused editor there are usually ten (slightly less meddling) readers. — HTGS (talk) 00:28, 31 August 2022 (UTC)
I suggested "eponymous ethnicity" above. But I don't know if that would work for articles like Khoikhoi language. Perhaps instruction that if it's anything other than the eponymous ethnicity, it needs to be a summary of the info in the text, a redirect to the text, or simply omitted. — kwami (talk) 01:53, 31 August 2022 (UTC)
I don't that “originating ethnicity” is an essential piece of information that can be placed in the infobox. The essential information is "Who speaks the language?", and it's nice to have it in the infobox. Currently we only have the parameter "ethnicity", so we can only allow ethnic groups to appear here. A different and more flexible solution (which could accommodate Lusophones etc.) would be to change the current set up to two parameters {{speaker_group}} and {{speaker_group_label}}. {{speaker_group_label}} by default would display "Speakers", to which we could add sensible restricted parameters like "Ethnicity" (can't think of any other now). So in Portuguese language we would have "Speakers: Lusophones", and in Mandar language "Ethnicity: Mandar people". –Austronesier (talk) 20:03, 31 August 2022 (UTC)
That would work. Or we could have those as two separate labels. If combined, I think the default should be 'ethnicity', since that's the one that would generally be used. Also, 'speakers' is what we used to call 'native speakers', and they still sound similar, so ppl might think 'native speakers' is the number of L1, and 'speakers' is the number of L1 + L2.
Or maybe "speaker groups"? That would cover both ethnicity and e.g. Lusophones. — kwami (talk) 20:20, 31 August 2022 (UTC)
"Speaker groups" is even more vague than "ethnicity", but I agree that both are vague. It may be better to make it "ethnic origin". 1.126.108.0 (talk) 15:23, 23 September 2022 (UTC)
I'm with Kwami and Austronesier on this, following their argumentation that we need this parameter at least for smaller languages, where quite often the relationship between language and ethnicity is not one-to-one. This is worth documenting in the infobox. If we drop all infobox parameters that attract Wiki-troll abuse, we won't have much useful information left in the infoboxes, and it would somehow leave me with the stale taste in the mouth that the trolls have had the upper hand. LandLing 18:23, 31 August 2022 (UTC)

Another possibility would be a 'see also' entry. E.g. for Malay language, we might have:

Ethnicity: Malays (see also Malayophones)

For Portuguese, I don't know if we'd want the same, or if it would be better just to have:

Ethnicity: (See Lusophones)

though the current solution ("Lusophones" with a bulleted list of subgroups) seems like a good one IMO.

Could we perhaps draw up a guideline? Mathglot and I have given our thoughts above. Could Mathglot or someone else draft a guideline, to see how we like it? — kwami (talk) 19:18, 2 September 2022 (UTC)

Note to self: need to get back to this... Mathglot (talk) 19:27, 20 September 2022 (UTC)

A generic thought about this question, one that I've already mentioned in a couple of talk pages: In my opinion, there should be a general WP guideline on infoboxes, something like this: Any information in an infobox that can be misleading or easily misunderstood if you don't read the body of the page, should be left out of the infobox. --Jotamar (talk) 22:43, 21 September 2022 (UTC)

Italicize label?

Shouldn't the parameter label Glottolog be italicized, since the article itself italicizes it? Thrakkx (talk) 19:18, 27 September 2022 (UTC)

Differences in Ethnologue versions in Template:Infobox language/ref

In comparing the different versions of Template:Infobox language/ref, I found the following differences, I would like to know if there is any actual reason for them. I'm working on a LUA version of Template:Infobox language, and if these were all the same it would make the code simpler.

The differences:

  1. Up to version 17, there is active support for 3 pairs of lc/ld; for later versions, the template supports 6 and has a 7th lc for "Additional references under 'Language codes' in the information box".
  2. For versions 24 and 25, the template asks check if the underlying Ethnologue template exists (yes, it does and presumably won't ever be deleted). This doesn't happen with earlier versions.

Is there any reason for this? Animal lover |666| 14:02, 28 September 2022 (UTC)

I may be responsible for (1). It's because we only cite old eds when a figure is undated, and we wish to show its earliest attestation as a partial substitute for a date. At the time I made those edits, there was only need for 3 pairs in older data. There's no problem with having the same number as later; it just cut down on lines of code. — kwami (talk) 18:48, 28 September 2022 (UTC)

Add Language Endangerment Status

Categories List
All Statuses
Blank
Extinct
Critically Endangered
Severely Endangered
Definitely Endangered
Vulnerable
Not Endangered

Given how the IUCN's Red List status for (endangered & non-endangered) animals is included in {{Speciesbox}}, the status of (endangered & non-endangered) languages according to UNESCO's Atlas of the World's Languages in Danger (formerly the Red Book of Endangered Languages) should also be included in {{Infobox language}}.

The levels of endangered languages from least endangered to most endangered are:[1]

1) Safe
2) Vulnerable
3) Definitely Endangered
4) Severely Endangered
5) Critically Endangered
6) Extinct

The criteria for each level is as follows:

1) Safe
a) language is spoken by all generations
b) intergenerational transmission is uninterrupted (Note: Such languages are not included in the Atlas.)

2) Vulnerable
a) most children speak the language, but it may be restricted to certain domains (e.g. home)

3) Definitely endangered
a) children no longer learn the language as mother tongue in the home

4) Severely endangered
a) language is spoken by grandparents and older generations
b) while the parent generation may understand it, they do not speak it to children or among themselves

5) Critically endangered
a) the youngest speakers are grandparents and older, and they speak the language partially and infrequently

6) Extinct
a) there are no speakers left (Note: The Atlas presumes extinction if there have been no known speakers since the 1950s.)

While I realise that this method of categorisation has its critics and there are alternatives, because this is how the UN currently categorises language viability, I believe this system should be included in this Template.

Thank you.

1.127.107.224 (talk) 17:31, 26 May 2022 (UTC)

I think this would be a good edition. It would also be good if we made an article to explain and contextualize the list so we don’t have to explain it on every page. Maybe there is information on the Endangered language page that could be expanded to its own article. Bluealbion (talk) 00:55, 27 May 2022 (UTC)
Support
Parts of the information in the Defining and measuring endangerment section of that article could be combined with the Atlas of the World's Languages in Danger article to create an article akin to the Conservation status article for endangered animals.
The table at the bottom of this archived UNESCO Atlas page would serve as an easy reference (if the actual website doesn't solve its technical problem sometime soon). Ggdivhjkjl (talk) 07:54, 27 May 2022 (UTC)
I've created the images on the right to get this started.
Please feel free to use them on language pages under the language infobox until such time as a field is added to include them within the infobox itself.
Anyone who is able to improve upon these images &/or their coding is welcome to do so. That's why I released them into the Public Domain. Ggdivhjkjl (talk) 15:13, 27 May 2022 (UTC)
I support these additions, except for those not-included (NI) in the Atlas; most languages are by default safe and don't need these images in their infoboxes. Another reason some language wasn't included could be that they're not classified yet, so equating non-inclusion with "safety" is also false. Tl;dr if they're not in the book, don't put anything. -Vipz (talk) 15:25, 27 May 2022 (UTC)
Is it the case that "not-included" could not be supported by a citation, and interpreting an omission as "safe" would be original research? Kanguole 16:07, 27 May 2022 (UTC)
Obviously, it cannot be supported by a citation because one doesn't exist, so yes to both questions. An important question: is the Atlas being actively updated and maintained? I also suggest putting this on wait until the site is fixed (bad timing to suggest this at the exact time the site got broken). -Vipz (talk) 16:42, 27 May 2022 (UTC)
You're right that this was probably a bad time to make this suggestion, although it appears that it is simply that the website's security has lapsed. The UNESCO library staff are currently on holidays (as sending as email to them can confirm via their auto-response) so hopefully it will be fixed once they're back.
But regardless, a printed edition of the Atlas was published in 2010 anyway.[2] Therefore, even if the website is remains down, it is still possible to provide references from that physical book for languages which are included in it. Consequently, there is no need to wait.
As regards languages which are not included because they are not endangered, I have answered this question in reply to Kanguole already. Ggdivhjkjl (talk) 17:18, 27 May 2022 (UTC)
That is a misinterpretation of the designation. UNESCO itself states that languages are "not included in the Atlas" if their degree of endangerment is "safe". This level within the schema is intended for languages which have been intentionally not included within the Atlas because they are known to not be endangered. Given that such languages are not endangered, it would be very easy to provide a citation supporting the fact that they are not endangered.
For example, the English language article contains numerous citations from David Crystal's publications which verify that English is not an endangered language. By combining such references with the absence of a language from the Atlas, it can be confirmed that this is the reason for why a language was not included in the Atlas. This eliminates the problem whereby unclassified languages could mistakenly be included under this status.
There are many other sources (e.g. the SBS World Atlas of Languages) which can provide verification that the world's most commonly spoken languages are not endangered. Ggdivhjkjl (talk) 17:06, 27 May 2022 (UTC)
Thanks for your support but it is not correct to assume that "most languages are by default safe". 94% of the world's approximately 7000 living languages are classified as "Endangered", not to mention languages which are already extinct. As the Atlas was based upon the Red Book, which in turn was based upon the IUCN Red List, these images are designed to follow a system akin to that already used on Wikipedia for the conservation status of animals (see the image at left).
Conservation status according to the IUCN Red List of Threatened Species
The status of "Not Included" is akin to that of "Least Concern" while also being visually similar to that of "Not Threatened" by beginning with an 'N'. Failing to include this status within the schema would leave it incomplete and thus defective. It is essential to include it so that all languages can be designated a status, especially seeing how only about 6% of the world's languages fall into this category anyway.
Furthermore, UNESCO itself states that languages are "not included in the Atlas" if their degree of endangerment is "safe".[3] Thus the use of this abbreviation is in derived from the official source.
While I agree with you that some languages may not be included in the Atlas because they have not been classified, that is not the purpose of this ranking within the classification system used by the Atlas. Rather, it indicates that such languages were intentionally not included within the Atlas because they are known to not be endangered. Ggdivhjkjl (talk) 16:51, 27 May 2022 (UTC)
I probably worded that sentence poorly, I meant "most NI (non-included) languages are by default safe". It is obvious that English, Spanish, Portuguese and many other spoken languages are definitely safe, so adding this classification can be bloat to their infoboxes (personally, I'd include them as normal text lines, without these images). The other concern being equation of NI with "safety" (the image on the right captions "NI" as "Safe"), and perhaps that Atlas is not as exhaustive and up to date as IUCN, so there could be a lot of newly discovered languages, or just languages with non-updated classifications. -Vipz (talk) 17:48, 27 May 2022 (UTC)
Thank you for your feedback. To address your concern about "NI" being misunderstood, I have changed the caption on the image to read "Not Included in the Atlas because this language is Safe". This also more accurately reflects UNESCO's own statement as well about why those languages designated as "NI" were not included in the Atlas.
Before compiling the Atlas, UNESCO conducted extensive research surveys all over the world. It even includes data such as how there was 1 living speaker of the Pazeh language at the time. This caused Pazeh to be classified as Critically Endangered, not Extinct. If anything, the IUCN Red List is not as exhaustive at the Atlas.
Nonetheless, I acknowledge that it is possible that undiscovered languages may exist which have not been classified. However, I doubt undiscovered languages have Wikipedia articles about them. But as the United Nations General Assembly (Resolution A/RES/74/135) has proclaimed the period between 2022 and 2032 to be the International Decade of Indigenous Languages,[4] it is quite possible more may be discovered. If that happens, they will be allocated a status in due time.
In regards to updates, the Cornish language was previously classified as "Extinct" but, due to revitalisation efforts by elderly people, it is now classified as "Critically Endangered". Of course, updates such as these take time to be recognised by the compilers of official sources of information. That is nothing new. Even animals which are believed to be extinct take time to be officially designated as such because investigations to verify the case must first be conducted.
On the flip side, it is obvious that the Blue jay and many other animals are definitely of "Least Concern" when it comes to animal species. Nonetheless, they are designated a conservation status for the sake of completeness. Including easily recognisable visual diagrammes which conform to Wikipedia's existing standards serves to enhance the value of a language's infobox by maintaining consistency across topics. Ggdivhjkjl (talk) 18:30, 27 May 2022 (UTC)
Nota bene: The PNG images originally created by Ggdivhjkjl have been superseded by SVG versions; make sure these are used in any future template modifications:
Alhadis (talk) 04:34, 6 November 2022 (UTC)
How do you think revived languages should be displayed?
Although previously extinct, the white-throated rail is listed as being of "Least Concern" with no indication in its conservation status that it was once extinct even though the article states that it was. Following that precedent, as the Atlas classifies Cornish as "Critically endangered" and UNESCO has said that its previous classification of "Extinct" 'does not reflect the current situation for Cornish' and is 'no longer accurate',[5] it would be logical to only indicate it as 'CR' while including that it was formerly extinct in the article.
Another possible option may be to make the ring around the circle a different colour for revived languages. For reference, although it's not currently working, UNESCO's interactive online Atlas indicates revived languages with a square on its map instead of a pin. Ggdivhjkjl (talk) 17:50, 27 May 2022 (UTC)
Just an observation, the debate above is not easy to follow. It isn't immediately obvious what the issue is and what remedy is sort. Roger 8 Roger (talk) 20:24, 27 May 2022 (UTC)
Summary of Discussion Thus Far
Fair point. The only points really being debated were:
  • whether or not languages excluded from the Atlas (on the grounds that they are safe from endangerment) should receive a designation, &
  • if so, how to best describe the criteria for "Safe" languages so as to avoid mistakenly including undocumented languages in that category.
In light of the concerns raised, I rephrased the short description I originally proposed on the box. It now conforms more accurately to UNESCO's stated intent for languages it classes as "Safe" and thus excludes from its endangered list.
In response to other queries, I highlighted how the "Red Book" of Endangered Languages was modelled on the "Red List" of Threatened Species. Consequently, these images have been modelled on the "Conservation status" template so as to promote consistency across endangerment topics on Wikipedia. This includes how animal species of "Least Concern" are designated a category akin to the proposed "Safe" category for languages. Ggdivhjkjl (talk) 20:58, 27 May 2022 (UTC)
This is a long thread so I've only skimmed it while searching for the word "safe". In case it hasn't already been mentioned, the notion that we can mark a language safe because the Atlas doesn't include it is fallacious. A language may also be omitted because it wasn't evaluated, possibly because they'd never heard of it, possibly because they didn't even consider it a separate language from another that they had evaluated while Wikipedia does have a discrete article about it.
If a language isn't listed, then we don't know. When we don't know, the proper treatment is to put nothing in the article, just as, if we don't know a person's religion, we don't enter the value "We don't know" or make up a default like "Christian" for the religion field in {{Infobox person}}, we leave it out.
Of course, we might know for other reasons that a language is safe: Portuguese, Thai, Zulu. We can use more than this one source of information for establishing that. We could, for example, have a parameter for specifying the source, and another, a shortcut, say from-2010-atlas that, when set to "y", "true", or another truthy value, would cause the Atlas to be cited. Largoplazo (talk) 10:30, 6 June 2022 (UTC)

My tuppence: as a source, it seems rather out of date and not apparently maintained, the website is mostly broken, so I'm not sure if this amount of effort - and prominence - should be given to a source which may not be actively maintained. Secondly, it's a very ham-fisted categorisation. The issue of revived languages like Cornish aside, it is totally inadequate for cross-boundary languages. Case in point, Basque, there I just removed the UNESCO label. Basque is split across two nation states (France/Spain) and within those, into at least 4 different administrative regions all with vastly different levels of support ranging from zero (the exclave of Trebiño), some (the French side) and extremely strong (the Autononmous Community) and Navarre itself further subdivides into a Basque/Mixed/Non-Basque zone. Unsurprisingly, the health of the language varies vastly depending on where you are yet it all gets lumped under Vulnerable - Children who speak this language typically only do so in limited circumstances. That's just so ... misleading. And Basque isn't an exception, Catalan has a similar, if not crazier split between Catalonia/Valencia/Fragas/Balearic Islands on the Spanish side, Andorra (state language), Alghero (Italy, where it's probably endangered) and Northern Catalonia (French side) and once again the language health varies from endangered to least concern... It may seem like a nice idea but given the complex nature of language health, too crude in my view to be meaningful. Akerbeltz (talk) 21:12, 27 May 2022 (UTC)

In the November 2016 issue of Science Advances, a research article claimed there are serious inconsistencies in the way species are classified on the IUCN Red List. In spite of that, the 2001 version of that schema is used across Wikipedia on animal articles. This is because the Wikipedia community recognises that it is the purpose of an encyclopedia to relate information as it is documented by official sources even when it is contested.
While I agree with you that the brief definitions for each category that UNESCO has assigned are inaccurate, and that some information may only be accurate as of a certain date, this is not the place to question how UNESCO can improve its schema. When its sources are mistaken, it is up to UNESCO to self-correct based upon its own research. In the meantime, secondary sources of information such as Wikipedia must rely upon the extant documentation given in official sources. Nonetheless, would it help ease your concerns if the date of that source were included with the images?
Regardless of whether or not its online equivalent is currently functional, the Atlas is a printed book published by UNESCO. That book is already referenced as an offline source in numerous Wikipedia articles. Consequently, it may be concluded that the Wikipedia community already considers that book to be a reliable source. In light of that, and just how extensive it is, it can be adopted for this purpose. If there is some other system of endangered language categorisation available, there is no reason why it could not be used concurrently alongside those of the Atlas in a similar manner to how the NatureServe conservation status is used alongside that of the Red List.
The issue of where a language is spoken is irrelevant because the categorisation refers to the language as a whole. Even if all the Welsh speakers in Wales were to suddenly die, the fact that there are communities of native speakers of Welsh in Argentina means that Welsh would not be classified as an extinct language. Just as a given species of animal may be endangered in one location but thriving in another, so too a language may be endangered in one location but thriving in another. In spite of that, neither classification system is not "totally inadequate for cross-boundary" application. Animals can still be classified based upon the generalised overall circumstances of their species even if they are protected by one government while having a bounty on their heads across the border. The fact that animals are so classified on Wikipedia in spite of that adds weight to the argument that languages too should be so designated the sake of consistency. After all, language health is arguably a less complex matter than the health of animals given how there are far fewer languages than there are species of animals. Ggdivhjkjl (talk) 22:16, 27 May 2022 (UTC)

I have added to the sandbox some coding for this that you can check out at Template:Infobox language/testcases (the latest section), with a test for how it would look in the Manx article. Everything seems to be pulling correctly, but I am not a very good coder so if anyone wants to improve on it please feel free Bluealbion (talk) 03:02, 28 May 2022 (UTC)

Of course we can question how well this particular UNESCO scheme works. Just because their stuff may be generally reliable does not mean we have to shut down our brains and wave everything through that has a shiny blue UNESCO sticker on it. What they do is what they do and not our decision, you're right. But it's up to us to decide if this would be a valuable addition or a distraction. Given the lack of regular maintenance of the source, our time may be better spent by coming up with a similar, but more nuanced scheme that could take data from source such as the Atlas but without attempting to replicate it exactly.
Regarding your Welsh example, nice one. Overall, my guess is UNESCO probably (I can't get the website to work) classifies it as Vulnerable because in Wales, where the vast majority of its speakers live, its relatively robust. But if a meteorite wiped out Wales, it would immediately flip to Moribund because the language is mostly gone in Patagonia. So lumping both those areas together under one category for Welsh is just plain misleading.
Regarding the comparison with animals, if you look closely, it's applied in a much more nuanced way. The Elephant page doesn't use it, neither does the Elephas page and you have to get down to the level of the species before this classification starts appearing i.e. you have to drill down to African bush elephant or Asian elephant. In addition, biology and conservation has a much better view by and large over how many elephants there are left and how much their territory has shrunk and how much we're decimating the species. A lot of data on language health is based on scientifically very shaky data and often way out of date and used on the basis of this-missionary's-report-is-the-best-we-have. Akerbeltz (talk) 10:39, 28 May 2022 (UTC)
If we came up with a more nuanced scheme instead of the UNESCO model, what changes would you like to see? I will start looking to see if there is a system that is more area specific. Does anyone like the Endangered languages project [1]? It has a little bit more data to explain some of its decisions
I would just like to say though that a species can be doing fine as a whole, a species in particular area or a subspecies may not be. The Grey Wolf is not endangered because it well established across two continents, but the Italian wolf is considered vulnerable. Bluealbion (talk) 11:30, 28 May 2022 (UTC)
From what I can see, whatever the intention is, it will go nowhere. The UN data is just that and it is not our job to tamper with it or to think what it means or what it should mean, which is all original research. I believe a lot of the templates and articles for languages are less than ideal with the biggest problem being the definition of certain common terms such as 'speaker of a language'. I suggest a productive use of time would be to look for reliable secondary sources, bearing in mind that statistical data and graphs are usually primary sources. Roger 8 Roger (talk) 14:06, 28 May 2022 (UTC)
I largely agree with the points made by User:Akerbeltz and User:Roger 8 Roger. Language vitality is complex, and often not adequately captured by coarse one-dimensional scales. Also, it is dynamic, so we shouldn't rely on one static source as a pivotal reference. Given the fact that Ethnologue measures language vitality with a much finer (yet still one-dimensional) scale, the "EGIDS", and is occasionally updated, it is a better choice as a "second-best" default source than the UN data. But as @Roger 8 Roger has said, nothing beats reliable secondary sources here.
If we really want to make statements about vitality/endangerment, a "vitality" parameter in the infobox might be a better solution.
And finally, is it just me who finds the caption "Sumerian is an extinct language according to the classification system of the UNESCO Atlas of the World's Languages in Danger. No living people speak this language." a bit strange for a language that was already ancient even for the ancients? –Austronesier (talk) 19:28, 28 May 2022 (UTC)
LOL re Sumerian - and it's probably not even correct, I'm sure there are Sumerologists who are capable of having a conversation ... there just aren't any native speakers...
I'm not even sure a vitality parameter would be that helpful. I really think the current approach is the most nuanced, i.e. if we CAN say something about it being (severely) endangered, we can say that in the lede in one or two words and then give a nuanced explanation in the Status of the language or some such section. Akerbeltz (talk) 20:47, 28 May 2022 (UTC)
Regarding Sumerian, thanks for seeing the humour. That was just an example because nobody would argue against it being extinct. Even if some scholars have learnt to speak it, there is absolutely no way they could be sure their articulation is completely correct because of how long it has been dead for.
The purpose of an Infobox is for the reader to be able to quickly identify key information. What sort of state a language is in in the grand scheme of things forms part of such general information and as such it belongs in the Infobox. Although I do agree that for many languages providing a more nuanced explanation in the article would be worthwhile. Ggdivhjkjl (talk) 16:28, 30 May 2022 (UTC)
You are correct that EGIDS is a more precise scale but it has 13 levels, which is too many for little circles to tidily show in a quick visual reference. Nonetheless, articles about animals can use both the Red List status (of which the Atlas is the equivalent) and the NatureServe conservation status (of which EGIDS could be seen as being kind of an equivalent).
So, as a compromise, what if your suggested "Vitality" section were added to the Infobox between the existing "Official status" & "Language codes" sections? Place the EGIDS Level and Label in words at the top with the image of its position on the Atlas scale below. That way, the assessments of both authorities can be shown even if they present conflicting categorisations.
As for the captions, I've updated them now to simply state the status of the language. The example texts provided before were just ideas for how it could read and I thought people were less likely to argue about the status of Sumerian being extinct than they were about recently extinct languages. Ggdivhjkjl (talk) 16:24, 30 May 2022 (UTC)
Thank you for the compliment about my Welsh example. How UNESCO comes to its conclusions I don't know but I am not sure I would not say that Welsh is relatively robust in Wales. Just a few years ago, an incorrect government sign in Cardiff stood in full publicly view for 6 months because nobody could read it. It's really only in the north-west of Wales that Welsh is spoken by the majority of people and even there its use had been declining until the government started promoting the teaching of the language in schools. Nonetheless, if you meant that Welsh is in a relatively health state compared to most other endangered languages, then I could agree with you that is it comparatively robust.
You are correct that Wikipedia only applies the Red List categorisation system to animal species, not to higher levels of categorisation such as families or kingdoms. In the same way, the categorisation used by the Atlas only applies to languages, not to higher levels such as language families (e.g. Indo-European) or groupings (e.g. Celtic). So in this sense, they are equal and therefore it is logical to include this information in the Infobox.
I agree that a better schema could be created than what UNESCO uses in the Atlas. If you want to use your time in that way, you are welcome to submit your ideas to UNESCO or any other authority on this subject. But it is the role of Wikipedia to present the determinations of authorities regardless of whether you or I or anyone else believes the information they have published is accurate or not.
Over the weekend, I have identified over 2500 Wikipedia articles about various languages which already use the Atlas as their source regarding the endangerment status of that particular language. Considering that is around 1/3rd of the world's living languages, I think it is fair to conclude that Wikipedia already recognises the authority of the Atlas in spite of whatever flaws it may contain.
As mentioned before, Wikipedia also presents the IUCN Red List status of animals in spite of its flaws. This is because it is not for us here (on Wikipedia) to question whether or not the authority is accurate. Of course we can doubt whether it is correct and state publicly that we do not agree with the categorisation chosen by the authority, but that does not change what the authority has determined.
Many people criticise Wikipedia for its inaccuracies yet users are always instructed to cite reliable sources. One of the problems this creates is that, for many topics, it takes time for a reliable source to gather and publish information, by which time that information may be out of date. Nonetheless, that is the method used by this site.
Wikipedia is not the place to invent new schema. Rather, we are instructed by Wikipedia to cite existing ones. This is the existing system thousands of Wikipedia articles already refer to. Including it in the Infobox would just make finding this information a lot easier for everyone. Ggdivhjkjl (talk) 15:39, 30 May 2022 (UTC)
Thanks for creating those Bluealbion. Your coding skills are better than you give yourself credit for. Only, as it relates to the status of the language, I think it would be better to place the image at the bottom of the "Official status" section (above "Language codes") instead.
Including the line for revived languages where you did works well. Could you encode a line in the same place for Constructed languages too please? Ggdivhjkjl (talk) 15:14, 30 May 2022 (UTC)
Your Welsh sign story is so anecdotal it beggars belief and in no way indicative of language health. The only thing it tells you for certain is that there are government departments using Google Translate (not exclusive to Welsh) and that they're slow to react. And if you've ever tried reporting something like that, then you'd know it's an infuriatingly slow and often futile process. It once took me several hours to find the person responsible for a Gaelic typo on some new Scottish post office vans. I doubt most people have that kind of time. But that doesn't mean nobody spotted it.
At the risk of repeating myself, I didn't sugget we do fieldwork ourselves. I suggested a different way of presenting date from existing sources rather than copying one would be better. Sure, lots of pages use the Atlas as a source but in my experience, there is usually a fairly verbose approach that presents language health in a slightly more nuanced way rather than just a uni-dimensional streetlight system. For instance, Welsh language only references it in this sentence Welsh is the most vibrant of the Celtic languages in terms of active speakers, and is the only Celtic language not considered endangered by UNESCO., similar on the Sardinian language page we have ... the use of which being therefore quite limited, Sardinian has been classified by UNESCO as "definitely endangered". . Galician language, Aromanian language don't mention it at all but give fairly verbose explanations of the usually complex situation.
It would be a little more diplomatic, I'd like to add, if you waited with implementing the UNESCO scheme in lots of pages until some sort of consensus is reached as it's clearly not unversally popular. Otherwise it feels like you're trying to achieve implementation by fait accompli. Akerbeltz (talk) 16:23, 30 May 2022 (UTC)
My apologies. The sign was in Swansea, not Cardiff. While you're correct that my example was anecdotal, it doesn't change the fact that only 11% of the population of Cardiff speaks Welsh (compared to 69% of Gwynedd). If you're interested, that sign was the result of a worker not realising that a swift reply to an email was an auto-response. The sign was meant to read "No entry for heavy goods vehicles. Residential site only." But the Welsh translator's automated response caused the sign to instead read (in Welsh), "I am not in the office at the moment. Send any work to be translated." Nonetheless, we can agree that the government is rather incompetent when it comes to these things. I could give other examples yet we are not here to discuss the health of Welsh specifically. It is good to know you also care enough about accuracy to have taken the time to make the government fix a sign. Thank you for your service to the community.
Your comments tend to suggest that you are a detail oriented person but not a visual thinker. The purpose of an Infobox is for the reader to be able to quickly identify key information. For the vast majority of readers, it is currently a great inconvenience to have to wade through articles in order to locate where (if at all) the endangerment status of a language is mentioned. Including this simple visual reference, imperfect though it may be, solves that problem.
I had only added the boxes to about a dozen pages because it is the nature of Wikipedia that once the community sees the usefulness of an improvement they gradually begin implementing it on more pages. Can you really not see how increasing consistency across fields makes Wikipedia easier to use? That is why, when I saw that the equivalent image is typically included in the Infobox for animals, I came here to suggest that it be added to the Infobox for languages. It is obvious that that is where this information should be positioned. By doing this, I was attempting to prevent the situation arising where the boxes would come to be used on a large number of articles and had to be moved into the Infobox at a later date. Ggdivhjkjl (talk) 18:20, 30 May 2022 (UTC)
You can stop explaining both that story and the language situation of Wales, I am familiar with both, both from a personal and a professional perspective. Stuff like that happens, even in places where the level of biligualism is much higher than in South Wales. It makes for fun anecdotes but does not in itself measure the level of bilingualism in a meaningful way.
As for the psychoanalysis, you're totally off target. I'm hugely visual, so much so I was A+ in geometry, D- in algebra... I am fully in favour of visual aids but not when they're misleading. Akerbeltz (talk) 18:36, 30 May 2022 (UTC)
In light of the updates I have made in response to those commenting here, what is it about the diagramme that remains misleading? It now presents only the facts in a manner essentially the same as how the facts of the Red List are presented in the Conservation status section of the Speciesbox for animals. If you are consistent, then what reason can you present for why that section should be removed? For it is illogical to have one on Wikipedia while excluding the other. Ggdivhjkjl (talk) 20:11, 30 May 2022 (UTC)

If you use the Atlas to back up a sentence like "language X enjoys strong state support at all levels in country Y, including several TV channels in X and two daily newspapers, and child acquisition has gone up in recent years but the majority of speakers of X, who live in country Z, continue to suffer state persection at all levels and language use is low, leading to the language being classified as endangered by the UNESCO Atlas" then that uses the Atlas as a source for it being classified as endangered overall without being misleading at a cursory glance. If you just stick a big red sticker on X saying it's endangered, then a lot of people who are too lazy to read through half a parapgraph will just glance at the red sticker and go "oh it's endangered" without realising that that's not the whole story. Such overly simplistic categorisation of languages can foster overall negative perceptions of a language, both by speakers and non-speakers and is the reason (for example) why many indigenous communities prefer to talk about 'sleeping languages' (or similar terms) rather than 'extinct' to avoid propping up this negative perception. I'm not suggesting we stop using the term 'extinct' before you hone in on that, but my point is, once again, that a mono-dimensional visual representation of language health is a dangerous over-simplification. A casual reader may not be as nuanced and aware of the complexities of language health and indeed in my professional experience of working with Scottish Gaelic over the last 20 years is that misconceptions (not necessarily malicious) are the norm and I'd rather we didn't ADD to them. Akerbeltz (talk) 20:50, 30 May 2022 (UTC)

You did not answer the question. Why do you believe that the conservation status of animals should be removed from the Speciesbox? The Red List also simplifies the reality of the situation for animals. Indeed, every system of categorisation always simplifies matters. That is the purpose of categorisation - to help people obtain a simplified general idea. If you are opposed to presenting information in a consistent manner, say so.
Your latest argument is ironic seeing how adding this information would promote public awareness of the status of minority languages just as the conservation status does for animals. In spite of your spurious assertion that people's feelings might be hurt by the addition of a graphic to signify what an article already says anyway, you don't seem to care that your opposition to this addition is detrimental to the public promotion of endangered languages. Ggdivhjkjl (talk) 21:10, 30 May 2022 (UTC)
I never suggested removing the conservation status of animals. It's - in my view - an entirely different kettle of fish.
And don't make me laugh, a colour button system promoting public awareness? Show me ONE scientific article that suggests a RAG status system has a positive impact on language maintenance or perception. Akerbeltz (talk) 21:53, 30 May 2022 (UTC)
Comment I've noticed the additions of these to a number of pages and I'm concerned that these are done without citation and without any date information. Languages are dynamic and facts this assessment should clearly state when it was done with a verifiable citation, because not only does UNESCO probably not update these data daily but wikipedia being secondary doesn't necessarily update the data from UNESCO as soon as an update is made. At current it's quite hard, even for an interested reader, to figure out how outdated this information is. I think it's much better to include no information than to include this information in such a potentially misleading way. AquitaneHungerForce (talk) 20:16, 31 May 2022 (UTC)
Question While this is a generally good idea, I wonder about the wisdom of starting out at this late date when all the data in the source is at least 12 years old. Largoplazo (talk) 21:35, 31 May 2022 (UTC)
Yeah, Largoplazo, I've mentioned that too somehwere above.
We'll be chasing fly-by edits like this forever [2] Akerbeltz (talk) 11:01, 4 June 2022 (UTC)
These unsourced edits (using inappropriate small formatting and obsolete center tags and misusing the |map= parameter) continue. See Special:Contributions/Cecilkilledthedinosaurs for more fun. – Jonesey95 (talk) 23:27, 5 September 2022 (UTC)
On the Galician Wikipedia there is a section titled "Status" in the language infobox. Maybe if that section were added to the English template, the |map= parameter would no longer need to be used. But for now, it is the only available space where it can logically be included. 1.126.108.0 (talk) 15:16, 23 September 2022 (UTC)
Such a field is still under discussion here. You should not be attempting to pre-empt that discussion by abusing |map= in this way. Kanguole 10:48, 19 October 2022 (UTC)
I am late to the discussion, but I drafted this idea a couple of years ago and made some graphs. If you find them useful in developing this, feel free to use them. I also have individual SVG files for each EGIDS level, but I won’t post them all here, but you can find them on Commons. Good luck!
EGIDS levels
EGIDS level 0.
Languages by EGIDS level
--Lundgren8 (t · c) 07:55, 19 October 2022 (UTC)

As noted above, endangerment status is complicated. If we do add endangerment status to language infoboxes, we need to take status indicators from consistent external sources, rather than apply their criteria ourselves. We should also indicate the source (and thus the criteria, which vary), and the date of the judgement.

  • One possible source is the UNESCO Atlas of the World's Languages in Danger, published in 2010 with a new edition coming "soon". It provides a scale for endangered and recently extinct languages, with each point having a descriptive name and colour. It therefore is not a source for saying a language is "safe" or for extinctions of languages more than 50 years ago. Also, in many cases it uses a different granularity of "language" from Wikipedia (which generally follows ISO 639-3), and so its judgements may not be directly transferable.
  • Ethnologue uses the EGIDS scale, covering degrees of both development and endangerment, and thus provides judgements for all modern languages corresponding to an ISO 639-3 code. Each point is the scale is identified by a number and descriptive name.

There may be others. Devising a graphic for one particular scale (with locally created abbreviations and/or colours) would give it undue emphasis, and should be avoided. Kanguole 10:48, 19 October 2022 (UTC)

+1 for using Ethnologue's EGIDS scale. a455bcd9 (Antoine) (talk) 10:16, 19 December 2022 (UTC)
By the way, it looks like an IP added some EGIDS images to the "map2" field (see Tese language)... a455bcd9 (Antoine) (talk) 11:53, 19 December 2022 (UTC)

References

  1. ^ Moseley, Christopher (2010), Atlas of the World’s Languages in Danger (3rd ed.), Paris: UNESCO Publishing
  2. ^ https://www.amazon.com/Atlas-Worlds-Languages-Danger-UNESCO/dp/9231040960/
  3. ^ "Endangered Languages: UNESCO Atlas of the World's Languages in Danger". Archived from the original on 2022-04-29.
  4. ^ https://en.unesco.org/idil2022-2032
  5. ^ "Cornish language no longer extinct, says UN". BBC News Online. 7 December 2010. Retrieved 11 November 2012.