Wikipedia talk:WikiProject Languages

From Wikipedia, the free encyclopedia
  (Redirected from Wikipedia talk:LANG)
Jump to: navigation, search

Conservation Status[edit]

Hello fellow linguists. I wanted to propose a change to the language infobox that adds a section concerning language conservation and vitality. I was hoping we could throw the idea around of making a language conservation template or diagram of some kind that could be used on each language's page in the infobox to help the reader visualize the vitality of the language, similar to the endangered species one. Each language in the world has a status and vitality, as do species.

Here is my current idea about language conservation statuses we could have in the infobox (I am always open to discussion and other ideas):

  • Global/Least Concern(for international or national languages used for wider communication like English and French)
  • Developing/Vigorous/near Thriving (for littler languages that still yield considerable vitality; children are still learning them but they are not yet widespread and may have limited official/regional status)
  • Vulnerable/threatened (the language is spoken by all generations but is a minority language, and its use maybe restricted to certain domains; perhaps the language community needs some kind of conservation to maintain their language)
  • Shifting/moribund (the language is no longer spoken/acquired by children as a first language, but is in use among the parent generation and older who could theoretically turn around and start speaking the language to their children)
  • nearly extinct (only a few elders remain)
  • dormant/dead (no known living first language speakers, but perhaps revitalization attempts)
  • extinct (completely gone)

Here is the endangered species diagram, which I was hoping the language status diagram might look like: Status iucn3.1 CR.svg

However, the problem with language conservation status is there is not a concrete source as there is for defining the conservation status of species (the Red List of Endangered Species). UNESCO can be reliable for language conservation status, but it appears to struggle with original research and has trouble differentiated between a language and a dialect. Ethnologue by SIL International is generally reliable and does provide a status for each language, but it is a missionary resource and is thus biased; some of the data is manipulated and linguists do not agree on its accuracy. I was thinking that the Catalogue of Endangered Languages by the University of Michigan and the University of Hawaii looks reliable, and well defines the various degrees of language endangerment/vitality, but I'd like to hear everyone's ideas. We need a source that linguists agree is generally reliable to prevent potential edit warring between users knit picking various sources.

~~user:Neddy1234~~

  • I think that in principle it is a good idea, and that the template should certainly support it if it oesn already. I am not sure I would want to make it a requirement however. Sometimes the status is controversial (for example calling a language dead when revival efforts are ongoing), or sometimes the authoritative sources are wrong (I have myself brought "dead language" back to life by writing to the ethnologue to tell them that I found speakers of a variety they listed as extinct). I also definitely think that we should not tie ourselves to one single source, but use which ever sources is best for a given language and use editorial discretion to do so. But having the option and making it the standard is a good idea.User:Maunus ·ʍaunus·snunɐw· 21:32, 19 December 2014 (UTC)
    • Having an infobox display discrete levels of language endangerment strikes me as unsupportable OR. Even Ethnologue doesn't divide all languages into seven neat categories, and neither (AFAIK) does anyone else. Therefore, neither should we. —Aɴɢʀ (talk) 23:44, 19 December 2014 (UTC)
It is only OR if we require it in absence of sources. There are several resources that are attempting to make global endangerment indexes that we could use, ethnologue is only one of them. UNESCO is one, and the google based endangered language project is another. No reason we couldnt use those categorizations in infoboxes.User:Maunus ·ʍaunus·snunɐw· 23:51, 19 December 2014 (UTC)
  • Thank you guys for all taking my idea into account! I'm very thankful for your feedback! As is the case of individual species whose conservation status is data deficient, we need not include such a template/diagram on every language article if there is nothing known. My vision is that if we have enough data from reliable sources like Ethnologue, the google-based endangered languages database, and UNESCO, we can make an accurate (and non-Original) conservation status evaluation. In describing the categories, I have incorporated the various guidelines for language health and vitality from Ethnologue, UNESCO, and Google's endangered language project. And, by the way, Aɴɢʀ, Ethnologue ABSOLUTELY DOES put languages in neat little categories concerning their vitality: http://www.ethnologue.com/about/language-status. So why should we not if we are using these reliable sources to make status evaluations based solely on the information found there? There is no Original research involved.
--user:Neddy1234

I see nothing wrong with it as long as the data is from a RS. What do ppl think of the Catalogue of Endangered Languages? I would oppose using Ethnologue, as SIL admits that the goal of their assessment is to make the lang look as vital as possible in order to facilitate funding for translating scripture, which you're not going to get if the lang is dying. The result is that non-missionary linguists have been denied funding for documenting endangered languages when the funding agency checks Ethn., which says the lang is not endangered. There are many more endangered langs in Africa than you'd understand from Ethn., compared to other continents, for example, with the result that Africa would be under-funded if ppl relied on Ethn. for funding decisions. Maybe our ref'ing some other source would help remedy that.

As for revitalization, IMO we should have a category for that. But even if ppl want to deny it, once a lang is gone, it's gone. If you are able to bring s.t. back, it won't be the same language. That's even the case for Modern Hebrew, which is arguably relexified Slavic rather than Semitic. And few revitalization efforts actually change ppl's native language like that. — kwami (talk) 17:39, 20 December 2014 (UTC)

Kwami, do you have a reliable source for your claim that SIL admits cooking the vitality books to facilitate better funding of its activities? I guess you don't, and it is plainly not true. Just to the opposite, you could claim that many linguists classify a language as highly endangered in order to get access to grants from Rausing or other organizations who have an interest in endangered languages only. Just recently I was reading in an MA thesis about a language that it is "on the brink of extinction", when the writer as much as I know well enough that it is actually quite vital, and still generally being passed on to the next generation. It was written, because the thesis was part of a language documentation program, and they always should have endangered languages as their subject. Therefore, if there is any possible bias about language endangerment status, I would expect it rather from that side. SIL is not going to invest resources into developing a language that is dying, if they know it is dying. This is not to say that all of Ethnologue's vitality assessments are correct (I know they are not), but that there is no motivation to tampering with the status of a language. Landroving Linguist (talk) 08:47, 25 December 2014 (UTC)
I must admit that that is one accusation against the SIL that I also havent heard myself. And I have heard and read a lot. As I say I have myself had to correct languages listed as extinct that are in fact alive. User:Maunus ·ʍaunus·snunɐw· 18:50, 25 December 2014 (UTC)
That's what SIL said when I asked them, after I'd heard complaints. I am aware that there is a lot of exaggeration in the other direction too. These are largely subjective categories, so there could be bias in either direction even without intentional misrepresentation. And there's the anger we'd provoke by saying a language is extinct if there are attempts at reviving it. I guess we'd need to decide which POV we wish to represent, if we're going to add this category. — kwami (talk) 19:36, 25 December 2014 (UTC)
I don't think it is entirely subjective - for most languages there is hard and fast data, such as whether the language is used in education, how many monolingual speakers there are, whether parents pass it on to their children, or whether there is any institutional effort for language development. In the case I mentioned above, the language in question is doing well according to three of these criteria, and any claim that the language is seriously endangered can be easily debunked, based on published sources. I agree that this kind of data may not be available for all languages, and then the situation may be more difficult. In any case, just like the similarly troublesome question about speaker numbers, maybe we can agree here to refer to the best published sources available, and only if nothing else is available default to sources like the Ethnologue. For reasons mentioned above, published sources may be in disagreement, and then this should just be mentioned. Landroving Linguist (talk) 21:48, 25 December 2014 (UTC)
I think the key is not to make the information obligatory but to exclude it on editorial discretion if there is reason so doubt the validity of the extant sources for any reason.User:Maunus ·ʍaunus·snunɐw· 22:02, 25 December 2014 (UTC)
I think that it's messy, subjective, and guaranteed to cause disgreements- but worth it, for the most part. Except for the absolute no-brainers like English or Sumerian, there should always be a link to the section of the article discussing the issue, and there should be a way of marking statuses as disputed and/or unknown (unknown as in "no reliable source has that information", as opposed to "no one at Wikipedia has checked"). It may even be worth it to have multiple statuses possible (with annotation) in cases where there are differences between reliable sources. At any rate, it should always be made clear that it's only a simplified graphical representation of potentially very complex and disputable facts. Chuck Entz (talk) 23:34, 25 December 2014 (UTC)
Oddly enough, as a member of SIL for 35 years, and actively interacting with Ethnologue since they started including EGIDS ratings, I've never heard anyone in SIL suggest that we should or do bias an evaluation of a language's vitality upward so as to make it easier to justify funding for work in that language. Now, I'm not saying that kwami is wrong; I trust him that someone in SIL did actually say that to him. But, given that my experience within the organization is so different, I would guess that the opinion expressed is not widely held, and it certainly is not a matter of policy. AlbertBickford (talk) 23:07, 23 January 2015 (UTC)
One other factor I just thought of, and that is the confusion that can arise over the term "endangered". According to one definition, a language is endangered if there is likelihood that it may disappear within the next century. By that definition, a language can be endangered even when it is still being transmitted to all children. The EGIDS scale used in Ethnologue attempts to rate current level of vitality, rather than "endangerment" in this sense. Other uses of "endangered" that I've seen are more along the lines of languages that are beginning to fade away--where children are no longer learning the language. So, when people use the term in different ways, there is great potential for misunderstanding--especially when money is involved, such as getting funding for research. AlbertBickford (talk) 23:15, 23 January 2015 (UTC)
The word can indeed be ambiguous. I've seen linguists get funding for "endangered" languages that to me seem quite robust, to an extent that many communities could only dream of. That would be the opposite bias to the one I mentioned.
BTW, in Ethn.18, Lyons SL is described as 6a "vigorous", but then there's a note saying that a survey is needed to determine if it's still spoken. Just a heads-up on the problem with copying categories blindly. — kwami (talk) 18:39, 5 March 2015 (UTC)

WikiProject X is live![edit]

WikiProject X icon.svg

Hello everyone!

You may have received a message from me earlier asking you to comment on my WikiProject X proposal. The good news is that WikiProject X is now live! In our first phase, we are focusing on research. At this time, we are looking for people to share their experiences with WikiProjects: good, bad, or neutral. We are also looking for WikiProjects that may be interested in trying out new tools and layouts that will make participating easier and projects easier to maintain. If you or your WikiProject are interested, check us out! Note that this is an opt-in program; no WikiProject will be required to change anything against its wishes. Please let me know if you have any questions. Thank you!

Note: To receive additional notifications about WikiProject X on this talk page, please add this page to Wikipedia:WikiProject X/Newsletter. Otherwise, this will be the last notification sent about WikiProject X.

Harej (talk) 16:57, 14 January 2015 (UTC)

eyes on Habla Congo, please[edit]

Ongoing edit-war over gutting the contents, bad phrasing, and unref'd claims (I'm not sure speaking Spanish is part of speaking "Congo"). It could be better sourced anyway. — kwami (talk) 20:39, 16 January 2015 (UTC)

Yes, please check the edit history. Omo Obatalá (talk) 20:44, 16 January 2015 (UTC)

basic language article in need of review[edit]

National language is a rather basic article for a lot of what we write, but has been listed as needing attention since 2009. According to the intro, a "national languages" is any variety of speech associated with an ethnicity, which is a bit useless as a definition. We should clean it up if fixable, or state the phrase has no meaning if it's not. — kwami (talk) 23:36, 17 January 2015 (UTC)

"Altaic ?" in the Infobox?[edit]

Consensus is to eliminate the named controversial material from the infobox. Non admin closure. SamuelDay1 (talk) 02:36, 17 February 2015 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

(Note: This is a centralization and reinitiation of an RfC that has occurred in different forms elsewhere) In the Infoboxes for Turkic languages, Mongolic languages, Tungusic languages, Japonic languages, and Koreanic languages and most, if not all, of their daughters, assigning the color "Altaic (areal)" to the infobox automatically fills in the text "Altaic ?" in the "family" line of the classification at the top node unless it is specifically overridden by inserting another value into the "family" slot. This should be eliminated since Altaic has been generally discredited among historical linguists and only a small minority of linguists still cling to it. --Taivo (talk) 08:42, 7 February 2015 (UTC)

  • Support. "Altaic", as a genetic unit, has virtually disappeared from serious historical linguistic consideration. Only a decreasingly small number of linguists still cling to it as to a floating deck chair from the Titanic. "Altaic" should be described in the history of classification sections, of course, but should no longer occur (automatically or otherwise) in the infoboxes of the constituent clades or the daughter languages. --Taivo (talk) 08:42, 7 February 2015 (UTC)
  • Support - Colouring should be Language Isolate with linguistic classification being something along the lines of: Language Isolate (controversially classified as Altaic) or Language Isolate/Altaic (?) Luxure Σ 07:30, 12 February 2015 (UTC)
  • Support - In my understanding, the term Altaic, whether it has many or only a few supporters, has not achieved a wide consensus among linguists. As much as possible, the infobox should only include uncontroversial material. I know that this is not always possible, but I'm sure that classifying a language as Japonic or Mongolic is certainly uncontroversial, whereas Altaic is not. Landroving Linguist (talk) 11:19, 9 February 2015 (UTC)
  • Comment: "Language isolate (generally accepted)" is misleading; it's not that the majority of modern linguists are actively proposing that, say, Basque, Sumerian, Korean (minus Jeju), or Haida has literally no relatives (or even no living ones), but that they're refusing to take a position in any direction owing to lack of evidence. I think "Altaic (controversial)" or the like would be more appropriate. Tezero (talk) 15:04, 9 February 2015 (UTC)
Completely untrue. Why do you make statements like this? HammerFilmFan (talk) 14:13, 15 February 2015 (UTC)
Changed. Luxure Σ 07:30, 12 February 2015 (UTC)
  • Support - per Landroving Linguist. ミーラー強斗武 (StG88ぬ会話) 17:23, 9 February 2015 (UTC)
  • Support Controversial classifications should not be in the infobox. It is not the case that "linguists refusing to take a position" somehow lends credibility to the hypothesis, that is a misunderstanding of how classification works. The default position is "isolate" until any relations have been satisfactorily demonstrated. User:Maunus ·ʍaunus·snunɐw· 20:43, 9 February 2015 (UTC)
  • Support - this needed to be done some time ago. HammerFilmFan (talk) 14:13, 15 February 2015 (UTC)
  • I recognize that; I'm merely arguing that "generally accepted" is misleading given what "isolate" means. It'd be like saying agnostic atheism is the religious position the majority of some subset of philosophers or scientists believe in - even if that's the most common response to a self-identification survey, it's not really "believing in" anything. If the concept is to be included in the infobox with no mention of Altaic, but Altaic is considered to be major enough not to label the proposed member only an isolate, I'd prefer something like "No demonstrable relationship to other languages", which I think Mayan languages used last I checked. Tezero (talk) 21:43, 10 February 2015 (UTC)
  • Support per Maunus. As a rule, only well-accepted classifications should be in the infobox. —Granger (talk · contribs) 21:30, 9 February 2015 (UTC)
  • Comment: For the record, the Altaic languages' article lists noticeably more supporters of the theory than opponents. If it's as fringe an idea as you people are saying, you ought to amend that in the interest of due weight. Tezero (talk) 21:43, 10 February 2015 (UTC)
Most of those Altaic supporters are either dead or have abandoned Altaic. --Taivo (talk) 01:00, 11 February 2015 (UTC)
Yes, your assertions are not based on any facts, and makes me somewhat suspicious of your motives here, in spite of AGF. HammerFilmFan (talk) 14:11, 15 February 2015 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Focus in Spanish[edit]

I find that the articles on grammar are so devoid of any linguistics that it is revolting. Instead they all rely on traditional ideas about grammar and in Spanish also shared with RAE. Like the Spanish preterite but why not call it aorist but in Greek it is aorist not preterite, so that instead of making effective and quick the teaching of Spanish they just stick to old ways. Most important however is that which I did not say which is FOCUS since just as they did in the article for Tuareg a Berber language which is very related to Spnaish and other Southern European languages, as many other Afro-Asiatic languages, I would also like to see talk about focus in Spanish which literally dominates everything that is said much like in English when we start a sentence with the object we make a passive construction yet Spanish has its own mechanism whereby to accomplish such object focus. Nevertheless I found no such thing. What's worse is this whole coger thing where supposedly it is to "have relations", when in reality there are many dialectal words as that is a very colloquial word in any language unlike in English where it is always the f word, but in reality the old word for it in Spanish is "joder" which comes straight from the Latin futuere just as vulgar although standardly it can mean to bother since such puritanical selfcensorship is natural in language so that there is no standard word for the private parts and sexuality outside of medical formalities. Therefore I find many articles superficially scholarly but indeed devoid of exanimation of important typological and syntactical features which are more important than some traditional morphological prescriptivism. Furthermore the dialectal differences between Latin American dialectals are completely neglected because Spanish is not just Argentinian voseo, Spaniard vosotros, and Latin American seseo. Indeed there is hardly anything similar between Argentinian (standard voseo) and Dominican (extreme coda deletion) and Mexican (reduced vowels) and Antioquia Colombian (apico-alveolar sibilant) and areas affected by an Indian language substratum or by foreign intonation or isochrony or even the fact some dialects find weird that others extremely use usted yet others that others would even use tú. I concludingly find these Spanish articles representative of a fake Spanish being a fusion of different characteristics and standard codified Spanish ideas about grammar frequently neglected all over Latin America, because in reality what happens in Latin America and even more in Spain is not whatsoever far from the Arabic situation and to speak standard Spanish is as much a matter of much practice and little success as it is to use the Standard for Arabic speakers. Thank you very much. — Preceding unsigned comment added by 98.254.198.111 (talk) 02:30, 13 February 2015 (UTC)

Language examples in new Palatalization articles[edit]

Palatalization was recently split into Palatalization (phonetics) and Palatalization (sound change).

The phonetics article is the place where palatalization as a phonemic feature is described. If you know a language with this feature, please add a section for its language family, and a section on the language, under the Examples section. Adding a few examples of minimal pairs, or notes on typologically unusual features relating to the palatalized phonemes, would be good too.

The sound change article is where palatalization as a historical phonological change is covered. This includes Romance and Slavic historical palatalizations, phonemic splits relating to development of palatal or palatalized consonants, development of alveolar and postalveolar affricates and fricatives, vowel fronting and raising, etc. Examples of palatalization sound changes can be placed in the Examples section. Brief statement of when and where the sound change applied, with a few examples, would be great.

The two must be distinguished: examples of languages with palatalized and unpalatalized phonemes belong in the phonetics article; all examples of historical sound changes, even ones resulting in palatalized phonemes, belong in the sound change article. — Eru·tuon 22:14, 14 February 2015 (UTC)

English language article GA cleanup[edit]

Hi, everyone,

I'm writing here to let project participants know that I'd like to work collaboratively with other Wikipedians to bring the article English language, a very high-page-view article, up to good article status and beyond to featured article status. The previous (failed) good article review from January 2009 and the helpful peer review from September 2012 agree on many points of improvement needed in the article. This project's own template for articles on spoken languages is also a good resource for restructuring the English language article to mention the most important issues about English as one language among many. What I have been doing for the last month or so is exhaustively going through every reference now cited in the article, reference by reference, to verify the references, complete the bibliographic description of each reference, and to collect all the references gradually into a bibliography at the end of the article, with a new inline citation format to ease verifying controversial statements and finding examples. I have done very little rewriting of the article text as yet, but that will eventually have to be done by someone to bring the article to GA status. The previous comments point out that the article lists a lot of miscellaneous facts without unifying them through clear prose. I invite you to suggest current, reliable sources for the article, to query dubious statements in the article, and to discuss on the article talk page what restructuring of the article will place due emphasis on the various aspects of the English language. And, yes, feel free to fix problems as sources identify problems to fix. Please let me know what I can do to help. -- WeijiBaikeBianji (talk, how I edit) 16:25, 16 February 2015 (UTC)

I'm interested in phonology, so I will be editing the Pronunciation section to make it more readable, accurate, and so on.
One endemic problem is that some traditional vowel symbols are no longer phonetically accurate. /ʌ ʊ/, for instance, are pronounced as lowered and centralized to [ɐ ɵ] in standard US pronunciation. This is problematic, because the actual vowel [ʊ] occurs in other languages, like German and Hindi, and use of one symbol for both is misleading. There's a certain amount of wiggle-room in phonetic transcription, but the difference between German Stunde About this sound /ˈʃtʊndə/ and English could About this sound /kʊd/ is great enough that it should be noted in the transcription. Not sure if this observation is supported by reliable sources. — Eru·tuon 06:06, 21 February 2015 (UTC)

Punic language, Tunisian Arabic[edit]

Edit-war by editor who believes that Punic is spoken by 50 million people in the Maghreb. Doesn't appear to understand what a substratum is. (Unless his source actually say what he thinks it does, in which case we'll need to address it per RS.) — kwami (talk) 19:42, 16 February 2015 (UTC)

Smallcaps[edit]

I have initiated a discussion regarding the use of small caps which some editors consider to be deprecated by the MOS in general but which are of course necessary for writing interlinear gloss in language and linguistics articles.User:Maunus ·ʍaunus·snunɐw· 19:55, 16 February 2015 (UTC)

Here is the RfC about the issue: Wikipedia_talk:Manual_of_Style/Capital_letters#RfC:_Proposed_exceptions_to_general_deprecation_of_Allcaps You input will be valued. User:Maunus ·ʍaunus·snunɐw· 21:16, 16 February 2015 (UTC)
It's actually a muddled RFC mostly about two unrelated all-caps style issues, untreated to linguistics matters. The linguistic issue should be a separate proposal.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  11:20, 20 February 2015 (UTC)
No, it is about three unrelated allcaps issues. The question of using small caps in authornames in references is however also related to linguistics since the Linguistic Society of America style guide uses this.User:Maunus ·ʍaunus·snunɐw· 22:40, 20 February 2015 (UTC)
Irrelevant; WP does has it's own citation styles, and does not use those of the LSA or other organizations.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  02:47, 25 February 2015 (UTC)
That is wrong. WP does not have its own citations styles and allows the use of all citation styles. Please read the actual policy WP:CITEVAR.

Ethnologue 18 is out[edit]

I've updated the language info box, so now citations of E17 put the article in Category:Language articles citing Ethnologue 17. There will soon be thousands of articles in that cat that need to be updated. We might want to start with those in Category:Language articles with old Ethnologue 17 speaker data‎, which should fill up in coming days. I've started with the oldest pop dates (< 1983), but anyone who wants to help would be appreciated. (If Ethnologue does not provide a date for its figure, and has not changed since E17, then please leave it as E17, and add a comment that it shouldn't be changed.) — kwami (talk) 21:09, 27 February 2015 (UTC)

(I think most updates were in Eurasia and sign languages, while most of our old data is from outside Eurasia. — kwami (talk) 21:40, 27 February 2015 (UTC))

I've added one update that I knew about because it was me who suggested it. User:Maunus ·ʍaunus·snunɐw· 22:49, 27 February 2015 (UTC)
I moved some links to match what we had elsewhere. If that was wrong, we'll need to correct the statements supporting it. Would be nice if you could create a stub for Chiapas Nahuatl as well, since all I'm working on right now are the new(ish) editions of Glottolog & Ethnologue. — kwami (talk) 00:32, 28 February 2015 (UTC)
Also, if you could ID the remaining INALI names at Nahuan languages#List of Nahuatl dialects recognized by the Mexican government and Wikipedia:WikiProject Languages/INALI names for Mexican languages, that would be wonderful. — kwami (talk) 00:39, 28 February 2015 (UTC)
Yeah unfortunately Glottolog is weirdly listing Tabasco as part of Isthmus, but really it is not. It does not in fact share any of the main innovations characteristic of the Isthmus dialects, it is closer to Pipil. I will try to make an article on Chiapas Nahuatl as well.User:Maunus ·ʍaunus·snunɐw· 01:06, 28 February 2015 (UTC)

I've been recruited to this job, and I will do some work on it. I'd like some input on something. I updated Abkhaz language; the number didn't change, but appears to have three significant figures rather than two. I might be making a mistake (I'm not exactly a math major), so could one of you glance at it and let me know? — Eru·tuon 03:52, 4 March 2015 (UTC)

Yes, the 101,000 + 4,000 + whatever makes up the remaining 7,740 would be to the nearest 1000 and so 3 figs. But consider that the 4,000 is not from the citation date of 1993, but from 1980, and that we have no idea how old the data adding up to the other 7,740 is. Also, the published Turkish pop. might have been, maybe, a range of 3–5k, and Ethn. just reported the mid-point. (They do that a lot. Old editions of Ethn. are often more reliable in this regard than recent editions.) So, yes, just following the math, it would be 3 sig figs, but I seriously doubt that the data is really that reliable. I generally don't like to report anything greater than 2 sig figs, though other editors might disagree with me. Never more than 3 sigfigs, though: that would be greater than 1% accuracy, and population data is hardly ever going to be that accurate. — kwami (talk) 02:44, 5 March 2015 (UTC)
Ethnologue figures for speakers of native languages in the US are still woefully inflated and out of date. Are other published sources acceptable besides Ethnologue? --Vihelik (talk) 20:40, 4 March 2015 (UTC)
Lots of other sources are acceptable. Just follow WP:RS. Ethnologue isn't exactly a RS, really, but it's more complete than anything else, and heads off edit-wars by POV editors cherry-picking sources to inflate the population of their favorite language. But if you can find something that covers the whole US, that would prevent concerns about cherry-picking. You might want to ask a specialist like @Taivo: for the most up-to-date sources. — kwami (talk) 02:44, 5 March 2015 (UTC)

Thanks to @Abrahamic Faiths: and @Miniapolis: on helping with the drudge work.

See also Category:ISO language articles citing sources other than Ethnologue. Some of these could be updated to E18. For most of the top 100 language of the world, we cite the Swedish national encyclopedia rather than Ethn. Also, a number of langs (esp. in Ethiopia and Canada) are cited directly to the census that Ethn. uses. We might as well leave those alone. But some others might be old, or cherry-picked to maximize the population estimate. — kwami (talk) 23:43, 6 March 2015 (UTC)

UPDATE: 1,400 articles have been updated, including all the ones with old population figures. Someone started updating all the Caucasian languages; that may be an approach for those of you interested in a particular family or region. — kwami (talk) 04:28, 11 March 2015 (UTC)

RfC: The MoS and the generic he[edit]

A conversation about the Wikipedia Manual of Style's stance on the generic he and gender-neutral language that started on this talk page has progressed to two RfCs at the village pump. Further opinions are welcome. Darkfrog24 (talk) 18:57, 5 March 2015 (UTC)

Hey, thanks for posting. I've copied your note on the WikiProject Linguistics talk page, since users there will also be interested. — Eru·tuon 00:38, 7 March 2015 (UTC)

Khowar language or Chitrali?[edit]

There seems to be a campaign by some editors, mostly IPs, who are trying to say that this language does not exist, or disputing its name, see here and here.

I'm not a student of language, so it's confusing for me (it appears to me that Chitrali is the language of the Khowar people? However, the page is called Khowar language but uses both names in the text and Chitrali in the lead?)

See this edit here (copy-pasting content) and later here at Khowar, a redirect that had the text from Khowar language pasted into it, for example. See also edits at Languages of Chitral and Chitrali language. More information/discussion at Talk:Khowar language#Vandalism. 220 of Borg 02:41, 7 March 2015 (UTC)

I've reverted the last year's edits to the lead. It's not just the name: the population was falsified, as at various times were the refs.
There are several languages in Chitral. Khowar is commonly called "Chitrali" because it is the most populous, but the term can be ambiguous. The ISO name is the more precise "Khowar".
Thanks for catching this. — kwami (talk) 04:06, 7 March 2015 (UTC)
Ah, very good. I came to the right noticeboard then! Just a little POV IP editing on this topic. :-/ - 220 of Borg 06:29, 7 March 2015 (UTC)

Extinction dates needed[edit]

For those of you interested in ancient or extinct languages, the lang box now generates two new tracking categories: Category:Language articles with unknown extinction date‎ and Category:Language articles with unreferenced extinction date‎. Many articles in the latter are ref'd in the text, just not in the box, but many have no ref at all. The first isn't actually unknown (sorry for the poor choice of wording), but just where we haven't yet found a date. — kwami (talk) 23:32, 12 March 2015 (UTC)

Is the extinction date of a language defined as the year of the death of the last native speaker?
Wavelength (talk) 23:38, 12 March 2015 (UTC)
For most of them presumably it will be date of last documentation. The fetichization oflast native speakers is pretty much only a north american phenomenon.·maunus · snunɐɯ· 23:41, 12 March 2015 (UTC)
For ancient or historical languages, we have an "era" field that may be more informative than "extinct". And in many cases we can only say "mid-20th century", "some time before 1931", etc. If all we have to go on is a few documents, then we can use their dates. (Should that be under 'era' or 'extinct'?) But if we have the date the last native speaker died, that would be good to include. — kwami (talk) 00:12, 13 March 2015 (UTC)
  • A little weird to list Early Modern English etc as extinct languages.·maunus · snunɐɯ· 23:40, 12 March 2015 (UTC)
It doesn't have an "extinct" field, but an "era" field, and the dates in that field are unreferenced. I lumped in historical languages for two reasons: We already have plenty of tracking categories, and many older articles use the "extinct" field rather than the newer "era" field anyway. Feel free to change the names of the categories if you like. I didn't put much thought into them, since few readers are ever going to see them. I suppose we could create a separate cat for "unreferenced era", which might help us review where we should change the box from "extinct" to "era". — kwami (talk) 00:12, 13 March 2015 (UTC)

Some of these articles link to Linguist List for the ISO code description. If there's a date there, you can ref it by entering "linglist" in the ref field. (Can do s.t. similar with AIATSIS for Australian langs.) Also, the 'unknown date' cat is only populated if there is no ref. If the ref is set to e17 or e18 (in some cases where Ethn. does not give a date), then we won't see it. Should those articles be included? Maybe as a subcat? — kwami (talk) 00:24, 13 March 2015 (UTC)

There are Wikipedia wikis in Old English (ang) and Latin (la). DMOZ has links to web pages in Latin. In a sense, those two languages have current documentation. See also "Revival of the Hebrew language". How are extinction date criteria applied to those three languages?
Wavelength (talk) 02:56, 13 March 2015 (UTC)
Liturgical languages are going to have additional dates of L2 use, but that should be kept distinct from L1 use, as we do for living languages. With Hebrew, you have two periods of L1 use, if you accept that they're the same language. Old English is more straightforward. AFAIK, there's no significant modern usage. — kwami (talk) 03:52, 13 March 2015 (UTC)

Deleted Puntland Arabic[edit]

This article may have been incompetent rather than a hoax, in case anyone wants to rescue it. The author seems to be invested, but the info is either fake or unref'd. I turned it into a redirect. — kwami (talk) 20:53, 17 March 2015 (UTC)

"Revival" field in language infobox[edit]

If you enter a value for "revived" in the info box, it will now produce a "revival" field. It can be used in conjunction with "speakers" for revitalization efforts of endangered or moribund languages that still have L1 speakers, and with "extinct" for reconstruction or revival of extinct languages. I'm hoping this will encourage greater description of these efforts, as well as take some of the sting out of reporting a language is extinct when the community is trying to maintain it. — kwami (talk) 02:07, 18 March 2015 (UTC)

I think this is a great idea, thanks for implementing it.·maunus · snunɐɯ· 22:35, 20 March 2015 (UTC)

Splits of IPA help pages[edit]

Several splits of IPA help pages are being discussed or are in progress.

Also, a question is unresolved: whether Ecclesiastical Latin has four mid vowels, as in Italian, or only two. I think it must only have two, since the difference is not marked in spelling. To comment on this, head over to Wikipedia talk:WikiProject Latin § Pronunciation of Ecclesiastical Latin.

It's helpful to have classical and modern Latin and Greek side by side, so that readers can compare them. — kwami (talk) 17:51, 20 March 2015 (UTC)

Sanskrit article[edit]

The illustration of Devangari as used for writing Sanskrit has associated info/text that looks like this: <<

"My name is 'incomplete third word is the name'" (written) in Sanskrit

>>. I can't figure out what is intended or how to fix it. (Posted at talk:Sanskrit and talk:WikiProject Language). -- Jo3sampl (talk) 19:29, 21 March 2015 (UTC)