Talk:ISO 639-3

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Collective languages[edit]

The ISO 639-2 code "art" for artificial languages is also a collective code, isn't it? HTH --surueña 10:40, 23 May 2006 (UTC)

Yes, it is. ISO 639-2 actually has far more than eleven collective codes. For example, roa refers to Romance language (other). — Gareth Hughes 12:59, 23 May 2006 (UTC)

I noticed that Gary Simons gave a presentation in Berlin on the ISO 639-3.

In that presentation he makes the claim that TC37 management implies/suggests that the macro languages feature of the ISO 639-3 should be used / only used for compatibility with other code standards in the ISO 639-3 group. Bringing this to the wiki page might add some clarity for those who are wondering why this feature is not more exploited in the standard. Hugh Paterson III (talk) 07:04, 15 January 2015 (UTC)


would like to write about ISO 639 three letter adoption.

  • uses ces->cze, hun->ung, nld->net ; while deu eng spa fra ita pol por are ISO conform

Tobias Conradi (Talk) 11:25, 1 August 2006 (UTC)

Lots of people use nonconforming codes. Even when they try to adopt ISO codes, they make up codes like pob[1] for Brazilian Portuguese, which is just a dialect. (Of course, so are the variants of Serbocroatian that get their own language codes for purely political reasons.) Many, many people don't even know the difference between a language and a country and use country codes. The horror. And don't even try to evangelize them, it's useless.-- 06:58, 3 April 2007 (UTC)

Mapping ISO 639-1 to ISO 639-3[edit]

This linux one-liner generates the mapping:

(echo "iso_639_map_1_3 = {"; wget -O - | cut -f 1,4 | sed -e "1,1d" | grep "......" | awk "{print \"    '\" \$2 \"' : '\" \$1 \"',\"}" | sort; echo "}") > 

Procedure for making changes to the ISO 639-3 standard[edit]

It would be helpful to have a paragraph discussing the fact that the standard is not static, but dynamic, and how one might go about proposing changes to it (through the ISO 639-3 registration authority).

- Albert Bickford (talk) 23:24, 20 November 2007 (UTC)

Agreed —Preceding unsigned comment added by (talk) 07:23, 15 March 2009 (UTC)


Since we're talking standards, what the heck is "2007-02-05" supposed to mean? I get the 2007 bit. But the rest could be 2 May or 5 February. Totally ambiguous and therefore totally unacceptable for an encyclopedia. -- Jack of Oz ... speak! ... 20:27, 19 December 2009 (UTC)

When the year is given first in a date, with two-digit month and year, as in "2014-05-18", it is generally done for the purpose of sorting dates with simple algorithms. That purpose demands that the month be given first, then the day. I've personally never seen dates in the form yyyy-dd-mm, and would be very surprised to see them. Therefore, one can safely assume that 2007-02-05 is 5 February, not 2 May. I'd also note that this format is accepted in the Wikipedia manual of style [[2]]. AlbertBickford (talk) 01:23, 19 May 2014 (UTC)
Speaking of standards, the YYYY-MM-DD format conforms to ISO 8601. So, not completely ambiguous. Pconstable (talk) 20:05, 8 June 2014 (UTC)

A superset of ISO 639-1[edit]

The article says: "It is a superset of ISO 639-1".

Since all the actual codes are different, it seems more correct to me to say: "The list of languages covered by ISO 639-3 is a superset of the list of languages covered by ISO 639-1".

Or is it too anal? --Amir E. Aharoni (talk) 08:48, 3 February 2011 (UTC)

I agree that the wording should be more precise, so reworded it. AlbertBickford (talk) 01:27, 19 May 2014 (UTC)

Maintenance processes[edit]

In the list of permitted changes, I changed "removed" to "deprecated": for purposes of stability, entries in 639-3 are never removed. The registration authority talks about "retirement" (unfortunately, since that is somewhat ambiguous), but the actual text of the standard does not use that term: it uses "deprecation". Pconstable (talk) 16:48, 21 May 2014 (UTC)

Criticisms / Evaluation of criticisms[edit]

The various responses to Morey, Post and Friedman introduced by "Evaluation of this criticism" seem to be synthesis. They also read as arguing the SIL case, which is not what WP is for. Kanguole 16:59, 19 May 2014 (UTC)

What would you suggest be done instead? Is there some way to scale back or adjust the evaluative comments so that they represent what you would consider NPOV? Prior to the evaluations being added, the section seemed to reflect a bias in the other direction, as the descriptions of Morey, Post and Friedman's objections sometimes read as if they were facts, when in at least some cases they were their opinions. I'm all for including a discussion of the controversy, but let's be clear in WP when we're reporting one position and when we're reporting the consensus of the scholarly community. (Oh, and BTW, most of the evaluations were inserted by someone else; the only one I inserted was about the confusion of codes with abbreviated names.) AlbertBickford (talk) 18:50, 19 May 2014 (UTC)
I think the way to discuss critisms is to attribute what is said, and only say what can be attributed. So far, our only sources are Morey/Post/Friedman and Haspelmath. Another could be Golla, Victor, ed. (2006). "SSILA statement on ISO 639-3 language codes" (PDF). SSILA Bulletin 249.  We should be reporting others' evaluations, not doing our own. Kanguole 20:36, 19 May 2014 (UTC)
Yes, that's a nice source, although it is inconclusive, because it describes a motion without g. Although I'm a member of SSILA, I don't recall whether it passed (I wasn't there that year), and I'm fairly sure that little if any action has actually come from it. (Unfortunate, because I personally feel it would have greatly strengthened the quality in the standard in its coverage of the Americas. It could be, though, that people may have just gone straight to the registration authority with the things that bothered them the most. There was a flurry of such change requests submitted back then, and the great majority were accepted. And, once people got their biggest concerns taken care of, the energy for setting up such a Board may have evaporated. Anyway, maybe somebody reading this knows what happened.)
OK, then, what's the way forward? We shouldn't just revert to a couple days ago, because I've made other changes to the page in the meantime that I'd rather not lose, and it could be that there is some of the stuff in the recently-added evaluative comments that can be kept. I'm afraid I don't have much more time devote to this issue at the moment, and not for about a month. Do you? AlbertBickford (talk) 21:00, 19 May 2014 (UTC)
Maybe in a few days. My feeling is that most of the evaluation should go, or be moved to other sections and recast as a description of the process. And of course the critisisms need to be attributed. (There's more of the discussion in Bulletin numbers 242, 244 and 250.) Kanguole 21:19, 19 May 2014 (UTC)
According to Bulletin number 252, the motion was carried unanimously in January 2007. Kanguole 11:42, 20 May 2014 (UTC)
I introduced the "evaluation" content. I don't mind if we describe that in some way other than "evaluation", but I think my additions are necessary to provide balance.
I was the project editor in the creation of ISO 639-3, I'm a ISO TC37-appointed member of the body that maintains the ISO 639 standards, and also a liaison representative from the Unicode Consortium to ISO TC37 representing industry interests in ISO 639. Hence, I have a deep stake in seeing that the standard is well understood and able to succeed in meeting the needs of many different user segments. I'm deeply concerned with the "criticism" section as it was created for a few reasons.
First, I found it to be unbalanced in representing only opinions of linguists, which is only one of several user segments for the standard, but it didn't reflect in a way that would be obvious to all users that it is a perspective from only one user segment. Someone coming from a different user segment and learning about the topic for the first time could be deeply misled about the impact for them. Related to that is that the criticism section added comprised about 20% of the entire topic, resulting in a significant skew by the volume alone. There are major user segments that have been well-served by the standard; having 20% focus on criticisms from one segement without any critical assessment is misleading.
Secondly, it made statements as though they were representative of the majority or all linguists. Given the number of change requests that the registration authority has been processing each year, generally coming from linguists and most of the time adopted, it is difficult for me to believe that all of the criticisms are representative of the greater majority of even that user segment.
Thirdly, statements are restated from Hasplemath with little if any critical assessment as to their validity. Some are subjective opinions that cannot be objectively refuted -- if someone is unhappy with something, I can't refute their unhappiness. Some may be reasonable concerns -- and I didn't comment on everything. But other statements can be assessed in relation to factual data, and that is what I attempted to add. For instance, in relation to the first criticism, it is a statement of fact that can be verified by the references I provided that the registration authority is publicly documenting every change request and every decision taken in response to those requests, including a rationale for the decision. To state that there is "inadequate transparency and accountability" without also stating all that is made public in the RA's process is to present a very strong bias. I consider that unacceptable.
My intent was not to come across as trying to create a bias the other way; my intent was to provide balance. I'm happy to work on refining so as to best reflect a balance. Note that I did not add any comments that aren't backed with some verifiable facts. For instance, I did not add anything in relation to the statement about ISO 639-3 potentially being misused since I can't provide any factual data calling that into question; and I also didn't add any comments on the issue of identifiers being based on pejorative names. Comments I added were about RA process, the intent of the standard regarding the types of distinctions made, and appropriateness of an industry organization creating a language coding standard. My comments were fact-based, and references were provided in each case.
Btw, I made those changes anonymously since I had lost track of my Wikipedia credentials (which I've since found). My intent wasn't to conceal the source of changes.
Pconstable (talk) 05:35, 20 May 2014 (UTC)
One of the problems is that it reads like an attempt to sweep all problems under the rug. The standard dominates the field; it doesn't need protecting. — kwami (talk) 06:15, 20 May 2014 (UTC)
Much of this can be addressed by attributing the various criticisms, and by building up the rest of the article, but it's not the function of a Wikipedia article to provide a right of reply. Also this section should be retargeted a bit, as it's not about criticism in general, but the reaction from linguists. In addition to criticism, there's been some introspection regarding the fact that, as Dobrin and Good put it, "academic linguistics has virtually nothing to say about an aspect of its object of study that is of intense and legitimate interest outside the discipline: when asked what the languages of the world are, it is only SIL that is ready to answer." Kanguole 11:19, 20 May 2014 (UTC)
@Kwamikagami:: When you reverted, you took out en masse several other changes that I had introduced after PConstable's. Was that your intention, or was that just a side-effect? That is, were you objecting to the other changes also? For example, it seems that the comment about how the standard is used to provide precise reference when names are ambiguous belongs better in the head section, rather than being buried in the section on criticisms. So, would you object if I re-introduced some of those other changes that were independent of the question of how to bring better balance to the article? AlbertBickford (talk) 12:42, 20 May 2014 (UTC)
Kwami restored some of your changes (including this one) in his next edit. Kanguole 13:10, 20 May 2014 (UTC)
Oh, I hadn't seen that yet. Good. I'm going to look back at the others to see if there are any others I care enough about to re-insert--although I'll leave the criticism section alone until we come to a consensus on how to adjust it. AlbertBickford (talk) 14:32, 20 May 2014 (UTC)
Kwami got them all. Thanks, Kwami. AlbertBickford (talk) 14:35, 20 May 2014 (UTC)
I don't see how my additions were sweeping things under the rug since the statements of criticism were left in place. By reverting, the factual statements I provided are disregarded. I'll take another attempt at introducing factual statements in a balanced way. Pconstable (talk) 14:01, 20 May 2014 (UTC)
We need to attribute the criticisms and add more material to other parts of the article, but I don't believe that writing replies to the criticisms would be appropriate. Kanguole 14:18, 20 May 2014 (UTC)
As I've thought about this more, in light of this discussion above, I'm inclined to agree with Kanguole. We can let the criticism section stand more-or-less as it is now, as a summary of the claims of these four scholars (and any others who have published on the matter). Similarly, if people have published replies to these criticisms, we can cite them here. Otherwise, facts that run counter to the criticisms would better be cited in an appropriate other part of the article; this will have the side benefit of expanding those other sections that need expanding.
At the same time, I do still think the criticism section needs some tweaking in the wording. So, for example, the lead sentence of the section now says criticism has been leveled "by linguists". Though factually true--it is linguists who made the criticisms--it seems to imply that the criticisms represent a consensus opinion among linguists, which is probably not true. (As evidence, consider the more measured tone about the standard in the SSILA discussions--though not totally supportive of the fact that SIL was the registration authority, they seem to be supportive of the standard itself. Also the fact, as PConstable points out, that many linguists have submitted change requests to the standard in the intervening years. And, it is common practice in linguistic articles, when discussing a language, to give its 639-3 code; in other words, though there are certainly limitations with the standard, many linguists find it useful.) Therefore, PConstable had inserted "some"--i.e. "by some linguists"--which is also factually true, but gives a more neutral characterization of how widespread the criticism is. I request that this word be re-inserted. AlbertBickford (talk) 14:32, 20 May 2014 (UTC), slightly revised AlbertBickford (talk) 14:43, 20 May 2014 (UTC)
One other aspect of this section that we should look at: What do linguists think now, eight years later? It seems like a lot of the controversy has died down. Should that be reported? AlbertBickford (talk) 14:43, 20 May 2014 (UTC)
Morey, Post and Friedman's presentation was last year. Morey and Post are working on an expanded version. Kanguole 15:32, 20 May 2014 (UTC)
Albert, "various ... by linguists" does not suggest that linguists are in agreement, so "some" is just awkward and sounds weasley. Also, it was preceded by a statement that linguists do commonly use the ISO code for identification. Haspelmath seems to think that objecting to an ID code for being an ID code is silly; generally the criticisms are not about having a code, but in the details and application.
As for linguists applying for fixes, that doesn't mean they like the system. I once spoke at a conference where I mentioned fixing ISO codes and Ethnologue data, and the unified response was that they had tried and failed, and so had given up on it. There was not one positive experience out of dozens of field linguists in the audience. Also, the quality of the change reports is not very high; many appear to be amateurs or students who come across contradictory data in the lit, and some obviously do not know what they're talking about. I've yet to see an expert on a family submit the corrections they feel are necessary for the family (indeed, the change forms force you to address it in bits and pieces), and ISO/Ethnologue do not appear to seek out such advice. Outside linguistic input is still meagre. — kwami (talk) 15:59, 20 May 2014 (UTC)
Kanguole's change to the section resolves my concerns about "some linguists", in a better way than I had suggested, by citing the specific people who made the criticism. AlbertBickford (talk) 16:28, 20 May 2014 (UTC)
@Kwamikagami: You must not have seen Nora England's proposals in 2008 (such as this one: which argued for merging several varieties of Mayan languages that Ethnologue had treated as separate languages. She's an expert on the family, and most if not all of her proposals were accepted (in fact, over the objections of some people within SIL). AlbertBickford (talk) 16:28, 20 May 2014 (UTC)
With regard to SIL seeking out advice for improving Ethnologue, I have on several occasions posted requests on discussion lists requesting input from sign language linguists for improving Ethnologue. Some have replied, and we've worked the input into Ethnologue, although I acknowledge that, as you note, many seem to have given up on trying. Knowing the people involved, I don't think they were resistant to input, but were overwhelmed by the immensity of the task and plagued by many people who asserted the need for changes without providing adequate documentation. The problem was exacerbated by the fact that for many years Ethnologue, although on the internet, was updated only once every four years at the same time as the print edition--an interminably long time for the internet. People submitted suggestions, and then if they didn't see results in six months or a year, they got discouraged. SIL is working to correct those problems. The publication schedule is now once a year, and further, the most recent version incorporates a mechanism for soliciting comments and corrections. So, SIL is actively working to seek input on Ethnologue now, and further suggestions about how it could do so would, I'm sure, be welcomed. AlbertBickford (talk) 16:28, 20 May 2014 (UTC)
I saw the results of England's intervention, but that was before I started tracking SIL/ISO changes. My impressions are in line with everything you said about people being overwhelmed and recent improvements at SIL. It just hasn't filtered into the wider community AFAICT. One way possibly of improving it would be to outsource the hard work by submitting a proposal to a respected Chadic/Dravidian/Algonquian/whatever conference, requesting that they evaluate ISO/SIL with a workshop, committee, or whatever, and get back with a consensus on what needs to be fixed: spurious distinctions, missing languages, inappropriate isoglosses, bad demographics, inaccurate maps, and a consensus classification (favoring a comb structure where the nodes of the tree haven't been worked out, and explicitly labeling unclassified nodes). That is, do it a chunk at a time, so it's manageable. Or more than one conference, or send out for independent commentary on the response, if they're worried about bias. They could cite the conference/workshop in the new tree. If a conference doesn't respond, then they have only themselves to blame for inaccurate coverage. Giving them credit in the tree will hopefully let people who have given up hope know that SIL/ISO is now receptive to external input.
I've made some fixes to ISO, but the response has been so unprofessional that I've basically given up myself. For example, they get bad info off Multitree, and when the error is pointed out to them, justify it by citing Multitree, so you have to get Multitree corrected before you can do anything about ISO. And with 7,000 ISO codes, doing them one at a time like that will soon eat up the available time and patience of anyone who wants to help. That is not a viable approach. I've also spoken to people about problems with their languages, and been told not to worry, it's been fixed in the upcoming edition, only to have to tell them that no, the correction has been rejected. At which point they throw up their hands and say screw it. IMO ISO/SIL needs to be proactive in getting the most informed input, rather than having spotty improvements only where someone cares enough, and has the time and inclination, to both submit the paperwork and follow through to make sure it gets fixed. — kwami (talk) 16:57, 20 May 2014 (UTC)
@Kanguole: Moving that info into other sections is reasonable. I see that one of the paragraphs I had added regarding change requests was moved into the Language Codes section. I think it makes better sense to have a section documenting the maintenance processes in general, and including the info there. I've removed the paragraph from the Language Codes section and incorporated it into a new section. (Note: I think this paragraph was added by PConstable and that he forgot to sign; at any rate, it wasn't me. AlbertBickford (talk) 15:34, 21 May 2014 (UTC))

One other idea: Since SSILA had extensive discussion of the standard in 2006 and 2007, could we add a summary of that discussion and SSILA's action to the section on criticisms? If so, the section maybe could be retitled as "Reception by linguists", and could then express a range of viewpoints, thus addressing some of PConstable's concerns that the article was lopsided. Anyway, we're making progress, and I'm pleased with how things are going. AlbertBickford (talk) 16:28, 20 May 2014 (UTC)

PConstable had inserted the following sentence in response to Haspelmath's criticism about economic impact: "By aiming to include all natural languages without judgment of economic impact, ISO 639-3 gives equal opportunity for all languages to be used in information technologies, providing an important technology component addressing the Digital divide problem." This seems to be a point that is worthy to be included in the article somewhere. We're staying away from having the criticism section turn into a back-and-forth discussion, but where else might this be placed? Would someone else like to do this, or should I? AlbertBickford (talk) 15:40, 21 May 2014 (UTC)

Thanks for bringing that up; I had forgotten about it. As project editor for 639-3, I can say (though can't provide published references) that the standard was supported by the software industry because it was already encountering limits of 639-2 and there was a desire for something comprehensive that would leave implementers unblocked from their future business needs as well as helping them to get out from being bottlenecks to smaller language communities -- i.e., removing factors that create the digital divide. This is a key reason why Microsoft supported my work in development of the standard. (I presented papers at a couple of Unicode conferences back around 2002, and this industry support was reflected in the discussion.) I think it would be good to include this point, though I'm not sure how to provide citations other than the "Over 7,000 languages" blog on Microsoft's site. Pconstable (talk) 16:40, 21 May 2014 (UTC)
Yes, a good point to include. If we can't find direct support in a ref, can you cite industry running up against the limits of 693-2? — kwami (talk) 18:05, 21 May 2014 (UTC)
Well, anecdotally, yes: I've given you that above. The Microsoft blog post I mentioned is another evidence of that: Windows 8 has support that clearly requires 639-3 and much more than 639-2. Beyond that, a possibility that comes to mind would be to look for cases of specific support in some commercial software or in something like the CLDR project that involves some language not covered in 639-2. Pconstable (talk) 21:44, 21 May 2014 (UTC)
I added the digital divide mention to the intro, and added some details on usage in CLDR and Windows 8 clarifying use of more than ISO 639-2; those items have links / citations, so that is covered. Pconstable (talk) 20:14, 8 June 2014 (UTC)

I'm reviewing what is said about Martin Haspelmath's blog remarks, and it currently doesn't reflect quite accurately what he said. E.g., he doesn't say, "business has no significant interest in the many small, unwritten and often endangered languages with no measurable economic impact," nor does he suggest that all coding be done in terms of languoids. So, I'm going to make some changes; others may want to review and adjust. Pconstable (talk) 18:28, 9 June 2014 (UTC)


Minnan is a chinese language, a medieval chinese one, that is one of the ancestor of todaay mandarin or cantonese, so 639-1 should be zh. (talk) 09:36, 12 September 2014 (UTC)

Minnan is a group of modern Chinese varieties, not an ancestor of other modern varieties. In any case, this article should report what ISO 639-3 says, not try to correct it. Kanguole 09:43, 12 September 2014 (UTC)
That's already corrected following ISO 639-3 say, there was nothing, in 639/639-2, that was false statement. That's a modern chinese variety, but generaly classed as medieval chinese. (talk) 08:22, 13 September 2014 (UTC)
This is off-topic for an article about ISO 639-3, but a claim that a modern variety is classified as medieval Chinese makes no sense. Kanguole 10:24, 13 September 2014 (UTC)

Sil refered this as : zh-min and/or zh-minnan for more precision (there are at least mindong and minbei), see : — Preceding unsigned comment added by (talk) 09:55, 12 September 2014 (UTC)

I don't understand[edit]

I don't understand how a select group of people can decide whether any random language is a language or not. How can they for example decide that those speaking Montenegrin aren't actually speaking a language while their own government calls it one! How can the West accept such an un democratic organization...?--Michael (talk) 20:48, 14 October 2014 (UTC)

You are quite right, you don't understand. The ISO 639-3 is not a dictatorship, but a consensus opinion among interested scholars. Its use is primarily for library and internet referencing software to track bibliography, collections, etc. And its use is by no means mandatory. In the US, for example, NSF and NEH funding for language-related projects use it for identifying and tracking language research funding. You somehow think this is more than it actually is. Take your tin hat off and go out to look at the fall colors. There's nothing to see here. --Taivo (talk) 23:01, 14 October 2014 (UTC)
Not many autumn colones here yet, but I'll take my hat off anyway.--Michael (talk) 11:47, 16 October 2014 (UTC)

Change management in the ISO with regard to variation leveling[edit]

I was reading the talk section above about criticisms, in particular the discussion about Nora England's revisions. I am not an expert in Mayan languages so I am not critiquing her suggestions, but I am asking a process question. How does the ISO handle language mergers? Not just code mergers. Let's say that some daughter languages of historical ancestry are evaluated and they have a low intelligibility score (a classic litmus test for differentiating languages), therefore they are argued to be independent an each assigned and ISO 639-3 code. Then, something happens so that "Bi-dialecticalism" or "bi-lingualism" occurs - say a road is built so that population movement and contact increases. Then another researcher comes along and notices that the languages are no-longer "independent" or that mutual intelligibility has risen or that "leveling" has occurred. Then the new researcher reports these findings, saying that the previous language assessment was in error. If the ISO committee accepts the proposal usually what happens is that the old codes are retired. However, why is it that those codes are retired, rather than pointing to a different era in the history of the language? There are codes in the code set which point to historical states of a language i.e. Old High German, or Old English. If I am asking this question then perhaps the question and the answer should be part of the main page? Hugh Paterson III (talk) 06:55, 15 January 2015 (UTC)

Dead Links[edit]

I have noticed that all the links to PDFs in the main page and in the talk page are dead. It seems that some time ago that the physical location of the linguistlist servers changed. when this happened lots of the older features on the linguist list servers ceased to exist. I don't know if the content is in the internet archive or not.

The following items in the further reading section no-longer are correctly linked. Perhaps the SSILA has another server now?

  • Aristar, Anthony (2006). "ISO standardized language codes and the Ethnologue". SSILA Bulletin 247.
  • Epps, Patience, et al (2006). "In opposition to adopting Ethnologue's language codes for ISO 639-3". SSILA Bulletin 246.
  • Golla, Victor, ed. (2006). "SSILA statement on ISO 639-3 language codes". SSILA Bulletin 249. — Preceding unsigned comment added by Hugh Paterson III (talkcontribs) 05:52, 8 December 2014 (UTC)