Talk:IETF language tag

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Mistakes[edit]

The article contains, as it is today, many mistakes. Here is a summary, written by Doug Ewell:

A few examples:
  • "It is written that subtags are separated from each other by a hyphen. This is not true for the given examples in cases where the subtags are empty. The example is en not en-----."
    This statement presupposes that tags can contain "empty subtags," which do not in fact exist. There is only one subtag in the tag "en".
  • "The IETF only derives their subtags from ISO standards, they are therefore not ISO conform."
    The RFC 4646 system does not claim to be "ISO-conformant," and explicitly seeks to mitigate some of the instability of the ISO standards, so this statement is a red herring. Country-code TLDs are also not ISO-conformant since they use ".uk" instead of ".gb", and use other extensions to ISO 3166 such as ".ac".
  • "It also reserves some tag parts that currently do not exist."
    Section 2.2.1, point 4 reads:
    "4. All four-character language subtags are reserved for possible future standardization.
    "At the same time ISO 15924 is a four-character subtag, all ready."
    This is a non sequitur. There is no syntactical or namespace collision between language subtags and script subtags, just as there is none between 2-letter language subtags and region subtags.
  • "Since the ISO 3166-1 alpha-2 can change from time to time there is ambiguity in the use. E.g. CS could refer to Serbia and Montenegro or to Czechoslovakia. Section 2.2.4. point 3 C&D solve this. For ambiguous ISO 3166-1 codes the UN M.49 code shall be used."
    This is oversimplified to the point of inaccuracy. While ISO code elements may be ambiguous for the reason given, RFC 4646 does not assign a UN M.49-based subtag in place of *each* of the "ambiguous codes," only the more recently assigned. Going forward, if ISO 3166/MA reassigned "CS" to yet another country, that country would get a UN-based subtag but the existing "CS" subtag would not be changed.
    I'll be making the necessary factual corrections to the article, but others are of course free to jump in and do it first. I thought it would be good to let the list know that these misconceptions exist and may be widespread, because of the wide use of Wikipedia,

--The above unsigned comment was added by 63.252.121.133 at 15:40 UTC on 19 November 2006.

I've rewritten most of the article. I think my rewrite addresses all your points, but if you see any further problems, just go ahead and edit the article. --Zundark 17:48, 11 March 2007 (UTC)
I've just reformatted this for minimum Mediawiki syntax and use of talk pages (avoiding preformatted paragraphs and broken paragraphs) for easier attribution. I also added a title to this discussion.
Note that Doug Ewell is one of the authors for the revision of RFC4646, now published in RFC 5646 in september 2009. It is also an active member in the Unicode working groups.
I was not an author of either RFC 4646 or RFC 5646. I was the WG editor for RFC 4645 and RFC 5645, which defined the initial Registry contents as specified by 4646 and 5646 respectively. My activity on the Unicode mailing list (which has since ended) is not relevant. There is no conflict of interest in my making corrections to this page, and I plan to make more. --Doug Ewell 19:07, 2 January 2014 (UTC)
When posting discussions, please use a minimum formatting, insert a section title, to avoid the discussion to take two full screens. And don't forget to sign your message, even if you don't have a Wiki account here. verdy_p (talk) 13:11, 1 November 2009 (UTC)

Updating examples[edit]

This article uses two letter iso 639-1 tags as examples throughout, though current standards favor three letter iso 639-3 tags. Most examples should be changed to show current best practice. Bcharles (talk) 03:02, 8 March 2011 (UTC)

This article is about BCP 47, which doesn't offer any such choice - you have to use the tag specified in the registry. --Zundark (talk) 08:30, 8 March 2011 (UTC)

Missing Content[edit]

I'm curious if anyone knows of a decently comprehensive list somewhere of commonly used IETF language tags. My sense is because there is a theoretically endless amount of possible combinations, no one wants to publish an exact list, and thats why usually examples are only provided. But I do think there are a set of most commonly used tags which would be practical to have a list of for those doing everyday i18n work, and would be a useful addition to this article. Does anyone have a good reference for a list, or also think this would be useful? (note this is obviously an unofficial edit, just convenient to stay logged into my staff acct!) Sbouterse (WMF) (talk) 22:43, 23 September 2011 (UTC)

CLDR has a lot of this sort of thing. Try http://www.unicode.org/cldr/charts/latest/supplemental/index.html . Doug Ewell (talk) 02:46, 15 January 2014 (UTC)

Corrections[edit]

I've made some factual corrections to this article, specifically with regard to misuse of well-defined BCP 47 terms like "redundant" and "deprecated" and "standard track." More such changes will be forthcoming. Doug Ewell (talk) 02:48, 15 January 2014 (UTC)

"* up to three optional extended language subtags composed of three letters each, separated by hyphens; (There is currently no extended language subtag registered in the Language Subtag Registry without an equivalent and preferred primary language subtag. This component of language tags is preserved for backwards compatibility and to allow for future parts of ISO 639.)" Are you sure? Cantonese Chinese requires an extlang subtype, represented by "zh-yue"[1] (language-extlang). There's no Cantonese distinction in ISO 639. --Ebarcell (talk) 10:50, 2 September 2014 (UTC)

ISO 639-3 does distinguish between Chinese dialects, including Mandarin, Cantonese, Wu, and many others. Cantonese can be represented by either "yue" or "zh-yue" in BCP 47. Doug Ewell (talk) 13:08, 2 September 2014 (UTC)