Talk:ISO 639

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Languages (Rated C-class)
WikiProject icon This article is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of standardized, informative and easy-to-use resources about languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 

To look-up codes, and allow easy referencing from outside, some redirects have been implemented. The urls then are in the form: http://en.wikipedia.org/wiki/ISO_639:eng. More Category:Redirects from ISO 639.

Dialects[edit]

What about dialects such as en-us? Are those part of this standard? -- AdamRaizen 14:15, 2003 Sep 8 (UTC)

No, I believe thats just in the Internet RFCs (combination of the ISO 639 and ISO 3166 codes).
Yes, RFC 3066. But it's not only about country codes (ISO 3166). It can be anything that identifies a language/script variant (zh-HK-HanT = Chinese - Hongkong - Traditional Han ideographs; en-scouse) --84.188.156.201 19:03, 11 August 2005 (UTC)

Scanian[edit]

Someone added "scy" as a code for Scanian; however, I wasn't able to find that code or language in [1] or [2]. The loc.gov site appears to me to be normative, so I'm removing it.

If you have newer information (e.g. a mailing list post from a standardisation authority), please provide a source for this new code. -- pne 10:49, 13 Jul 2004 (UTC)

ISO 639 sources[edit]

The Ethnologue, the ISO recognized authority for standard 639-3, is my first go-to reference for language codes, and usually the last one I need. Plugging "scy" into the Ethnologue language-by-code URL format to get
http://www.ethnologue.com/show_language.asp?code=scy
tells us
Invalid language code
scy is not a language code used in the Ethnologue, 16th edition, nor is it a valid ISO 639-3 code.
And that page says under "Comments":
The language has had no recognition since Sweden obtained Scania from Denmark in 1658. It is called 'Southern Swedish' in Sweden, and 'Eastern Danish' in Denmark. Today it is heavily influenced by Swedish in Sweden.
ISO 639-6 assigns Scanian the 4-alpha code scyr, and under that "Scanian Spoken" scys. I've updated Scanian dialects#Status with the recent history (2009-2010) of that change. --Thnidu (talk) 17:12, 21 November 2012 (UTC)

Eskimo languages[edit]

I see "esk" is listed as a code for "Eskimo languages" (a better term I guess would be Yupik languages), apparently ever since the page has existed. For the same reasons given above for Scanian, I am wondering if this is a legitimate ISO 639 code. Let me know if you have a source for this code. --Iceager 10:47, 18 Aug 2004 (UTC)

Ethnologue (the authority for ISO 639-3; see #ISO_639_sources) lists 10 Eskimo languages, including Northwest Alaska Inupiatun, which has the 639-3 code "esk" (probably based on the US Census listing it as “Eskimo”). The Eskimo group has two branches, Inuit and Yupik; Northwest Alaska Inupiatun is in the Inuit branch. --Thnidu (talk) 17:54, 21 November 2012 (UTC)

The same ISO 639-2 and ISO 639-3 code?[edit]

I think that in the Alpha-3 code space paragraph should be mentioned that languages have the same ISO 639-2 and 639-3 (in case of 639-2 at lest the "form for TERMINOLOGICAL applications"). Am I assuming correctly? Could you please correct my assumptions? —Preceding unsigned comment added by 213.151.83.161 (talk) 23:06, 14 February 2008 (UTC)

Not universally correct. See #ISO_639_sources. --Thnidu (talk) 17:19, 21 November 2012 (UTC)

What is bibliographic? terminological?[edit]

This sentence won't be clear for the average reader: "In these cases, the first code is bibliographic (ISO 639-1/B), and the second code is for terminological use (ISO 639-2/T)." Bibliographical? For use in a bibliography in a book if you use books from another langauge maybe? For use in a library? And terminological? What's that? For use in a dictionary maybe? So if you have the history how the word came into exist you can use the code for middle English? A clarification please.

AFAIK these denominations, just as the whole mess with 3 different code sets, exist only for historical reasons. You're right, the sentence "For these languages, the first three-letter code is for bibliographic use (ISO 639-2/B), and the second three-letter code is for terminological use (ISO 639-2/T)" is quite obscure. "Bibliographic" codes are those traditionally used by US-American libraries, based on Library of Congress's MARC standards. They are derived from the English names of languages, which is not so cool (read: anglocentric). B codes are deprecated. "Terminological" codes are mostly based on self-denomination of languages, and they cover more languages. Those should be used. If a 2-letter code exists, it should be preferred over the 3-letter code. The table should have separate columns for B and T codes and show T codes first, as they're the preferred ones. --84.188.156.201 18:49, 11 August 2005 (UTC)
there are not more T than B codes
B should lead, because this is common, see official reference.
IMO seperate cols are not needed. only few codes have B/T
Tobias Conradi (Talk) 18:36, 17 October 2005 (UTC)
The current guideline is IETF's BCP 47 (replaces RFC 3066). It states on page 8 that the shortest code should be used, that the ISO 639-2/T code should be used when no ISO 639-1 code exists, and that a divergent B code should not be used:
  Note: For languages that have both an ISO 639-1 two-character code
  and an ISO 639-2 three-character code, only the ISO 639-1 two-
  character code is defined in the IANA registry.
  Note: For languages that have no ISO 639-1 two-character code and for
  which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
  (Bibliographic) codes differ, only the Terminology code is defined in
  the IANA registry.  At the time this document was created, all
  languages that had both kinds of three-character code were also
  assigned a two-character code; it is not expected that future
  assignments of this nature will occur.
So B codes are in fact deprecated.--87.162.0.236 (talk) 19:46, 30 November 2008 (UTC)

Table conversion[edit]

Since uniform data like ISO 639 codes ought to be presented in a tabular format, I wrote a quick program to do the conversion:

// File:    convert-iso639.cpp
// License: Public domain
// Author:  Ardonik
#include <fstream>
#include <iostream>
#include <string>
using namespace std;

void generate(istream& in, ostream& out) {
  string line;
  while (getline(in, line)) {
    if (line.length() < 5) continue; // Blank line
    if (line.substr(0, 2) == "==" && line.substr(3, 2) == "==") {
      // New section.
      // End old table, if applicable.
      if (line != "==A==") out << "|}\n";
      // Start a new table.
      out << line << "\n";
      out << "{| border=\"1px\" cellspacing=\"0\" cellpadding=\"2px\"\n";
      out << "|- style=\"background-color: #a0d0ff;\"\n";
      out << "!Alpha-3!!Alpha-2!!Language name\n";
      out << "|-\n";
    } else {
      // Just another entry in the current table.
      string alpha3 = line.substr(1, line[4] == '/' ? 7 : 3);
      string alpha2 = line.substr(10, 2); if (alpha2=="  ") alpha2 = " ";
      string language = line.substr(16);
      out << "|" << alpha3 << "||" << alpha2 << "||" << language << "\n";
      out << "|-\n";
    }
  }
  out << "|}\n"; // Close last table.
  if (in.fail() && !in.eof()) cout << "Could not read from input\n";
  if (out.fail()) cout << "Could not write to output\n";  
}

int main(int argc, char* argv[]) {
  if (argc != 3) {
    cout << "Usage: " << argv[0] << " [infile] [outfile]\n";
    cout << "  If infile is \"-\", input will be read from stdin.\n";
    cout << "  If outfile is \"-\", output will be written to stdout.\n";
    return 0;
  }
  string infile = argv[1], outfile = argv[2];
  if (infile == "-" && outfile == "-") {
    generate(cin, cout);
  } else if (infile == "-") {
    ofstream out(outfile.c_str());
    generate(cin, out);
  } else if (outfile == "-") {
    ifstream in(infile.c_str());    
    generate(in, cout);
  } else {
    ifstream in(infile.c_str());    
    ofstream out(outfile.c_str());    
    generate(in, out);
  }
  return 0;
}

To operate the program, you should cut the data (headings included) from the old version of the page and paste into a text file like old.txt. Running convert-iso639 old.txt new.txt will give you the tabled version in new.txt, and you can copy and paste that into the article. --Ardonik 01:19, Aug 12, 2004 (UTC)

Serbo-Croatian, Serbian, Croatian[edit]

  • Three letters codes "scr" and "scc" are from Serbo-Croatian and differs alphabet (scr for Latin script and scc for Cyrillic script). But, both -- Serbian and Croatian -- texts from the time of Serbo-Croatian standard could be written in both alphabets (especially Serbian, which has 50/50 texts in Latin and Cyrillic alphabet). In this table "scr" refers only to Croatian and "scc" refers only to Serbian. The question is: Is it ISO mistake (because of this possibility I didn't change codes) or Wikipedia mistake? --Millosh 07:15, 10 Nov 2004 (UTC)

Including native names in table[edit]

Although the English name for a language is important, the native name is equally if not more important. It is arguablly preferrable to display native names on webpages attempting to alert speakers of the displayed language that content is available in their language. For example, the "In other languages" field uses native names not English ones. I think it would be a worthwile addition to include a native names column in the ISO 639 table. Many of the native names are already available from their respective language articles.

An example of what I'm thinking: http://people.w3.org/rishida/names/languages.html

Cleanup needed[edit]

I looked at the article and was unable to understand most of it. IMO, the entire text needs to be rewritten so that it is accessible to people who don't already know what it's about. --Smack (talk) 21:42, 28 August 2005 (UTC)

It also needs to be checked for accuracy. I just removed Banyumasan from the list, because it's not listed here [3], but there are probably other languages which should be removed too. (I also added Ainu, which is on the list of updates [4], but not the main alphabetical list yet, so please don't delete it.) --Chamdarae 00:32, 30 August 2005 (UTC)

I took a stab at clarifying the discussion of Alpha-x spaces, but a lot more could be done.--A12n 14:33, 26 November 2006 (UTC)

with http://en.wikipedia.org/w/index.php?title=ISO_639&diff=90219310&oldid=89208769 you inserted a false statment. And IIRC in mathematics we called it "bound" not "limit". It is ONE upper bound. Not THE upper bound. There are zillions upper bounds. Tobias Conradi4 14:36, 31 October 2007 (UTC)


Hey, at de.wikipedia.org they have these very nice pictures which I think would go some way to improving clarity, and shouldn't be hard to translate:

http://de.wikipedia.org/wiki/Bild:ISO_639_Schematische_Darstellung.svg and http://de.wikipedia.org/wiki/Bild:ISO_639_Mengenbeziehungen.svg -- note the delightful bilingual summary below. Both images are creative commons attribution, both are Inkscape SVG so even with notepad, the desperate could edit them. Right now it's almost seven AM, I've been up all night and shouldn't be on addictive Wikipedia at all and my overheating monitor is flashing in my face making me nauseous ;) but yeah. Although I believe that content is far more useful than images in the long run, I think that converting these images would give a lot of bang for the buck. MIGHT have a crack at it later. Probably not but hey. Anyway correct me if I'm wrong on any of these counts -I'm known to be wrong often. 125.236.211.165 (talk) 17:49, 8 March 2008 (UTC)

New RFC[edit]

RFC 3066 has been replaced by RFC 4646. — Preceding unsigned comment added by 192.134.4.212 (talk) 07:51, September 13, 2006‎

List of ISO 639 codes[edit]

I think List of ISO 639-3 codes should be renamed as List of ISO 639 alpha-3 codes or simply moved to List of ISO 639 codes. The same set of codes are not just used in ISO 639-3, but also ISO 639-2 and ISO 639-5.

Many codes that were in Part 2 (i.e., 639-2) have been removed from Part 3. See #ISO_639_sources. --Thnidu (talk) 17:20, 21 November 2012 (UTC)

Furthermore, there are lots of info about "native names" in the articles List of ISO 639-1 codes and List of ISO 639-2 codes. However, these native names are not included in the ISO standard; therefore I think that a better way is to move this part into this article (List of languages by name, or its sub-lists), remaining only ISO 639 codes, English names and French names (French names is a part of the ISO 639).

My plan is to:

  1. Copy the "native names" column inside List of ISO 639-1 codes and List of ISO 639-2 codes to → List of languages by name
  2. Merge contents inside List of ISO 639-1 codes (which is a relatively shorter list) to → ISO 639-1
  3. Deprecate / delete List of ISO 639-1 codes and List of ISO 639-2 codes
  4. Move List of ISO 639-3 codes to → List of ISO 639 codes
  5. Add ISO 639-2 and ISO 639-5 codes into List of ISO 639 codes

-- Hello World! 08:48, 17 July 2008 (UTC)

See Talk:lists of ISO 639 codes TalkChat (talk) 18:37, 11 November 2008 (UTC)

Template for Ladin needed[edit]

Hello. Could you create a template for Ladin which has an official status as minority language in the Province of Bolzano-Bozen and the Province of Trento, Italy. Please implement this template also at Commons, where it would be of much use, since many of the mountains in the Dolomites have actually Ladin names. Regards Gun Powder Ma (talk) 14:59, 17 February 2009 (UTC)

Unclear[edit]

If this was withdrawn, then what is the new standard? This needs clarification. It writes "it was withdrawn" and then it stops. It's natural to ask "then what is in place of it now?" Qorilla (talk) 21:04, 30 June 2009 (UTC)

Personal request[edit]

Apologies for off-topic content, but does anyone know how to contact the maintainers of ISO 639? Have tried their website, but they list a postal address and a phone number, but no email address. Thanks, reply to my talk page please. Mglovesfun (talk) 12:55, 17 December 2009 (UTC)

The different parts of 639 have different Registration Authorities. 639-3's is SIL, the maintainers of the Ethnologue; 639-6's is Geolang. --Thnidu (talk) 17:16, 21 November 2012 (UTC)

What are the # numbers?[edit]

The table in the middle of the article has a column called # that isn't explained at all, and further # numbers appear throughout the article. Can someone explain what those are? 84.75.8.21 (talk) 17:08, 12 December 2014 (UTC) (lKj)