ISO 639-3

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Find a language
Enter an ISO 639-3 code to find the corresponding language article.

ISO 639-3:2007, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. The standard describes three‐letter codes for identifying languages. It extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. The standard was published by ISO on 2007-02-05.[1] It was based on the language codes used in the Ethnologue published by SIL International, which is now the registration authority for ISO 639-3.[2][3]

ISO 639-3 is intended for use in a wide range of applications, in particular computer systems where many languages need to be supported. It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten.[1] However, it does not include reconstructed languages such as Proto-Indo-European.[4]

It is a superset of ISO 639-1 and of the individual languages in ISO 639-2. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections and Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes.


language 639-1 639-2 (B/T) 639-3
English en eng individual eng
German de ger/deu individual deu
Arabic ar ara macro ara
individual arb + others
Minnan individual nan

As of April 2012, the standard contains 7776 entries.[5] The inventory of languages is based on a number of sources including: the individual languages contained in 639-2, modern languages from the Ethnologue, historic varieties, ancient languages and artificial languages from Anthony Aristar at the Linguist List,[citation needed] as well as languages recommended within the annual public commenting period.

A transition from ISO 639-1 to ISO 639-3 could be done using the data contained in the list of ISO 639-1 codes.

Code space[edit]

Since the code is three-letter alphabetic, one upper bound for the number of languages that can be represented is 26 × 26 × 26 = 17576. Since ISO 639-2 defines special codes (4), a reserved range (520) and B-only codes (23), 547 codes cannot be used in part 3. Therefore a lower upper bound is 17576 − 547 = 17030.

The upper bound gets even lower if one subtracts the language collections defined in 639-2 and the ones yet to be defined in ISO 639-5.


There are 56 languages in ISO 639-2 which are considered, for the purposes of the standard, to be "macrolanguages" in ISO 639-3.[6]

Some of these macrolanguages had no individual language as defined by ISO 639-3 in the code set of ISO 639-2, e.g. 'ara' (Generic Arabic). Others like 'nor' (Norwegian) had their two individual parts ('nno' (Nynorsk), 'nob' (Bokmål)) already in ISO 639-2.

That means some languages (e.g. 'arb', Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ('ara') are now in ISO 639-3 in certain contexts considered to be individual languages themselves.

This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as two forms of the same language, e.g. in cases of diglossia.

For example:

See[7] for the complete list.

Collective languages[edit]

"A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context."[8] These codes do not precisely represent a particular language or macrolanguage.

While ISO 639-2 includes three-letter identifiers for collective languages, these codes are excluded from ISO 639-3. Hence ISO 639-3 is not a superset of ISO 639-2.

ISO 639-5 defines 3-letter collective codes for language families and groups.

Usage of ISO 639-3[edit]

Generic codes[edit]

Four codes are set aside for cases where none of the specific codes are appropriate. These are intended primarily for applications like databases where an ISO code is required regardless of whether one exists.

mis uncoded languages
mul multiple languages
und undetermined languages
zxx no linguistic content / not applicable

mis (originally an abbreviation for 'miscellaneous') is intended for languages which have not (yet) been included in the ISO standard.

mul is intended for cases where the data includes more than one language, and (for example) the database requires a single ISO code.

und is intended for cases where the language in the data has not been identified, such as when it is mislabeled or never had been labeled. It is not intended for cases such as Trojan where an unattested language has been given a name.

zxx is intended for data which is not a language at all, such as animal calls.[11]

In addition, codes in the range qaa–qtz are 'reserved for local use', for example for extinct languages at Linguist List. Linguist List has assigned one of them a generic value,

qnp unnamed proto-language

This is used for proposed intermediate nodes in a family tree that have no name.


ISO 693-3 codes are frequently used to identify languages referred to in the linguistic literature. However, a number of criticisms have been leveled:[2][12]

  • The administration of the standard is problematic because SIL is a missionary organization with inadequate transparency and accountability. Decisions as to what deserves to be encoded as a language are made internally. While outside input may or may not be welcomed, the decisions themselves are opaque, and many linguists have given up trying to improve the standard.
  • Languages and dialects often cannot be rigorously distinguished, and such distinctions are often based instead on social and political factors. Codification should therefore be made at the languoid level, rather than forcing arbitrary divisions onto ambiguous cases.
  • ISO 639-3 may be misunderstood and misused by authorities that make decisions about people's identity and language, abrogating the right of speakers to identify or identify with their speech variety. [Though SIL is sensitive to such issues, this problem is inherent in the last.]
  • The three-letter codes themselves are problematic, because they are sometimes used as if they were abbreviations of the language, which they are not. Moreover, they are sometimes based on pejorative names, such as [jnj] for Yemsa, from pejorative "Janejero"; when seen as an abbreviation for the name, rather than an arbitrary technical label, the code itself may become pejorative.
  • Permanent identification of a language is incompatible with the "changeable nature" of languages [Haspelmath judges this to be unreasonable, as any account of a language requires identifying it, and we can easily identify different stages of a language.]

Indeed, independently of criticisms leveled at SIL and the nature of the existing codes, the very idea of an ISO standard for language identification has come under fire. The ISO is an industrial organization, not a scientific one. Business has an interest in the stable identification of economically significant languages, for example for translation and software localization, and this is why the ISO 639-1 and 639-2 standards were established in the first place. However, those standards are adequate for the needs of industry; business has no significant interest in the many small, unwritten and often endangered languages with no measurable economic impact. For scientific standardization, scientific bodies are normally established, such as the International Mineralogical Association, which regulates the identification and naming of minerals, and the International Astronomical Union, which regulates the identification and naming of astronomical bodies. And while the use of SIL at least means that ISO 693-3 is based on demonstrated linguistic expertise, this is not true of the later standards, 693-5 and 693-6, which were developed in complete ignorance of the necessary science.[2][12]

See also[edit]


  1. ^ a b "ISO 639-3 status and abstract". 2010-07-20. Retrieved 2012-06-14. 
  2. ^ a b c Morey, Stephen; Post, Mark W.; Friedman, Victor A. (2013). "The language codes of ISO 639: A premature, ultimately unobtainable, and possibly damaging standardization". PARADISEC RRR Conference. 
  3. ^ "Maintenance agencies and registration authorities". ISO. 
  4. ^ "Types of individual languages - Ancient languages". Retrieved 2012-06-14. 
  5. ^ "ISO 639-3 Code Set". 2007-10-18. Retrieved 2012-06-14. 
  6. ^ "Scope of denotation: Macrolanguages". Retrieved 2012-06-14. 
  7. ^ "Macrolanguage Mappings". Retrieved 2012-06-14. 
  8. ^ "Scope of denotation: Collective languages". Retrieved 2012-06-14. 
  9. ^ "Languages in the Root: A TLD Launch Strategy Based on ISO 639". 2004-10-05. Retrieved 2012-06-14. 
  10. ^ "ICANN Email Archives: [gtld-strategy-draft]". Retrieved 2012-06-14. 
  11. ^ Field Recordings of Vervet Monkey Calls. Entry in the catalog of the Linguistic Data Consortium. Retrieved 2012-09-04)
  12. ^ a b Martin Haspelmath, Can language identity be standardized? On Morey et al.’s critique of ISO 639-3, Diversity Linguistics Comment, 2013/12/04

External links[edit]