ISO basic Latin alphabet

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The ISO Basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication.

The two sets contain the following 26 letters each:[1][2]

Uppercase Latin alphabet (Majuscule forms, also called uppercase or capital letters)
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Lowercase Latin alphabet (Minuscule forms, also called lowercase or small letters)
a b c d e f g h i j k l m n o p q r s t u v w x y z

History[edit]

By the 1960s it became apparent to the computer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1]

Terminology[edit]

Name for Unicode block that contains all letters[edit]

The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin".

Names for the two subsets[edit]

In Unicode 7.0 two subheadings exist:[3]

  • "Uppercase Latin alphabet", individual letters contain the string LATIN CAPITAL LETTER in their descriptions
  • "Lowercase Latin alphabet", individual letters contain the string LATIN SMALL LETTER in their descriptions

Names for the letters[edit]

The letters are also contained in "Halfwidth and Fullwidth Forms" FF00 to FFEF[4]

FF21 A FULLWIDTH LATIN CAPITAL LETTER A
FF41 a FULLWIDTH LATIN SMALL LETTER A

Timeline for encoding standards[edit]

  • 1865 International Morse Code was standardized at the International Telegraphy Congress in Paris, and was later made the standard by the International Telecommunication Union (ITU)
  • 1950s Radiotelephony Spelling Alphabet by ICAO [1]

Timeline for widely-used computer codes supporting the alphabet[edit]

  • 1963: ASCII (7-bit character-encoding standard from the American Standards Association, which became ANSI in 1969)
  • 1963/1964: EBCDIC (developed by IBM and supporting the same alphabetic characters as ASCII, but with different code values)
  • 1972: ISO 646 (ISO 7-bit character-encoding standard, using the same alphabetic code values as ASCII, revised in second edition ISO 646:1983 and third edition ISO/IEC 646:1991 as a joint ISO/IEC standard)
  • 1983: ITU-T Rec. T.51 | ISO/IEC 6937 (a multi-byte extension of ASCII)
  • 1987: ISO/IEC 8859-1:1987 (8-bit character encoding)
    • Subsequently other versions and parts of ISO/IEC 8859 have been published.
  • Mid-to-late 1980s: Windows-1250, Windows-1252, and other encodings used in Microsoft Windows (some roughly similar to ISO/IEC 8859-1)
  • 1990: Unicode 1.0 (developed by the Unicode Consortium),[5][6] contained in the block "C0 Controls and Basic Latin" using the same alphabetic code values as ASCII and ISO/IEC 646
    • Subsequently other versions of Unicode have been published subsequently, and it later became a joint ISO/IEC standard as well, as identified below.
  • 1993: ISO/IEC 10646-1:1993, ISO/IEC standard for characters in Unicode 1.1
    • Subsequently other versions of ISO/IEC 10646-1 and one of ISO/IEC 10646-2 have been published. Since 2003, the standards have been published under the name "ISO/IEC 10646" without the separation into two parts.

Representation[edit]

Hindu-Arabic numerals and letters of the ISO basic Latin alphabet on a 16-segment display.

In ASCII the letters belong to the printable characters and in Unicode since version 1.0 they belong to the block "C0 Controls and Basic Latin". In both cases, as well as in ISO/IEC 646, ISO/IEC 8859 and ISO/IEC 10646 they are occupying the positions in hexadecimal notation 41 to 5A for uppercase and 61 to 7A for lowercase.

Not case sensitive, all letters have code words in the ICAO spelling alphabet and can be represented with Morse code.

Usage[edit]

All of the lowercase letters are used in the International Phonetic Alphabet (IPA). In X-SAMPA and SAMPA these letters have the same sound value as in IPA. In Kirshenbaum they have the same value except for the letter r.

Alphabets containing the same set of letters[edit]

The below list only contains alphabets that do not contain:

  • letters with diacritical marks that constitute distinct letters.
  • multigraphs that constitute distinct letters.
alphabet diacritic multigraphs (not constituting distinct letters) ligatures
Afrikaans alphabet á, é, è, ê, ë, í, î, ï, ó, ô, ú, û, ý
Catalan alphabet à, é, è, í, ï, ó, ò, ú, ü, ç
Dutch alphabet[dubious ] ä, é, è, ë, ï, ö, ü The digraphij⟩ is sometimes considered to be a separate letter. When that is the case, it usually replaces or is intermixed with ⟨y⟩.
English alphabet -none- sh, ch, ea, ou, th, ph, ng, zh æ, œ
French alphabet[citation needed] à, â, ç, é, è, ê, ë, î, ï, ô, ù, û, ü, ÿ ai⟩, ⟨au⟩, ⟨ei⟩, ⟨eu⟩, ⟨oi⟩, ⟨ou⟩, ⟨eau⟩, ⟨ch⟩, ⟨ph⟩, ⟨gn⟩, ⟨an⟩, ⟨am⟩, ⟨en⟩, ⟨em⟩, ⟨in⟩, ⟨im⟩, ⟨on⟩, ⟨om⟩, ⟨un⟩, ⟨um⟩, ⟨yn⟩, ⟨ym⟩, ⟨ain⟩, ⟨aim⟩, ⟨ein⟩, ⟨oin⟩, ⟨⟩, ⟨ æ, œ
German alphabet[citation needed] ä, ö, ü sch⟩, ⟨qu⟩, ⟨ch⟩, ⟨ph⟩, ⟨ng⟩, ⟨ie⟩, ⟨ck⟩, ⟨ei⟩, ⟨eu⟩, ⟨äu ß
Ido alphabet -none- qu⟩, ⟨ch⟩, ⟨sh -none-
Indonesian alphabet -none- kh⟩, ⟨ng⟩, ⟨ny⟩, ⟨sy
Interglossa alphabet -none-
Interlingua alphabet -none- qu -none-
Luxembourgish alphabet ä, é, ë
Malay alphabet -none- kh⟩, ⟨ng⟩, ⟨ny⟩, ⟨sy -none-
Occidental alphabet -none-
Portuguese alphabet ã, õ, á, é, í, ó, ú, â, ê, ô, à, ç ch⟩, ⟨lh⟩, ⟨nh⟩, ⟨rr⟩, ⟨ss⟩, ⟨am⟩, ⟨em⟩, ⟨im⟩, ⟨om⟩, ⟨um⟩, ⟨ãe⟩, ⟨ão⟩, ⟨õe -none-

English is the only major modern European language requiring no diacritics for native words (although a diaeresis is used by some publishers in words such as "coöperation").[7][8]

Note for Portuguese: k, w and y were part of the alphabet until several spelling reforms during the 20th century, the aim of which was to change the etymological Portuguese spelling into an easier phonetic spelling. These letters were replaced by other letters having the same sound: thus psychologia became psicologia, kioske became quiosque, martyr became mártir, etc. Nowadays k, w, and y are only found in foreign words and their derived terms and in scientific abbreviations (e.g. km, byronismo). These letters are considered part of the alphabet again following the 1990 Portuguese Language Orthographic Agreement, which came into effect on January 1, 2009, in Brazil. See Reforms of Portuguese orthography.

See also[edit]

References[edit]

  1. ^ a b "Internationalisation standardization of 7-bit codes, ISO 646". Trans-European Research and Education Networking Association (TERENA). Retrieved 2010-10-03. 
  2. ^ "RFC1815 – Character Sets ISO-10646 and ISO-10646-J-1". Retrieved 2010-10-03. 
  3. ^ http://www.unicode.org/charts/PDF/U0000.pdf
  4. ^ http://www.unicode.org/charts/PDF/UFF00.pdf
  5. ^ "Unicode character database". The Unicode Standard. Retrieved 2013-03-22. 
  6. ^ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1. 
  7. ^ As an example, an article containing both a diaeresis "coöperate" and a cedilla in "façades" (Grafton, Anthony (2006-10-23). "Books: The Nutty Professors, The history of academic charisma". The New Yorker. )
  8. ^ http://dscriber.com/news/121-the-new-yorkers-odd-mark-the-diaeresis