Jump to content

Basic Latin (Unicode block)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Drmccreedy (talk | contribs) at 00:24, 12 February 2016 (Use U+ prefix consistently). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

C0 Controls and Basic Latin
RangeU+0000..U+007F
(128 code points)
PlaneBMP
ScriptsLatin (52 char.)
Common (76 char.)
Major alphabetsEnglish
French
Spanish
German
Vietnamese
Symbol setsArabic numerals
Punctuation
Assigned128 code points
33 Control or Format
Unused0 reserved code points
Source standardsISO/IEC 8859, ISO 646
Unicode version history
1.0.0 (1991)128 (+128)
Unicode documentation
Code chart ∣ Web page
Note: [1][2]

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding.

The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[2]

Table of characters

Code Result Description Acronym
C0 controls
U+0000 Null character NUL
U+0001 Start of Heading SOH
U+0002 Start of Text STX
U+0003 End-of-text character ETX
U+0004 End-of-transmission character EOT
U+0005 Enquiry character ENQ
U+0006 Acknowledge character ACK
U+0007 Bell character BEL
U+0008 Backspace BS
U+0009 Horizontal tab HT
U+000A Line feed LF
U+000B Vertical tab VT
U+000C Form feed FF
U+000D Carriage return CR
U+000E Shift Out SO
U+000F Shift In SI
U+0010 Data Link Escape DLE
U+0011 Device Control 1 DC1
U+0012 Device Control 2 DC2
U+0013 Device Control 3 DC3
U+0014 Device Control 4 DC4
U+0015 Negative-acknowledge character NAK
U+0016 Synchronous Idle SYN
U+0017 End of Transmission Block ETB
U+0018 Cancel character CAN
U+0019 End of Medium EM
U+001A Substitute character SUB
U+001B Escape character ESC
U+001C File Separator FS
U+001D Group Separator GS
U+001E Record Separator RS
U+001F Unit Separator US
ASCII punctuation and symbols
U+0020   Space SP
U+0021 ! Exclamation mark
U+0022 " Quotation mark
U+0023 # Number sign
U+0024 $ Dollar sign
U+0025 % Percent sign
U+0026 & Ampersand
U+0027 ' Apostrophe
U+0028 ( Left parenthesis
U+0029 ) Right parenthesis
U+002A * Asterisk
U+002B + Plus sign
U+002C , Comma
U+002D - Hyphen-minus
U+002E . Full stop
U+002F / Slash
ASCII digits
U+0030 0 Digit Zero
U+0031 1 Digit One
U+0032 2 Digit Two
U+0033 3 Digit Three
U+0034 4 Digit Four
U+0035 5 Digit Five
U+0036 6 Digit Six
U+0037 7 Digit Seven
U+0038 8 Digit Eight
U+0039 9 Digit Nine
ASCII punctuation and symbols
U+003A : Colon
U+003B ; Semicolon
U+003C < Less-than sign
U+003D = Equal sign
U+003E > Greater-than sign
U+003F ? Question mark
U+0040 @ At sign
Uppercase Latin alphabet
U+0041 A Latin Capital letter A
U+0042 B Latin Capital letter B
U+0043 C Latin Capital letter C
U+0044 D Latin Capital letter D
U+0045 E Latin Capital letter E
U+0046 F Latin Capital letter F
U+0047 G Latin Capital letter G
U+0048 H Latin Capital letter H
U+0049 I Latin Capital letter I
U+004A J Latin Capital letter J
U+004B K Latin Capital letter K
U+004C L Latin Capital letter L
U+004D M Latin Capital letter M
U+004E N Latin Capital letter N
U+004F O Latin Capital letter O
U+0050 P Latin Capital letter P
U+0051 Q Latin Capital letter Q
U+0052 R Latin Capital letter R
U+0053 S Latin Capital letter S
U+0054 T Latin Capital letter T
U+0055 U Latin Capital letter U
U+0056 V Latin Capital letter V
U+0057 W Latin Capital letter W
U+0058 X Latin Capital letter X
U+0059 Y Latin Capital letter Y
U+005A Z Latin Capital letter Z
ASCII punctuation and symbols
U+005B [ Left Square Bracket
U+005C \ Backslash [A]
U+005D ] Right Square Bracket
U+005E ^ Circumflex accent
U+005F _ Low line
U+0060 ` Grave accent
Lowercase Latin alphabet
U+0061 a Latin Small Letter A
U+0062 b Latin Small Letter B
U+0063 c Latin Small Letter C
U+0064 d Latin Small Letter D
U+0065 e Latin Small Letter E
U+0066 f Latin Small Letter F
U+0067 g Latin Small Letter G
U+0068 h Latin Small Letter H
U+0069 i Latin Small Letter I
U+006A j Latin Small Letter J
U+006B k Latin Small Letter K
U+006C l Latin Small Letter L
U+006D m Latin Small Letter M
U+006E n Latin Small Letter N
U+006F o Latin Small Letter O
U+0070 p Latin Small Letter P
U+0071 q Latin Small Letter Q
U+0072 r Latin Small Letter R
U+0073 s Latin Small Letter S
U+0074 t Latin Small Letter T
U+0075 u Latin Small Letter U
U+0076 v Latin Small Letter V
U+0077 w Latin Small Letter W
U+0078 x Latin Small Letter X
U+0079 y Latin Small Letter Y
U+007A z Latin Small Letter Z
ASCII punctuation and symbols
U+007B { Left Curly Bracket
U+007C | Vertical bar
U+007D } Right Curly Bracket
U+007E ~ Tilde
Control character
U+007F Delete DEL
A The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[3]

Subheadings

The C0 Controls and Basic Latin block contains six subheadings.[4]

C0 controls

The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[4]

ASCII punctuation and symbols

This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[4]

ASCII digits

The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[4]

Uppercase Latin alphabet

The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[4]

Lowercase Latin alphabet

The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[4]

Control character

The Control Character subheading contains the "Delete" character.[4]

Compact table

C0 Controls and Basic Latin[a]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+000x NUL SOH STX ETX EOT ENQ ACK BEL  BS   HT   LF   VT   FF   CR   SO   SI 
U+001x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN  EM  SUB ESC  FS   GS   RS   US 
U+002x  SP  ! " # $ % & ' ( ) * + , - . /
U+003x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
U+004x @ A B C D E F G H I J K L M N O
U+005x P Q R S T U V W X Y Z [ \ ] ^ _
U+006x ` a b c d e f g h i j k l m n o
U+007x p q r s t u v w x y z { | } ~ DEL
  1. ^ As of Unicode version 16.0

Emoji

The Basic Latin block contains twelve emoji: U+0023, U+002A and U+0030–U+0039.[5][6]

The block has 22 standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the following eleven base characters: U+0023 and U+0030–U+0039.[7][8]

All of these base characters default to a text presentation.

Emoji variation sequences
U+ 0023 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039
base codepoint # 0 1 2 3 4 5 6 7 8 9
base+VS15 (text) #︎ 0︎ 1︎ 2︎ 3︎ 4︎ 5︎ 6︎ 7︎ 8︎ 9︎
base+VS16 (emoji) #️ 0️ 1️ 2️ 3️ 4️ 5️ 6️ 7️ 8️ 9️

See also

References

  1. ^ "Unicode character database". The Unicode Standard. Retrieved 22 March 2013.
  2. ^ a b The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
  3. ^ Sorting it all Out : When is a backslash not a backslash?
  4. ^ a b c d e f g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
  5. ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2015-11-12.
  6. ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2015-11-11.
  7. ^ "Unicode Character Database: Standardized Variants". The Unicode Consortium.
  8. ^ "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium.