Basic Latin (Unicode block)
C0 Controls and Basic Latin | |
---|---|
Range | U+0000..U+007F (128 code points) |
Plane | BMP |
Scripts | Latin (52 char.) Common (76 char.) |
Major alphabets | English French Spanish German Vietnamese |
Symbol sets | Arabic numerals Punctuation |
Assigned | 128 code points 33 Control or Format |
Unused | 0 reserved code points |
Source standards | ISO/IEC 8859, ISO 646 |
Unicode version history | |
1.0.0 (1991) | 128 (+128) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1][2] |
The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding.
The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[2]
Table of characters
Code | Result | Description | Acronym |
---|---|---|---|
C0 controls | |||
U+0000 | Null character | NUL | |
U+0001 | Start of Heading | SOH | |
U+0002 | Start of Text | STX | |
U+0003 | End-of-text character | ETX | |
U+0004 | End-of-transmission character | EOT | |
U+0005 | Enquiry character | ENQ | |
U+0006 | Acknowledge character | ACK | |
U+0007 | Bell character | BEL | |
U+0008 | Backspace | BS | |
U+0009 | Horizontal tab | HT | |
U+000A | Line feed | LF | |
U+000B | Vertical tab | VT | |
U+000C | Form feed | FF | |
U+000D | Carriage return | CR | |
U+000E | Shift Out | SO | |
U+000F | Shift In | SI | |
U+0010 | Data Link Escape | DLE | |
U+0011 | Device Control 1 | DC1 | |
U+0012 | Device Control 2 | DC2 | |
U+0013 | Device Control 3 | DC3 | |
U+0014 | Device Control 4 | DC4 | |
U+0015 | Negative-acknowledge character | NAK | |
U+0016 | Synchronous Idle | SYN | |
U+0017 | End of Transmission Block | ETB | |
U+0018 | Cancel character | CAN | |
U+0019 | End of Medium | EM | |
U+001A | Substitute character | SUB | |
U+001B | Escape character | ESC | |
U+001C | File Separator | FS | |
U+001D | Group Separator | GS | |
U+001E | Record Separator | RS | |
U+001F | Unit Separator | US | |
ASCII punctuation and symbols | |||
U+0020 | Space | SP | |
U+0021 | ! | Exclamation mark | |
U+0022 | " | Quotation mark | |
U+0023 | # | Number sign | |
U+0024 | $ | Dollar sign | |
U+0025 | % | Percent sign | |
U+0026 | & | Ampersand | |
U+0027 | ' | Apostrophe | |
U+0028 | ( | Left parenthesis | |
U+0029 | ) | Right parenthesis | |
U+002A | * | Asterisk | |
U+002B | + | Plus sign | |
U+002C | , | Comma | |
U+002D | - | Hyphen-minus | |
U+002E | . | Full stop | |
U+002F | / | Slash | |
ASCII digits | |||
U+0030 | 0 | Digit Zero | |
U+0031 | 1 | Digit One | |
U+0032 | 2 | Digit Two | |
U+0033 | 3 | Digit Three | |
U+0034 | 4 | Digit Four | |
U+0035 | 5 | Digit Five | |
U+0036 | 6 | Digit Six | |
U+0037 | 7 | Digit Seven | |
U+0038 | 8 | Digit Eight | |
U+0039 | 9 | Digit Nine | |
ASCII punctuation and symbols | |||
U+003A | : | Colon | |
U+003B | ; | Semicolon | |
U+003C | < | Less-than sign | |
U+003D | = | Equal sign | |
U+003E | > | Greater-than sign | |
U+003F | ? | Question mark | |
U+0040 | @ | At sign | |
Uppercase Latin alphabet | |||
U+0041 | A | Latin Capital letter A | |
U+0042 | B | Latin Capital letter B | |
U+0043 | C | Latin Capital letter C | |
U+0044 | D | Latin Capital letter D | |
U+0045 | E | Latin Capital letter E | |
U+0046 | F | Latin Capital letter F | |
U+0047 | G | Latin Capital letter G | |
U+0048 | H | Latin Capital letter H | |
U+0049 | I | Latin Capital letter I | |
U+004A | J | Latin Capital letter J | |
U+004B | K | Latin Capital letter K | |
U+004C | L | Latin Capital letter L | |
U+004D | M | Latin Capital letter M | |
U+004E | N | Latin Capital letter N | |
U+004F | O | Latin Capital letter O | |
U+0050 | P | Latin Capital letter P | |
U+0051 | Q | Latin Capital letter Q | |
U+0052 | R | Latin Capital letter R | |
U+0053 | S | Latin Capital letter S | |
U+0054 | T | Latin Capital letter T | |
U+0055 | U | Latin Capital letter U | |
U+0056 | V | Latin Capital letter V | |
U+0057 | W | Latin Capital letter W | |
U+0058 | X | Latin Capital letter X | |
U+0059 | Y | Latin Capital letter Y | |
U+005A | Z | Latin Capital letter Z | |
ASCII punctuation and symbols | |||
U+005B | [ | Left Square Bracket | |
U+005C | \ | Backslash [A] | |
U+005D | ] | Right Square Bracket | |
U+005E | ^ | Circumflex accent | |
U+005F | _ | Low line | |
U+0060 | ` | Grave accent | |
Lowercase Latin alphabet | |||
U+0061 | a | Latin Small Letter A | |
U+0062 | b | Latin Small Letter B | |
U+0063 | c | Latin Small Letter C | |
U+0064 | d | Latin Small Letter D | |
U+0065 | e | Latin Small Letter E | |
U+0066 | f | Latin Small Letter F | |
U+0067 | g | Latin Small Letter G | |
U+0068 | h | Latin Small Letter H | |
U+0069 | i | Latin Small Letter I | |
U+006A | j | Latin Small Letter J | |
U+006B | k | Latin Small Letter K | |
U+006C | l | Latin Small Letter L | |
U+006D | m | Latin Small Letter M | |
U+006E | n | Latin Small Letter N | |
U+006F | o | Latin Small Letter O | |
U+0070 | p | Latin Small Letter P | |
U+0071 | q | Latin Small Letter Q | |
U+0072 | r | Latin Small Letter R | |
U+0073 | s | Latin Small Letter S | |
U+0074 | t | Latin Small Letter T | |
U+0075 | u | Latin Small Letter U | |
U+0076 | v | Latin Small Letter V | |
U+0077 | w | Latin Small Letter W | |
U+0078 | x | Latin Small Letter X | |
U+0079 | y | Latin Small Letter Y | |
U+007A | z | Latin Small Letter Z | |
ASCII punctuation and symbols | |||
U+007B | { | Left Curly Bracket | |
U+007C | | | Vertical bar | |
U+007D | } | Right Curly Bracket | |
U+007E | ~ | Tilde | |
Control character | |||
U+007F | Delete | DEL |
- A The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[3]
Subheadings
The C0 Controls and Basic Latin block contains six subheadings.[4]
C0 controls
The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[4]
ASCII punctuation and symbols
This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[4]
ASCII digits
The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[4]
Uppercase Latin alphabet
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[4]
Lowercase Latin alphabet
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[4]
Control character
The Control Character subheading contains the "Delete" character.[4]
Compact table
C0 Controls and Basic Latin[a] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+000x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
U+001x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
U+002x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
U+003x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
U+004x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
U+005x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
U+006x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
U+007x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
|
Emoji
The Basic Latin block contains twelve emoji: U+0023, U+002A and U+0030–U+0039.[5][6]
The block has 22 standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the following eleven base characters: U+0023 and U+0030–U+0039.[7][8]
All of these base characters default to a text presentation.
U+ | 0023 | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
base codepoint | # | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
base+VS15 (text) | #︎ | 0︎ | 1︎ | 2︎ | 3︎ | 4︎ | 5︎ | 6︎ | 7︎ | 8︎ | 9︎ |
base+VS16 (emoji) | #️ | 0️ | 1️ | 2️ | 3️ | 4️ | 5️ | 6️ | 7️ | 8️ | 9️ |
See also
References
- ^ "Unicode character database". The Unicode Standard. Retrieved 22 March 2013.
- ^ a b The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
- ^ Sorting it all Out : When is a backslash not a backslash?
- ^ a b c d e f g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
- ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2015-11-12.
- ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2015-11-11.
- ^ "Unicode Character Database: Standardized Variants". The Unicode Consortium.
- ^ "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium.