Unicode character property: Difference between revisions
→Age: desciption + refs |
|||
Line 51: | Line 51: | ||
(decompositions, decomposition type, canonical combining class, composition exclusions, and so On) |
(decompositions, decomposition type, canonical combining class, composition exclusions, and so On) |
||
===Age=== |
===Age=== |
||
"Age" is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.0, 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1 and 5.2.<ref>[http://www.unicode.org/versions/components-pre4.html Pre version 4]</ref><ref>[http://www.unicode.org/versions/enumeratedversions.html Versions 4.0 and later]</ref> |
|||
(version of the standard in which the code point was first designated) |
|||
===Boundaries=== |
===Boundaries=== |
||
(grapheme cluster, word, line, and sentence) |
(grapheme cluster, word, line, and sentence) |
Revision as of 20:06, 10 June 2010
Unicode assigns character properties to each codepoint (unicode-number like U+A234).[1] These properties can be used to handle characters in processes, like in line-breaking or script direction right-to-left. Slightly inconsequently, some character properties are also defined for codepoints that have no character assigned, and codepoints that are defined "not a character", etc.
Properties have levels of forcefulness: normative, informative, contributory, or provisional. Technically a property may be assigned by specifying a range of codepoints.
Character property
Name
Most Unicode characters are assigned a unique Name (na).[1] The name, in English, is composed of A-Z capitals, 0-9 digits, - (hyphen-minus) and <space>. Some sequences are excluded: beginning space, hyphen; ending space, hyphen; repeated spaces, hyphens; space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are ten of thousands, are named in the pattern "<CJK UNIFIED IDEOGRAPH-hhhh>", like for U+4E00: "CJK UNIFIED IDEOGRAPH-4E00". Formatting characters are named too: "NO-BREAK SPACE" (U+00A0).
Starting from Unicode version 2.0, the published name for a code point will never change. In the event of a misspelling in a publication, a correct name will later be assigned to the code point as an Character Name Alias (na1). Within the whole range of names, an alias is unique too.
Apart from these normative names, informal names can be assigned. These are usually other commonly used names for a character, used for illustration, but these are not informal names are not guaranteed to be unique.
The next code points are do not have a Name (na=""): Controls (General Category: Cc), Private use(Cp), Surrogate (Cs), Non-characters (Co) and Reserved (Co). They may be referenced, informally, by a generic or specific meta-name, called Code Point Labels: <control>, <control-0088>, <reserved>, <noncharacter-hhhh>, <private-hhhh>, <surrogate>.
- Pre version 2.0
In version 2.0 of Unicode, many names were changed. From then on the rule "a name will never change" came into effect, including the strict (normative) use of alias names. Disused version 1.0-names were moved to the property Alias, to provide some backward compatibility.
General Category
Each codepoint is assigned a value for General Category. This is one of the character properties that are also defined unassigned codepoints, and codepoints that are defined "not a character".
General Category (Unicode Character Property)[a] | |||||
---|---|---|---|---|---|
Value | Category Major, minor | Basic type[b] | Character assigned[b] | Count[c] (as of 15.1) |
Remarks |
L, Letter; LC, Cased Letter (Lu, Ll, and Lt only)[d] | |||||
Lu | Letter, uppercase | Graphic | Character | 1,831 | |
Ll | Letter, lowercase | Graphic | Character | 2,233 | |
Lt | Letter, titlecase | Graphic | Character | 31 | Ligatures or digraphs containing an uppercase followed by a lowercase part (e.g., Dž, Lj, Nj, and Dz) |
Lm | Letter, modifier | Graphic | Character | 397 | A modifier letter |
Lo | Letter, other | Graphic | Character | 132,234 | An ideograph or a letter in a unicase alphabet |
M, Mark | |||||
Mn | Mark, nonspacing | Graphic | Character | 1,985 | |
Mc | Mark, spacing combining | Graphic | Character | 452 | |
Me | Mark, enclosing | Graphic | Character | 13 | |
N, Number | |||||
Nd | Number, decimal digit | Graphic | Character | 680 | All these, and only these, have Numeric Type = De[e] |
Nl | Number, letter | Graphic | Character | 236 | Numerals composed of letters or letterlike symbols (e.g., Roman numerals) |
No | Number, other | Graphic | Character | 915 | E.g., vulgar fractions, superscript and subscript digits |
P, Punctuation | |||||
Pc | Punctuation, connector | Graphic | Character | 10 | Includes spacing underscore characters such as "_", and other spacing tie characters. Unlike other punctuation characters, these may be classified as "word" characters by regular expression libraries.[f] |
Pd | Punctuation, dash | Graphic | Character | 26 | Includes several hyphen characters |
Ps | Punctuation, open | Graphic | Character | 79 | Opening bracket characters |
Pe | Punctuation, close | Graphic | Character | 77 | Closing bracket characters |
Pi | Punctuation, initial quote | Graphic | Character | 12 | Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage |
Pf | Punctuation, final quote | Graphic | Character | 10 | Closing quotation mark. May behave like Ps or Pe depending on usage |
Po | Punctuation, other | Graphic | Character | 628 | |
S, Symbol | |||||
Sm | Symbol, math | Graphic | Character | 948 | Mathematical symbols (e.g., +, −, =, ×, ÷, √, ∊, ≠). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation". |
Sc | Symbol, currency | Graphic | Character | 63 | Currency symbols |
Sk | Symbol, modifier | Graphic | Character | 125 | |
So | Symbol, other | Graphic | Character | 6,639 | |
Z, Separator | |||||
Zs | Separator, space | Graphic | Character | 17 | Includes the space, but not TAB, CR, or LF, which are Cc |
Zl | Separator, line | Format | Character | 1 | Only U+2028 LINE SEPARATOR (LSEP) |
Zp | Separator, paragraph | Format | Character | 1 | Only U+2029 PARAGRAPH SEPARATOR (PSEP) |
C, Other | |||||
Cc | Other, control | Control | Character | 65 (will never change)[e] | No name,[g] <control> |
Cf | Other, format | Format | Character | 170 | Includes the soft hyphen, joining control characters (ZWNJ and ZWJ), control characters to support bidirectional text, and language tag characters |
Cs | Other, surrogate | Surrogate | Not (only used in UTF-16) | 2,048 (will never change)[e] | No name,[g] <surrogate> |
Co | Other, private use | Private-use | Character (but no interpretation specified) | 137,468 total (will never change)[e] (6,400 in BMP, 131,068 in Planes 15–16) | No name,[g] <private-use> |
Cn | Other, not assigned | Noncharacter | Not | 66 (will not change unless the range of Unicode code points is expanded)[e] | No name,[g] <noncharacter> |
Reserved | Not | 824,652 | No name,[g] <reserved> | ||
|
Other important general characteristics
(whitespace, dash, ideographic, alphabetic, noncharacter, deprecated, and so on)
Display-related properties
(bidirectional class, shaping, mirroring, width, and so on)
Casing
The Case value is Normative in Unicode. It pertains to those scripts with uppercase (aka capital, majuscule) and the lowercase (aka small, minuscule) letter. Case-difference occurs in the scripts Latin, Greek, Coptic, Cyrillic, Glagolitic, Armenian, Deseret, and archaic Georgian.
(upper, lower, title, folding—both simple and full)
Numeric values and types
Characters are classified with a Numeric type.[1] Numeric are all characters such as fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits. All these have a numeric value that can be decimal, including zero and negatives, but also a vulgar fraction. If there is not such a value, as with most of the scripts, the numeric type is "None".
The numeric characters are separated in three groups: Decimal (De), Decimal ideographic (Di) and Numeric (Nu, i.e. all other). "Decimal" if the character is a straight decimal digit. Here are excluded fractions, encircled numbers, superscripts etc., which end up with the type "Numeric". The intended effect is that a even more simple parser can use these decimal numeric values, without being distracted by say a numeric superscript or a fraction. Some 41 CJK Ideographs that represent a number, including those used for accounting, are typed "Decimal, ideographic".
On the other hand, characters that could have a numeric value as a second meaning are still marked Numeric type "None", and have no numeric value (""). E.g. Latin letters can be used in paragraph numbering like (II.A.1.b), but the letters "I", "A" and "b" are not numeric (type "None") and have no numeric value.
[a][b] (Unicode character property) | Numeric Type||||
---|---|---|---|---|
Numeric type | Code | Has numeric value | Example | Remarks |
Not numeric | <none> |
No |
|
Numeric Value="NaN" |
Decimal | De |
Yes |
|
Straight digit (decimal-radix). Corresponds both ways with General Category=Nd[a] |
Digit | Di |
Yes |
|
Decimal, but in typographic context |
Numeric | Nu |
Yes |
|
Numeric value, but not decimal-radix |
a. ^ "Section 4.6: Numeric Value" (PDF). The Unicode Standard. Unicode Consortium. September 2022. | ||||
b. ^ "Unicode 15.1 Derived Numeric Types". Unicode Character Database. Unicode Consortium. 2023-01-05. |
Block
A block is a continuous range of code points, marked by its first and last code point. It may contain non-assigned code points. Each assigned character has a single "block" value from a the list of 197 names. Unassigned code points that have no block name yet, have the default value "No_block". Although this is the value of the property "block", it is not an existing block name.
Script
Each assigned character has a single value for its "Script" (sc) property, signifing to which script it belongs. The value is a four-letter code in the range Aaaa-Zzzz, according to ISO 15924, which is mapped to a writing system. The special code Zyyy for "Common" allows a single value for a character that is used in multiple scripts. Unicode uses the private available code Qaai for "Inherited" script. The code Zzzz "Unknown" is used for all characters that do not belong to a script (i.e. de default value), such as symbols and formatting characters. Overall, a single script can be scattered over multiple blocks.
ISO 15924 | Script in Unicode[e] | |||||||
---|---|---|---|---|---|---|---|---|
Code | ISO number | ISO formal name | Directionality | Unicode Alias[f] | Version | Characters | Notes | Description |
Adlm | 166 | Adlam | right-to-left script ![]() |
Adlam | 9.0 | 88 | Ch 19.9 | |
Afak | 439 | Afaka | varies | [i] | — Not in Unicode, proposal is explored||||
Aghb | 239 | Caucasian Albanian | left-to-right ![]() |
Caucasian Albanian | 7.0 | 53 | Ancient/historic | Ch 8.11 |
Ahom | 338 | Ahom, Tai Ahom | left-to-right ![]() |
Ahom | 8.0 | 65 | Ancient/historic | Ch 15.16 |
Arab | 160 | Arabic | right-to-left script ![]() |
Arabic | 1.0 | 1,368 | Ch 9.2 | |
Aran | 161 | Arabic (Nastaliq variant) | mixed | § Arab) | — Typographic variant of Arabic (see||||
Armi | 124 | Imperial Aramaic | right-to-left script ![]() |
Imperial Aramaic | 5.2 | 31 | Ancient/historic | Ch 10.4 |
Armn | 230 | Armenian | left-to-right ![]() |
Armenian | 1.0 | 96 | Ch 7.6 | |
Avst | 134 | Avestan | right-to-left script ![]() |
Avestan | 5.2 | 61 | Ancient/historic | Ch 10.7 |
Bali | 360 | Balinese | left-to-right ![]() |
Balinese | 5.0 | 124 | Ch 17.3 | |
Bamu | 435 | Bamum | left-to-right ![]() |
Bamum | 5.2 | 657 | Ch 19.6 | |
Bass | 259 | Bassa Vah | left-to-right ![]() |
Bassa Vah | 7.0 | 36 | Ancient/historic | Ch 19.7 |
Batk | 365 | Batak | left-to-right ![]() |
Batak | 6.0 | 56 | Ch 17.6 | |
Beng | 325 | Bengali (Bangla) | left-to-right ![]() |
Bengali | 1.0 | 96 | Ch 12.2 | |
Bhks | 334 | Bhaiksuki | left-to-right ![]() |
Bhaiksuki | 9.0 | 97 | Ancient/historic | Ch 14.3 |
Blis | 550 | Blissymbols | varies | [i] | — Not in Unicode, proposal is explored||||
Bopo | 285 | Bopomofo | left-to-right, right-to-left script ![]() |
Bopomofo | 1.0 | 77 | Ch 18.3 | |
Brah | 300 | Brahmi | left-to-right ![]() |
Brahmi | 6.0 | 115 | Ancient/historic | Ch 14.1 |
Brai | 570 | Braille | left-to-right ![]() |
Braille | 3.0 | 256 | Ch 21.1 | |
Bugi | 367 | Buginese | left-to-right ![]() |
Buginese | 4.1 | 30 | Ch 17.2 | |
Buhd | 372 | Buhid | left-to-right ![]() |
Buhid | 3.2 | 20 | Ch 17.1 | |
Cakm | 349 | Chakma | left-to-right ![]() |
Chakma | 6.1 | 71 | Ch 13.11 | |
Cans | 440 | Unified Canadian Aboriginal Syllabics | left-to-right ![]() |
Canadian Aboriginal | 3.0 | 726 | Ch 20.2 | |
Cari | 201 | Carian | left-to-right, right-to-left script ![]() |
Carian | 5.1 | 49 | Ancient/historic | Ch 8.5 |
Cham | 358 | Cham | left-to-right ![]() |
Cham | 5.1 | 83 | Ch 16.10 | |
Cher | 445 | Cherokee | left-to-right ![]() |
Cherokee | 3.0 | 172 | Ch 20.1 | |
Chis | 298 | Chisoi | left-to-right | [ii] | — Not in Unicode, proposal is mature||||
Chrs | 109 | Chorasmian | right-to-left script, top-to-bottom ![]() |
Chorasmian | 13.0 | 28 | Ancient/historic | Ch 10.8 |
Cirt | 291 | Cirth | varies | — Not in Unicode | ||||
Copt | 204 | Coptic | left-to-right ![]() |
Coptic | 1.0 | 137 | Ancient/historic, disunified from Greek in 4.1 | Ch 7.3 |
Cpmn | 402 | Cypro-Minoan | left-to-right | Cypro Minoan | 14.0 | 99 | Ancient/historic | Ch 8.4 |
Cprt | 403 | Cypriot syllabary | right-to-left script ![]() |
Cypriot | 4.0 | 55 | Ancient/historic | Ch 8.3 |
Cyrl | 220 | Cyrillic | left-to-right ![]() |
Cyrillic | 1.0 | 506 | Includes typographic variant Old Church Slavonic (see § Cyrs) | Ch 7.4 |
Cyrs | 221 | Cyrillic (Old Church Slavonic variant) | varies | § Cyrl); Ancient/historic | — Typographic variant of Cyrillic (see||||
Deva | 315 | Devanagari (Nagari) | left-to-right ![]() |
Devanagari | 1.0 | 164 | Ch 12.1 | |
Diak | 342 | Dives Akuru | left-to-right ![]() |
Dives Akuru | 13.0 | 72 | Ancient/historic | Ch 15.15 |
Dogr | 328 | Dogra | left-to-right ![]() |
Dogra | 11.0 | 60 | Ancient/historic | Ch 15.18 |
Dsrt | 250 | Deseret (Mormon) | left-to-right ![]() |
Deseret | 3.1 | 80 | Ch 20.4 | |
Dupl | 755 | Duployan shorthand, Duployan stenography | left-to-right ![]() |
Duployan | 7.0 | 143 | Ch 21.6 | |
Egyd | 070 | Egyptian demotic | mixed | — Not in Unicode | ||||
Egyh | 060 | Egyptian hieratic | mixed | — Not in Unicode | ||||
Egyp | 050 | Egyptian hieroglyphs | right-to-left script, left-to-right ![]() |
Egyptian Hieroglyphs | 5.2 | 1,110 | Ancient/historic | Ch 11.4 |
Elba | 226 | Elbasan | left-to-right ![]() |
Elbasan | 7.0 | 40 | Ancient/historic | Ch 8.10 |
Elym | 128 | Elymaic | right-to-left script ![]() |
Elymaic | 12.0 | 23 | Ancient/historic | Ch 10.9 |
Ethi | 430 | Ethiopic (Geʻez) | left-to-right ![]() |
Ethiopic | 3.0 | 523 | Ch 19.1 | |
Gara | 164 | Garay | right-to-left | [iii] | — Not in Unicode, approved for version 16.0||||
Geok | 241 | Khutsuri (Asomtavruli and Nuskhuri) | left-to-right ![]() |
Georgian | Unicode groups Khutsori, Asomtavruli and Nuskhuri into 'Georgian' (see § Geok). Similarly, Mkhedruli and Mtavruli are 'Georgian' (see § Geor) | Ch 7.7 | ||
Geor | 240 | Georgian (Mkhedruli and Mtavruli) | left-to-right ![]() |
Georgian | 1.0 | 173 | In Unicode this also includes Nuskhuri (Geok) | Ch 7.7 |
Glag | 225 | Glagolitic | left-to-right ![]() |
Glagolitic | 4.1 | 134 | Ancient/historic | Ch 7.5 |
Gong | 312 | Gunjala Gondi | left-to-right ![]() |
Gunjala Gondi | 11.0 | 63 | Ch 13.15 | |
Gonm | 313 | Masaram Gondi | left-to-right ![]() |
Masaram Gondi | 10.0 | 75 | Ch 13.14 | |
Goth | 206 | Gothic | left-to-right ![]() |
Gothic | 3.1 | 27 | Ancient/historic | Ch 8.9 |
Gran | 343 | Grantha | left-to-right ![]() |
Grantha | 7.0 | 85 | Ancient/historic | Ch 15.14 |
Grek | 200 | Greek | left-to-right ![]() |
Greek | 1.0 | 518 | Directionality sometimes as boustrophedon | Ch 7.2 |
Gujr | 320 | Gujarati | left-to-right ![]() |
Gujarati | 1.0 | 91 | Ch 12.4 | |
Gukh | 397 | Gurung Khema | left-to-right | [iii] | — Not in Unicode, approved for version 16.0||||
Guru | 310 | Gurmukhi | left-to-right ![]() |
Gurmukhi | 1.0 | 80 | Ch 12.3 | |
Hanb | 503 | Han with Bopomofo (alias for Han + Bopomofo) | mixed | § Hani, § Bopo | — See||||
Hang | 286 | Hangul (Hangŭl, Hangeul) | left-to-right, vertical right-to-left ![]() |
Hangul | 1.0 | 11,739 | Hangul syllables relocated in 2.0 | Ch 18.6 |
Hani | 500 | Han (Hanzi, Kanji, Hanja) | top-to-bottom, columns right-to-left (historically) | Han | 1.0 | 99,030 | Ch 18.1 | |
Hano | 371 | Hanunoo (Hanunóo) | left-to-right, bottom-to-top ![]() |
Hanunoo | 3.2 | 21 | Ch 17.1 | |
Hans | 501 | Han (Simplified variant) | varies | § Hani) | — Subset of Han (Hanzi, Kanji, Hanja) (see||||
Hant | 502 | Han (Traditional variant) | varies | § Hani | — Subset of||||
Hatr | 127 | Hatran | right-to-left script ![]() |
Hatran | 8.0 | 26 | Ancient/historic | Ch 10.12 |
Hebr | 125 | Hebrew | right-to-left script ![]() |
Hebrew | 1.0 | 134 | Ch 9.1 | |
Hira | 410 | Hiragana | vertical right-to-left, left-to-right ![]() |
Hiragana | 1.0 | 381 | Ch 18.4 | |
Hluw | 080 | Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) | left-to-right ![]() |
Anatolian Hieroglyphs | 8.0 | 583 | Ancient/historic | Ch 11.6 |
Hmng | 450 | Pahawh Hmong | left-to-right ![]() |
Pahawh Hmong | 7.0 | 127 | Ch 16.11 | |
Hmnp | 451 | Nyiakeng Puachue Hmong | left-to-right ![]() |
Nyiakeng Puachue Hmong | 12.0 | 71 | Ch 16.12 | |
Hrkt | 412 | Japanese syllabaries (alias for Hiragana + Katakana) | vertical right-to-left, left-to-right ![]() |
Katakana or Hiragana | See § Hira, § Kana | Ch 18.4 | ||
Hung | 176 | Old Hungarian (Hungarian Runic) | right-to-left script ![]() |
Old Hungarian | 8.0 | 108 | Ancient/historic | Ch 8.8 |
Inds | 610 | Indus (Harappan) | mixed | [i] | — Not in Unicode, proposal is explored||||
Ital | 210 | Old Italic (Etruscan, Oscan, etc.) | right-to-left script, left-to-right ![]() |
Old Italic | 3.1 | 39 | Ancient/historic | Ch 8.6 |
Jamo | 284 | Jamo (alias for Jamo subset of Hangul) | varies | § Hang | — Subset of||||
Java | 361 | Javanese | left-to-right ![]() |
Javanese | 5.2 | 90 | Ch 17.4 | |
Jpan | 413 | Japanese (alias for Han + Hiragana + Katakana) | varies | § Hani, § Hira and § Kana | — See||||
Jurc | 510 | Jurchen | left-to-right | — Not in Unicode | ||||
Kali | 357 | Kayah Li | left-to-right ![]() |
Kayah Li | 5.1 | 47 | Ch 16.9 | |
Kana | 411 | Katakana | vertical right-to-left, left-to-right ![]() |
Katakana | 1.0 | 321 | Ch 18.4 | |
Kawi | 368 | Kawi | left-to-right ![]() |
Kawi | 15.0 | 86 | Ancient/historic | Ch 17.9 |
Khar | 305 | Kharoshthi | right-to-left script ![]() |
Kharoshthi | 4.1 | 68 | Ancient/historic | Ch 14.2 |
Khmr | 355 | Khmer | left-to-right ![]() |
Khmer | 3.0 | 146 | Ch 16.4 | |
Khoj | 322 | Khojki | left-to-right ![]() |
Khojki | 7.0 | 65 | Ancient/historic | Ch 15.7 |
Kitl | 505 | Khitan large script | left-to-right | — Not in Unicode | ||||
Kits | 288 | Khitan small script | vertical right-to-left ![]() |
Khitan Small Script | 13.0 | 471 | Ancient/historic | Ch 18.12 |
Knda | 345 | Kannada | left-to-right ![]() |
Kannada | 1.0 | 91 | Ch 12.8 | |
Kore | 287 | Korean (alias for Hangul + Han) | left-to-right | § Hani, § Hang | — See||||
Kpel | 436 | Kpelle | left-to-right | [i] | — Not in Unicode, proposal is explored||||
Krai | 396 | Kirat Rai | left-to-right | [iii] | — Not in Unicode, approved for version 16.0||||
Kthi | 317 | Kaithi | left-to-right ![]() |
Kaithi | 5.2 | 68 | Ancient/historic | Ch 15.2 |
Lana | 351 | Tai Tham (Lanna) | left-to-right ![]() |
Tai Tham | 5.2 | 127 | Ch 16.7 | |
Laoo | 356 | Lao | left-to-right ![]() |
Lao | 1.0 | 83 | Ch 16.2 | |
Latf | 217 | Latin (Fraktur variant) | varies | § Latn) | — Typographic variant of Latin (see||||
Latg | 216 | Latin (Gaelic variant) | left-to-right | § Latn) | — Typographic variant of Latin (see||||
Latn | 215 | Latin | left-to-right ![]() |
Latin | 1.0 | 1,481 | See also: Latin script in Unicode | Ch 7.1 |
Leke | 364 | Leke | left-to-right | — Not in Unicode | ||||
Lepc | 335 | Lepcha (Róng) | left-to-right ![]() |
Lepcha | 5.1 | 74 | Ch 13.12 | |
Limb | 336 | Limbu | left-to-right ![]() |
Limbu | 4.0 | 68 | Ch 13.6 | |
Lina | 400 | Linear A | left-to-right ![]() |
Linear A | 7.0 | 341 | Ancient/historic | Ch 8.1 |
Linb | 401 | Linear B | left-to-right ![]() |
Linear B | 4.0 | 211 | Ancient/historic | Ch 8.2 |
Lisu | 399 | Lisu (Fraser) | left-to-right ![]() |
Lisu | 5.2 | 49 | Ch 18.9 | |
Loma | 437 | Loma | left-to-right | [i] | — Not in Unicode, proposal is explored||||
Lyci | 202 | Lycian | left-to-right ![]() |
Lycian | 5.1 | 29 | Ancient/historic | Ch 8.5 |
Lydi | 116 | Lydian | right-to-left script ![]() |
Lydian | 5.1 | 27 | Ancient/historic | Ch 8.5 |
Mahj | 314 | Mahajani | left-to-right ![]() |
Mahajani | 7.0 | 39 | Ancient/historic | Ch 15.6 |
Maka | 366 | Makasar | left-to-right ![]() |
Makasar | 11.0 | 25 | Ancient/historic | Ch 17.8 |
Mand | 140 | Mandaic, Mandaean | right-to-left script ![]() |
Mandaic | 6.0 | 29 | Ch 9.5 | |
Mani | 139 | Manichaean | right-to-left script ![]() |
Manichaean | 7.0 | 51 | Ancient/historic | Ch 10.5 |
Marc | 332 | Marchen | left-to-right ![]() |
Marchen | 9.0 | 68 | Ancient/historic | Ch 14.5 |
Maya | 090 | Mayan hieroglyphs | mixed | — Not in Unicode | ||||
Medf | 265 | Medefaidrin (Oberi Okaime, Oberi Ɔkaimɛ) | left-to-right ![]() |
Medefaidrin | 11.0 | 91 | Ch 19.10 | |
Mend | 438 | Mende Kikakui | right-to-left script ![]() |
Mende Kikakui | 7.0 | 213 | Ch 19.8 | |
Merc | 101 | Meroitic Cursive | right-to-left script ![]() |
Meroitic Cursive | 6.1 | 90 | Ancient/historic | Ch 11.5 |
Mero | 100 | Meroitic Hieroglyphs | right-to-left script ![]() |
Meroitic Hieroglyphs | 6.1 | 32 | Ancient/historic | Ch 11.5 |
Mlym | 347 | Malayalam | left-to-right ![]() |
Malayalam | 1.0 | 118 | Ch 12.9 | |
Modi | 324 | Modi, Moḍī | left-to-right ![]() |
Modi | 7.0 | 79 | Ancient/historic | Ch 15.12 |
Mong | 145 | Mongolian | vertical left-to-right, left-to-right ![]() |
Mongolian | 3.0 | 168 | Mong includes Clear and Manchu scripts | Ch 13.5 |
Moon | 218 | Moon (Moon code, Moon script, Moon type) | mixed | [i] | — Not in Unicode, proposal is explored||||
Mroo | 264 | Mro, Mru | left-to-right ![]() |
Mro | 7.0 | 43 | Ch 13.8 | |
Mtei | 337 | Meitei Mayek (Meithei, Meetei) | left-to-right ![]() |
Meetei Mayek | 5.2 | 79 | Ch 13.7 | |
Mult | 323 | Multani | left-to-right ![]() |
Multani | 8.0 | 38 | Ancient/historic | Ch 15.10 |
Mymr | 350 | Myanmar (Burmese) | left-to-right ![]() |
Myanmar | 3.0 | 223 | Ch 16.3 | |
Nagm | 295 | Nag Mundari | left-to-right ![]() |
Nag Mundari | 15.0 | 42 | ||
Nand | 311 | Nandinagari | left-to-right ![]() |
Nandinagari | 12.0 | 65 | Ancient/historic | Ch 15.13 |
Narb | 106 | Old North Arabian (Ancient North Arabian) | right-to-left script ![]() |
Old North Arabian | 7.0 | 32 | Ancient/historic | Ch 10.1 |
Nbat | 159 | Nabataean | right-to-left script ![]() |
Nabataean | 7.0 | 40 | Ancient/historic | Ch 10.10 |
Newa | 333 | Newa, Newar, Newari, Nepāla lipi | left-to-right ![]() |
Newa | 9.0 | 97 | Ch 13.3 | |
Nkdb | 085 | Naxi Dongba (na²¹ɕi³³ to³³ba²¹, Nakhi Tomba) | left-to-right | — Not in Unicode | ||||
Nkgb | 420 | Naxi Geba (na²¹ɕi³³ gʌ²¹ba²¹, 'Na-'Khi ²Ggŏ-¹baw, Nakhi Geba) | left-to-right | [i] | — Not in Unicode, proposal is explored||||
Nkoo | 165 | N’Ko | right-to-left script ![]() |
NKo | 5.0 | 62 | Ch 19.4 | |
Nshu | 499 | Nüshu | vertical right-to-left ![]() |
Nushu | 10.0 | 397 | Ch 18.8 | |
Ogam | 212 | Ogham | bottom-to-top, left-to-right ![]() |
Ogham | 3.0 | 29 | Ancient/historic | Ch 8.14 |
Olck | 261 | Ol Chiki (Ol Cemet’, Ol, Santali) | left-to-right ![]() |
Ol Chiki | 5.1 | 48 | Ch 13.10 | |
Onao | 296 | Ol Onal | left-to-right | [iii] | — Not in Unicode, approved for version 16.0||||
Orkh | 175 | Old Turkic, Orkhon Runic | right-to-left script ![]() |
Old Turkic | 5.2 | 73 | Ancient/historic | Ch 14.8 |
Orya | 327 | Oriya (Odia) | left-to-right ![]() |
Oriya | 1.0 | 91 | Ch 12.5 | |
Osge | 219 | Osage | left-to-right ![]() |
Osage | 9.0 | 72 | Ch 20.3 | |
Osma | 260 | Osmanya | left-to-right ![]() |
Osmanya | 4.0 | 40 | Ch 19.2 | |
Ougr | 143 | Old Uyghur | mixed | Old Uyghur | 14.0 | 26 | Ancient/historic | Ch 14.11 |
Palm | 126 | Palmyrene | right-to-left script ![]() |
Palmyrene | 7.0 | 32 | Ancient/historic | Ch 10.11 |
Pauc | 263 | Pau Cin Hau | left-to-right ![]() |
Pau Cin Hau | 7.0 | 57 | Ch 16.13 | |
Pcun | 015 | Proto-Cuneiform | left-to-right | — Not in Unicode | ||||
Pelm | 016 | Proto-Elamite | left-to-right | — Not in Unicode | ||||
Perm | 227 | Old Permic | left-to-right ![]() |
Old Permic | 7.0 | 43 | Ancient/historic | Ch 8.13 |
Phag | 331 | Phags-pa | vertical left-to-right ![]() |
Phags-pa | 5.0 | 56 | Ancient/historic | Ch 14.4 |
Phli | 131 | Inscriptional Pahlavi | right-to-left script ![]() |
Inscriptional Pahlavi | 5.2 | 27 | Ancient/historic | Ch 10.6 |
Phlp | 132 | Psalter Pahlavi | right-to-left script ![]() |
Psalter Pahlavi | 7.0 | 29 | Ancient/historic | Ch 10.6 |
Phlv | 133 | Book Pahlavi | mixed | — Not in Unicode | ||||
Phnx | 115 | Phoenician | right-to-left script ![]() |
Phoenician | 5.0 | 29 | Ancient/historic[g] | Ch 10.3 |
Piqd | 293 | Klingon (KLI pIqaD) | left-to-right ![]() |
[iv][v] | — Rejected for inclusion in Unicode||||
Plrd | 282 | Miao (Pollard) | left-to-right ![]() |
Miao | 6.1 | 149 | Ch 18.10 | |
Prti | 130 | Inscriptional Parthian | right-to-left script ![]() |
Inscriptional Parthian | 5.2 | 30 | Ancient/historic | Ch 10.6 |
Psin | 103 | Proto-Sinaitic | mixed | — Not in Unicode | ||||
Qaaa-Qabx | 900-949 | Reserved for private use (range) | — Not in Unicode | |||||
Ranj | 303 | Ranjana | left-to-right | — Not in Unicode | ||||
Rjng | 363 | Rejang (Redjang, Kaganga) | left-to-right ![]() |
Rejang | 5.1 | 37 | Ch 17.5 | |
Rohg | 167 | Hanifi Rohingya | right-to-left script ![]() |
Hanifi Rohingya | 11.0 | 50 | Ch 16.14 | |
Roro | 620 | Rongorongo | mixed | [i] | — Not in Unicode, proposal is explored||||
Runr | 211 | Runic | left-to-right, boustrophedon ![]() |
Runic | 3.0 | 86 | Ancient/historic | Ch 8.7 |
Samr | 123 | Samaritan | right-to-left script, top-to-bottom ![]() |
Samaritan | 5.2 | 61 | Ch 9.4 | |
Sara | 292 | Sarati | mixed | — Not in Unicode | ||||
Sarb | 105 | Old South Arabian | right-to-left script ![]() |
Old South Arabian | 5.2 | 32 | Ancient/historic | Ch 10.2 |
Saur | 344 | Saurashtra | left-to-right ![]() |
Saurashtra | 5.1 | 82 | Ch 13.13 | |
Sgnw | 095 | SignWriting | vertical left-to-right ![]() |
SignWriting | 8.0 | 672 | Ch 21.7 | |
Shaw | 281 | Shavian (Shaw) | left-to-right ![]() |
Shavian | 4.0 | 48 | Ch 8.15 | |
Shrd | 319 | Sharada, Śāradā | left-to-right ![]() |
Sharada | 6.1 | 96 | Ch 15.3 | |
Shui | 530 | Shuishu | left-to-right | — Not in Unicode | ||||
Sidd | 302 | Siddham, Siddhaṃ, Siddhamātṛkā | left-to-right ![]() |
Siddham | 7.0 | 92 | Ancient/historic | Ch 15.5 |
Sidt | 180 | Sidetic | right-to-left | [ii] | — Not in Unicode, proposal is mature||||
Sind | 318 | Khudawadi, Sindhi | left-to-right ![]() |
Khudawadi | 7.0 | 69 | Ch 15.9 | |
Sinh | 348 | Sinhala | left-to-right ![]() |
Sinhala | 3.0 | 111 | Ch 13.2 | |
Sogd | 141 | Sogdian | horizontal and vertical writing in East Asian scripts, top-to-bottom ![]() |
Sogdian | 11.0 | 42 | Ancient/historic | Ch 14.10 |
Sogo | 142 | Old Sogdian | right-to-left script ![]() |
Old Sogdian | 11.0 | 40 | Ancient/historic | Ch 14.9 |
Sora | 398 | Sora Sompeng | left-to-right ![]() |
Sora Sompeng | 6.1 | 35 | Ch 15.17 | |
Soyo | 329 | Soyombo | left-to-right ![]() |
Soyombo | 10.0 | 83 | Ancient/historic | Ch 14.7 |
Sund | 362 | Sundanese | left-to-right ![]() |
Sundanese | 5.1 | 72 | Ch 17.7 | |
Sunu | 274 | Sunuwar | left-to-right | [iii] | — Not in Unicode, approved for version 16.0||||
Sylo | 316 | Syloti Nagri | left-to-right ![]() |
Syloti Nagri | 4.1 | 45 | Ancient/historic | Ch 15.1 |
Syrc | 135 | Syriac | right-to-left script ![]() |
Syriac | 3.0 | 88 | Includes typographic variants Estrangelo (see § Syre), Western (§ Syrj), and Eastern (§ Syrn) | Ch 9.3 |
Syre | 138 | Syriac (Estrangelo variant) | mixed | § Syrc) | — Typographic variant of Syriac (see||||
Syrj | 137 | Syriac (Western variant) | mixed | § Syrc) | — Typographic variant of Syriac (see||||
Syrn | 136 | Syriac (Eastern variant) | mixed | § Syrc) | — Typographic variant of Syriac (see||||
Tagb | 373 | Tagbanwa | left-to-right ![]() |
Tagbanwa | 3.2 | 18 | Ch 17.1 | |
Takr | 321 | Takri, Ṭākrī, Ṭāṅkrī | left-to-right ![]() |
Takri | 6.1 | 68 | Ch 15.4 | |
Tale | 353 | Tai Le | left-to-right ![]() |
Tai Le | 4.0 | 35 | Ch 16.5 | |
Talu | 354 | New Tai Lue | left-to-right ![]() |
New Tai Lue | 4.1 | 83 | Ch 16.6 | |
Taml | 346 | Tamil | left-to-right ![]() |
Tamil | 1.0 | 123 | Ch 12.6 | |
Tang | 520 | Tangut | vertical right-to-left, left-to-right ![]() |
Tangut | 9.0 | 6,914 | Ancient/historic | Ch 18.11 |
Tavt | 359 | Tai Viet | left-to-right ![]() |
Tai Viet | 5.2 | 72 | Ch 16.8 | |
Tayo | 380 | Tai Yo | top-to-bottom, columns right-to-left | [ii] | — Not in Unicode, proposal is mature||||
Telu | 340 | Telugu | left-to-right ![]() |
Telugu | 1.0 | 100 | Ch 12.7 | |
Teng | 290 | Tengwar | left-to-right | — Not in Unicode | ||||
Tfng | 120 | Tifinagh (Berber) | left-to-right, right-to-left script, top-to-bottom, bottom-to-top ![]() |
Tifinagh | 4.1 | 59 | Ch 19.3 | |
Tglg | 370 | Tagalog (Baybayin, Alibata) | left-to-right ![]() |
Tagalog | 3.2 | 23 | Ch 17.1 | |
Thaa | 170 | Thaana | right-to-left script ![]() |
Thaana | 3.0 | 50 | Ch 13.1 | |
Thai | 352 | Thai | left-to-right ![]() |
Thai | 1.0 | 86 | Ch 16.1 | |
Tibt | 330 | Tibetan | left-to-right ![]() |
Tibetan | 2.0 | 207 | Added in 1.0, removed in 1.1 and reintroduced in 2.0 | Ch 13.4 |
Tirh | 326 | Tirhuta | left-to-right ![]() |
Tirhuta | 7.0 | 82 | Ch 15.11 | |
Tnsa | 275 | Tangsa | left-to-right | Tangsa | 14.0 | 89 | Ch 13.18 | |
Todr | 229 | Todhri | right-to-left | [iii] | — Not in Unicode, approved for version 16.0||||
Tols | 299 | Tolong Siki | left-to-right | [ii] | — Not in Unicode, proposal is mature||||
Toto | 294 | Toto | left-to-right | Toto | 14.0 | 31 | Ch 13.17 | |
Tutg | 341 | Tulu-Tigalari | left-to-right | [iii] | — Not in Unicode, approved for version 16.0||||
Ugar | 040 | Ugaritic | left-to-right ![]() |
Ugaritic | 4.0 | 31 | Ancient/historic | Ch 11.2 |
Vaii | 470 | Vai | left-to-right ![]() |
Vai | 5.1 | 300 | Ch 19.5 | |
Visp | 280 | Visible Speech | left-to-right | — Not in Unicode | ||||
Vith | 228 | Vithkuqi | left-to-right | Vithkuqi | 14.0 | 70 | Ancient/historic | Ch 8.12 |
Wara | 262 | Warang Citi (Varang Kshiti) | left-to-right ![]() |
Warang Citi | 7.0 | 84 | Ch 13.9 | |
Wcho | 283 | Wancho | left-to-right ![]() |
Wancho | 12.0 | 59 | Ch 13.16 | |
Wole | 480 | Woleai | mixed | [i] | — Not in Unicode, proposal is explored||||
Xpeo | 030 | Old Persian | left-to-right ![]() |
Old Persian | 4.1 | 50 | Ancient/historic | Ch 11.3 |
Xsux | 020 | Cuneiform, Sumero-Akkadian | left-to-right ![]() |
Cuneiform | 5.0 | 1,234 | Ancient/historic | Ch 11.1 |
Yezi | 192 | Yezidi | right-to-left script ![]() |
Yezidi | 13.0 | 47 | Ancient/historic | Ch 9.6 |
Yiii | 460 | Yi | left-to-right ![]() |
Yi | 3.0 | 1,220 | Ch 18.7 | |
Zanb | 339 | Zanabazar Square (Zanabazarin Dörböljin Useg, Xewtee Dörböljin Bicig, Horizontal Square Script) | left-to-right ![]() |
Zanabazar Square | 10.0 | 72 | Ancient/historic | Ch 14.6 |
Zinh | 994 | Code for inherited script | Inherited | 657 | ||||
Zmth | 995 | Mathematical notation | — Not a 'script' in Unicode | |||||
Zsym | 996 | Symbols | — Not a 'script' in Unicode | |||||
Zsye | 993 | Symbols (emoji variant) | — Not a 'script' in Unicode | |||||
Zxxx | 997 | Code for unwritten documents | — Not a 'script' in Unicode | |||||
Zyyy | 998 | Code for undetermined script | Common | 8,306 | ||||
Zzzz | 999 | Code for uncoded script | Unknown | 964,234 | In Unicode: All other code points | |||
Notes
| ||||||||
References
|
Normalization properties
(decompositions, decomposition type, canonical combining class, composition exclusions, and so On)
Age
"Age" is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.0, 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1 and 5.2.[2][3]
Boundaries
(grapheme cluster, word, line, and sentence)