Jump to content

Unicode character property: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Age: desciption + refs
Line 51: Line 51:
(decompositions, decomposition type, canonical combining class, composition exclusions, and so On)
(decompositions, decomposition type, canonical combining class, composition exclusions, and so On)
===Age===
===Age===
"Age" is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.0, 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1 and 5.2.<ref>[http://www.unicode.org/versions/components-pre4.html Pre version 4]</ref><ref>[http://www.unicode.org/versions/enumeratedversions.html Versions 4.0 and later]</ref>
(version of the standard in which the code point was first designated)

===Boundaries===
===Boundaries===
(grapheme cluster, word, line, and sentence)
(grapheme cluster, word, line, and sentence)

Revision as of 20:06, 10 June 2010

Unicode assigns character properties to each codepoint (unicode-number like U+A234).[1] These properties can be used to handle characters in processes, like in line-breaking or script direction right-to-left. Slightly inconsequently, some character properties are also defined for codepoints that have no character assigned, and codepoints that are defined "not a character", etc.

Properties have levels of forcefulness: normative, informative, contributory, or provisional. Technically a property may be assigned by specifying a range of codepoints.

Character property

Name

Most Unicode characters are assigned a unique Name (na).[1] The name, in English, is composed of A-Z capitals, 0-9 digits, - (hyphen-minus) and <space>. Some sequences are excluded: beginning space, hyphen; ending space, hyphen; repeated spaces, hyphens; space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are ten of thousands, are named in the pattern "<CJK UNIFIED IDEOGRAPH-hhhh>", like for U+4E00: "CJK UNIFIED IDEOGRAPH-4E00". Formatting characters are named too: "NO-BREAK SPACE" (U+00A0).

Starting from Unicode version 2.0, the published name for a code point will never change. In the event of a misspelling in a publication, a correct name will later be assigned to the code point as an Character Name Alias (na1). Within the whole range of names, an alias is unique too.

Apart from these normative names, informal names can be assigned. These are usually other commonly used names for a character, used for illustration, but these are not informal names are not guaranteed to be unique.

The next code points are do not have a Name (na=""): Controls (General Category: Cc), Private use(Cp), Surrogate (Cs), Non-characters (Co) and Reserved (Co). They may be referenced, informally, by a generic or specific meta-name, called Code Point Labels: <control>, <control-0088>, <reserved>, <noncharacter-hhhh>, <private-hhhh>, <surrogate>.

Pre version 2.0

In version 2.0 of Unicode, many names were changed. From then on the rule "a name will never change" came into effect, including the strict (normative) use of alias names. Disused version 1.0-names were moved to the property Alias, to provide some backward compatibility.

General Category

Each codepoint is assigned a value for General Category. This is one of the character properties that are also defined unassigned codepoints, and codepoints that are defined "not a character".

General Category (Unicode Character Property)[a]
Value Category Major, minor Basic type[b] Character assigned[b] Count[c]
(as of 15.1)
Remarks
 
L, Letter; LC, Cased Letter (Lu, Ll, and Lt only)[d]
Lu Letter, uppercase Graphic Character 1,831
Ll Letter, lowercase Graphic Character 2,233
Lt Letter, titlecase Graphic Character 31 Ligatures or digraphs containing an uppercase followed by a lowercase part (e.g., Dž, Lj, Nj, and Dz)
Lm Letter, modifier Graphic Character 397 A modifier letter
Lo Letter, other Graphic Character 132,234 An ideograph or a letter in a unicase alphabet
M, Mark
Mn Mark, nonspacing Graphic Character 1,985
Mc Mark, spacing combining Graphic Character 452
Me Mark, enclosing Graphic Character 13
N, Number
Nd Number, decimal digit Graphic Character 680 All these, and only these, have Numeric Type = De[e]
Nl Number, letter Graphic Character 236 Numerals composed of letters or letterlike symbols (e.g., Roman numerals)
No Number, other Graphic Character 915 E.g., vulgar fractions, superscript and subscript digits
P, Punctuation
Pc Punctuation, connector Graphic Character 10 Includes spacing underscore characters such as "_", and other spacing tie characters. Unlike other punctuation characters, these may be classified as "word" characters by regular expression libraries.[f]
Pd Punctuation, dash Graphic Character 26 Includes several hyphen characters
Ps Punctuation, open Graphic Character 79 Opening bracket characters
Pe Punctuation, close Graphic Character 77 Closing bracket characters
Pi Punctuation, initial quote Graphic Character 12 Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
Pf Punctuation, final quote Graphic Character 10 Closing quotation mark. May behave like Ps or Pe depending on usage
Po Punctuation, other Graphic Character 628
S, Symbol
Sm Symbol, math Graphic Character 948 Mathematical symbols (e.g., +, , =, ×, ÷, , , ). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation".
Sc Symbol, currency Graphic Character 63 Currency symbols
Sk Symbol, modifier Graphic Character 125
So Symbol, other Graphic Character 6,639
Z, Separator
Zs Separator, space Graphic Character 17 Includes the space, but not TAB, CR, or LF, which are Cc
Zl Separator, line Format Character 1 Only U+2028 LINE SEPARATOR (LSEP)
Zp Separator, paragraph Format Character 1 Only U+2029 PARAGRAPH SEPARATOR (PSEP)
C, Other
Cc Other, control Control Character 65 (will never change)[e] No name,[g] <control>
Cf Other, format Format Character 170 Includes the soft hyphen, joining control characters (ZWNJ and ZWJ), control characters to support bidirectional text, and language tag characters
Cs Other, surrogate Surrogate Not (only used in UTF-16) 2,048 (will never change)[e] No name,[g] <surrogate>
Co Other, private use Private-use Character (but no interpretation specified) 137,468 total (will never change)[e] (6,400 in BMP, 131,068 in Planes 15–16) No name,[g] <private-use>
Cn Other, not assigned Noncharacter Not 66 (will not change unless the range of Unicode code points is expanded)[e] No name,[g] <noncharacter>
Reserved Not 824,652 No name,[g] <reserved>
  1. ^ "Table 4-4: General Category" (PDF). The Unicode Standard. Unicode Consortium. September 2022.
  2. ^ a b "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. September 2022.
  3. ^ "DerivedGeneralCategory.txt". The Unicode Consortium. 2022-04-26.
  4. ^ "5.7.1 General Category Values". UTR #44: Unicode Character Database. Unicode Consortium. 2020-03-04.
  5. ^ a b c d e Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal).
  6. ^ "Annex C: Compatibility Properties (§ word)". Unicode Regular Expressions. Version 23. Unicode Consortium. 2022-02-08. Unicode Technical Standard #18.
  7. ^ a b c d e "Table 4-9: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. September 2022. A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.

Other important general characteristics

(whitespace, dash, ideographic, alphabetic, noncharacter, deprecated, and so on)

(bidirectional class, shaping, mirroring, width, and so on)

Casing

The Case value is Normative in Unicode. It pertains to those scripts with uppercase (aka capital, majuscule) and the lowercase (aka small, minuscule) letter. Case-difference occurs in the scripts Latin, Greek, Coptic, Cyrillic, Glagolitic, Armenian, Deseret, and archaic Georgian.

(upper, lower, title, folding—both simple and full)

Numeric values and types

Characters are classified with a Numeric type.[1] Numeric are all characters such as fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits. All these have a numeric value that can be decimal, including zero and negatives, but also a vulgar fraction. If there is not such a value, as with most of the scripts, the numeric type is "None".

The numeric characters are separated in three groups: Decimal (De), Decimal ideographic (Di) and Numeric (Nu, i.e. all other). "Decimal" if the character is a straight decimal digit. Here are excluded fractions, encircled numbers, superscripts etc., which end up with the type "Numeric". The intended effect is that a even more simple parser can use these decimal numeric values, without being distracted by say a numeric superscript or a fraction. Some 41 CJK Ideographs that represent a number, including those used for accounting, are typed "Decimal, ideographic".

On the other hand, characters that could have a numeric value as a second meaning are still marked Numeric type "None", and have no numeric value (""). E.g. Latin letters can be used in paragraph numbering like (II.A.1.b), but the letters "I", "A" and "b" are not numeric (type "None") and have no numeric value.

Numeric Type[a][b] (Unicode character property)
Numeric type Code Has numeric value Example Remarks
Not numeric <none> No
  • A
  • X (Latin)
  • !
  • Д
  • μ
Numeric Value="NaN"
Decimal De Yes
  • 0
  • 1
  • 9
  •  (Devanagari 6)
  •  (Kannada 6)
  • 𝟨 (Mathematical, styled sans serif)
Straight digit (decimal-radix). Corresponds both ways with General Category=Nd[a]
Digit Di Yes
  • ¹ (superscript)
  •  (digit with full stop)
Decimal, but in typographic context
Numeric Nu Yes
  • ¾
  •  (Tamil number ten)
  •  (Roman numeral)
  •  (Han number 6)
Numeric value, but not decimal-radix
a. ^ "Section 4.6: Numeric Value" (PDF). The Unicode Standard. Unicode Consortium. September 2022.
b. ^ "Unicode 15.1 Derived Numeric Types". Unicode Character Database. Unicode Consortium. 2023-01-05.

Block

A block is a continuous range of code points, marked by its first and last code point. It may contain non-assigned code points. Each assigned character has a single "block" value from a the list of 197 names. Unassigned code points that have no block name yet, have the default value "No_block". Although this is the value of the property "block", it is not an existing block name.

Script

Each assigned character has a single value for its "Script" (sc) property, signifing to which script it belongs. The value is a four-letter code in the range Aaaa-Zzzz, according to ISO 15924, which is mapped to a writing system. The special code Zyyy for "Common" allows a single value for a character that is used in multiple scripts. Unicode uses the private available code Qaai for "Inherited" script. The code Zzzz "Unknown" is used for all characters that do not belong to a script (i.e. de default value), such as symbols and formatting characters. Overall, a single script can be scattered over multiple blocks.


ISO 15924 Script in Unicode[e]
Code ISO number ISO formal name Directionality Unicode Alias[f] Version Characters Notes Description
Adlm 166 Adlam right-to-left script Edit this on Wikidata Adlam 9.0 88 Ch 19.9
Afak 439 Afaka varies ZZ— Not in Unicode, proposal is explored[i]
Aghb 239 Caucasian Albanian left-to-right Edit this on Wikidata Caucasian Albanian 7.0 53 Ancient/historic Ch 8.11
Ahom 338 Ahom, Tai Ahom left-to-right Edit this on Wikidata Ahom 8.0 65 Ancient/historic Ch 15.16
Arab 160 Arabic right-to-left script Edit this on Wikidata Arabic 1.0 1,368 Ch 9.2
Aran 161 Arabic (Nastaliq variant) mixed ZZ— Typographic variant of Arabic (see § Arab)
Armi 124 Imperial Aramaic right-to-left script Edit this on Wikidata Imperial Aramaic 5.2 31 Ancient/historic Ch 10.4
Armn 230 Armenian left-to-right Edit this on Wikidata Armenian 1.0 96 Ch 7.6
Avst 134 Avestan right-to-left script Edit this on Wikidata Avestan 5.2 61 Ancient/historic Ch 10.7
Bali 360 Balinese left-to-right Edit this on Wikidata Balinese 5.0 124 Ch 17.3
Bamu 435 Bamum left-to-right Edit this on Wikidata Bamum 5.2 657 Ch 19.6
Bass 259 Bassa Vah left-to-right Edit this on Wikidata Bassa Vah 7.0 36 Ancient/historic Ch 19.7
Batk 365 Batak left-to-right Edit this on Wikidata Batak 6.0 56 Ch 17.6
Beng 325 Bengali (Bangla) left-to-right Edit this on Wikidata Bengali 1.0 96 Ch 12.2
Bhks 334 Bhaiksuki left-to-right Edit this on Wikidata Bhaiksuki 9.0 97 Ancient/historic Ch 14.3
Blis 550 Blissymbols varies ZZ— Not in Unicode, proposal is explored[i]
Bopo 285 Bopomofo left-to-right, right-to-left script Edit this on Wikidata Bopomofo 1.0 77 Ch 18.3
Brah 300 Brahmi left-to-right Edit this on Wikidata Brahmi 6.0 115 Ancient/historic Ch 14.1
Brai 570 Braille left-to-right Edit this on Wikidata Braille 3.0 256 Ch 21.1
Bugi 367 Buginese left-to-right Edit this on Wikidata Buginese 4.1 30 Ch 17.2
Buhd 372 Buhid left-to-right Edit this on Wikidata Buhid 3.2 20 Ch 17.1
Cakm 349 Chakma left-to-right Edit this on Wikidata Chakma 6.1 71 Ch 13.11
Cans 440 Unified Canadian Aboriginal Syllabics left-to-right Edit this on Wikidata Canadian Aboriginal 3.0 726 Ch 20.2
Cari 201 Carian left-to-right, right-to-left script Edit this on Wikidata Carian 5.1 49 Ancient/historic Ch 8.5
Cham 358 Cham left-to-right Edit this on Wikidata Cham 5.1 83 Ch 16.10
Cher 445 Cherokee left-to-right Edit this on Wikidata Cherokee 3.0 172 Ch 20.1
Chis 298 Chisoi left-to-right ZZ— Not in Unicode, proposal is mature[ii]
Chrs 109 Chorasmian right-to-left script, top-to-bottom Edit this on Wikidata Chorasmian 13.0 28 Ancient/historic Ch 10.8
Cirt 291 Cirth varies ZZ— Not in Unicode
Copt 204 Coptic left-to-right Edit this on Wikidata Coptic 1.0 137 Ancient/historic, disunified from Greek in 4.1 Ch 7.3
Cpmn 402 Cypro-Minoan left-to-right Cypro Minoan 14.0 99 Ancient/historic Ch 8.4
Cprt 403 Cypriot syllabary right-to-left script Edit this on Wikidata Cypriot 4.0 55 Ancient/historic Ch 8.3
Cyrl 220 Cyrillic left-to-right Edit this on Wikidata Cyrillic 1.0 506 Includes typographic variant Old Church Slavonic (see § Cyrs) Ch 7.4
Cyrs 221 Cyrillic (Old Church Slavonic variant) varies ZZ— Typographic variant of Cyrillic (see § Cyrl); Ancient/historic
Deva 315 Devanagari (Nagari) left-to-right Edit this on Wikidata Devanagari 1.0 164 Ch 12.1
Diak 342 Dives Akuru left-to-right Edit this on Wikidata Dives Akuru 13.0 72 Ancient/historic Ch 15.15
Dogr 328 Dogra left-to-right Edit this on Wikidata Dogra 11.0 60 Ancient/historic Ch 15.18
Dsrt 250 Deseret (Mormon) left-to-right Edit this on Wikidata Deseret 3.1 80 Ch 20.4
Dupl 755 Duployan shorthand, Duployan stenography left-to-right Edit this on Wikidata Duployan 7.0 143 Ch 21.6
Egyd 070 Egyptian demotic mixed ZZ— Not in Unicode
Egyh 060 Egyptian hieratic mixed ZZ— Not in Unicode
Egyp 050 Egyptian hieroglyphs right-to-left script, left-to-right Edit this on Wikidata Egyptian Hieroglyphs 5.2 1,110 Ancient/historic Ch 11.4
Elba 226 Elbasan left-to-right Edit this on Wikidata Elbasan 7.0 40 Ancient/historic Ch 8.10
Elym 128 Elymaic right-to-left script Edit this on Wikidata Elymaic 12.0 23 Ancient/historic Ch 10.9
Ethi 430 Ethiopic (Geʻez) left-to-right Edit this on Wikidata Ethiopic 3.0 523 Ch 19.1
Gara 164 Garay right-to-left ZZ— Not in Unicode, approved for version 16.0[iii]
Geok 241 Khutsuri (Asomtavruli and Nuskhuri) left-to-right Edit this on Wikidata Georgian Unicode groups Khutsori, Asomtavruli and Nuskhuri into 'Georgian' (see § Geok). Similarly, Mkhedruli and Mtavruli are 'Georgian' (see § Geor) Ch 7.7
Geor 240 Georgian (Mkhedruli and Mtavruli) left-to-right Edit this on Wikidata Georgian 1.0 173 In Unicode this also includes Nuskhuri (Geok) Ch 7.7
Glag 225 Glagolitic left-to-right Edit this on Wikidata Glagolitic 4.1 134 Ancient/historic Ch 7.5
Gong 312 Gunjala Gondi left-to-right Edit this on Wikidata Gunjala Gondi 11.0 63 Ch 13.15
Gonm 313 Masaram Gondi left-to-right Edit this on Wikidata Masaram Gondi 10.0 75 Ch 13.14
Goth 206 Gothic left-to-right Edit this on Wikidata Gothic 3.1 27 Ancient/historic Ch 8.9
Gran 343 Grantha left-to-right Edit this on Wikidata Grantha 7.0 85 Ancient/historic Ch 15.14
Grek 200 Greek left-to-right Edit this on Wikidata Greek 1.0 518 Directionality sometimes as boustrophedon Ch 7.2
Gujr 320 Gujarati left-to-right Edit this on Wikidata Gujarati 1.0 91 Ch 12.4
Gukh 397 Gurung Khema left-to-right ZZ— Not in Unicode, approved for version 16.0[iii]
Guru 310 Gurmukhi left-to-right Edit this on Wikidata Gurmukhi 1.0 80 Ch 12.3
Hanb 503 Han with Bopomofo (alias for Han + Bopomofo) mixed ZZ— See § Hani, § Bopo
Hang 286 Hangul (Hangŭl, Hangeul) left-to-right, vertical right-to-left Edit this on Wikidata Hangul 1.0 11,739 Hangul syllables relocated in 2.0 Ch 18.6
Hani 500 Han (Hanzi, Kanji, Hanja) top-to-bottom, columns right-to-left (historically) Han 1.0 99,030 Ch 18.1
Hano 371 Hanunoo (Hanunóo) left-to-right, bottom-to-top Edit this on Wikidata Hanunoo 3.2 21 Ch 17.1
Hans 501 Han (Simplified variant) varies ZZ— Subset of Han (Hanzi, Kanji, Hanja) (see § Hani)
Hant 502 Han (Traditional variant) varies ZZ— Subset of § Hani
Hatr 127 Hatran right-to-left script Edit this on Wikidata Hatran 8.0 26 Ancient/historic Ch 10.12
Hebr 125 Hebrew right-to-left script Edit this on Wikidata Hebrew 1.0 134 Ch 9.1
Hira 410 Hiragana vertical right-to-left, left-to-right Edit this on Wikidata Hiragana 1.0 381 Ch 18.4
Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) left-to-right Edit this on Wikidata Anatolian Hieroglyphs 8.0 583 Ancient/historic Ch 11.6
Hmng 450 Pahawh Hmong left-to-right Edit this on Wikidata Pahawh Hmong 7.0 127 Ch 16.11
Hmnp 451 Nyiakeng Puachue Hmong left-to-right Edit this on Wikidata Nyiakeng Puachue Hmong 12.0 71 Ch 16.12
Hrkt 412 Japanese syllabaries (alias for Hiragana + Katakana) vertical right-to-left, left-to-right Edit this on Wikidata Katakana or Hiragana See § Hira, § Kana Ch 18.4
Hung 176 Old Hungarian (Hungarian Runic) right-to-left script Edit this on Wikidata Old Hungarian 8.0 108 Ancient/historic Ch 8.8
Inds 610 Indus (Harappan) mixed ZZ— Not in Unicode, proposal is explored[i]
Ital 210 Old Italic (Etruscan, Oscan, etc.) right-to-left script, left-to-right Edit this on Wikidata Old Italic 3.1 39 Ancient/historic Ch 8.6
Jamo 284 Jamo (alias for Jamo subset of Hangul) varies ZZ— Subset of § Hang
Java 361 Javanese left-to-right Edit this on Wikidata Javanese 5.2 90 Ch 17.4
Jpan 413 Japanese (alias for Han + Hiragana + Katakana) varies ZZ— See § Hani, § Hira and § Kana
Jurc 510 Jurchen left-to-right ZZ— Not in Unicode
Kali 357 Kayah Li left-to-right Edit this on Wikidata Kayah Li 5.1 47 Ch 16.9
Kana 411 Katakana vertical right-to-left, left-to-right Edit this on Wikidata Katakana 1.0 321 Ch 18.4
Kawi 368 Kawi left-to-right Edit this on Wikidata Kawi 15.0 86 Ancient/historic Ch 17.9
Khar 305 Kharoshthi right-to-left script Edit this on Wikidata Kharoshthi 4.1 68 Ancient/historic Ch 14.2
Khmr 355 Khmer left-to-right Edit this on Wikidata Khmer 3.0 146 Ch 16.4
Khoj 322 Khojki left-to-right Edit this on Wikidata Khojki 7.0 65 Ancient/historic Ch 15.7
Kitl 505 Khitan large script left-to-right ZZ— Not in Unicode
Kits 288 Khitan small script vertical right-to-left Edit this on Wikidata Khitan Small Script 13.0 471 Ancient/historic Ch 18.12
Knda 345 Kannada left-to-right Edit this on Wikidata Kannada 1.0 91 Ch 12.8
Kore 287 Korean (alias for Hangul + Han) left-to-right ZZ— See § Hani, § Hang
Kpel 436 Kpelle left-to-right ZZ— Not in Unicode, proposal is explored[i]
Krai 396 Kirat Rai left-to-right ZZ— Not in Unicode, approved for version 16.0[iii]
Kthi 317 Kaithi left-to-right Edit this on Wikidata Kaithi 5.2 68 Ancient/historic Ch 15.2
Lana 351 Tai Tham (Lanna) left-to-right Edit this on Wikidata Tai Tham 5.2 127 Ch 16.7
Laoo 356 Lao left-to-right Edit this on Wikidata Lao 1.0 83 Ch 16.2
Latf 217 Latin (Fraktur variant) varies ZZ— Typographic variant of Latin (see § Latn)
Latg 216 Latin (Gaelic variant) left-to-right ZZ— Typographic variant of Latin (see § Latn)
Latn 215 Latin left-to-right Edit this on Wikidata Latin 1.0 1,481 See also: Latin script in Unicode Ch 7.1
Leke 364 Leke left-to-right ZZ— Not in Unicode
Lepc 335 Lepcha (Róng) left-to-right Edit this on Wikidata Lepcha 5.1 74 Ch 13.12
Limb 336 Limbu left-to-right Edit this on Wikidata Limbu 4.0 68 Ch 13.6
Lina 400 Linear A left-to-right Edit this on Wikidata Linear A 7.0 341 Ancient/historic Ch 8.1
Linb 401 Linear B left-to-right Edit this on Wikidata Linear B 4.0 211 Ancient/historic Ch 8.2
Lisu 399 Lisu (Fraser) left-to-right Edit this on Wikidata Lisu 5.2 49 Ch 18.9
Loma 437 Loma left-to-right ZZ— Not in Unicode, proposal is explored[i]
Lyci 202 Lycian left-to-right Edit this on Wikidata Lycian 5.1 29 Ancient/historic Ch 8.5
Lydi 116 Lydian right-to-left script Edit this on Wikidata Lydian 5.1 27 Ancient/historic Ch 8.5
Mahj 314 Mahajani left-to-right Edit this on Wikidata Mahajani 7.0 39 Ancient/historic Ch 15.6
Maka 366 Makasar left-to-right Edit this on Wikidata Makasar 11.0 25 Ancient/historic Ch 17.8
Mand 140 Mandaic, Mandaean right-to-left script Edit this on Wikidata Mandaic 6.0 29 Ch 9.5
Mani 139 Manichaean right-to-left script Edit this on Wikidata Manichaean 7.0 51 Ancient/historic Ch 10.5
Marc 332 Marchen left-to-right Edit this on Wikidata Marchen 9.0 68 Ancient/historic Ch 14.5
Maya 090 Mayan hieroglyphs mixed ZZ— Not in Unicode
Medf 265 Medefaidrin (Oberi Okaime, Oberi Ɔkaimɛ) left-to-right Edit this on Wikidata Medefaidrin 11.0 91 Ch 19.10
Mend 438 Mende Kikakui right-to-left script Edit this on Wikidata Mende Kikakui 7.0 213 Ch 19.8
Merc 101 Meroitic Cursive right-to-left script Edit this on Wikidata Meroitic Cursive 6.1 90 Ancient/historic Ch 11.5
Mero 100 Meroitic Hieroglyphs right-to-left script Edit this on Wikidata Meroitic Hieroglyphs 6.1 32 Ancient/historic Ch 11.5
Mlym 347 Malayalam left-to-right Edit this on Wikidata Malayalam 1.0 118 Ch 12.9
Modi 324 Modi, Moḍī left-to-right Edit this on Wikidata Modi 7.0 79 Ancient/historic Ch 15.12
Mong 145 Mongolian vertical left-to-right, left-to-right Edit this on Wikidata Mongolian 3.0 168 Mong includes Clear and Manchu scripts Ch 13.5
Moon 218 Moon (Moon code, Moon script, Moon type) mixed ZZ— Not in Unicode, proposal is explored[i]
Mroo 264 Mro, Mru left-to-right Edit this on Wikidata Mro 7.0 43 Ch 13.8
Mtei 337 Meitei Mayek (Meithei, Meetei) left-to-right Edit this on Wikidata Meetei Mayek 5.2 79 Ch 13.7
Mult 323 Multani left-to-right Edit this on Wikidata Multani 8.0 38 Ancient/historic Ch 15.10
Mymr 350 Myanmar (Burmese) left-to-right Edit this on Wikidata Myanmar 3.0 223 Ch 16.3
Nagm 295 Nag Mundari left-to-right Edit this on Wikidata Nag Mundari 15.0 42
Nand 311 Nandinagari left-to-right Edit this on Wikidata Nandinagari 12.0 65 Ancient/historic Ch 15.13
Narb 106 Old North Arabian (Ancient North Arabian) right-to-left script Edit this on Wikidata Old North Arabian 7.0 32 Ancient/historic Ch 10.1
Nbat 159 Nabataean right-to-left script Edit this on Wikidata Nabataean 7.0 40 Ancient/historic Ch 10.10
Newa 333 Newa, Newar, Newari, Nepāla lipi left-to-right Edit this on Wikidata Newa 9.0 97 Ch 13.3
Nkdb 085 Naxi Dongba (na²¹ɕi³³ to³³ba²¹, Nakhi Tomba) left-to-right ZZ— Not in Unicode
Nkgb 420 Naxi Geba (na²¹ɕi³³ gʌ²¹ba²¹, 'Na-'Khi ²Ggŏ-¹baw, Nakhi Geba) left-to-right ZZ— Not in Unicode, proposal is explored[i]
Nkoo 165 N’Ko right-to-left script Edit this on Wikidata NKo 5.0 62 Ch 19.4
Nshu 499 Nüshu vertical right-to-left Edit this on Wikidata Nushu 10.0 397 Ch 18.8
Ogam 212 Ogham bottom-to-top, left-to-right Edit this on Wikidata Ogham 3.0 29 Ancient/historic Ch 8.14
Olck 261 Ol Chiki (Ol Cemet’, Ol, Santali) left-to-right Edit this on Wikidata Ol Chiki 5.1 48 Ch 13.10
Onao 296 Ol Onal left-to-right ZZ— Not in Unicode, approved for version 16.0[iii]
Orkh 175 Old Turkic, Orkhon Runic right-to-left script Edit this on Wikidata Old Turkic 5.2 73 Ancient/historic Ch 14.8
Orya 327 Oriya (Odia) left-to-right Edit this on Wikidata Oriya 1.0 91 Ch 12.5
Osge 219 Osage left-to-right Edit this on Wikidata Osage 9.0 72 Ch 20.3
Osma 260 Osmanya left-to-right Edit this on Wikidata Osmanya 4.0 40 Ch 19.2
Ougr 143 Old Uyghur mixed Old Uyghur 14.0 26 Ancient/historic Ch 14.11
Palm 126 Palmyrene right-to-left script Edit this on Wikidata Palmyrene 7.0 32 Ancient/historic Ch 10.11
Pauc 263 Pau Cin Hau left-to-right Edit this on Wikidata Pau Cin Hau 7.0 57 Ch 16.13
Pcun 015 Proto-Cuneiform left-to-right ZZ— Not in Unicode
Pelm 016 Proto-Elamite left-to-right ZZ— Not in Unicode
Perm 227 Old Permic left-to-right Edit this on Wikidata Old Permic 7.0 43 Ancient/historic Ch 8.13
Phag 331 Phags-pa vertical left-to-right Edit this on Wikidata Phags-pa 5.0 56 Ancient/historic Ch 14.4
Phli 131 Inscriptional Pahlavi right-to-left script Edit this on Wikidata Inscriptional Pahlavi 5.2 27 Ancient/historic Ch 10.6
Phlp 132 Psalter Pahlavi right-to-left script Edit this on Wikidata Psalter Pahlavi 7.0 29 Ancient/historic Ch 10.6
Phlv 133 Book Pahlavi mixed ZZ— Not in Unicode
Phnx 115 Phoenician right-to-left script Edit this on Wikidata Phoenician 5.0 29 Ancient/historic[g] Ch 10.3
Piqd 293 Klingon (KLI pIqaD) left-to-right Edit this on Wikidata ZZ— Rejected for inclusion in Unicode[iv][v]
Plrd 282 Miao (Pollard) left-to-right Edit this on Wikidata Miao 6.1 149 Ch 18.10
Prti 130 Inscriptional Parthian right-to-left script Edit this on Wikidata Inscriptional Parthian 5.2 30 Ancient/historic Ch 10.6
Psin 103 Proto-Sinaitic mixed ZZ— Not in Unicode
Qaaa-Qabx 900-949 Reserved for private use (range) ZZ— Not in Unicode
Ranj 303 Ranjana left-to-right ZZ— Not in Unicode
Rjng 363 Rejang (Redjang, Kaganga) left-to-right Edit this on Wikidata Rejang 5.1 37 Ch 17.5
Rohg 167 Hanifi Rohingya right-to-left script Edit this on Wikidata Hanifi Rohingya 11.0 50 Ch 16.14
Roro 620 Rongorongo mixed ZZ— Not in Unicode, proposal is explored[i]
Runr 211 Runic left-to-right, boustrophedon Edit this on Wikidata Runic 3.0 86 Ancient/historic Ch 8.7
Samr 123 Samaritan right-to-left script, top-to-bottom Edit this on Wikidata Samaritan 5.2 61 Ch 9.4
Sara 292 Sarati mixed ZZ— Not in Unicode
Sarb 105 Old South Arabian right-to-left script Edit this on Wikidata Old South Arabian 5.2 32 Ancient/historic Ch 10.2
Saur 344 Saurashtra left-to-right Edit this on Wikidata Saurashtra 5.1 82 Ch 13.13
Sgnw 095 SignWriting vertical left-to-right Edit this on Wikidata SignWriting 8.0 672 Ch 21.7
Shaw 281 Shavian (Shaw) left-to-right Edit this on Wikidata Shavian 4.0 48 Ch 8.15
Shrd 319 Sharada, Śāradā left-to-right Edit this on Wikidata Sharada 6.1 96 Ch 15.3
Shui 530 Shuishu left-to-right ZZ— Not in Unicode
Sidd 302 Siddham, Siddhaṃ, Siddhamātṛkā left-to-right Edit this on Wikidata Siddham 7.0 92 Ancient/historic Ch 15.5
Sidt 180 Sidetic right-to-left ZZ— Not in Unicode, proposal is mature[ii]
Sind 318 Khudawadi, Sindhi left-to-right Edit this on Wikidata Khudawadi 7.0 69 Ch 15.9
Sinh 348 Sinhala left-to-right Edit this on Wikidata Sinhala 3.0 111 Ch 13.2
Sogd 141 Sogdian horizontal and vertical writing in East Asian scripts, top-to-bottom Edit this on Wikidata Sogdian 11.0 42 Ancient/historic Ch 14.10
Sogo 142 Old Sogdian right-to-left script Edit this on Wikidata Old Sogdian 11.0 40 Ancient/historic Ch 14.9
Sora 398 Sora Sompeng left-to-right Edit this on Wikidata Sora Sompeng 6.1 35 Ch 15.17
Soyo 329 Soyombo left-to-right Edit this on Wikidata Soyombo 10.0 83 Ancient/historic Ch 14.7
Sund 362 Sundanese left-to-right Edit this on Wikidata Sundanese 5.1 72 Ch 17.7
Sunu 274 Sunuwar left-to-right ZZ— Not in Unicode, approved for version 16.0[iii]
Sylo 316 Syloti Nagri left-to-right Edit this on Wikidata Syloti Nagri 4.1 45 Ancient/historic Ch 15.1
Syrc 135 Syriac right-to-left script Edit this on Wikidata Syriac 3.0 88 Includes typographic variants Estrangelo (see § Syre), Western (§ Syrj), and Eastern (§ Syrn) Ch 9.3
Syre 138 Syriac (Estrangelo variant) mixed ZZ— Typographic variant of Syriac (see § Syrc)
Syrj 137 Syriac (Western variant) mixed ZZ— Typographic variant of Syriac (see § Syrc)
Syrn 136 Syriac (Eastern variant) mixed ZZ— Typographic variant of Syriac (see § Syrc)
Tagb 373 Tagbanwa left-to-right Edit this on Wikidata Tagbanwa 3.2 18 Ch 17.1
Takr 321 Takri, Ṭākrī, Ṭāṅkrī left-to-right Edit this on Wikidata Takri 6.1 68 Ch 15.4
Tale 353 Tai Le left-to-right Edit this on Wikidata Tai Le 4.0 35 Ch 16.5
Talu 354 New Tai Lue left-to-right Edit this on Wikidata New Tai Lue 4.1 83 Ch 16.6
Taml 346 Tamil left-to-right Edit this on Wikidata Tamil 1.0 123 Ch 12.6
Tang 520 Tangut vertical right-to-left, left-to-right Edit this on Wikidata Tangut 9.0 6,914 Ancient/historic Ch 18.11
Tavt 359 Tai Viet left-to-right Edit this on Wikidata Tai Viet 5.2 72 Ch 16.8
Tayo 380 Tai Yo top-to-bottom, columns right-to-left ZZ— Not in Unicode, proposal is mature[ii]
Telu 340 Telugu left-to-right Edit this on Wikidata Telugu 1.0 100 Ch 12.7
Teng 290 Tengwar left-to-right ZZ— Not in Unicode
Tfng 120 Tifinagh (Berber) left-to-right, right-to-left script, top-to-bottom, bottom-to-top Edit this on Wikidata Tifinagh 4.1 59 Ch 19.3
Tglg 370 Tagalog (Baybayin, Alibata) left-to-right Edit this on Wikidata Tagalog 3.2 23 Ch 17.1
Thaa 170 Thaana right-to-left script Edit this on Wikidata Thaana 3.0 50 Ch 13.1
Thai 352 Thai left-to-right Edit this on Wikidata Thai 1.0 86 Ch 16.1
Tibt 330 Tibetan left-to-right Edit this on Wikidata Tibetan 2.0 207 Added in 1.0, removed in 1.1 and reintroduced in 2.0 Ch 13.4
Tirh 326 Tirhuta left-to-right Edit this on Wikidata Tirhuta 7.0 82 Ch 15.11
Tnsa 275 Tangsa left-to-right Tangsa 14.0 89 Ch 13.18
Todr 229 Todhri right-to-left ZZ— Not in Unicode, approved for version 16.0[iii]
Tols 299 Tolong Siki left-to-right ZZ— Not in Unicode, proposal is mature[ii]
Toto 294 Toto left-to-right Toto 14.0 31 Ch 13.17
Tutg 341 Tulu-Tigalari left-to-right ZZ— Not in Unicode, approved for version 16.0[iii]
Ugar 040 Ugaritic left-to-right Edit this on Wikidata Ugaritic 4.0 31 Ancient/historic Ch 11.2
Vaii 470 Vai left-to-right Edit this on Wikidata Vai 5.1 300 Ch 19.5
Visp 280 Visible Speech left-to-right ZZ— Not in Unicode
Vith 228 Vithkuqi left-to-right Vithkuqi 14.0 70 Ancient/historic Ch 8.12
Wara 262 Warang Citi (Varang Kshiti) left-to-right Edit this on Wikidata Warang Citi 7.0 84 Ch 13.9
Wcho 283 Wancho left-to-right Edit this on Wikidata Wancho 12.0 59 Ch 13.16
Wole 480 Woleai mixed ZZ— Not in Unicode, proposal is explored[i]
Xpeo 030 Old Persian left-to-right Edit this on Wikidata Old Persian 4.1 50 Ancient/historic Ch 11.3
Xsux 020 Cuneiform, Sumero-Akkadian left-to-right Edit this on Wikidata Cuneiform 5.0 1,234 Ancient/historic Ch 11.1
Yezi 192 Yezidi right-to-left script Edit this on Wikidata Yezidi 13.0 47 Ancient/historic Ch 9.6
Yiii 460 Yi left-to-right Edit this on Wikidata Yi 3.0 1,220 Ch 18.7
Zanb 339 Zanabazar Square (Zanabazarin Dörböljin Useg, Xewtee Dörböljin Bicig, Horizontal Square Script) left-to-right Edit this on Wikidata Zanabazar Square 10.0 72 Ancient/historic Ch 14.6
Zinh 994 Code for inherited script Inherited 657
Zmth 995 Mathematical notation ZZ— Not a 'script' in Unicode
Zsym 996 Symbols ZZ— Not a 'script' in Unicode
Zsye 993 Symbols (emoji variant) ZZ— Not a 'script' in Unicode
Zxxx 997 Code for unwritten documents ZZ— Not a 'script' in Unicode
Zyyy 998 Code for undetermined script Common 8,306
Zzzz 999 Code for uncoded script Unknown 964,234 In Unicode: All other code points
Notes
  1. ^
    ISO 15924 publications As of 12 September 2023
  2. ^
    ISO 15924 Normative text file As of 12 September 2023
  3. ^
    ISO 15924 Changes (including Aliases for Unicode; as of 12 September 2023)
  4. ^
    Unicode version 15.1
  5. ^
  6. ^
    Unicode uses the "Property Value Alias" (Alias) as the script-name. These Alias names are part of Unicode and are published informatively next to ISO 15924. An alias script name may be used in a character name: Palm, Palmyrene → U+10860 𐡠 PALMYRENE LETTER ALEPH.
  7. ^
    In Unicode, the Phoenician script is intended for the representation of text in Paleo-Hebrew, Archaic Phoenician, Phoenician, Early Aramaic, Late Phoenician cursive, Phoenician papyri, Siloam Hebrew, Hebrew seals, Ammonite, Moabite, and Punic.[vi]
References
  1. ^ a b c d e f g h i "SEI List of Scripts Not Yet Encoded". Unicode Consortium. March 2023. Retrieved 2023-09-25.
  2. ^ a b c d "Unicode Pipeline § Code Points Provisionally Assigned for Mature Proposals". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25.
  3. ^ a b c d e f g "Unicode Pipeline § Approved for Publication in Version 16.0". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25.
  4. ^ Michael Everson (1997-09-18). "Proposal to encode Klingon in Plane 1 of ISO/IEC 10646-2".
  5. ^ The Unicode Consortium (2001-08-14). "Approved Minutes of the UTC 87 / L2 184 Joint Meeting".
  6. ^ "Middle East-II, Ancient Scripts" (PDF). 15.0.0. The Unicode Consortium. Retrieved 2023-09-25.

Normalization properties

(decompositions, decomposition type, canonical combining class, composition exclusions, and so On)

Age

"Age" is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.0, 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1 and 5.2.[2][3]

Boundaries

(grapheme cluster, word, line, and sentence)

References