Unicode character property: Difference between revisions

Content deleted Content added

Inline

Revision as of 20:06, 10 June 2010

Unicode assigns character properties to each codepoint (unicode-number like U+A234).^[1] These properties can be used to handle characters in processes, like in line-breaking or script direction right-to-left. Slightly inconsequently, some character properties are also defined for codepoints that have no character assigned, and codepoints that are defined "not a character", etc.

Properties have levels of forcefulness: normative, informative, contributory, or provisional. Technically a property may be assigned by specifying a range of codepoints.

Character property

Name

Most Unicode characters are assigned a unique Name (na).^[1] The name, in English, is composed of A-Z capitals, 0-9 digits, - (hyphen-minus) and <space>. Some sequences are excluded: beginning space, hyphen; ending space, hyphen; repeated spaces, hyphens; space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are ten of thousands, are named in the pattern "<CJK UNIFIED IDEOGRAPH-hhhh>", like for U+4E00: "CJK UNIFIED IDEOGRAPH-4E00". Formatting characters are named too: "NO-BREAK SPACE" (U+00A0).

Starting from Unicode version 2.0, the published name for a code point will never change. In the event of a misspelling in a publication, a correct name will later be assigned to the code point as an Character Name Alias (na1). Within the whole range of names, an alias is unique too.

Apart from these normative names, informal names can be assigned. These are usually other commonly used names for a character, used for illustration, but these are not informal names are not guaranteed to be unique.

The next code points are do not have a Name (na=""): Controls (General Category: Cc), Private use(Cp), Surrogate (Cs), Non-characters (Co) and Reserved (Co). They may be referenced, informally, by a generic or specific meta-name, called Code Point Labels: <control>, <control-0088>, <reserved>, <noncharacter-hhhh>, <private-hhhh>, <surrogate>.

Pre version 2.0

In version 2.0 of Unicode, many names were changed. From then on the rule "a name will never change" came into effect, including the strict (normative) use of alias names. Disused version 1.0-names were moved to the property Alias, to provide some backward compatibility.

General Category

Each codepoint is assigned a value for General Category. This is one of the character properties that are also defined unassigned codepoints, and codepoints that are defined "not a character".

General Category (Unicode Character Property)^[a] v t e
Value	Category Major, minor	Basic type^[b]	Character assigned^[b]	Count^[c] (as of 15.1)	Remarks

L, Letter; LC, Cased Letter (Lu, Ll, and Lt only)^[d]
Lu	Letter, uppercase	Graphic	Character	1,831
Ll	Letter, lowercase	Graphic	Character	2,233
Lt	Letter, titlecase	Graphic	Character	31	Ligatures or digraphs containing an uppercase followed by a lowercase part (e.g., ǅ, ǈ, ǋ, and ǲ)
Lm	Letter, modifier	Graphic	Character	397	A modifier letter
Lo	Letter, other	Graphic	Character	132,234	An ideograph or a letter in a unicase alphabet
M, Mark
Mn	Mark, nonspacing	Graphic	Character	1,985
Mc	Mark, spacing combining	Graphic	Character	452
Me	Mark, enclosing	Graphic	Character	13
N, Number
Nd	Number, decimal digit	Graphic	Character	680	All these, and only these, have Numeric Type = De^[e]
Nl	Number, letter	Graphic	Character	236	Numerals composed of letters or letterlike symbols (e.g., Roman numerals)
No	Number, other	Graphic	Character	915	E.g., vulgar fractions, superscript and subscript digits
P, Punctuation
Pc	Punctuation, connector	Graphic	Character	10	Includes spacing underscore characters such as "_", and other spacing tie characters. Unlike other punctuation characters, these may be classified as "word" characters by regular expression libraries.^[f]
Pd	Punctuation, dash	Graphic	Character	26	Includes several hyphen characters
Ps	Punctuation, open	Graphic	Character	79	Opening bracket characters
Pe	Punctuation, close	Graphic	Character	77	Closing bracket characters
Pi	Punctuation, initial quote	Graphic	Character	12	Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
Pf	Punctuation, final quote	Graphic	Character	10	Closing quotation mark. May behave like Ps or Pe depending on usage
Po	Punctuation, other	Graphic	Character	628
S, Symbol
Sm	Symbol, math	Graphic	Character	948	Mathematical symbols (e.g., +, −, =, ×, ÷, √, ∊, ≠). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation".
Sc	Symbol, currency	Graphic	Character	63	Currency symbols
Sk	Symbol, modifier	Graphic	Character	125
So	Symbol, other	Graphic	Character	6,639
Z, Separator
Zs	Separator, space	Graphic	Character	17	Includes the space, but not TAB, CR, or LF, which are Cc
Zl	Separator, line	Format	Character	1	Only U+2028 LINE SEPARATOR (LSEP)
Zp	Separator, paragraph	Format	Character	1	Only U+2029 PARAGRAPH SEPARATOR (PSEP)
C, Other
Cc	Other, control	Control	Character	65 (will never change)^[e]	No name,^[g] <control>
Cf	Other, format	Format	Character	170	Includes the soft hyphen, joining control characters (ZWNJ and ZWJ), control characters to support bidirectional text, and language tag characters
Cs	Other, surrogate	Surrogate	Not (only used in UTF-16)	2,048 (will never change)^[e]	No name,^[g] <surrogate>
Co	Other, private use	Private-use	Character (but no interpretation specified)	137,468 total (will never change)^[e] (6,400 in BMP, 131,068 in Planes 15–16)	No name,^[g] <private-use>
Cn	Other, not assigned	Noncharacter	Not	66 (will not change unless the range of Unicode code points is expanded)^[e]	No name,^[g] <noncharacter>
Cn	Other, not assigned	Reserved	Not	824,652	No name,^[g] <reserved>
^ "Table 4-4: General Category" (PDF). The Unicode Standard. Unicode Consortium. September 2022. ^ ^a ^b "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. September 2022. ^ "DerivedGeneralCategory.txt". The Unicode Consortium. 2022-04-26. ^ "5.7.1 General Category Values". UTR #44: Unicode Character Database. Unicode Consortium. 2020-03-04. ^ ^a ^b ^c ^d ^e Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal). ^ "Annex C: Compatibility Properties (§ word)". Unicode Regular Expressions. Version 23. Unicode Consortium. 2022-02-08. Unicode Technical Standard #18. ^ ^a ^b ^c ^d ^e "Table 4-9: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. September 2022. A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.

Other important general characteristics

(whitespace, dash, ideographic, alphabetic, noncharacter, deprecated, and so on)

Display-related properties

(bidirectional class, shaping, mirroring, width, and so on)

Casing

The Case value is Normative in Unicode. It pertains to those scripts with uppercase (aka capital, majuscule) and the lowercase (aka small, minuscule) letter. Case-difference occurs in the scripts Latin, Greek, Coptic, Cyrillic, Glagolitic, Armenian, Deseret, and archaic Georgian.

(upper, lower, title, folding—both simple and full)

Numeric values and types

Characters are classified with a Numeric type.^[1] Numeric are all characters such as fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits. All these have a numeric value that can be decimal, including zero and negatives, but also a vulgar fraction. If there is not such a value, as with most of the scripts, the numeric type is "None".

The numeric characters are separated in three groups: Decimal (De), Decimal ideographic (Di) and Numeric (Nu, i.e. all other). "Decimal" if the character is a straight decimal digit. Here are excluded fractions, encircled numbers, superscripts etc., which end up with the type "Numeric". The intended effect is that a even more simple parser can use these decimal numeric values, without being distracted by say a numeric superscript or a fraction. Some 41 CJK Ideographs that represent a number, including those used for accounting, are typed "Decimal, ideographic".

On the other hand, characters that could have a numeric value as a second meaning are still marked Numeric type "None", and have no numeric value (""). E.g. Latin letters can be used in paragraph numbering like (II.A.1.b), but the letters "I", "A" and "b" are not numeric (type "None") and have no numeric value.

v t e Numeric Type^[a]^[b] (Unicode character property)
Numeric type	Code	Has numeric value	Example	Remarks
Not numeric	`<none>`	No	A X (Latin) ! Д μ に	Numeric Value="NaN"
Decimal	`De`	Yes	0 1 9 ६ (Devanagari 6) ೬ (Kannada 6) 𝟨 (Mathematical, styled sans serif)	Straight digit (decimal-radix). Corresponds both ways with General Category=Nd^[a]
Digit	`Di`	Yes	¹ (superscript) ① ⒈ (digit with full stop)	Decimal, but in typographic context
Numeric	`Nu`	Yes	¾ ௰ (Tamil number ten) Ⅹ (Roman numeral) 六 (Han number 6)	Numeric value, but not decimal-radix
a. ^ "Section 4.6: Numeric Value" (PDF). The Unicode Standard. Unicode Consortium. September 2022.
b. ^ "Unicode 15.1 Derived Numeric Types". Unicode Character Database. Unicode Consortium. 2023-01-05.

Block

A block is a continuous range of code points, marked by its first and last code point. It may contain non-assigned code points. Each assigned character has a single "block" value from a the list of 197 names. Unassigned code points that have no block name yet, have the default value "No_block". Although this is the value of the property "block", it is not an existing block name.

Script

Each assigned character has a single value for its "Script" (sc) property, signifing to which script it belongs. The value is a four-letter code in the range Aaaa-Zzzz, according to ISO 15924, which is mapped to a writing system. The special code Zyyy for "Common" allows a single value for a character that is used in multiple scripts. Unicode uses the private available code Qaai for "Inherited" script. The code Zzzz "Unknown" is used for all characters that do not belong to a script (i.e. de default value), such as symbols and formatting characters. Overall, a single script can be scattered over multiple blocks.

v t e Scripts in ISO 15924^[a]^[b] and in Unicode^[c]^[d]
ISO 15924				Script in Unicode^[e]
Code	ISO number	ISO formal name	Directionality	Unicode Alias^[f]	Version	Characters	Notes	Description
Adlm	166	Adlam	right-to-left script	Adlam	9.0	88		Ch 19.9
Afak	439	Afaka	varies	ZZ— Not in Unicode, proposal is explored^[i]
Aghb	239	Caucasian Albanian	left-to-right	Caucasian Albanian	7.0	53	Ancient/historic	Ch 8.11
Ahom	338	Ahom, Tai Ahom	left-to-right	Ahom	8.0	65	Ancient/historic	Ch 15.16
Arab	160	Arabic	right-to-left script	Arabic	1.0	1,368		Ch 9.2
Aran	161	Arabic (Nastaliq variant)	mixed	ZZ— Typographic variant of Arabic (see § Arab)
Armi	124	Imperial Aramaic	right-to-left script	Imperial Aramaic	5.2	31	Ancient/historic	Ch 10.4
Armn	230	Armenian	left-to-right	Armenian	1.0	96		Ch 7.6
Avst	134	Avestan	right-to-left script	Avestan	5.2	61	Ancient/historic	Ch 10.7
Bali	360	Balinese	left-to-right	Balinese	5.0	124		Ch 17.3
Bamu	435	Bamum	left-to-right	Bamum	5.2	657		Ch 19.6
Bass	259	Bassa Vah	left-to-right	Bassa Vah	7.0	36	Ancient/historic	Ch 19.7
Batk	365	Batak	left-to-right	Batak	6.0	56		Ch 17.6
Beng	325	Bengali (Bangla)	left-to-right	Bengali	1.0	96		Ch 12.2
Bhks	334	Bhaiksuki	left-to-right	Bhaiksuki	9.0	97	Ancient/historic	Ch 14.3
Blis	550	Blissymbols	varies	ZZ— Not in Unicode, proposal is explored^[i]
Bopo	285	Bopomofo	left-to-right, right-to-left script	Bopomofo	1.0	77		Ch 18.3
Brah	300	Brahmi	left-to-right	Brahmi	6.0	115	Ancient/historic	Ch 14.1
Brai	570	Braille	left-to-right	Braille	3.0	256		Ch 21.1
Bugi	367	Buginese	left-to-right	Buginese	4.1	30		Ch 17.2
Buhd	372	Buhid	left-to-right	Buhid	3.2	20		Ch 17.1
Cakm	349	Chakma	left-to-right	Chakma	6.1	71		Ch 13.11
Cans	440	Unified Canadian Aboriginal Syllabics	left-to-right	Canadian Aboriginal	3.0	726		Ch 20.2
Cari	201	Carian	left-to-right, right-to-left script	Carian	5.1	49	Ancient/historic	Ch 8.5
Cham	358	Cham	left-to-right	Cham	5.1	83		Ch 16.10
Cher	445	Cherokee	left-to-right	Cherokee	3.0	172		Ch 20.1
Chis	298	Chisoi	left-to-right	ZZ— Not in Unicode, proposal is mature^[ii]
Chrs	109	Chorasmian	right-to-left script, top-to-bottom	Chorasmian	13.0	28	Ancient/historic	Ch 10.8
Cirt	291	Cirth	varies	ZZ— Not in Unicode
Copt	204	Coptic	left-to-right	Coptic	1.0	137	Ancient/historic, disunified from Greek in 4.1	Ch 7.3
Cpmn	402	Cypro-Minoan	left-to-right	Cypro Minoan	14.0	99	Ancient/historic	Ch 8.4
Cprt	403	Cypriot syllabary	right-to-left script	Cypriot	4.0	55	Ancient/historic	Ch 8.3
Cyrl	220	Cyrillic	left-to-right	Cyrillic	1.0	506	Includes typographic variant Old Church Slavonic (see § Cyrs)	Ch 7.4
Cyrs	221	Cyrillic (Old Church Slavonic variant)	varies	ZZ— Typographic variant of Cyrillic (see § Cyrl); Ancient/historic
Deva	315	Devanagari (Nagari)	left-to-right	Devanagari	1.0	164		Ch 12.1
Diak	342	Dives Akuru	left-to-right	Dives Akuru	13.0	72	Ancient/historic	Ch 15.15
Dogr	328	Dogra	left-to-right	Dogra	11.0	60	Ancient/historic	Ch 15.18
Dsrt	250	Deseret (Mormon)	left-to-right	Deseret	3.1	80		Ch 20.4
Dupl	755	Duployan shorthand, Duployan stenography	left-to-right	Duployan	7.0	143		Ch 21.6
Egyd	070	Egyptian demotic	mixed	ZZ— Not in Unicode
Egyh	060	Egyptian hieratic	mixed	ZZ— Not in Unicode
Egyp	050	Egyptian hieroglyphs	right-to-left script, left-to-right	Egyptian Hieroglyphs	5.2	1,110	Ancient/historic	Ch 11.4
Elba	226	Elbasan	left-to-right	Elbasan	7.0	40	Ancient/historic	Ch 8.10
Elym	128	Elymaic	right-to-left script	Elymaic	12.0	23	Ancient/historic	Ch 10.9
Ethi	430	Ethiopic (Geʻez)	left-to-right	Ethiopic	3.0	523		Ch 19.1
Gara	164	Garay	right-to-left	ZZ— Not in Unicode, approved for version 16.0^[iii]
Geok	241	Khutsuri (Asomtavruli and Nuskhuri)	left-to-right	Georgian			Unicode groups Khutsori, Asomtavruli and Nuskhuri into 'Georgian' (see § Geok). Similarly, Mkhedruli and Mtavruli are 'Georgian' (see § Geor)	Ch 7.7
Geor	240	Georgian (Mkhedruli and Mtavruli)	left-to-right	Georgian	1.0	173	In Unicode this also includes Nuskhuri (Geok)	Ch 7.7
Glag	225	Glagolitic	left-to-right	Glagolitic	4.1	134	Ancient/historic	Ch 7.5
Gong	312	Gunjala Gondi	left-to-right	Gunjala Gondi	11.0	63		Ch 13.15
Gonm	313	Masaram Gondi	left-to-right	Masaram Gondi	10.0	75		Ch 13.14
Goth	206	Gothic	left-to-right	Gothic	3.1	27	Ancient/historic	Ch 8.9
Gran	343	Grantha	left-to-right	Grantha	7.0	85	Ancient/historic	Ch 15.14
Grek	200	Greek	left-to-right	Greek	1.0	518	Directionality sometimes as boustrophedon	Ch 7.2
Gujr	320	Gujarati	left-to-right	Gujarati	1.0	91		Ch 12.4
Gukh	397	Gurung Khema	left-to-right	ZZ— Not in Unicode, approved for version 16.0^[iii]
Guru	310	Gurmukhi	left-to-right	Gurmukhi	1.0	80		Ch 12.3
Hanb	503	Han with Bopomofo (alias for Han + Bopomofo)	mixed	ZZ— See § Hani, § Bopo
Hang	286	Hangul (Hangŭl, Hangeul)	left-to-right, vertical right-to-left	Hangul	1.0	11,739	Hangul syllables relocated in 2.0	Ch 18.6
Hani	500	Han (Hanzi, Kanji, Hanja)	top-to-bottom, columns right-to-left (historically)	Han	1.0	99,030		Ch 18.1
Hano	371	Hanunoo (Hanunóo)	left-to-right, bottom-to-top	Hanunoo	3.2	21		Ch 17.1
Hans	501	Han (Simplified variant)	varies	ZZ— Subset of Han (Hanzi, Kanji, Hanja) (see § Hani)
Hant	502	Han (Traditional variant)	varies	ZZ— Subset of § Hani
Hatr	127	Hatran	right-to-left script	Hatran	8.0	26	Ancient/historic	Ch 10.12
Hebr	125	Hebrew	right-to-left script	Hebrew	1.0	134		Ch 9.1
Hira	410	Hiragana	vertical right-to-left, left-to-right	Hiragana	1.0	381		Ch 18.4
Hluw	080	Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)	left-to-right	Anatolian Hieroglyphs	8.0	583	Ancient/historic	Ch 11.6
Hmng	450	Pahawh Hmong	left-to-right	Pahawh Hmong	7.0	127		Ch 16.11
Hmnp	451	Nyiakeng Puachue Hmong	left-to-right	Nyiakeng Puachue Hmong	12.0	71		Ch 16.12
Hrkt	412	Japanese syllabaries (alias for Hiragana + Katakana)	vertical right-to-left, left-to-right	Katakana or Hiragana			See § Hira, § Kana	Ch 18.4
Hung	176	Old Hungarian (Hungarian Runic)	right-to-left script	Old Hungarian	8.0	108	Ancient/historic	Ch 8.8
Inds	610	Indus (Harappan)	mixed	ZZ— Not in Unicode, proposal is explored^[i]
Ital	210	Old Italic (Etruscan, Oscan, etc.)	right-to-left script, left-to-right	Old Italic	3.1	39	Ancient/historic	Ch 8.6
Jamo	284	Jamo (alias for Jamo subset of Hangul)	varies	ZZ— Subset of § Hang
Java	361	Javanese	left-to-right	Javanese	5.2	90		Ch 17.4
Jpan	413	Japanese (alias for Han + Hiragana + Katakana)	varies	ZZ— See § Hani, § Hira and § Kana
Jurc	510	Jurchen	left-to-right	ZZ— Not in Unicode
Kali	357	Kayah Li	left-to-right	Kayah Li	5.1	47		Ch 16.9
Kana	411	Katakana	vertical right-to-left, left-to-right	Katakana	1.0	321		Ch 18.4
Kawi	368	Kawi	left-to-right	Kawi	15.0	86	Ancient/historic	Ch 17.9
Khar	305	Kharoshthi	right-to-left script	Kharoshthi	4.1	68	Ancient/historic	Ch 14.2
Khmr	355	Khmer	left-to-right	Khmer	3.0	146		Ch 16.4
Khoj	322	Khojki	left-to-right	Khojki	7.0	65	Ancient/historic	Ch 15.7
Kitl	505	Khitan large script	left-to-right	ZZ— Not in Unicode
Kits	288	Khitan small script	vertical right-to-left	Khitan Small Script	13.0	471	Ancient/historic	Ch 18.12
Knda	345	Kannada	left-to-right	Kannada	1.0	91		Ch 12.8
Kore	287	Korean (alias for Hangul + Han)	left-to-right	ZZ— See § Hani, § Hang
Kpel	436	Kpelle	left-to-right	ZZ— Not in Unicode, proposal is explored^[i]
Krai	396	Kirat Rai	left-to-right	ZZ— Not in Unicode, approved for version 16.0^[iii]
Kthi	317	Kaithi	left-to-right	Kaithi	5.2	68	Ancient/historic	Ch 15.2
Lana	351	Tai Tham (Lanna)	left-to-right	Tai Tham	5.2	127		Ch 16.7
Laoo	356	Lao	left-to-right	Lao	1.0	83		Ch 16.2
Latf	217	Latin (Fraktur variant)	varies	ZZ— Typographic variant of Latin (see § Latn)
Latg	216	Latin (Gaelic variant)	left-to-right	ZZ— Typographic variant of Latin (see § Latn)
Latn	215	Latin	left-to-right	Latin	1.0	1,481	See also: Latin script in Unicode	Ch 7.1
Leke	364	Leke	left-to-right	ZZ— Not in Unicode
Lepc	335	Lepcha (Róng)	left-to-right	Lepcha	5.1	74		Ch 13.12
Limb	336	Limbu	left-to-right	Limbu	4.0	68		Ch 13.6
Lina	400	Linear A	left-to-right	Linear A	7.0	341	Ancient/historic	Ch 8.1
Linb	401	Linear B	left-to-right	Linear B	4.0	211	Ancient/historic	Ch 8.2
Lisu	399	Lisu (Fraser)	left-to-right	Lisu	5.2	49		Ch 18.9
Loma	437	Loma	left-to-right	ZZ— Not in Unicode, proposal is explored^[i]
Lyci	202	Lycian	left-to-right	Lycian	5.1	29	Ancient/historic	Ch 8.5
Lydi	116	Lydian	right-to-left script	Lydian	5.1	27	Ancient/historic	Ch 8.5
Mahj	314	Mahajani	left-to-right	Mahajani	7.0	39	Ancient/historic	Ch 15.6
Maka	366	Makasar	left-to-right	Makasar	11.0	25	Ancient/historic	Ch 17.8
Mand	140	Mandaic, Mandaean	right-to-left script	Mandaic	6.0	29		Ch 9.5
Mani	139	Manichaean	right-to-left script	Manichaean	7.0	51	Ancient/historic	Ch 10.5
Marc	332	Marchen	left-to-right	Marchen	9.0	68	Ancient/historic	Ch 14.5
Maya	090	Mayan hieroglyphs	mixed	ZZ— Not in Unicode
Medf	265	Medefaidrin (Oberi Okaime, Oberi Ɔkaimɛ)	left-to-right	Medefaidrin	11.0	91		Ch 19.10
Mend	438	Mende Kikakui	right-to-left script	Mende Kikakui	7.0	213		Ch 19.8
Merc	101	Meroitic Cursive	right-to-left script	Meroitic Cursive	6.1	90	Ancient/historic	Ch 11.5
Mero	100	Meroitic Hieroglyphs	right-to-left script	Meroitic Hieroglyphs	6.1	32	Ancient/historic	Ch 11.5
Mlym	347	Malayalam	left-to-right	Malayalam	1.0	118		Ch 12.9
Modi	324	Modi, Moḍī	left-to-right	Modi	7.0	79	Ancient/historic	Ch 15.12
Mong	145	Mongolian	vertical left-to-right, left-to-right	Mongolian	3.0	168	Mong includes Clear and Manchu scripts	Ch 13.5
Moon	218	Moon (Moon code, Moon script, Moon type)	mixed	ZZ— Not in Unicode, proposal is explored^[i]
Mroo	264	Mro, Mru	left-to-right	Mro	7.0	43		Ch 13.8
Mtei	337	Meitei Mayek (Meithei, Meetei)	left-to-right	Meetei Mayek	5.2	79		Ch 13.7
Mult	323	Multani	left-to-right	Multani	8.0	38	Ancient/historic	Ch 15.10
Mymr	350	Myanmar (Burmese)	left-to-right	Myanmar	3.0	223		Ch 16.3
Nagm	295	Nag Mundari	left-to-right	Nag Mundari	15.0	42
Nand	311	Nandinagari	left-to-right	Nandinagari	12.0	65	Ancient/historic	Ch 15.13
Narb	106	Old North Arabian (Ancient North Arabian)	right-to-left script	Old North Arabian	7.0	32	Ancient/historic	Ch 10.1
Nbat	159	Nabataean	right-to-left script	Nabataean	7.0	40	Ancient/historic	Ch 10.10
Newa	333	Newa, Newar, Newari, Nepāla lipi	left-to-right	Newa	9.0	97		Ch 13.3
Nkdb	085	Naxi Dongba (na²¹ɕi³³ to³³ba²¹, Nakhi Tomba)	left-to-right	ZZ— Not in Unicode
Nkgb	420	Naxi Geba (na²¹ɕi³³ gʌ²¹ba²¹, 'Na-'Khi ²Ggŏ-¹baw, Nakhi Geba)	left-to-right	ZZ— Not in Unicode, proposal is explored^[i]
Nkoo	165	N’Ko	right-to-left script	NKo	5.0	62		Ch 19.4
Nshu	499	Nüshu	vertical right-to-left	Nushu	10.0	397		Ch 18.8
Ogam	212	Ogham	bottom-to-top, left-to-right	Ogham	3.0	29	Ancient/historic	Ch 8.14
Olck	261	Ol Chiki (Ol Cemet’, Ol, Santali)	left-to-right	Ol Chiki	5.1	48		Ch 13.10
Onao	296	Ol Onal	left-to-right	ZZ— Not in Unicode, approved for version 16.0^[iii]
Orkh	175	Old Turkic, Orkhon Runic	right-to-left script	Old Turkic	5.2	73	Ancient/historic	Ch 14.8
Orya	327	Oriya (Odia)	left-to-right	Oriya	1.0	91		Ch 12.5
Osge	219	Osage	left-to-right	Osage	9.0	72		Ch 20.3
Osma	260	Osmanya	left-to-right	Osmanya	4.0	40		Ch 19.2
Ougr	143	Old Uyghur	mixed	Old Uyghur	14.0	26	Ancient/historic	Ch 14.11
Palm	126	Palmyrene	right-to-left script	Palmyrene	7.0	32	Ancient/historic	Ch 10.11
Pauc	263	Pau Cin Hau	left-to-right	Pau Cin Hau	7.0	57		Ch 16.13
Pcun	015	Proto-Cuneiform	left-to-right	ZZ— Not in Unicode
Pelm	016	Proto-Elamite	left-to-right	ZZ— Not in Unicode
Perm	227	Old Permic	left-to-right	Old Permic	7.0	43	Ancient/historic	Ch 8.13
Phag	331	Phags-pa	vertical left-to-right	Phags-pa	5.0	56	Ancient/historic	Ch 14.4
Phli	131	Inscriptional Pahlavi	right-to-left script	Inscriptional Pahlavi	5.2	27	Ancient/historic	Ch 10.6
Phlp	132	Psalter Pahlavi	right-to-left script	Psalter Pahlavi	7.0	29	Ancient/historic	Ch 10.6
Phlv	133	Book Pahlavi	mixed	ZZ— Not in Unicode
Phnx	115	Phoenician	right-to-left script	Phoenician	5.0	29	Ancient/historic^[g]	Ch 10.3
Piqd	293	Klingon (KLI pIqaD)	left-to-right	ZZ— Rejected for inclusion in Unicode^[iv]^[v]
Plrd	282	Miao (Pollard)	left-to-right	Miao	6.1	149		Ch 18.10
Prti	130	Inscriptional Parthian	right-to-left script	Inscriptional Parthian	5.2	30	Ancient/historic	Ch 10.6
Psin	103	Proto-Sinaitic	mixed	ZZ— Not in Unicode
Qaaa-Qabx	900-949	Reserved for private use (range)		ZZ— Not in Unicode
Ranj	303	Ranjana	left-to-right	ZZ— Not in Unicode
Rjng	363	Rejang (Redjang, Kaganga)	left-to-right	Rejang	5.1	37		Ch 17.5
Rohg	167	Hanifi Rohingya	right-to-left script	Hanifi Rohingya	11.0	50		Ch 16.14
Roro	620	Rongorongo	mixed	ZZ— Not in Unicode, proposal is explored^[i]
Runr	211	Runic	left-to-right, boustrophedon	Runic	3.0	86	Ancient/historic	Ch 8.7
Samr	123	Samaritan	right-to-left script, top-to-bottom	Samaritan	5.2	61		Ch 9.4
Sara	292	Sarati	mixed	ZZ— Not in Unicode
Sarb	105	Old South Arabian	right-to-left script	Old South Arabian	5.2	32	Ancient/historic	Ch 10.2
Saur	344	Saurashtra	left-to-right	Saurashtra	5.1	82		Ch 13.13
Sgnw	095	SignWriting	vertical left-to-right	SignWriting	8.0	672		Ch 21.7
Shaw	281	Shavian (Shaw)	left-to-right	Shavian	4.0	48		Ch 8.15
Shrd	319	Sharada, Śāradā	left-to-right	Sharada	6.1	96		Ch 15.3
Shui	530	Shuishu	left-to-right	ZZ— Not in Unicode
Sidd	302	Siddham, Siddhaṃ, Siddhamātṛkā	left-to-right	Siddham	7.0	92	Ancient/historic	Ch 15.5
Sidt	180	Sidetic	right-to-left	ZZ— Not in Unicode, proposal is mature^[ii]
Sind	318	Khudawadi, Sindhi	left-to-right	Khudawadi	7.0	69		Ch 15.9
Sinh	348	Sinhala	left-to-right	Sinhala	3.0	111		Ch 13.2
Sogd	141	Sogdian	horizontal and vertical writing in East Asian scripts, top-to-bottom	Sogdian	11.0	42	Ancient/historic	Ch 14.10
Sogo	142	Old Sogdian	right-to-left script	Old Sogdian	11.0	40	Ancient/historic	Ch 14.9
Sora	398	Sora Sompeng	left-to-right	Sora Sompeng	6.1	35		Ch 15.17
Soyo	329	Soyombo	left-to-right	Soyombo	10.0	83	Ancient/historic	Ch 14.7
Sund	362	Sundanese	left-to-right	Sundanese	5.1	72		Ch 17.7
Sunu	274	Sunuwar	left-to-right	ZZ— Not in Unicode, approved for version 16.0^[iii]
Sylo	316	Syloti Nagri	left-to-right	Syloti Nagri	4.1	45	Ancient/historic	Ch 15.1
Syrc	135	Syriac	right-to-left script	Syriac	3.0	88	Includes typographic variants Estrangelo (see § Syre), Western (§ Syrj), and Eastern (§ Syrn)	Ch 9.3
Syre	138	Syriac (Estrangelo variant)	mixed	ZZ— Typographic variant of Syriac (see § Syrc)
Syrj	137	Syriac (Western variant)	mixed	ZZ— Typographic variant of Syriac (see § Syrc)
Syrn	136	Syriac (Eastern variant)	mixed	ZZ— Typographic variant of Syriac (see § Syrc)
Tagb	373	Tagbanwa	left-to-right	Tagbanwa	3.2	18		Ch 17.1
Takr	321	Takri, Ṭākrī, Ṭāṅkrī	left-to-right	Takri	6.1	68		Ch 15.4
Tale	353	Tai Le	left-to-right	Tai Le	4.0	35		Ch 16.5
Talu	354	New Tai Lue	left-to-right	New Tai Lue	4.1	83		Ch 16.6
Taml	346	Tamil	left-to-right	Tamil	1.0	123		Ch 12.6
Tang	520	Tangut	vertical right-to-left, left-to-right	Tangut	9.0	6,914	Ancient/historic	Ch 18.11
Tavt	359	Tai Viet	left-to-right	Tai Viet	5.2	72		Ch 16.8
Tayo	380	Tai Yo	top-to-bottom, columns right-to-left	ZZ— Not in Unicode, proposal is mature^[ii]
Telu	340	Telugu	left-to-right	Telugu	1.0	100		Ch 12.7
Teng	290	Tengwar	left-to-right	ZZ— Not in Unicode
Tfng	120	Tifinagh (Berber)	left-to-right, right-to-left script, top-to-bottom, bottom-to-top	Tifinagh	4.1	59		Ch 19.3
Tglg	370	Tagalog (Baybayin, Alibata)	left-to-right	Tagalog	3.2	23		Ch 17.1
Thaa	170	Thaana	right-to-left script	Thaana	3.0	50		Ch 13.1
Thai	352	Thai	left-to-right	Thai	1.0	86		Ch 16.1
Tibt	330	Tibetan	left-to-right	Tibetan	2.0	207	Added in 1.0, removed in 1.1 and reintroduced in 2.0	Ch 13.4
Tirh	326	Tirhuta	left-to-right	Tirhuta	7.0	82		Ch 15.11
Tnsa	275	Tangsa	left-to-right	Tangsa	14.0	89		Ch 13.18
Todr	229	Todhri	right-to-left	ZZ— Not in Unicode, approved for version 16.0^[iii]
Tols	299	Tolong Siki	left-to-right	ZZ— Not in Unicode, proposal is mature^[ii]
Toto	294	Toto	left-to-right	Toto	14.0	31		Ch 13.17
Tutg	341	Tulu-Tigalari	left-to-right	ZZ— Not in Unicode, approved for version 16.0^[iii]
Ugar	040	Ugaritic	left-to-right	Ugaritic	4.0	31	Ancient/historic	Ch 11.2
Vaii	470	Vai	left-to-right	Vai	5.1	300		Ch 19.5
Visp	280	Visible Speech	left-to-right	ZZ— Not in Unicode
Vith	228	Vithkuqi	left-to-right	Vithkuqi	14.0	70	Ancient/historic	Ch 8.12
Wara	262	Warang Citi (Varang Kshiti)	left-to-right	Warang Citi	7.0	84		Ch 13.9
Wcho	283	Wancho	left-to-right	Wancho	12.0	59		Ch 13.16
Wole	480	Woleai	mixed	ZZ— Not in Unicode, proposal is explored^[i]
Xpeo	030	Old Persian	left-to-right	Old Persian	4.1	50	Ancient/historic	Ch 11.3
Xsux	020	Cuneiform, Sumero-Akkadian	left-to-right	Cuneiform	5.0	1,234	Ancient/historic	Ch 11.1
Yezi	192	Yezidi	right-to-left script	Yezidi	13.0	47	Ancient/historic	Ch 9.6
Yiii	460	Yi	left-to-right	Yi	3.0	1,220		Ch 18.7
Zanb	339	Zanabazar Square (Zanabazarin Dörböljin Useg, Xewtee Dörböljin Bicig, Horizontal Square Script)	left-to-right	Zanabazar Square	10.0	72	Ancient/historic	Ch 14.6
Zinh	994	Code for inherited script		Inherited		657
Zmth	995	Mathematical notation		ZZ— Not a 'script' in Unicode
Zsym	996	Symbols		ZZ— Not a 'script' in Unicode
Zsye	993	Symbols (emoji variant)		ZZ— Not a 'script' in Unicode
Zxxx	997	Code for unwritten documents		ZZ— Not a 'script' in Unicode
Zyyy	998	Code for undetermined script		Common		8,306
Zzzz	999	Code for uncoded script		Unknown		964,234	In Unicode: All other code points
Notes ^ ISO 15924 publications As of 12 September 2023^[update] ^ ISO 15924 Normative text file As of 12 September 2023^[update] ^ ISO 15924 Changes (including Aliases for Unicode; as of 12 September 2023^[update]) ^ Unicode version 15.1 ^ Unicode charts ^ Unicode uses the "Property Value Alias" (Alias) as the script-name. These Alias names are part of Unicode and are published informatively next to ISO 15924. An alias script name may be used in a character name: `Palm`, Palmyrene → U+10860 𐡠 PALMYRENE LETTER ALEPH. ^ In Unicode, the Phoenician script is intended for the representation of text in Paleo-Hebrew, Archaic Phoenician, Phoenician, Early Aramaic, Late Phoenician cursive, Phoenician papyri, Siloam Hebrew, Hebrew seals, Ammonite, Moabite, and Punic.^[vi]
References ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ "SEI List of Scripts Not Yet Encoded". Unicode Consortium. March 2023. Retrieved 2023-09-25. ^ ^a ^b ^c ^d "Unicode Pipeline § Code Points Provisionally Assigned for Mature Proposals". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25. ^ ^a ^b ^c ^d ^e ^f ^g "Unicode Pipeline § Approved for Publication in Version 16.0". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25. ^ Michael Everson (1997-09-18). "Proposal to encode Klingon in Plane 1 of ISO/IEC 10646-2". ^ The Unicode Consortium (2001-08-14). "Approved Minutes of the UTC 87 / L2 184 Joint Meeting". ^ "Middle East-II, Ancient Scripts" (PDF). 15.0.0. The Unicode Consortium. Retrieved 2023-09-25.

Normalization properties

(decompositions, decomposition type, canonical combining class, composition exclusions, and so On)

Age

"Age" is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.0, 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1 and 5.2.^[2]^[3]

Boundaries

(grapheme cluster, word, line, and sentence)

References

[fn1-2] "Table 4-4: General Category" (PDF). The Unicode Standard. Unicode Consortium. September 2022.

[fn2-3] "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. September 2022.

[fn6-4] "DerivedGeneralCategory.txt". The Unicode Consortium. 2022-04-26.

[fn5-5] "5.7.1 General Category Values". UTR #44: Unicode Character Database. Unicode Consortium. 2020-03-04.

[fn3-6] Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal).

[7] "Annex C: Compatibility Properties (§ word)". Unicode Regular Expressions. Version 23. Unicode Consortium. 2022-02-08. Unicode Technical Standard #18.

[fn4-8] "Table 4-9: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. September 2022. A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.

[cnote_a_grp_ISO_Unicode] 
ISO 15924 publications As of 12 September 2023^[update]

[cnote_b_grp_ISO_list] 
ISO 15924 Normative text file As of 12 September 2023^[update]

[cnote_c_grp_ISO_changes] 
ISO 15924 Changes (including Aliases for Unicode; as of 12 September 2023^[update])

[cnote_d_grp_Asof_Unicode_version] 
Unicode version 15.1

[cnote_e_grp_Unicode_charts] 
Unicode charts

[cnote_f_grp_Aliases_for_Unicode] 
Unicode uses the "Property Value Alias" (Alias) as the script-name. These Alias names are part of Unicode and are published informatively next to ISO 15924. An alias script name may be used in a character name: Palm, Palmyrene → U+10860 𐡠 PALMYRENE LETTER ALEPH.

[cnote_g_grp_Scripts] 
In Unicode, the Phoenician script is intended for the representation of text in Paleo-Hebrew, Archaic Phoenician, Phoenician, Early Aramaic, Late Phoenician cursive, Phoenician papyri, Siloam Hebrew, Hebrew seals, Ammonite, Moabite, and Punic.^[vi]

[uniproposed-9] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ "SEI List of Scripts Not Yet Encoded". Unicode Consortium. March 2023. Retrieved 2023-09-25.

[pipeline_mature-10] "Unicode Pipeline § Code Points Provisionally Assigned for Mature Proposals". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25.

[pipeline_v16-11] ^ ^a ^b ^c ^d ^e ^f ^g "Unicode Pipeline § Approved for Publication in Version 16.0". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25.

[12] Michael Everson (1997-09-18). "Proposal to encode Klingon in Plane 1 of ISO/IEC 10646-2".

[13] The Unicode Consortium (2001-08-14). "Approved Minutes of the UTC 87 / L2 184 Joint Meeting".

[14] "Middle East-II, Ancient Scripts" (PDF). 15.0.0. The Unicode Consortium. Retrieved 2023-09-25.

[Chapter4-1] Unicode 5.2 chapter 4

[15] Pre version 4

[16] Versions 4.0 and later

[1]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[a]

[b]

[c]

[d]

[e]

[f]

[i]

[ii]

[iii]

[g]

[iv]

[v]

[vi]

[2]

[3]

@@ Line 51: / Line 51: @@
 (decompositions, decomposition type, canonical combining class, composition exclusions, and so On)
 ===Age===
+"Age" is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.0, 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1 and 5.2.<ref>[http://www.unicode.org/versions/components-pre4.html Pre version 4]</ref><ref>[http://www.unicode.org/versions/enumeratedversions.html Versions 4.0 and later]</ref>
-(version of the standard in which the code point was first designated)
 ===Boundaries===
 (grapheme cluster, word, line, and sentence)