Template:General Category (Unicode)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
General Category (Unicode Character Property)[a]
Value Category Major, minor Basic type[b] Character assigned[b] Fixed[c] Remarks
Lu Letter, uppercase Graphic Character
Ll Letter, lowercase Graphic Character
Lt Letter, titlecase Graphic Character Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)
Lm Letter, modifier Graphic Character
Lo Letter, other Graphic Character
Mn Mark, nonspacing Graphic Character
Mc Mark, spacing combining Graphic Character
Me Mark, enclosing Graphic Character
Nd Number, decimal digit Graphic Character All these, and only these, have Numeric Type = De[c]
Nl Number, letter Graphic Character Numerals composed of letters or letterlike symbols (e.g., Roman numerals)
No Number, other Graphic Character E.g., vulgar fractions, superscript and subscript digits
Pc Punctuation, connector Graphic Character Includes "_" underscore
Pd Punctuation, dash Graphic Character Includes several hyphen characters
Ps Punctuation, open Graphic Character Opening bracket characters
Pe Punctuation, close Graphic Character Closing bracket characters
Pi Punctuation, initial quote Graphic Character Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
Pf Punctuation, final quote Graphic Character Closing quotation mark. May behave like Ps or Pe depending on usage
Po Punctuation, other Graphic Character
Sm Symbol, math Graphic Character Mathematical symbols (e.g., +, =, ×, ÷, , ). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation".
Sc Symbol, currency Graphic Character Currency symbols
Sk Symbol, modifier Graphic Character
So Symbol, other Graphic Character
Zs Separator, space Graphic Character Includes the space, but not TAB, CR, or LF, which are Cc
Zl Separator, line Format Character Only U+2028 LINE SEPARATOR (LSEP)
Zp Separator, paragraph Format Character Only U+2029 PARAGRAPH SEPARATOR (PSEP)
Cc Other, control Control Character Fixed 65 No name[d], <control>
Cf Other, format Format Character Includes the soft hyphen, control characters to support bi-directional text, and language tag characters
Cs Other, surrogate Surrogate Not (but abstract) Fixed 2,048 No name[d], <surrogate>
Co Other, private use Private-use Not (but abstract) Fixed 137,468 total: 6,400 in BMP, 131,068 in Planes 15–16 No name[d], <private-use>
Cn Other, not assigned Noncharacter Not Fixed 66 No name[d], <noncharacter>
Reserved Not Not fixed No name[d], <reserved>
  1. ^ "Table 4-4: General Category" (PDF). The Unicode Standard. Unicode Consortium. July 2017. 
  2. ^ a b "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. July 2017. 
  3. ^ a b Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal).
  4. ^ a b c d e "Table 4-13: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. July 2017.  A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.


Template documentation[view] [edit] [history] [purge]

General Category is a Unicode character property, defined in Chapter 4 of the Unicode Standard: "Character Properties".


{{General Category (Unicode)|state=}}
  • State can be set to any of the wikitable collapsing classes. Default is collapsed.

See also[edit]