Template:General Category (Unicode)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
General Category (Unicode Character Property)[a]
Value Category Major, minor Basic type[b] Character assigned[b] Fixed[c] Remarks
001Lu Letter, uppercase Graphic Character
002Ll Letter, lowercase Graphic Character
003Lt Letter, titlecase Graphic Character Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)
004Lm Letter, modifier Graphic Character
005Lo Letter, other Graphic Character
011Mn Mark, nonspacing Graphic Character
012Mc Mark, spacing combining Graphic Character
013Me Mark, enclosing Graphic Character
021Nd Number, decimal digit Graphic Character All these, and only these, have Numeric Type = De[c]
022Nl Number, letter Graphic Character Numerals composed of letters or letterlike symbols (e.g., Roman numerals)
023No Number, other Graphic Character E.g., vulgar fractions, superscript and subscript digits
031Pc Punctuation, connector Graphic Character Includes "_" underscore
032Pd Punctuation, dash Graphic Character Includes several hyphen characters
033Ps Punctuation, open Graphic Character Opening bracket characters
034Pe Punctuation, close Graphic Character Closing bracket characters
035Pi Punctuation, initial quote Graphic Character Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
036Pf Punctuation, final quote Graphic Character Closing quotation mark. May behave like Ps or Pe depending on usage
037Po Punctuation, other Graphic Character
041Sm Symbol, math Graphic Character
042Sc Symbol, currency Graphic Character
043Sk Symbol, modifier Graphic Character
044So Symbol, other Graphic Character
051Zs Separator, space Graphic Character Includes the space, but not TAB, CR, or LF, which are Cc
052Zl Separator, line Format Character Only U+2028 LINE SEPARATOR (LSEP)
053Zp Separator, paragraph Format Character Only U+2029 PARAGRAPH SEPARATOR (PSEP)
061Cc Other, control Control Character Fixed 65 No name[d], <control>
062Cf Other, format Format Character Includes the soft hyphen, control characters to support bi-directional text, and language tag characters
063Cs Other, surrogate Surrogate Not (but abstract) Fixed 2,048 No name[d], <surrogate>
064Co Other, private use Private-use Not (but abstract) Fixed 137,468 total: 6,400 in BMP, 131,068 in Planes 15–16 No name[d], <private-use>
065Cn Other, not assigned Noncharacter Not Fixed 66 No name[d], <noncharacter>
Reserved Not Not fixed No name[d], <reserved>
  1. ^ "Table 4-9: General Category" (PDF). The Unicode Standard. Unicode Consortium. July 2016. 
  2. ^ a b "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. July 2016. 
  3. ^ a b Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal).
  4. ^ a b c d e "Table 4-13: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. July 2016.  A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.


These references will appear in the article, but this list appears only on this page.
Template documentation[view] [edit] [history] [purge]

General Category is a Unicode character property, defined in Chapter 4 of the Unicode Standard: "Character Properties".


{{General Category (Unicode)|state=}}
  • State can be set to any of the wikitable collapsing classes. Default is collapsed.

See also[edit]