Jump to content

Unicode subscripts and superscripts

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by AnomieBOT (talk | contribs) at 06:37, 18 April 2016 (Substing templates: {{unicode}} and {{Unicode}}. See User:AnomieBOT/docs/TemplateSubster for info.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The difference between superscript/subscript and numerator/denominator glyphs. In many popular fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs.

Unicode has subscripted and superscripted versions of a number of characters including a full set of arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters: "When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts...However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription."[1]

Uses

Most fonts that include these characters design them for mathematical numerator and denominator glyphs, which are smaller than normal characters but are aligned with the cap line and the baseline, respectively. When used with the solidus, these glyphs are useful for making arbitrary diagonal fractions (similar to the ½ glyph).

This was not the intended use of these characters when Unicode was designed. The intended use was to allow chemical and algebra formulas to be written without markup. Proper appearance of these requires true superscript and subscript. H2O with subscript markup may look better than with a Unicode subscript (H₂O) in a font that has repurposed the Unicode subscripts for fractions.

Another Unicode character, the fraction slash U+2044, is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts) was intended to tell a layout system that a fraction, such as ¹¹⁄₁₂, is intended.[2] Not all font layout systems accommodate this, although it may offer an improvement in appearance, such as additional slanting.

Display Characters
¹/₂ U+00B9 ¹ SUPERSCRIPT ONE, U+002F / SOLIDUS, U+2082 SUBSCRIPT TWO
1⁄2 U+0031 1 DIGIT ONE, U+2044 FRACTION SLASH, U+0032 2 DIGIT TWO
¹⁄₂ U+00B9 ¹ SUPERSCRIPT ONE, U+2044 FRACTION SLASH, U+2082 SUBSCRIPT TWO
½ U+00BD ½ VULGAR FRACTION ONE HALF

Superscripts and subscripts block

The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript or superscript. Gray cells are reserved for future use, white cells are other characters from Latin-1.

Unicode characters
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+00Bx
U+207x x⁰ xⁱ x⁴ x⁵ x⁶ x⁷ x⁸ x⁹ x⁺ x⁻ x⁼ x⁽ x⁾ xⁿ
U+208x x₀ x₁ x₂ x₃ x₄ x₅ x₆ x₇ x₈ x₉ x₊ x₋ x₌ x₍ x₎
U+209x xₐ xₑ xₒ xₓ xₔ xₕ xₖ xₗ xₘ xₙ xₚ xₛ xₜ
Simulated using <sup> or <sub> tags
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+00Bx x2 x3 x1
U+207x x0 xi x4 x5 x6 x7 x8 x9 x+ x x= x( x) xn
U+208x x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x+ x x= x( x)
U+209x xa xe xo xx xə xh xk xl xm xn xp xs xt

Other superscript and subscript characters

Unicode also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:

  • The Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
  • The Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over the placeholder ⟨◌⟩: ◌ͣ◌ͤ◌ͥ◌ͦ◌ͧ◌ͨ◌ͩ◌ͪ◌ͫ◌ͬ◌ͭ◌ͮ◌ͯ.
  • The Combining Diacritical Marks Supplement block contains additional medieval superscript letter diacritics, enough to complete the basic lowercase Latin alphabet except for q and y, a few small capitals and ligatures (ae, ao, av), and additional letters: ◌ᷓ◌ᷔ◌ᷕ◌ᷖ◌ᷗ◌ᷘ◌ᷙ◌ᷚ◌ᷛ◌ᷜ◌ᷝ◌ᷞ◌ᷟ◌ᷠ◌ᷡ◌ᷢ◌ᷣ◌ᷤ◌ᷥ◌ᷦ◌ᷧ◌ᷨ◌ᷩ◌ᷪ◌ᷫ◌ᷬ◌ᷭ◌ᷮ◌ᷯ◌ᷰ◌ᷱ◌ᷲ◌ᷳ◌ᷴ. There is also a combining subscript: ◌᷊.
  • The Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˀ ˁ ˠ ˡ ˢ ˣ ˤ.
  • The Phonetic Extensions block has several sub- and super-scripted letters and symbols: Latin/IPA ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵏ ᵐ ᵑ ᵒ ᵓ ᵖ ᵗ ᵘ ᵚ ᵛ ᵢ ᵣ ᵤ ᵥ, Greek ᵝ ᵞ ᵟ ᵠ ᵡ ᵦ ᵧ ᵨ ᵩ ᵪ, Cyrillic ᵸ, other ᵎ ᵔ ᵕ ᵙ ᵜ. These are intended to indicate secondary articulation.
  • The Phonetic Extensions Supplement block has several more: ᶛ ᶜ ᶝ ᶞ ᶟ ᶠ ᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶭ ᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ ᶿ.

Consolidated, the Unicode standard contains superscript and subscript versions of a rather arbitrary subset of Latin and Greek letters. This subset may increase in future Unicode standards. Here they are arranged in order for comparison (or for cut and paste convenience). Since these characters come from different ranges, they may not be of the same size and position, depending on the typeface:

Latin superscripts and subscripts
Letter A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Superscript capital ᴿ
Superscript small cap
Superscript minuscule ʰ ʲ ˡ ʳ ˢ ʷ ˣ ʸ
Subscript minuscule
Overscript small cap ◌ᷛ ◌ᷞ ◌ᷟ ◌ᷡ ◌ᷢ
Overscript minuscule ◌ͣ ◌ᷨ ◌ͨ ◌ͩ ◌ͤ ◌ᷫ ◌ᷚ ◌ͪ ◌ͥ ◌ᷜ ◌ᷝ ◌ͫ ◌ᷠ ◌ͦ ◌ᷮ ◌ͬ ◌ᷤ ◌ͭ ◌ͧ ◌ͮ ◌ᷱ ◌ͯ ◌ᷦ
Underscript minuscule ◌᷊
Greek superscripts and subscripts
Letter Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
Superscript minuscule ᶿ
Subscript minuscule

Composite characters

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that composite super and subscripts along with other symbols. In most fonts these render much better than attempting to construct these symbols from the hereinbefore characters or by using markup.

References

  1. ^ Martin Dürst, Asmus Freytag (16 May 2007). "Unicode in XML and other Markup Languages". W3C. Retrieved 13 September 2010.
  2. ^ Martin Dürst, Asmus Freytag (16 May 2007). "Fraction Slash". W3C. Retrieved 13 September 2010.