Phonetic symbols in Unicode

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In Unicode there is no "IPA script". Apart from IPA, extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Phonetic scripts[edit]

The International Phonetic Alphabet (IPA) makes use of letters from other writing systems as most phonetic scripts do. IPA notably uses Latin, Greek and Cyrillic characters. Combining diacritics also adds meaning to the phonetic text. Finally, these phonetic alphabets make use of modifier letters, that are specially constructed for the phonetic meaning. A "modifier letter" is strictly intended not as an independent grapheme but as a modification of the preceding character[1] resulting in a distinct grapheme, notably in the context of the International Phonetic Alphabet. For example, ʰ should not occur on its own but modifies the preceding or following symbol. Thus, is a single IPA symbol, distinct from t. In practice, however, several of these "modifier letters" are also used as full graphemes, e.g. ʿ as transliterating Semitic ayin or Hawaiian ʻokina, or ˚ transliterating Abkhaz ә.

From IPA to Unicode[edit]

Consonants[edit]

The following tables indicates the Unicode code point sequences for phonemes as used in the International Phonetic Alphabet. A bold code point indicates that the Unicode chart provides an application note such as "voiced retroflex lateral" for U+026D ɭ LATIN SMALL LETTER L WITH RETROFLEX HOOK (HTML ɭ). An entry in bold italics indicates the character name itself refers to a phoneme such as U+0298 ʘ LATIN LETTER BILABIAL CLICK (HTML ʘ)


  Basic Latin/Greek   Latin extended   IPA extension

Bilabial Labiodental Dental Alveolar Postalveolar Retroflex Labialized palatal Postalveolar-velar
Plosive p
U+0070
b
U+0062

U+0070 U+032A

U+0062 U+032A

U+0074 U+032A

U+0064 U+032A
t
U+0074
d
U+0064
ʈ
U+0288
ɖ
U+0256
Implosive ɓ̥
U+0253 U+0325
ɓ
U+0253
ɗ̪
U+0257 U+032A
ɗ
U+0257
*
Ejective
U+0070 U+02BC
t̪ʼ
U+0074 U+032A U+02BC

U+0074 U+02BC
ʈʼ
U+0288 U+02BC
Nasal
U+006D U+0325
m
U+006D
ɱ̊
U+0271 U+030A
ɱ
U+0271
n̪̊
U+006E U+032A U+030A

U+006E U+032A

U+006E U+0325
n
U+006E
ɳ̊
U+0273 U+030A
ɳ
U+0273
Trill ʙ
U+0299

U+0072 U+0325
r
U+0072
*
Tap or Flap ⱱ̟
U+2C71 U+031F

U+2C71
ɾ
U+027E
ɽ
U+027D
Lateral flap ɺ
U+027A
*
Fricative ɸ
U+0278
β
U+03B2
f
U+0066
v
U+0076
θ
U+03B8
ð
U+00F0
s
U+0073
z
U+007A
ʃ
U+0283
ʒ
U+0292
ʂ
U+0282
ʐ
U+0290
ɧ
U+0267
Lateral fricative ɬ
U+026C
ɮ
U+026E

U+A78E
Ejective fricative
U+0073 U+02BC
ʃʼ
U+0283 U+02BC
Ejective lateral fricative ɬʼ
U+026C U+02BC
Percussive ʬ
U+02AC
ʭ
U+02AD
Approximant β̞̊
U+03B2 U+031E U+030A
β̞
U+03B2 U+031E
ʋ̥
U+028B U+0325
ʋ
U+028B
ð̞
U+00F0 U+031E
ɹ̥
U+0279 U+0325
ɹ
U+0279
ɻ̊
U+027B U+030A
ɻ
U+027B
ɥ̊
U+0265 U+030A
ɥ
U+0265
Lateral approximant
U+006C U+0325
l
U+006C
ɭ
U+026D
Click consonant ʘ
U+0298
ǀ
U+01C0
ǃ
U+01C3
ǃ / ǂ
U+01C3 / U+01C2
Lateral click * ǁ
U+01C1
Alveolo-palatal Palatal Labial-velar Velar Uvular Pharyngeal Epiglottal Glottal
Plosive ȶ
U+0236
ȡ
U+0221
c
U+0063
ɟ
U+025F
k͡p
U+006B U+0361 U+0070
ɡ͡b
U+0261 U+0361 U+0062
k
U+006B
ɡ
U+0261
q
U+0071
ɢ
U+0262
ʡ
U+02A1
ʔ
U+0294
Implosive ʄ
U+0284
ɠ
U+0260
ʛ
U+029B
Ejective
U+0063 U+02BC

U+006B U+02BC

U+0071 U+02BC
Nasal ȵ
U+0235
ɲ
U+0272
ŋ͡m
U+014B U+0361 U+006D
ŋ
U+014B
ɴ
U+0274
Trill ʀ
U+0280
*
Tap or Flap *
Lateral flap * *
Fricative ɕ
U+0255
ʑ
U+0291
ç
U+00E7
ʝ
U+029D
x
U+0078
ɣ
U+0263
χ
U+03C7
ʁ
U+0281
ħ
U+0127
ʕ
U+0295
ʜ
U+029C
ʢ
U+02A2
h
U+0068
ɦ
U+0266
Approximant j
U+006A
ʍ
U+028D
w
U+0077
ɰ
U+0270
Lateral approximant ȴ
U+0234
ʎ
U+028E
ʟ
U+029F

Vowels[edit]

The following figures depict the phonetic vowels and their Unicode / UCS code points, arranged to represent the phonetic vowel trapezium. Vowels appearing in pairs in the figure to the right indicate rounded and unrounded variations respectively. Again, characters with Unicode names referring to phonemes are indicated by bold text. Those with explicit application notes are indicated by bold italic text. Those from borrowed unchanged from another script (Latin, Greek or Cyrillic) are indicated by italics. Before and after a bullet are the unrounded • rounded vowels.

Unicode code points for phonetic vowels
Front Central Back
Close iy
U+0069

U+0079
ɨʉ
U+0268

U+0289
ɯu
U+026F

U+0075
Near-close ɪʏ
U+026A

U+028F
ɪ̈ʊ̈
U+026A U+0308

U+028A U+0308
ʊ
 

U+028A
Close-mid eø
U+0065

U+00F8
ɘɵ
U+0258

U+0275
ɤo
U+0264

U+006F
Mid ø̞
U+0065 U+031E

U+00F8 U+031E
ə
 
U+0259
 
ɤ̞
U+0264 U+031E

U+006F U+031E
Open-mid ɛœ
U+025B

U+0153
ɜɞ
U+025C

U+025E
ʌɔ
U+028C

U+0254
Near-open æ
U+00E6

 
ɐ
 
U+0250
 
Open aɶ
U+0061

U+0276
ä
U+0061 U+0308

 
ɑɒ
U+0251

U+0252

Diacritics[edit]

Diacritics may be encoded as either modifier (e.g. ˳) or combining (e.g. ◌̥) characters.

Voiceless Breathy Voiced Dental Syllabic
˳◌̥
U+02F3 • U+0325
◌̤
U+0324
◌͏̪
U+032A
ˌ◌̩
U+02CC • U+0329
Voiced Creaky Voiced Apical Non-syllabic
ˬ◌̬
U+02EC • U+032C
˷◌̰
U+02F7 • U+0330
˽◌̺
U+02FD • U+033A
◌͏̯
U+032F
Aspirated Linguolabial Laminal More Rounded
ʰ
U+02B0
◌͏̼
U+033C
◌͏̻
U+033B
˒◌̹
U+02D2 • U+0339
Labialized Nasalized Palatalized Less Rounded
ʷ
U+02B7
◌̃
U+0303
ʲ
U+02B2
˓◌̜
U+02D3 • U+031C
Advanced Nasal release Centralized Velarized
˖◌̟
U+02D6 • U+031F

U+207F
¨◌̈
U+00A8[1] • U+0308
ˠ
U+02E0
Retracted Lateral release Mid-Centralized Pharyngealized
ˍ◌̠
U+02CD • U+0320
ˡ
U+02E1
˟◌̽
U+02DF • U+033D
ˤ
U+02E4
Advanced Tongue Root No audible release Raised Velarized or Pharyngealized
◌͏̘
U+0318
˺◌̚
U+02FA • U+031A
˔◌̝
U+02D4 • U+031D
◌̴
U+0334
Retracted Tongue Root Rhoticity Lowered Lengthened
◌͏̙
U+0319
˞
U+02DE
˕◌̞
U+02D5 • U+031E
ː
U+02D0
Notes
1.^ The codepoint refers to diaeresis, which takes up space but is not a Spacing Modifier Letter.

Unicode blocks[edit]

Unicode blocks with many phonetic symbols[edit]

Six Unicode blocks contain many phonetic symbols:

IPA Extensions (U+0250–02AF)[edit]

IPA Extensions[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+025x ɐ ɑ ɒ ɓ ɔ ɕ ɖ ɗ ɘ ə ɚ ɛ ɜ ɝ ɞ ɟ
U+026x ɠ ɡ ɢ ɣ ɤ ɥ ɦ ɧ ɨ ɩ ɪ ɫ ɬ ɭ ɮ ɯ
U+027x ɰ ɱ ɲ ɳ ɴ ɵ ɶ ɷ ɸ ɹ ɺ ɻ ɼ ɽ ɾ ɿ
U+028x ʀ ʁ ʂ ʃ ʄ ʅ ʆ ʇ ʈ ʉ ʊ ʋ ʌ ʍ ʎ ʏ
U+029x ʐ ʑ ʒ ʓ ʔ ʕ ʖ ʗ ʘ ʙ ʚ ʛ ʜ ʝ ʞ ʟ
U+02Ax ʠ ʡ ʢ ʣ ʤ ʥ ʦ ʧ ʨ ʩ ʪ ʫ ʬ ʭ ʮ ʯ
Notes
1.^ As of Unicode version 14.0

Spacing Modifier Letters (U+02B0–02FF)[edit]

The characters in the "Spacing Modifier Letters" block are intended as forming a unity with the preceding letter (which they "modify"). E.g. the character U+02B0 ʰ MODIFIER LETTER SMALL H isn't intended simply as a superscript h (h), but as the mark of aspiration placed after the letter being aspirated, as in "aspirated voiceless bilabial plosive". The block contains:

  • Latin superscript modifier letters: (U+02B0–U+02B8): ʰ aspiration; ʱ breathy voice, murmured; ʲ palatalization; ʳ, ʴ, ʵ, ʶ r-coloring or r-offglides; ʷ labialization; ʸ palatalization, Americanist usage for U+02B2
  • Miscellaneous phonetic modifiers: (U+02B9–U+02D7): ʹ ʺ ʻ ʼ ʽ ʾ ʿ ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ ː ˑ ˒ ˓ ˔ ˕ ˖ ˗
  • Spacing clones of diacritics: (U+02D8–U+02DD): ˘ breve; ˙ dot above; ˚ ring above; ˛ ogonek; ˜ small tilde; ˝ double acute accent
  • Additions based on 1989 IPA: (U+02DE–U+02E4): ˞ ˟ ˠ ˡ ˢ ˣ ˤ
  • Tone letters: (U+02E5–U+02E9): ˥ ˦ ˧ ˨ ˩
  • Extended Bopomofo tone marks: U+02EA ˪ MODIFIER LETTER YIN DEPARTING TONE MARK; U+02EB ˫ MODIFIER LETTER YANG DEPARTING TONE MARK
  • IPA modifiers: U+02EC ˬ MODIFIER LETTER VOICING, unaspirated
  • Other modifier letters: U+02EE ˮ MODIFIER LETTER DOUBLE APOSTROPHE for Nenets
  • Uralic Phonetic Alphabet (UPA) modifiers: (U+02EF–U+02FF): ˯ ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
Spacing Modifier Letters[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+02Bx ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ
U+02Cx ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ
U+02Dx ː ˑ ˒ ˓ ˔ ˕ ˖ ˗ ˘ ˙ ˚ ˛ ˜ ˝ ˞ ˟
U+02Ex ˠ ˡ ˢ ˣ ˤ ˥ ˦ ˧ ˨ ˩ ˪ ˫ ˬ ˭ ˮ ˯
U+02Fx ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
Notes
1.^ As of Unicode version 14.0

Phonetic Extensions (U+1D00–1D7F)[edit]

This block, together with Phonetic Extensions Supplement below, contains:

  • Small capitals "ɢ ɪ ɴ ɶ ʀ ʏ ʙ ʜ ʟ"
  • Turned small letters "ɐ ɥ ɯ ɹ ɺ ɻ ʇ ʌ ʍ ʎ ʞ ʮ ʯ"
  • Extra small capitals "ʁ ʛ ᴀ ᴁ ᴃ ᴄ ᴅ ᴆ ᴇ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ᴐ ᴘ ᴙ ᴚ ᴛ ᴜ ᴠ ᴡ ᴢ ᴣ ᴦ ᴧ ᴨ ᴩ ᴪ"
  • Letters with palatal hooks "ƫ ᶀ ᶁ ᶂ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶉ ᶊ ᶋ ᶌ ᶍ ᶎ ᶪ ᶵ"
  • Letters with retroflex hooks "ᶏ ᶐ ᶒ ᶓ ᶔ ᶕ ᶖ ᶗ ᶘ ᶙ ᶚ ᶩ ᶯ ᶼ"
Phonetic Extensions[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1D0x
U+1D1x
U+1D2x
U+1D3x ᴿ
U+1D4x
U+1D5x
U+1D6x
U+1D7x ᵿ
Notes
1.^ As of Unicode version 14.0

Phonetic Extensions Supplement (U+1D80–1DBF)[edit]

Phonetic Extensions Supplement[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1D8x
U+1D9x
U+1DAx
U+1DBx ᶿ
Notes
1.^ As of Unicode version 14.0

Modifier Tone Letters (U+A700–A71F)[edit]

Modifier Tone Letters[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+A70x
U+A71x
Notes
1.^ As of Unicode version 14.0

Superscripts and Subscripts (U+2070–209F)[edit]

Superscripts and Subscripts[1][2][3]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+207x
U+208x
U+209x
Notes
1.^ As of Unicode version 14.0
2.^ Grey areas indicate non-assigned code points
3.^ Refer to the Latin-1 Supplement Unicode block for characters ¹ (U+00B9), ² (U+00B2) and ³ (U+00B3)

Font support for IPA[edit]

Input by selection from a screen[edit]

Applet for character selection

Many systems provide a way to select Unicode characters visually. ISO/IEC 14755 refers to this as a screen-selection entry method.

Microsoft Windows has provided a Unicode version of the Character Map program (find it by hitting ⊞ Win+R then type charmap then hit ↵ Enter) since version NT 4.0 – appearing in the consumer edition since XP. This is limited to characters in the Basic Multilingual Plane (BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block. More advanced third-party tools of the same type are also available (a notable freeware example is BabelMap).

macOS provides a "character palette" with much the same functionality, along with searching by related characters, glyph tables in a font, etc. It can be enabled in the input menu in the menu bar under System Preferences → International → Input Menu (or System Preferences → Language and Text → Input Sources) or can be viewed under Edit → Emoji & Symbols in many programs.

Equivalent tools – such as gucharmap (GNOME) or kcharselect (KDE) – exist on most Linux desktop environments.

See also[edit]

References[edit]

  1. ^ "Spacing modifier letters". Everything2.com. 2002-08-29. Retrieved 2016-01-23.

External links[edit]