ARIB STD B24 character set

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
ARIB STB-B24 encoding
StandardARIB STB-B24 Volume 1
ClassificationISO 2022 profile/extension
Transforms / EncodesARIB STB-B24 Kanji, Kana and mosaic sets,
JIS X 0201
ARIB STB-B24 Kanji set
ARIB Extended Font (Weather Symbols) ja.svg
Weather symbols: a few of the extended symbols included.
Language(s)Japanese, English, Russian
Partial support: Greek, Chinese
StandardARIB STB-B24 Volume 1
ClassificationISO-2022-structured CJK DBCS
ExtendsJIS X 0208
Encoding formats
  • ARIB STB-B24 encoding (ISO 2022 based)
  • Shift JIS (ARIB variant)[1]

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language[2] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.[2] The latest revision is version 6.3 as of 2016-07-06.

It includes a number of ARIB extended characters (ARIB外字, ARIB gaiji) not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[3] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[4]

Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.[5] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.

Sets and codes[edit]

The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.[6] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):[7]

Set Type Code (column/line) Code (hexadecimal) Code (ASCII character) Comments
Kanji 2-byte 4/2 42 B The escape code B used for the ARIB Kanji set[7] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[8][9]
Alphanumeric 1-byte 4/10 4A J JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.[9]
Proportional alphanumeric 1-byte 3/6 36 6
Hiragana 1-byte 3/0 30 0 Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Hiragana 1-byte 3/7 37 7
Katakana 1-byte 3/1 31 1 Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Katakana 1-byte 3/8 38 8
JIS X 0201 Katakana 1-byte 4/9 49 I JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3.
Mosaic A 1-byte 3/2 32 2 Pseudographics
Mosaic B 1-byte 3/3 33 3
Mosaic C 1-byte 3/4 34 4 Non-spacing pseudographics
Mosaic D 1-byte 3/5 35 5

Code charts[edit]

Kanji (double-byte) set[edit]

This is a double-byte character set extending JIS X 0208.

Lead byte[edit]

The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208.

ARIB STD-B24 Kanji (double-byte) set (lead bytes)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x  SP  1-_ 2-_ 3-_ 4-_ 5-_ 6-_ 7-_ 8-_ 9-_ 10-_ 11-_ 12-_ 13-_ 14-_ 15-_
3x 16-_ 17-_ 18-_ 19-_ 20-_ 21-_ 22-_ 23-_ 24-_ 25-_ 26-_ 27-_ 28-_ 29-_ 30-_ 31-_
4x 32-_ 33-_ 34-_ 35-_ 36-_ 37-_ 38-_ 39-_ 40-_ 41-_ 42-_ 43-_ 44-_ 45-_ 46-_ 47-_
5x 48-_ 49-_ 50-_ 51-_ 52-_ 53-_ 54-_ 55-_ 56-_ 57-_ 58-_ 59-_ 60-_ 61-_ 62-_ 63-_
6x 64-_ 65-_ 66-_ 67-_ 68-_ 69-_ 70-_ 71-_ 72-_ 73-_ 74-_ 75-_ 76-_ 77-_ 78-_ 79-_
7x 80-_ 81-_ 82-_ 83-_ 84-_ 85-_ 86-_ 87-_ 88-_ 89-_ 90-_ 91-_ 92-_ 93-_ 94-_ DEL
  Unused lead byte
  Lead byte
  Differences from JIS X 0208

Character sets 0x21-0x74 (row numbers 1-84: punctuation, alphabets, numbers, Kana, Kanji)[edit]

Character set 0x7A (row number 90, traffic symbols)[edit]

Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below shaded) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.[10] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.[10]

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7A)[5][11]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x ❗︎ ⛔︎
3x 🅿 🆊
4x ⭕︎
5x 🅊 🅌 🄿 🅆 🅋 🈐 🈑 🈒 🈓 🅂 🈔 🈕 🈖 🅍 🄱 🄽
6x ⬛︎ 🈗 🈘 🈙 🈚︎ 🈛 🈜 🈝 🈞 🈟 🈠 🈡 🈢 🈣
7x 🈤 🈥 🅎 🈀
  Additions from table 7-10 not in table 7-4.

Character set 0x7B (row number 91, map symbols)[edit]

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7B)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x [a] ⛪︎
3x ⚓︎ ⛲︎ ⛳︎ ⛵︎ 🅗
4x 🅟 🆋 🆍 🆌 🅹 ⛺︎ 🅻 ⛽︎
5x 🅼
6x
7x
  Not in ARIB STD-B62

Character set 0x7C (row number 92, units, enclosed forms, list markers, arrows)[edit]

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7C)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x 🄀 [b] [b] [b] [b] [b] [b]
4x 🄁 🄂 🄃 🄄 🄅 🄆 🄇 🄈 🄉 🄊
5x ² ³ 🄭 (vn)[c] (ob)[c] (cb)[c] (ce[c] mb)[c] (hp)[c] (br)[c] (p)[c]
6x (s)[c] (ms)[c] (t)[c] (bs)[c] (b)[c] (tb)[c] (tp)[c] (ds)[c] (ag)[c] (eg)[c] (vo)[c] (fl)[c] (ke[c] y)[c] (sa[c] x)[c]
7x (sy[c] n)[c] (or[c] g)[c] (pe[c] r)[c] 🄬 🄫 🆐 🈦
  Not in ARIB STD-B62

Character set 0x7D (row number 93, game and weather symbols, fractions, units, enclosed forms)[edit]

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7D)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x ⚾︎ 🉀 🉁 🉂 🉃 🉄 🉅 🉆 🉇 🉈 🄪 🈧 🈨 🈩 🈔 🈪
4x 🈫 🈬 🈭 🈮 🈯︎ 🈰 🈱
5x ½ ¼ ¾
6x ⛄︎
7x ⛅︎ ☔︎ ⚡︎
  Not in ARIB STD-B62

Character set 0x7E (row number 94, list markers)[edit]

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7E)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x
4x 🄐 🄑 🄒 🄓 🄔 🄕 🄖 🄗 🄘 🄙 🄚 🄛 🄜 🄝 🄞
5x 🄟 🄠 🄡 🄢 🄣 🄤 🄥 🄦 🄧 🄨 🄩
6x
7x
  Not in ARIB STD-B62

Single-byte sets[edit]

Alphanumeric set[edit]

ARIB STD-B24 Alphanumeric set[14]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ ¥ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | }
  Differences from US-ASCII

Hiragana set[edit]

ARIB STD-B24 Hiragana set[15]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x
4x
5x
6x
7x
  Character allocations not following row 4 of JIS X 0208

Katakana set[edit]

ARIB STD-B24 Katakana set[16]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x
4x
5x
6x
7x
  Character allocations not following row 5 of JIS X 0208

JIS X 0201 Katakana set[edit]

ARIB STD-B24 JIS X 0201 Katakana set[17]
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x ソ
4x
5x
6x
7x

Mosaic sets[edit]

Shift_JIS variant[edit]

In addition to the modified ISO 2022 encoding, the B24 standard also specifies a Shift JIS encoding following JIS X 0208:1997, but with the addition of the extended characters in the kanji set.[1]

First byte
0 1 2 3 4 5 6 7 8 9 A B C D E F
0
1
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ ¥ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | }
8
9
A
B ソ
C
D
E
F
Second byte
0 1 2 3 4 5 6 7 8 9 A B C D E F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
 
Non printable ASCII character
Unaltered ASCII character
Modified ASCII character
Single-byte half-width katakana
First byte of a double-byte character, used by JIS X 0208
First byte of an ARIB extended character
Not used as first byte, unallocated space in JIS X 0208
Not used as first byte
Second byte of a double-byte character whose first half of the JIS sequence was odd
Second byte of a double-byte character whose first half of the JIS sequence was even
Unused as second byte of a double-byte character


See also[edit]

Footnotes[edit]

  1. ^ Glossed as "temple" (i.e. Buddhist temple) in B24 table 7-10 (the list of extension characters).
  2. ^ a b c d e f Small form (70% size per code chart / table 7-10) of a kanji character. Shown here simulated. Private Use Area code points shown are those used by the Nishiki-teki font.[13]
  3. ^ a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad Musical abbreviation (or half thereof) not present in Unicode, simulated here with multiple characters. Private Use Area code points shown are those used by the Nishiki-teki font.

References[edit]

  1. ^ a b ARIB (2008), p. 105, part 2, section 7.3
  2. ^ a b ARIB (2008)
  3. ^ Suignard, Michel (2008-03-11). "ISO/IEC JTC1/SC2/WG2 N 3397: Japanese TV Symbols" (PDF).
  4. ^ "Unicode 5.2 Emoji List". Emojipedia.
  5. ^ a b c d e f ARIB (2014), pp. 33–50, part 2, Table 5-2
  6. ^ ARIB (2008), pp. 48–52
  7. ^ a b ARIB (2008), p. 39, part 2, Table 7-3
  8. ^ "ISO-IR-087" (PDF). Information Technology Standards Commission of Japan (IPSJ/ITSCJ).
  9. ^ a b RFC 1468 (IETF)
  10. ^ a b ARIB (2008), p. 72
  11. ^ a b c d e ARIB (2008), pp. 54–72, part 2, Table 7-10
  12. ^ a b c d ARIB (2008), pp. 46–47, part 2, Table 7-4
  13. ^ "Nishiki-teki Version 3.82b (2021-07-23) - 6,416 characters in the Private Use Areas" (PDF).
  14. ^ ARIB (2008), p. 48, part 2, Table 7-5
  15. ^ ARIB (2008), p. 50, part 2, Table 7-7
  16. ^ ARIB (2008), p. 49, part 2, Table 7-6
  17. ^ ARIB (2008), p. 52, part 2, Table 7-9

Further reading[edit]

External links[edit]