Template:JIS X 0208 encoding comparison

Encoding	Alternate name	7-bit?^[a]	ISO 2022?	Stateless?^[b]	Accepts ASCII?	0x00–7F always ASCII?	Superset of 8-bit JIS X 0201?	Supports JIS X 0212?	Bytewise self-synchronizing?	Bitwise self-synchronizing?
ISO-2022-JP	"JIS" (JIS X 0202)	Yes	Yes	No^[c]	Yes	Sequences can be non-ASCII^[c]	No (encoding possible)^[d]	Possible^[e]	No	No
Shift_JIS	"SJIS"	No	No	Yes	Almost^[f]	Isolated bytes can be non-ASCII^[g]	Yes	No	No	No
EUC-JP	"UJIS" (Unixized JIS)	No	Yes^[h]	Yes^[h]	Usually^[i]	Yes	No (encoded)^[j]	Usually available^[k]	No	No
Unicode formats for comparison^[l]
UTF-8		No	No	Yes	Yes	Yes	No (encoded)	Available	Yes	Usually^[m]
UTF-16	"Unicode"^[n]	No	No	Yes	No	No	No (encoded)	Available	Over 16-bit words only.	No
GB 18030		No	No^[o]	Yes	Yes	Isolated bytes can be non-ASCII	No (encoded)	Available	No	No
UTF-32		No	No	Yes	No	No	No (encoded)	Available	Usually, in practice^[p]	No

^ i.e. does not require 8-bit clean transmission.
^ i.e. the sequence used to encode a given character is always the same, no matter what the previous character(s) were. See state (computer science).
^ ^a ^b ISO-2022-JP is a stateful encoding: all charsets are encoded over 0x21–7E and are switched between using ANSI escapes. Hence, while it is ASCII in its initial state, entire sequences of non-ASCII characters can be encoded with ASCII bytes.
^ JIS X 0201 katakana are available in JIS X 0202 and ISO 2022, but not included in the basic ISO-2022-JP profile, although they are a common extension.
^ JIS X 0212 is available in JIS X 0202 and ISO 2022, and included in the ISO-2022-JP-1 and ISO-2022-JP-2 profiles, but not in the basic ISO-2022-JP profile.
^ Single byte characters 0x21–7E in Shift_JIS are properly ISO-646-JP, in order to be a superset of 8-bit JIS X 0201, but are often decoded (not necessarily displayed) as ASCII, which differs only in two places.
^ Some (not all) ASCII bytes can appear as second bytes, but not first bytes, of double-byte characters in Shift_JIS. Hence in a sequence of two or more ASCII bytes, the second byte onward are necessarily ASCII (or ISO-646-JP) characters.
^ ^a ^b Packed-format EUC is based on ISO 2022 mechanisms, with charset designations pre-arranged. Charset designation escapes and locking shifts are avoided, whereas use of single shifts can be implemented in a non-stateful manner. The constraints of ISO 2022 are nonetheless followed.
^ Single byte characters 0x21–7E in EUC-JP are generally considered ASCII, but sometimes treated as ISO-646-JP.
^ Unlike Shift_JIS, EUC-JP will not handle plain 8-bit JIS X 0201 input without prior conversion, due to the different representation of the JIS X 0201 katakana (with single-shifts).
^ JIS X 0212 in EUC-JP is not always implemented.
^ Besides the properties of the encodings themselves, Unicode formats have further advantages stemming from the underlying character set: they are not limited to JIS coded characters but can represent the entirety of UCS (including the full repertoire of JIS coded characters), and are hence suited to international use. They are also less badly affected by colliding proprietary extensions, due to their greater base repertoire and designated private use areas.
^ Most bitwise frameshifts of UTF-8-encoded text will produce invalid UTF-8, but it is possible to construct sequences of characters that remain valid UTF-8 even when frameshifted by one or more bits.
^ By Microsoft only.
^ While GB 18030 and GBK are extensions of the EUC-CN form of GB/T 2312, they do not follow the constraints of EUC or ISO 2022, unlike EUC-JP (or the original EUC-CN).
^ Although, in theory, UTF-32 is self-synchronizing over 32-bit dwords only, the use of a 32-bit value to represent a 21-bit value means that, in practice, UTF-32 contains a continuous run of at least 11 zero bits at the high end of each character, which can usually be used to align to character boundaries, depending on the codepoint(s) involved.

[1] .e. does not require 8-bit clean transmission.

[2] .e. the sequence used to encode a given character is always the same, no matter what the previous character(s) were. See state (computer science).

[state2022-3] ISO-2022-JP is a stateful encoding: all charsets are encoded over 0x21–7E and are switched between using ANSI escapes. Hence, while it is ASCII in its initial state, entire sequences of non-ASCII characters can be encoded with ASCII bytes.

[4] JIS X 0201 katakana are available in JIS X 0202 and ISO 2022, but not included in the basic ISO-2022-JP profile, although they are a common extension.

[5] JIS X 0212 is available in JIS X 0202 and ISO 2022, and included in the ISO-2022-JP-1 and ISO-2022-JP-2 profiles, but not in the basic ISO-2022-JP profile.

[6] Single byte characters 0x21–7E in Shift_JIS are properly ISO-646-JP, in order to be a superset of 8-bit JIS X 0201, but are often decoded (not necessarily displayed) as ASCII, which differs only in two places.

[7] Some (not all) ASCII bytes can appear as second bytes, but not first bytes, of double-byte characters in Shift_JIS. Hence in a sequence of two or more ASCII bytes, the second byte onward are necessarily ASCII (or ISO-646-JP) characters.

[eucstateless-8] Packed-format EUC is based on ISO 2022 mechanisms, with charset designations pre-arranged. Charset designation escapes and locking shifts are avoided, whereas use of single shifts can be implemented in a non-stateful manner. The constraints of ISO 2022 are nonetheless followed.

[9] Single byte characters 0x21–7E in EUC-JP are generally considered ASCII, but sometimes treated as ISO-646-JP.

[10] Unlike Shift_JIS, EUC-JP will not handle plain 8-bit JIS X 0201 input without prior conversion, due to the different representation of the JIS X 0201 katakana (with single-shifts).

[11] JIS X 0212 in EUC-JP is not always implemented.

[12] Besides the properties of the encodings themselves, Unicode formats have further advantages stemming from the underlying character set: they are not limited to JIS coded characters but can represent the entirety of UCS (including the full repertoire of JIS coded characters), and are hence suited to international use. They are also less badly affected by colliding proprietary extensions, due to their greater base repertoire and designated private use areas.

[13] Most bitwise frameshifts of UTF-8-encoded text will produce invalid UTF-8, but it is possible to construct sequences of characters that remain valid UTF-8 even when frameshifted by one or more bits.

[14] By Microsoft only.

[15] While GB 18030 and GBK are extensions of the EUC-CN form of GB/T 2312, they do not follow the constraints of EUC or ISO 2022, unlike EUC-JP (or the original EUC-CN).

[16] Although, in theory, UTF-32 is self-synchronizing over 32-bit dwords only, the use of a 32-bit value to represent a 21-bit value means that, in practice, UTF-32 contains a continuous run of at least 11 zero bits at the high end of each character, which can usually be used to align to character boundaries, depending on the codepoint(s) involved.

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[h]

[i]

[j]

[k]

[l]

[m]

[n]

[o]

[p]

Usage