Jump to content

Talk:ISO/IEC 2022: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
SineBot (talk | contribs)
m Signing comment by 178.197.231.171 - "Disadvantages list: new section"
No edit summary
Line 76: Line 76:


The article currently says "UTF-1, the multi-byte Unicode transformation format compatible with ISO/IEC 2022". I don't think it's "compatible" in the way the statement implies because while the standard allows multi-byte encodings, it requires the elements to have constant-length (e.g. all elements have to be 3 bytes) while UTF-1 is variable-length. UTF-1 is registered as a "Coding system different from ISO 2022" (see: https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf), just like UTF-8 is. Escape sequences are provided for both UTF-8 and UTF-1, but using the "designate other coding system" method. If UTF-8 is not "compatible" then neither is UTF-1. --[[Special:Contributions/157.52.11.237|157.52.11.237]] ([[User talk:157.52.11.237|talk]]) 14:24, 24 October 2017 (UTC)
The article currently says "UTF-1, the multi-byte Unicode transformation format compatible with ISO/IEC 2022". I don't think it's "compatible" in the way the statement implies because while the standard allows multi-byte encodings, it requires the elements to have constant-length (e.g. all elements have to be 3 bytes) while UTF-1 is variable-length. UTF-1 is registered as a "Coding system different from ISO 2022" (see: https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf), just like UTF-8 is. Escape sequences are provided for both UTF-8 and UTF-1, but using the "designate other coding system" method. If UTF-8 is not "compatible" then neither is UTF-1. --[[Special:Contributions/157.52.11.237|157.52.11.237]] ([[User talk:157.52.11.237|talk]]) 14:24, 24 October 2017 (UTC)
::I agree this statement is confusing. UTF-1 is certainly not ISO-2022 ''conformant''. "Compatible" means... what, exactly? [[Special:Contributions/204.225.215.56|204.225.215.56]] ([[User talk:204.225.215.56|talk]]) 03:27, 29 June 2019 (UTC)


== Disadvantages list ==
== Disadvantages list ==

Revision as of 03:27, 29 June 2019

ISO 2022 vs ISO 646

To represent large character sets, ISO 2022 builds on ISO 646's property that 1 byte can define 94 graphic (printable) characters (in addition to space and 33 control characters).

So the control characters are always available no matter which character table is currently shifted in? --Abdull 20:17, 7 June 2007 (UTC)[reply]

I don't have ISO 2022 text, but according to JIS X 0202 (which corresponds to ISO 2022), you can designate control character sets (C0, C1) by escape sequences. See control character sets registrations at http://www.itscj.ipsj.or.jp/ISO-IR/ . ESC is guaranteed to be same code for all control character sets. --Fukumoto 16:42, 8 June 2007 (UTC)[reply]

Comparison with other encodings

The comment appears to indicate that ISO-2022 is not useful except with 7-bit displays. Since both GL and GR are mapped, it applies to 8-bit and 7-bit displays (with the latter requiring extra effort on the part of the application developer). Tedickey (talk) 22:15, 25 July 2009 (UTC)[reply]

The comment regarding disadvantages is also misleading, since (applying to cut/paste - apparently), it ignores the actual terminal implementations which may pass selections around as UTF-8. Tedickey (talk) 22:18, 25 July 2009 (UTC)[reply]

If a text processor needs random access to the character data, it basically has two options:
  • Normalize the text by repeating the current shift code before every character,
  • Convert everything to UTF-8 — but then why not use UTF-8 in the first place?
Both methods are unwieldy enough to be regarded as a disadvantage.
--Yecril (talk) 13:38, 25 September 2009 (UTC)[reply]

"Display" is the wrong word; "system" is slightly better. IIRC, the typical PC console driver is 9-bit (512 glyphs can be used at a time, or so) plus 8 bits of colour (4 background, 4 foreground) plus a few bits for bold/underline/blink, plus some more stuff I've forgotten.

  • ISO-2022-JP is presumably useful over a 7-bit transports (traditional SMTP comes to mind). In fact, all the examples listed in "ISO 2022 character sets" appear to be 7-bit.
  • You can always convert generic 8-bit ISO 2022 text (i.e. text that uses GR) into equivalent 7-bit ISO 2022 text by inserting the appropriate control codes and using GL instead. I don't think anyone uses plain ISO 2022, but this may be an advantage. I'm ignoring C1 control codes and DOCS.
  • It's actually easier for a developer to use only the 7-bit range because there's less choice (I have GL mapped to G0 and GR mapped to G1, and now I need a character from G2. What do I do?).
Optimise ...
  • "Actual terminal implementations" — which ones? Why specifically terminals? And no, it's about text processing, not copy/paste (which is simple):
    • A perl script is parsing text. It reads the next byte, which is an "e". But what shift state am I in? Which character is that? How do I represent a "character"? Some bytes need to be accompanied by a shift state, a character number needs to be accompanied by a charset number, and control codes are a right pain...
    • Your mail client is searching for some text ("Hello world!"). But wait — it might be in GL or GR. It might have random shift codes in the middle. Sigh.
  • Any text encoding which has support for arbitrary future extensions (ESC % / in particular) is broken — it's practically impossible to write an implementation that will fail gracefully when, for example, you switch to EBCDIC. And what's "use ESC % @ to return"? Is that the relevant bytes in EBCDIC or ASCII?
ISO 2022/ECMA-35 is defined in terms of bit patterns so it's 0x1B 0x25 0x40 whatever the character set 90.195.73.4 (talk) 21:58, 3 October 2011 (UTC)[reply]
  • How nice of them to support "private use F bytes". Suddenly, there's a an unknown blob in your string, and you can't even let the user select inside it because you don't know where the character boundaries are. And how are you supposed to compare two private-use blobs for equality?

The article has many problems, such as the introduction saying that it's a 7-bit encoding (helpfully contradicting the rest of the article!), but not as many problems as ISO 2022. ⇌Elektron 04:13, 29 January 2010 (UTC)[reply]

But I agree with the rest of your rant; ISO 2022 is EVIL 90.195.73.4 (talk) 21:58, 3 October 2011 (UTC)[reply]

DICOM ISO 2022 variation

Reference 4, "DICOM ISO 2022 variation" is an incorrect url. It points to a simple test email message in a sourceforge project which does not appear to have any relation to DICOM. I've searched to try to find the correct link, with no success. I'd be very interested in the correct target of this link if it could be found. Dlmason (talk) 12:25, 7 April 2012 (UTC)[reply]

Rather than that link, this may be useful TEDickey (talk) 13:31, 7 April 2012 (UTC)[reply]
Thanks, those examples are helpful, but are largely directed to VRs of type PN. There should (I hope) be a link that talks about other differences or issues in general DICOM encodings compared with ISO 2022 -- for example, the DICOM standard forbids certain control characters and shifts. I'm hoping there's a nice summary somewhere to list all the differences in simpler language than is used in the DICOM standard. Dlmason (talk) 12:31, 8 April 2012 (UTC)[reply]

Missing an history section.

Missing an history section. 84.97.14.22 (talk) 19:01, 19 July 2012 (UTC)[reply]

removing POV tag with no active discussion per Template:POV

I've removed an old neutrality tag from this page that appears to have no active discussion per the instructions at Template:POV:

This template is not meant to be a permanent resident on any article. Remove this template whenever:
  1. There is consensus on the talkpage or the NPOV Noticeboard that the issue has been resolved
  2. It is not clear what the neutrality issue is, and no satisfactory explanation has been given
  3. In the absence of any discussion, or if the discussion has become dormant.

Since there's no evidence of ongoing discussion, I'm removing the tag for now. If discussion is continuing and I've failed to see it, however, please feel free to restore the template and continue to address the issues. Thanks to everybody working on this one! -- Khazar2 (talk) 04:26, 27 June 2013 (UTC)[reply]

Hello fellow Wikipedians,

I have just modified one external link on ISO/IEC 2022. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 02:27, 8 April 2017 (UTC)[reply]

UTF-1

The article currently says "UTF-1, the multi-byte Unicode transformation format compatible with ISO/IEC 2022". I don't think it's "compatible" in the way the statement implies because while the standard allows multi-byte encodings, it requires the elements to have constant-length (e.g. all elements have to be 3 bytes) while UTF-1 is variable-length. UTF-1 is registered as a "Coding system different from ISO 2022" (see: https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf), just like UTF-8 is. Escape sequences are provided for both UTF-8 and UTF-1, but using the "designate other coding system" method. If UTF-8 is not "compatible" then neither is UTF-1. --157.52.11.237 (talk) 14:24, 24 October 2017 (UTC)[reply]

I agree this statement is confusing. UTF-1 is certainly not ISO-2022 conformant. "Compatible" means... what, exactly? 204.225.215.56 (talk) 03:27, 29 June 2019 (UTC)[reply]

Disadvantages list

 "Because of its escape sequences, it is possible to construct attack byte sequences that round-trip from ISO/IEC 2022 to Unicode and back."

This statement is incredibly scary and also confusing, which is not a good combination. What kinds of attacks? Why do escape sequences make them possible? What other encodings is this in contrast to (UTF-8?)? Why is round-tripping relevant?

There is a link, but there's not any more detail on the destination page, only discussion about popularity of encodings and discussion around measuring if they are in use in a specific piece of software.

Can someone please clarify this part of the article?

01:15, 31 July 2018 (UTC) — Preceding unsigned comment added by 178.197.231.171 (talk)