Talk:UTF-9 and UTF-18

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing  
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.

Untitled discussion[edit]

What does "standard communication protocols are built around octets rather than nonets" mean? Is this an assertion that there are no standard protocols for computers with 9-bit bytes, such as the PDP-10? —Preceding unsigned comment added by (talk) 11:09, 20 July 2007

A protocol that requires 9-bit bytes isn't likely to become a standard. 9-bit machines on the Internet use the same octet-based protocols as everyone else, generally with some adaptation. FTP, for instance, can operate in text mode, unpacking characters into an octet each, or image mode, packing 2 36-bit words into 9 octets. Yuubi 20:26, 11 October 2007 (UTC)

There is another encoding called "UTF-9"(draft-abela-utf9-00.txt) aim for similar target. Roytam1 (talk) 12:04, 2 May 2010 (UTC)

Indeed, that's a rather silly comment, since the whole point of these UTFs is for use on 9-bit, 18-bit and 36-bit systems. I think the only reason someone thought it funny enough for 1 April is that these machines are obsolete, though they (and hence UTF-9 and UTF-18) might be of interest to retrocomputing enthusiasts and esolangers. Obviously you wouldn't seriously use them on octet-based systems - you'd use UTF-8, UTF-16 or UTF-32. As such, I'd be inclined to reword the comment, if not remove it. I'll see what I can come up with. — Smjg (talk) 01:01, 4 September 2013 (UTC)

these machines are obsolete

For what it's worth, while the PDP-10 may indeed be obsolete, Unisys continues to produce, advertise, sell, maintain and operate their 1100/2200 series 36-bit hardware. I develop on such systems, which enjoy most of the usual "comforts" of modern computing environments including compilers for C and Java, a TCP/IP stack, FTP, email and other TCP/IP-based protocols such as OLTP. To someone accustomed to other hardware, it can be a bit disconcerting to realize that the internal size of a char is 9 bits. Extra care to this issue is needed on interfaces to "the outside world." Carl Smotricz (talk) 13:09, 5 February 2015 (UTC)
"produce, advertise, sell [..] 36-bit hardware", that is VERY interesting.. I thought long dead. Still, just as the ternary Setun, these are very unpopular now (and likely will remain for the foreseeable future); already compatible with 8-bit encodings and 8-bit bytes (either through subset of nine, or theoretically could be forced?) in C (POSIX demands it, C doesn't), so I find UTF-9 etc. to be a very unlikely future. The point of the standards is communications between systems. No inherent (multiple of) 8-bit system will use UTF-9 (yes, possible theoretically). YOU want compatibility with us, not vice versa. I find it regrettable that the section was deleted. This is humorous, was never meant otherwise. (Maybe a source for unlikely can be found..). comp.arch (talk) 13:15, 12 October 2016 (UTC)


UTF-12 has been invented recently, too. See it here. — Monedula (talk) 11:29, 17 June 2010 (UTC)

Looks like someone's personal invention, not a standard, so probably not worth covering in Wikipedia for now. -- intgr [talk] 14:38, 17 June 2010 (UTC)

UTF-9 and UTF-18 aren't standards either. As for "not worth covering", that applies to this article about a joke RFC ... not at all notable. -- (talk) 01:32, 5 February 2015 (UTC)

UTF-9 alleged problem[edit]

I don't think the alleged problem with UTF-9 exists, or else I don't understand the problem. Since one would never search for partial characters (a nonet sequence starting with the second or third nonet of the first character of the search string, or ending before the final nonet of the last character of the search string), an exact, unambiguous match does not require looking at any nonets prior to the first nonet of the match candidate.

If there isn't a clarification and or an example illustrating this alleged problem, it should be removed from the article. --Brouhaha (talk) 07:07, 21 February 2015 (UTC)

I agree. At the very least this smacks of WP:OR. It's also WP:undue since the problem section is as big as the rest of the article put together. Cut this way back, or eliminate it. Kendall-K1 (talk) 14:04, 21 February 2015 (UTC)

I have removed this section per WP:V: "All material in Wikipedia mainspace, including everything in articles, lists and captions, must be verifiable. All quotations, and any material whose verifiability has been challenged or is likely to be challenged, must include an inline citation that directly supports the material. Any material that needs a source but does not have one may be removed." Kendall-K1 (talk) 12:46, 12 October 2016 (UTC)

Everything in the article, including the unnecessarily removed Problem section (which I restored to improve it and add an example of the UTF-9 encoding flaw), is verifiable by reading the very RFC this article is based on. Please give others the chance to improve the article, as the UTF-9 encoding is a good example of the difficulties of devising encodings with good properties. (talk) 12:51, 12 October 2016 (UTC)
An example of the UTF-9 flaw can be pulled from an example in the RFC itself: U+0041 is represented in octal as 101 and U+E0041 as 416 400 101. (talk) 12:59, 12 October 2016 (UTC)
In the future you can avoid this kind of problem by giving a reason for reverting. In this case you reverted my change which had a summary of "remove unsourced OR; see talk page". So you could give your reasoning for reverting me on the talk page before you do the revert, and leave a summary of "see talk page." If you want to be polite you can discuss the change on the talk page and gain consensus before making the change. When you just revert without giving a reason, we don't know why you reverted, and are likely to revert back. Kendall-K1 (talk) 14:27, 12 October 2016 (UTC)
I made some changes to show how the UTF-9 encoding works and to explain why the encoding is not self-synchronizing. Hopefully the way I introduced these changes is more acceptable than the previous Problems section. (talk) 14:42, 13 October 2016 (UTC)