Talk:UTF-1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

however, this talk page is empty. :( 46.113.138.73 (talk) 06:16, 24 February 2015 (UTC)[reply]

ASCII Backwards Compatibility[edit]

I undid revision 953809602 because the reasoning was incorrect -- UTF-1 is indeed backwards compatible with ASCII, in that all ASCII is correct UTF-1. This reasons for why this is useful should be obvious. Other than that, the additional grammatical and semantic changes of that revision appeared to be unnecessary, and even made things more confusing. — Preceding unsigned comment added by Maschinengott (talkcontribs) 20:42, 29 April 2020 (UTC)[reply]

You are probably right, but I wanted to fix one error, take a look.Spitzak (talk) 21:57, 29 April 2020 (UTC)[reply]
I undid this edit as well, since the error you mention is not actually present. The range of byte values in question are said to be included among the aforementioned *single-byte* encodings of UTF-1, not ASCII. However, I can see how this might be unclear to some people, so I'll make this distinction more verbose.Maschinengott (talk) 18:09, 16 May 2020 (UTC)[reply]
I undid your subsequent edit. In addition to the typo, your changes redundantly restate UTF-8's support of ASCII. My version also makes the awkward parenthetical construction unnecessary, since the former unambiguously refers to the single-byte encodings of UTF-1 via the antecedent. — Preceding unsigned comment added by Maschinengott (talkcontribs) 22:43, 18 May 2020 (UTC)[reply]
Your last change had the following summary:

→‎Design: I really don't like this, because it is *like* UTF-8, the text implies that single-byte ascii is "unlike UTF-8". Maybe this will work

The previous text in fact does *not* imply that single-byte ASCII is unlike UTF-8. It's stating that the *single-byte encodings of UTF-1* are unlike those of UTF-8, which is correct, and which I find important enough to warrant a mention. Your interpretation requires that "single-byte encodings" acts as a nominative predicate for "ASCII", which wouldn't even make sense; the former is clearly referring to the UTF-1 encodings described in the preceding sentences.

The table[edit]

The last entry in the table is U+7FFFFFFF - FD BF BF BF BF BF - FD BD 2B B9 40. This cannot be correct, as Unicode (and UTF-8 by extension) cuts off at U+10FFFF. The maximum value the 21-bit encoding could represent is U+1FFFFF. Anything beyond that is undefined. Although if you were to logically extend UTF-8 using the existing patterns, then U+7FFFFFFF would have to be be fit into a 6-byte, 31-bit encoding 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, and would be encoded as FD BF BF BF BF BF. But that's not UTF-8. That would be something else. Maybe some day there'll be a "UTF-8x" that is fully unlimited and backward-incompatible with other UTF encodings, and we'll be using 64-bit codepoints and have quintillions of characters we can represent, but today is not that day. 97.123.119.6 (talk) 21:54, 15 August 2021 (UTC)[reply]