Talk:Universal Character Set

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Character Set vs. Character Encoding[edit]

We must be very clear in the distinction between a character set and character encoding. A character set defines a set of characters...um...how better to explain that: for instance, a character set could be the set containing the first four characters of the English alphabet -> {a,b,c,d}. An encoding is how the characters in a specific character set are actually stored as binary data. UTF-8 uses chunks of 8 bits to cover as much ground of the UCS as possible.

Anyway, my point is: the first sentence of this article previously equated UCS and character encodings. This is a really (relatively) grave error since it could confuse the bejeesus out of people. GodzillaWax 15:37, 20 September 2007 (UTC)

On a deleted sentence[edit]

At the end of the section on the differences between Unicode and ISO 10646, I had written the following sentence:

The Firefox browser and the OpenOffice.org suite can handle such characters, on Linux too, supporting Unicode and not just ISO 10646.

That sentence was deleted with the notice "Deleted a non pertinent sentence wich looked like advertisement." I beg to differ with that verdict.

I can concede that the sentence was not worded the best way. However, it is neither nonpertinent nor an advertisement. I wrote it to contrast applications which support Unicode (Mozilla and OpenOffice.org) with applications which support only ISO 10646 (Linux xterm). It is fully germane to this article and section, and any intention of advertising those applications was totally absent from my mind.

I will leave things as they currently are, but I wish it to be known that the charges are incorrect, and I hope some other user with the inclination for it would rewrite it in a wording that does not lend itself to those charges. --Shlomital 21:29, 2005 Feb 20 (UTC)

Article title change[edit]

Since Universal Character Set is a proper noun/proper name, as of today I have moved the article from Universal character set to Universal Character Set. — mjb 23:26, 20 Jun 2005 (UTC)

How about Chinese and Japanese?[edit]

Can you add a comment on Chinese and Japanese (and some other languaga, like hieroglyph) which can go not only horizontally bi-directional but also vertical down?

Thanks

Unicode and ISO 10646 distinctions and discussion of the character repetoire[edit]

I added a paragraph about the differences between Unicode and ISO 10646. I think the article could use more elaboration on these distinctions and to help drive home the particular innovations of Unicode.

I've also been working on a table that nicely summarizes the characters of the UCS (as of 5.0). My thinking is that this table colud serve as a departure point to link to other articles (or sections of this article) discussing the various scripts and other character blocks in more detail. Wikipedia already has individaul articles covering most of the scripts of UCS (the article could use a small discussion on the UCS use of the term script too). Also, the phonetic blocks could link to articles on the IPA and other relevant articles.

However, I've also been working on drafting portions to discuss the other character blocks: symbols; unified punctuation; unified diacritics, Unihan and CJK supporting characters; compatibility characters; control and formatting characters (such as glyph variant selectors, bidi characters, joiners, non-joiners and language tag characters), surrogates; and private use code points. Compatibility characters is especially a complicated topic that could use some eleaboration. The various symbol blocks are also vary specialized and some discussion of how they're used would be helpful. To me this is the type of information that a general audience would expect from an encylopedia artilce on the UCS (in addition to the topics already covered). It might also help more techincal readers as well. There are so many basic concept surrounding UCS and Unicode that seems to escape implementors of UCS and Unicode supporting text systems.

I'll likely post soemthing here to this duscssion page before posting it to the article. I'm still working on the formatting (I'm not that familiar with Wikimedia’s table markup, so it’s in plain old html table markup) Indexheavy 09:37, 19 April 2007 (UTC)

I now see that some of what I propose is handled in a separate article: Mapping of Unicode characters. Perhaps that article could be summarized in a section of this article. The summary table I'm preparing might fit better in that article. Indexheavy 15:10, 19 April 2007 (UTC)
I added the summary/categorized table of the UCS as I said I would. I added it to the mapping article. Anyone else is welcomed to jump in on these tasks. --Indexheavy 01:20, 25 April 2007 (UTC)

this has nothing to do with the content of the text[edit]

tried for a full five minutes to find an actual character map, to look up the Alt-code for the plus/minus sign. Couldn't link. Did get extensive, verbose, and redundant information on the history of, and subtle differences between the various UTF and ISO standards. Fascinating... but should we make these pages a QuikFix InfoBooth, or a "Jolly good read, wot?!". I'm not doing a project, I just needed a detail, and we should diversify into linked media to demonstrate the explanations and classifications given by the parent article. —Preceding unsigned comment added by 124.185.183.250 (talkcontribs) 00:51, 25 May 2007

We should make this page a "Jolly good read, wot?!", not a QuikFix InfoBooth. Wikipedia is not a complete exposition of all possible details. Perhaps this page should link to the Character Names Index page on the unicode.org Web site, but it shouldn't duplicate any of the character tables. Guy Harris (talk) 19:00, 12 November 2010 (UTC)

ISO/IEC 10646 vs. ISO/IEC 646?[edit]

It appears that the ISO/IEC standard number 10646 was deliberately chosen to recall ISO/IEC 646, to which the UCS is arguably a successor. Is this encyclopedic enough to bother looking for a good citation? Sw2k7 (talk) 06:43, 8 September 2008 (UTC)

It has the merit of being true. I don't know if we can find a citation for it however. -- Evertype· 16:59, 8 September 2008 (UTC)
If the email interview at [1] was ever published somewhere, I think that would be a citable source. Google search is your friend... --Alvestrand (talk) 05:08, 9 September 2008 (UTC)
Actually, Hugh McGregor Ross is my friend and I remember him telling me this as well... just didn't know that it'd been published anywhere. I'd consider Bob's interview with Hugh to be "citable" however. -- Evertype· 07:10, 9 September 2008 (UTC)

correction ?[edit]

the article's comparison of UNICODE to UCS states

Unicode provides: exclusively 16-bit code;

Is this strictly true? Of UTF-8 when capturing, say, ASCII 7-bit text? G. Robert Shiplett 17:29, 5 March 2011 (UTC) — Preceding unsigned comment added by Grshiplett (talkcontribs)

Where are the UCS abstract codes[edit]

"...The UCS contains nearly one hundred thousand abstract characters..."

as mentioned in the first paragraph of the article that there are about UCS's hundred thousand abstract characters so where are they why they are not mentioned in the main article.

Alijamal14 (talk) 23:35, 1 January 2012 (UTC)