Talk:Unicode

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Typography (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 
WikiProject Computing (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 


Contents

[edit] misuse of the term Unicode

I added a very helpful comment recently along the lines of "the term unicode is frequently and incorrectly used to refer to UCS-2" and some smart removed it. My comment is a good one and deserves to remain on the page. Perhaps some people should learn from the experience of people who actually have to program computers and know a lot about how the term 'unicode' is used. —Preceding unsigned comment added by 212.44.43.10 (talk) 10:04, 24 September 2008 (UTC)

I removed the statement because it was an unsupported, subjective comment. In order to be accepted a statement such as this would need to cite an authoritative source. I am a programmer, and I think that I do know a lot about how the term 'unicode' is used, and I for one do not think that the statement that people commonly confuse Unicode with UCS-2 is correct. UCS-2 is an obsolete encoding form for Unicode, and if somebody does not know what Unicode really means they are hardly likely to know what UCS-2 is (or be able to distinguish between UCS-2 and UTF-16, which is what your comment implies). From my experience with ignorant fellow-programmers, I think that perhaps you mean that some programmers think that Unicode is more or less just 16-bit wide ASCII. BabelStone (talk) 09:04, 25 September 2008 (UTC)
In Windows programming the term "Unicode API" almost always means an API that accepts 16-bit "characters". It may treat it as UTF-16, or as UCS-2, or in most cases it really does not do any operations that would be different depending on the encoding. The real confusion is that "Unicode API" often means "not the 8-bit API" even though the 8-bit API can accept UTF-8 encoding.Spitzak (talk) 20:44, 19 November 2010 (UTC)

[edit] External Links

The external links section is now full of self-promoting links to various Unicode code chart viewers and character picker applications of dubious quality, many of them restricted to a subset of Unicode characters (e.g. only BMP or an old version of Unicode). I think it would be a good idea to rid the external links section of such self-promoting links, and only link to sites which do give further useful information about Unicode. BabelStone (talk) 10:25, 8 January 2009 (UTC)

I support your idea. Wikipedia is not a link repository. — Emil J. 10:56, 8 January 2009 (UTC)
A lot of Unicadettes seem to like Decode Unicode though. -- Evertype· 12:27, 8 January 2009 (UTC)
Yes, let's weed out any items of this sort which aren't the best of their kind. Michael Z. 2009-01-08 15:21 z
I agree now. -- Evertype· 18:46, 8 January 2009 (UTC)
At a minimum I suggest removing YChartUnicode and Table of Unicode characters from 1 to 65535 (including 64 symbols per page and 100 symbols per page), as these provide little or no useful information and are limited to the BMP. libUniCode-plus is a software library that seems to me to be of peripheral relevance and could also be removed (probably a link to ICU would be much more useful). Ishida's UniView supports 5.1, so I think it can stay, and although DecodeUnicode is only 5.0, it is probably still a good link to keep. BabelStone (talk) 13:47, 9 January 2009 (UTC)

I have written a short overview of UNICODE. I have tried to be short and crisp. Please add this in external links so that more people can use it. Short and Crisp Overview of UNICODE —Preceding unsigned comment added by Skj.saurabh (talkcontribs) 15:26, 5 March 2010 (UTC)

Please do not try to use Wikipedia to promote your personal website. Your page is not appropriate to link to from the article as it does not provide any information which is not already in the article or available directly from the Unicode website. Please read the policy on external links at WP:ELNO. BabelStone (talk) 09:53, 11 March 2010 (UTC)

In reply to this I will like to say that I have not opened a tutorial site. Therefore I do not need to publicize my site. We use this site internally for our company. We have made it publicly available since we thought that some of the resources on the net were not crisp or in-depth. I sincerely think that our article gives a better introduction on UNICODE that UNICODE site as well as Wikipedia. If you say that all information is there on these two sites then there is no need of Wikipedia also since all information on UNICODE is there on UNICODE site. Also Wikipedia page is more complex. A casual viewer will find it very technical and confusing. I found it and then had to consult many sources in order to right the page for myself. If you do not think it adds value I do not have any problem but I think people would have found our article useful. —Preceding unsigned comment added by 115.240.54.153 (talk) 14:48, 11 March 2010 (UTC)

[edit] How to enter unicode characters

This article completely fails in explaining to average people how to enter a special letter with an unicode code. There are countless lists of codes, but none explains how to enter the code to insert the desired leter into a text. —Preceding unsigned comment added by 95.88.121.194 (talk) 14:12, 21 April 2009 (UTC)

That is not the purpose of this page. Please see the Unicode input article, which is linked to from this article. BabelStone (talk) 15:12, 21 April 2009 (UTC)

[edit] Use of 16 bits

Twice, "It was later discovered that 16 bits allowed for far more characters than originally hypothesized. This breakthrough[...]" has been added to the article. It's wrong; that same escape mechanism could have been created by any programmer since the 1950s. It was political will, not any new discovery or breakthrough that created that additional character space.--Prosfilaes (talk) 23:25, 15 June 2009 (UTC)

Well, with Joe Becker's original plan there would have been no room for Egyptian Hieroglyphs, Tangut, Old Hanzi, etc. regardless of political will, so the invention of the surrogate mechanism did make it practically possible to encode such scripts. Of course, as you say, there still needs to be the political will to encode historic scripts, and that is something that Joe Becker and the other founding fathers of Unicode probably did not anticipate. BabelStone (talk) 11:45, 16 June 2009 (UTC)

[edit] codepoint-layout graphics on offer

I dropped in a graphic showing the layout of the Unicode planes that I think might be newbie-friendly, just above the table. I'm not expert enough an editor to know all the correct incantations for making it appear at just the right optimal width.

I also have uploaded a graphic of the BMP layout at Basic Multilingual Pane.png which perhaps could be worked into the BMP article, where there's already a graphic but this one is maybe a little prettier. Both of them are created in Apple's "Keynote" presentation tool, and I'm volunteering to do two things:

  1. Edit them to bring up to date with the latest/greatest Unicode revs
  2. upload the source format so that other people can take care of #1 in case I become evil or die; .key is not exactly an open format but it's editable on many computers out there. Is this a reasonable thing to think of doing? Tim Bray (talk) 07:03, 21 August 2009 (UTC)
Re 1: Note that the picture File:Unicode Codespace Layout.png already is outdated. The latest version of the Unicode standard is 5.1.0, and it contains 100,713 characters.
As for "making it appear at just the right optimal width": you can use e.g. [[File:Unicode Codespace Layout.png|thumb|300px|Layout of Unicode]]
Layout of Unicode
; the syntax is explained in detail at WP:EIS. — Emil J. 10:28, 21 August 2009 (UTC)
I don't think it is helpful to have an out-of-date graphic, especially as it is so in-your-face. I suggest removing it for now, and putting it back when it has been updated to Unicode 5.1. But be aware that 5.2 is due to be released at the end of September, so it may be best to wait a few weeks, and update a 5.2 friendly version once 5.2 has been released. BabelStone (talk) 12:30, 21 August 2009 (UTC)
Oh, and the labels on the graphic are really bad -- the SMP is not just "dead languages and math", and the SSP is not just "language tags". I strongly suggest changing the labels to simply use the plane names. Anyway, I am going to remove the graphic until it has been fixed to correspond to the current version of Unicode and has suitable labels. BabelStone (talk) 12:36, 21 August 2009 (UTC)
Actually, it occurs to me that it might be smart to remove the version-specific stuff; the large-scale block assignments aren't going to change anyhow. That way it wouldn't have to be re-drafted for each version. Will revise graphic and re-submit. Tim Bray (talk) 16:25, 21 August 2009 (UTC)
Yes, I think that's a good idea. But you should still be aware that Plane 3 (provisionally named the "Tertiary Ideographic Plane") is not yet live, but will be defined in Unicode 6.0 (next year) ... although after that the top-level allocation of code space should be stable for our lifetime (famous last words). BabelStone (talk) 22:48, 21 August 2009 (UTC)

[edit] ..."which uses Unicode as the sole internal character encoding"

i think this needs re-wording - in strict terms, "Unicode" is not an encoding. —Preceding unsigned comment added by 194.106.126.97 (talk) 08:13, 29 September 2009 (UTC)

[edit] Classical Greek version of Cyrillic

Where is the uppercase circumflex Omega in the Cyrillic alphabet? Does it even exist? If you turn the circumflex character on its side, it might look a lot like the rough or soft breathing mark, but that is not the same thing as a true circumflex. The uppercase circumflex Omega is not used in Modern Greek, I guess. 216.99.201.190 (talk) 19:26, 25 October 2009 (UTC)

[edit] ভাল অভ্যাস গড়ুন

গবেষণা থেকে জানা গেছে যে, শরীবর এবং মনের জন্য ইতিবাচক চিন্তাভাবনা হচ্ছে জরুরি। তাই মনের কারখানায় শুধুই ইতিবাচক চিন্তাভানা তৈরী করুন। ভাল বই পড়ুন। ইন্টারনেটে ভালভাল সাইটের সাথে থাকুন। বন্ধুদের মাঝে খারাপ বন্ধু থাকলে তাদের ছাটাই করুন। ধর্মীয় আত্ম-উন্নয়নের বইগুলো হচ্ছে ভাল বই। এগুলো নিয়মিত পড়ুন। ভাল সাথী হচ্ছে সে, যে বেশিরভাগ সময় উৎফুল্ল থাকে। জীবনে আলোর দিকটা তার নজরে পড়ে। এরা আপনার হৃদয়কে সারাজীবন আলোড়ীত করবে। তাই এদের সঙ্গ কখরো ত্যাগ করবেন না। খারাপ বন্ধু তা যতোই কাছের হোক না কেন, ত্যাগ করুন। নাহলে খারাপ চিন্তা আপনাকে আক্রান্ত করবে। মনে রাখবেন, ভাল চিন্তার চেয়ে খারাপ চিন্তাই মানুষকে বেশি আকর্ষন করে। —Preceding unsigned comment added by Monitobd (talkcontribs) 12:34, 13 January 2010 (UTC)

This isn't Devanagari (Hindi). I used script recognition software to find out what language this is, and apparently it's "Bishnupriya Manipuri". Can anyone read it? I searched everywhere, and there's not a single online translator. Should I just ignore it... Indigochild 01:42, 12 April 2010 (UTC)

[edit] Who does need to understand the importance of unicode?

this link should be inserted somewhere:

Yeah, somewhere, just not on Wikipedia. See WP:ELNO. BabelStone (talk) 10:06, 11 March 2010 (UTC)

[edit] Formatting References

I've taken to formatting some of the bare URLs here, using the templates from WP:CT. Omirocksthisworld(Drop a line) 21:25, 16 March 2010 (UTC)

[edit] Use template:code?

Should we use the template like {{code|U+012F}} for ​U+012F​ to express Unicodetext? To me it looks sound. -DePiep (talk) 01:25, 6 May 2010 (UTC)

Yes, I think that is a good idea. BabelStone (talk) 14:05, 17 July 2010 (UTC)
YesY Done Somewhat differently. See {{unichar}} -DePiep (talk) 22:02, 19 November 2010 (UTC)

[edit] vandalism?

the edit on 20:52, 21 May 2010 by 188.249.3.139 shouldn't be reverted? —Preceding unsigned comment added by 193.226.6.227 (talk)

[edit] Unicode is not just text

The lead-in paragraph associates Unicode as something that is just for text, but than is not correct. Consider that there are control code points (e.g., ASCII control codes) and symbols (e.g., dingbats). Also, code points, characters and graphemes are separate things but the lead-in notes that 107,000 characters (not code points nor graphemes) are presently in the standard. I haven't read the rest of the article. —Preceding unsigned comment added by TechTony (talkcontribs) 12:59, 17 July 2010 (UTC)

"Plain text" by Unicode's terminology though, no? — Preceding unsigned comment added by 193.120.165.70 (talk) 10:22, 29 November 2011 (UTC)

[edit] Unicode block names capitalization (Rename and Move)

Here is a proposal to rename Unicode block names into regular (Unicode) casing, e.g. C0 controls and basic Latin be renamed and moved to C0 Controls and Basic Latin. Some 18 block pages are affected. -DePiep (talk) 09:08, 6 October 2010 (UTC)

The Outcome is: lets do it. See the C0-link. -DePiep (talk) 22:01, 19 November 2010 (UTC)

[edit] Categories

Where is the list of categories of unicode characters? I found the arrow category article. I want to see other categories. I go to this article, and no list of categories to be found. Rtdrury (talk) 19:44, 8 February 2011 (UTC)

As the TOC says, here: Unicode#Character_General_Category. -DePiep (talk) 20:31, 8 February 2011 (UTC)

[edit] Nomination for deletion of Template:UCS characters

Ambox warning pn.svgTemplate:UCS characters has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. DePiep (talk) 11:54, 2 February 2012 (UTC)

This template is now being proposed & developed into an infobox. See Template:UCS characters/sandbox and Template talk:UCS characters. Since my communication with my opponent is not clear, I'd like someone else to join in there. I am not yet convinced of the new form it has, but I don't want to throw away a good idea too. -DePiep (talk) 10:12, 17 February 2012 (UTC)

[edit] Nomination for deletion of Summary of Unicode character assignments

Ambox warning pn.svgSummary of Unicode character assignments has been nominated for deletion. You are invited to comment on the discussion at the article's entry on the Articles for deletion page.BabelStone (talk) 23:36, 3 February 2012 (UTC)

[edit] AfD notice Mapping of Unicode graphic characters

For deletion. See Wikipedia:Articles for deletion/Mapping of Unicode graphic characters. -DePiep (talk) 00:39, 12 February 2012 (UTC)

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export