User talk:Spitzak

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Hello Spitzak, and welcome to Wikipedia!
(If you find this note to be too bulky, feel free to remove it whenever you want)

Thank you for your contributions, we are pleased to have you here and I hope you stay. I recommend you have a quick look through the Five Pillars of Wikipedia - this page gives a brief summary of what we value here, and if you want some tutorial on how to edit have a glance at Wikipedia:Welcome. Below is a collection of some other pages that you may find helpful, feel free to read them at your leasure if and when you want. (But of course you don't have to read any of that to contribute!)

If you need help with anything, please feel free to contact me on my talk page. Or altertnatively type {{helpme}} here and a user will help you as soon as possible. But remember to sign all your posts on talk/discussion pages with ~~~~, this helps us track of who's saying what and when in discussions.

Once again Welcome to Wikipedia, and happy editing!

Konst.able 19:45, 17 October 2006 (UTC)
Getting started
Getting your info out there
Getting more Wikipedia rules
Getting help
Getting along
Getting technical
Wikimedia.png

Image without license[edit]

Unspecified source/license for Image:Utf8diagram.png[edit]

Copyright-excl.svg

Thanks for uploading Image:Utf8diagram.png. The image has been identified as not specifying the copyright status of the image, which is required by Wikipedia's policy on images. Even if you created the image yourself, you still need to release it so Wikipedia can use it. If you don't indicate the copyright status of the image on the image's description page, using an appropriate copyright tag, it may be deleted some time in the next seven days. If you made this image yourself, you can use copyright tags like {{PD-self}} (to release all rights), {{self|CC-by-sa-3.0|GFDL}} (to require that you be credited), or any tag here - just go to the image, click edit, and add one of those. If you have uploaded other images, please verify that you have provided copyright information for them as well.

For more information on using images, see the following pages:

This is an automated notice by MifterBot. For assistance on the image use policy, see Wikipedia:Media copyright questions. NOTE: once you correct this, please remove the tag from the image's page. --MifterBot (TalkContribsOwner) 00:29, 5 September 2008 (UTC)

File permission problem with File:Nuke screenshot.png[edit]

File Copyright problem

Thanks for uploading File:Nuke screenshot.png. I noticed that while you provided a valid copyright licensing tag, there is no proof that the creator of the file agreed to license it under the given license.

If you created this media entirely yourself but have previously published it elsewhere (especially online), please either

  • make a note permitting reuse under the CC-BY-SA or another acceptable free license (see this list) at the site of the original publication; or
  • Send an email from an address associated with the original publication to permissions-en@wikimedia.org, stating your ownership of the material and your intention to publish it under a free license. You can find a sample permission letter here.

If you did not create it entirely yourself, please ask the person who created the file to take one of the two steps listed above, or if the owner of the file has already given their permission to you via email, please forward that email to permissions-en@wikimedia.org.

If you believe the media meets the criteria at Wikipedia:Non-free content, use a tag such as {{non-free fair use in|article name}} or one of the other tags listed at Wikipedia:Image copyright tags#Fair use, and add a rationale justifying the file's use on the article or articles where it is included. See Wikipedia:Image copyright tags for the full list of copyright tags that you can use.

If you have uploaded other files, consider checking that you have provided evidence that their copyright owners have agreed to license their works under the tags you supplied, too. You can find a list of files you have uploaded by following this link. Files lacking evidence of permission may be deleted one week after they have been tagged, as described on criteria for speedy deletion. If you have any questions please ask them at the Media copyright questions page. Thank you. MBisanz talk 02:11, 22 July 2009 (UTC)

MS-DOS 1.0[edit]

It used the FAT-12 filesystem on 160kb single-sided 8-sector 5¼"-inch floppies. It was extremely primitive in some respects, yet still a great advance over commonly-used CP/M filesystems, since the exact file length, file modification date and time, etc. were recorded. Subdirectories were added in DOS 2.0, yet the DOS 1 directory entry format remained unchanged until the introduction of LFNs in Windows 95... AnonMoos (talk) 12:29, 2 December 2009 (UTC)

UTF-16[edit]

Hi, I reverted you deletions in UTF-16, see edit summary. Probably you have a point in some deletions, but I did not see that in the whole. btw for my understanding, the thing "word" (as a bitlength unit) is not used in Unicode, so that makes it hard to understand for me. -DePiep (talk) 22:48, 21 June 2010 (UTC)

UTF-8's compactness[edit]

Hi, I noticed you removed my addition to UTF-8 explaining that UTF-8 is popular in part because it is more compact than UTF-16 and UTF-32. I don't understand why, though, because the current wording (which is the same, or very nearly the same, as what it said before I added this part) suggests that the compactness of UTF-8 for Western European languages is not a significant reason for its popularity, because it cites its ASCII compatibility as the only reason ("for this reason", to me, suggests no other possible reasons), which I have a bit of a hard time believing. You also said "lots of other rejected multibyte encodings are shorter", but I don't understand why that's relevant or even what these encodings are... - furrykef (Talk at me) 00:32, 14 October 2010 (UTC)

I believe the reason for UTF-8 popularity is the ASCII compatibility, not compactness. An encoding that reused the ASCII bytes as part of larger characters would be more compact, and this is what most alternatives to UTF-8 did. (the other reason is that other multibyte encodings did not map all of Unicode or made the mapping hard to figure out). Comparing it to UTF-16 for size here does not make sense, as the reason it wins over UTF-16 is certainly compatibility, UTF-16 is incompatible with every single possible ASCII string!
I don't see why non-Unicode encodings are relevant here. When we talk about the popularity of UTF-8, I think one would generally assume "as opposed to other Unicode encodings", since, as you said, non-Unicode encodings generally don't cover the Unicode set.
In any case, I think my main issue here is that you seem insistent on citing compatibility as the only reason for UTF-8's popularity over UTF-16. Surely the size factors into it at least a little? If I were to store big heaps of Japanese text, for example, I would use UTF-16 (unless I thought there was a high probability that the files would need to be used with a program that only understands UTF-8). - furrykef (Talk at me) 03:48, 15 October 2010 (UTC)

License tagging for File:Unicode 2400 Chrome Ubuntu.png[edit]

Thanks for uploading File:Unicode 2400 Chrome Ubuntu.png. You don't seem to have indicated the license status of the image. Wikipedia uses a set of image copyright tags to indicate this information; to add a tag to the image, select the appropriate tag from this list, click on this link, then click "Edit this page" and add the tag to the image's description. If there doesn't seem to be a suitable tag, the image is probably not appropriate for use on Wikipedia.

For help in choosing the correct tag, or for any other questions, leave a message on Wikipedia:Media copyright questions. Thank you for your cooperation. --ImageTaggingBot (talk) 07:05, 26 October 2010 (UTC)

ARF CLI GUI etc[edit]

Please respond at: Talk:Abort, Retry, Fail?#Take two. —DragonHawk (talk|hist) 11:38, 24 March 2011 (UTC)

Still waiting. Please respond. —DragonHawk (talk|hist) 02:15, 4 May 2011 (UTC)

Imposter[edit]

I blocked and cleaned up after that person who was trying to impersonate you. -- Gogo Dodo (talk) 05:50, 8 April 2011 (UTC)

I don't understand.[edit]

Can you explain your checkin note in the NeWS article? You added "Actually pure PS could not no matter how much helped", but I can't understand what this means in the context of the edit. Maury Markowitz (talk) 11:07, 28 June 2011 (UTC)

Actually I'm not sure. It may make sense, but it seemed to me that the added text was just useless filler that provided no information. NeWS itself "needs additional software" (ie the operating system, probably other stuff) in order to work, too. I'm guessing you are saying that DPS could be used for windows provided you use something else for creating the windows and handling all the i/o, but when you specify it that way it is a true statement for any library, that it could be *part* of a windowing system. You could argue that DPS is designed for output, but when you do that you have to define X11 as being part of it in which case you might as well claim they are integrated and thus DPS+X11 is capable "without additional software".
Ahhh. OK I do believe this still needs mentioning in the article, but I'll fix the context. Maury Markowitz (talk) 12:12, 29 June 2011 (UTC)
Actually you're right, in the intro it's not needed at all. Maury Markowitz (talk) 12:14, 29 June 2011 (UTC)

i'm not getting why you have deleted the information on the page strcat from the section strcat_s. here i'm going to undo it. if you have any problem tplease drop message on my talk page. and give your suggestions. == Prasannjit Gondchawar (talk) 19:16, 1 October 2011 (UTC)

about deletion of the information from the section strcat[edit]

i'm not getting why you have deleted the information on the page strcat from the section strcat_s. here i'm going to undo it. if you have any problem tplease drop message on my talk page. and give your suggestions. == Prasannjit Gondchawar (talk) 19:17, 1 October 2011 (UTC)

Disambiguation link notification for March 13[edit]

Hi. When you recently edited Code page 437, you added links pointing to the disambiguation pages !! and 1/4 (check to confirm | fix with Dab solver). Such links are almost always unintended, since a disambiguation page is merely a list of "Did you mean..." article titles. Read the FAQ • Join us at the DPL WikiProject.

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 10:58, 13 March 2012 (UTC)

"Seek is O(1) in code units."[edit]

Can you give an algorithm demonstrating that? To find the nth character, do you not have to examine the preceding ones to determine that you indeed have the nth? (Of course, there are other issues here as well: the O(1) algorithm for char* is obviously at risk for buffer overruns, e.g., unless you have a solid upper bound, and can be fooled even then.) -- Elphion (talk) 16:27, 21 April 2012 (UTC)

Oh, I see: I was misreading "code unit" as "character". But this is not interesting: no one is interested in seeking to the nth code unit unless you already have something like a lut for the string converting Char(n) into CU(m) (or more broadly, a list of starting points that you interested in -- beginning of paragraphs, etc. -- i.e., something you get by already having scanned the text). -- Elphion (talk) 16:32, 21 April 2012 (UTC)
I should be clearer: I am "on your side" here -- the argument that UTF8 or UTF16 strings can't be treated as arrays is a red herring, because strings shouldn't be treated as arrays until they have been thoroughly scanned. If one truly needs an array of the characters, it can be built during the scan. But the argument that seeking the nth CU is O(1) is irrelevant to this. -- Elphion (talk) 16:53, 21 April 2012 (UTC)
The mistake you are making is thinking that there is a need to count "characters" at all. First of all the word "character" is poorly defined in Unicode (it depends on the interpreted normalization and quite a few code points may not be "characters" so it is impossible to count them except by string scanning, in any encoding. I suspect however you mean "Unicode code points" when you say "characters". Or perhaps "UTF-16 code units" (where Unicode code points greater than U+FFFF are 2 units). I hope you can see from even these few examples, where I am unsure what you intend, why talking about "characters" is a bad idea.
In any case this makes as much sense as saying there is a need to find the N'th word or letter 'x' or anything else in O(1) time. There is no need for this, and text processing is quite fast despite the inability to do searches in less than linear time. The problem is that you need to remember *offsets* into strings and it is desirable to turn an offset into a pointer to the character in O(1) time. The obvious solution that any programmer should think of is to use fixed-size units for this "offset", in fact it is such a no-brainer that it seems hard to believe anybody would ever think otherwise. However decades of indoctrination where every man page says "characters" when talking about offsets seems to have turned even experienced programmers into complete morons when they encounter UTF-8.Spitzak (talk) 01:57, 24 April 2012 (UTC)

Talk:UTF-8 #Byte vs Octet[edit]

[1] Please, restrain from personal attacks and assuming a bad faith, especially if there are no on-wiki evidences. Even if you allege to know something important about the real-life identity of that user, Wikipedia does not serve for spreading such rumours. Incnis Mrsi (talk) 06:04, 8 May 2012 (UTC)

Yes sorry, that was stupid. It was obviously a good-faith edit.Spitzak (talk) 22:59, 8 May 2012 (UTC)

UCS2 and UTF-16[edit]

Just curious: why is UTF-16 not an extension of UCS2? While it's true that the codepoints assigned to surrogate pairs are no longer available in UTF-16, those values had no character assignments in UCS2, so they did not lose their "original meaning". Are there other codepoints that were sacrificed in the transition? -- Elphion (talk) 21:09, 13 July 2012 (UTC)

Lexicographic[edit]

Hi -- not arguing the merits of your change, but pointing out that since the wording was being discussed on talk page, that's where you should have floated the change. Otherwise it's a quick descent into edit warring! -- Elphion (talk) 13:04, 18 September 2012 (UTC)

EBCDIC[edit]

You said

You really think "accustomed to ASCII" is why this was confusing? Really? Give me a break

Before EBCDIC and ASCII were developed, Variants of BCD were the most common character codes, and they all have non-contiguous alphabets smilar to EBCDIC, so EBCDIC probably wouldn't have been confusing then. I do think it was only after programmers becaame used to ASCII that anyone even gave it any thought. I've posed it as a question on the EBCDIC talk page. Peter Flass (talk) 22:46, 21 September 2013 (UTC)

EBCDIC[edit]

You said

You really think "accustomed to ASCII" is why this was confusing? Really? Give me a break

Before EBCDIC and ASCII were developed, Variants of BCD were the most common character codes, and they all have non-contiguous alphabets similar to EBCDIC, so EBCDIC probably wouldn't have been confusing. I do think it was only after programmers becaame used to ASCII that anyone even gave it any thought. I've posed it as a question on the EBCDIC talk page. Peter Flass (talk) 22:46, 21 September 2013 (UTC)

Nak[edit]

Please don't remove sourced content. A "nak" is a female "yak" - it's in the yak article. Rklawton (talk) 03:13, 23 October 2013 (UTC)

About the slashes[edit]

About Slash_(punctuation)#Encoding. The facts are clear (including the Unicode mislead), but I think we could get the prose better.

How about the section intro setup like: "Slashes are encoded in Unicode as ... and ...". But the Unicode naming is controversial/disputed.

(then the next paragraph says:) Typographically ... (zoom in on diffs).

One issue is, we should not push both definition and naming issue in one paragraph "encoding". What do you think? -DePiep (talk) 18:26, 17 April 2014 (UTC)

UTF-8 and ASCII backward compatibility[edit]

Hello there! Regarding our recent back-and-forth on UTF-8 and ASCII backward compatibility, please allow me to explain.

In a few words, a text editor which is only ASCII-aware can be used to process 7-bit ASCII subset of the UTF-8 data only, for example, and everything else is pretty much garbled to the end-user using such a text editor. If we take an API client as another example, it can also understand 7-bit ASCII subset only; everything else is garbage to anything speaking only ASCII, and any changes performed outside the 7-bit ASCII subset are going to break UTF-8's multibyte characters.

Hope it makes sense, and of course, I'm more than open to discussing this further. — Dsimic (talk | contribs) 06:58, 22 April 2014 (UTC)

"processing text" does NOT mean "text editor". What I meant is that code like this:
 printf("<some utf-8 here> %d\n", 10);
will work even if the C compiler and library does not know anything about UTF-8. The only requirement is that bytes with the high bit set are passed unchanged by the compiler, printf, and the output driver (it will help if the output is drawn on a terminal that understands UTF-8, but even if it is redirected to a file that is eventually displayed on a UTF-8 aware editor this works). This is true of virtually every language written to handle ISO-8859-1 or even the older IBM PC code pages.
Claiming the program must know something about the code point boundaries is as false as claiming it is impossible to process English text unless the program includes an english dictionary.— Preceding unsigned comment added by Spitzak (talkcontribs)
You're absolutely right with the above printf() example, and a bit later I'll change the wording in the article so it's covered. — Dsimic (talk | contribs) 15:23, 22 April 2014 (UTC)
Just saw that you've already reverted my edits. Well, you can have it that way if you insist, though saying that ASCII-aware software "can process UTF-8 data as well" is somewhat misleading, as "processing" isn't well defined and can mean many things. — Dsimic (talk | contribs) 15:28, 22 April 2014 (UTC)
Well, as always, things aren't that simple. For example, what if we had something like this, what's perfectly fine to be expected from an ASCII-aware application handling some ISO-8859-1 text:
printf("<some utf-8 here> %s\n", "<some iso-8859-1 here>");
That's happily producing invalid UTF-8 output. Thoughts? — Dsimic (talk | contribs) 15:40, 22 April 2014 (UTC)
That example will still produce correct UTF-8 both before and after the ISO-8859-1 text. Depending on what is reading the output you will either see error indicators inside that text, or it will be recognized as ISO-8859-1 and rendered correctly.Spitzak (talk) 18:38, 22 April 2014 (UTC)
How can UTF-8 be recognized as ISO-8859-1? — Dsimic (talk | contribs) 18:46, 22 April 2014 (UTC)
Because it won't be valid UTF-8 (unless it is ASCII or an extremely unlikely arrangement of two or three letters and symbols in a row). This error can be detected by looking at no more than 4 bytes in order to determine that the first byte must not be the start of a UTF-8 character, there is no need to find the "ends" of the ISO-8859-1, this is strictly a one-pass algorithm. The display can then do something with this byte, such as show an error indicator, or guess that it is in ISO-8859-1 (or CP1252). It can then continue interpreting with the next byte, which will allow this to repeat if there is a long sequence of non-UTF-8 in the text. This will re synchronize correctly on the next valid UTF-8 code point.Spitzak (talk) 19:13, 22 April 2014 (UTC)
That's all fine, but the point is that using ASCII-aware software provides forward compatibility with UTF-8 only in case 7-bit ASCII characters are used in combination with untouched multibyte UTF-8 characters. Why would those 128+ single-byte characters have to be ISO-8859-1 or CP1252? Why wouldn't they actually be Windows-1253, for example? Anything beyond 7-bit ASCII is a plain guessing game, if you agree. — Dsimic (talk | contribs) 19:33, 22 April 2014 (UTC)
Sure, but I fail to see what a "UTF-8 aware" program could do that is any better if it is given a block of bytes that is not UTF-8. Yea it can throw an exception, but IMHO that is *worse*, not better. The trivial operation of copying a block of bytes that is IMPOSSIBLE to confuse with valid data is the correct behavior.Spitzak (talk) 23:50, 22 April 2014 (UTC)
Well, I'd say that we're pretty much on the same page, and the whole confusion came from the vague definition of "processing". At the same time, the subset of bytes that can't be mistaken (in both directions) is the 7-bit ASCII. If it goes into 8-bit ASCII, backward/forward compatibility breaks due to design of UTF-8. Agreed? — Dsimic (talk | contribs) 02:59, 23 April 2014 (UTC)
Yes, any program that assigns a meaning to an 8-bit byte will fail to handle UTF-8, mostly because they may change this byte to a different value. The biggest problems are programs that make NEL (0x85) and non-breaking space (0xA0) into whitespace characters. But there is a lot of programs, in particular both compiled and interpreted languages, that assign no meaning to any bytes between quotes in a string constant other than the quotes and backslash, which are ASCII.Spitzak (talk) 03:22, 23 April 2014 (UTC)
... and other than dollar signs and curly brackets – as that's the case for PHP, for example, which is still good. — Dsimic (talk | contribs) 03:42, 23 April 2014 (UTC)

Du erhältst einen Orden![edit]

Minor Barnstar Hires.png Der Detailorden
For your FLTK change Polluks 12:10, 28 August 2014 (UTC)

Reversion without explanation[edit]

When you revert a good-faith edit (as you did here), it's best to provide a reason for the reversion. Otherwise the editor whose edit you reverted can't learn from their mistake. In this case, your reversions seem pretty low-effort, given that you could have also fixed the other occurrences of the word "octet" since you feel it's so obviously obscure and unused. When reverting good-faith edits, I'd encourage you to improve the page if the edit was prompted by inconsistency (as in this case) or some other easily-fixed problem. Electricmuffin11 (talk) 07:28, 24 September 2014 (UTC)

November 2014[edit]

{{subst:User:BracketBot/inform|diff=635280213|page=Control character|by= by modifying 1 "()"s|debug=(-1, 0, 0, 0)|list=yes|remaining=*[[newline|CR and LF]] used to separate lines of text. The code 127 ([[Delete character|DEL]])) is also a control character. [[Extended ASCII]] sets defined by [[ISO 8859]] added the codes 128

  • character|escape]], <code>ESC</code>, <code>[[\e]]</code> ([[GCC (software)|GCC]] only), <code>^[</code>).
  • normally. For example, the sequence of code 27, followed by the printable characters <nowiki>"[2;10H", would cause a DEC VT-102 terminal to move its [[cursor (computers)|</nowiki>
  • with 31, forcing bits 6 and 7 to zero. For example, pressing "control" and the letter "g" or "G" (code 103 in [[octal]] or 71 in [[decimal|base 10]], which is 01000111 in [[Binary numeral system|

|lines=4}}