Talk:Shift JIS

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Computing (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Computer science (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Typography (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 

Language...[edit]

can someone put this in english for the common man? —Preceding unsigned comment added by 75.108.224.182 (talk) 05:22, 25 February 2008 (UTC)

Sure, just take a look at Evil. Jpatokal (talk) 09:57, 25 February 2008 (UTC)

Typo?[edit]

developed by a Japanese company called ASCII 

Sounds like a typo, but I don't know if it is. First google page doesn't seem show anything. JamesBrownJr 22:17, 4 May 2006 (UTC)

It's not: the company is for real. Jpatokal 02:03, 5 May 2006 (UTC)


Ken Lunde states on page 175 of his authoritative book "CJKV Information Processing", O'Reilly & Associates, 1999, ISBN 1565922247, that shift-JIS was "originally developed by Microsoft Corporation". Later on page 176 he makes a reference to the ASCII Corporation's version of Japanese TEX as one of four examples of computer platforms or environments that uses shift-jis internally.
Lunde at least does not seem to be crediting the ASCII Corporation with having invented shift-jis. Morten Johnsen 23:51, 25 June 2006 (UTC)

The article has an underscore in the title which appears nowhere else in the article. Is this correct or a typo? If it's correct, it should be used throughout. User:Fredhoysted 14:57 (UTC)

Underscores are not used in normal English. However, MIME names cannot contain spaces; this is a common restriction for identifiers in software, it is also seen in programming languages for example. So to substitute an underscore is used in the MIME name. Shinobu 05:23, 4 November 2007 (UTC)

Umlauts[edit]

Why has Shift-JIS no code points for umlauts assigned? --84.61.71.163 15:25, 15 May 2006 (UTC)

Japanese typers rarely have a need to type the words "Mötley Crüe". You could use ISO-2022-JP-2, which has a mechanism to switch to ISO-8859-1 (includes umlauts), but you might as well just use UTF-8. --150.216.151.171 17:59, 9 July 2006 (UTC)


Upper and Lower ASCII[edit]

This page uses the term "upper ASCII" and "lower ASCII". I believe that the writer meant "characters > 127" and "characters <= 127" in a fixed-width 8-bit encoding. But ASCII only defines 127 characters. There is no "upper ASCII".

http://www.xslt.com/html/xsl-list/2002-02/msg00248.html

Browser interpretation of 0x5C[edit]

At least Firefox interprets 0x5C in Shift_JIS as '\' and not '¥'. I suspect this is because the '\' character is used to escape characters in Javascript, so having an encoding without a representation of that character would be a security problem. JeffreyYasskin 21:20, 19 December 2006 (UTC)

And you would be wrong to suspect that. 0x5c is commonly used as a special character, regardless of what symbol that character value actually represents. So on a Japanese computer you have filesystem paths like "C:¥Program Files¥", the DOS prompt looks like "C:¥>" and a Hello World program might contain the line 'cout << "Hello World!¥n";'. Shinobu 05:29, 4 November 2007 (UTC)

ASCII coporation: not a typo[edit]

Hi, I wrote a large portion of this article (before I had a sign-in) and that is what I meant when I wrote it. Sadly I could not remember where I read it, but I've done some googling so I'll add a link that backs up what I said. —The preceding unsigned comment was added by Tim Band (talkcontribs) 15:38, 7 May 2007 (UTC).

:1997 - what does that mean?[edit]

JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double byte characters)

What does :1997 mean? The linked articles don't yield a clue, and both state that the standards were set in 1969 and 1990(?). Were they revised in 1997? If so, why didn't they simply get a new four-digit number? Shinobu 16:16, 31 August 2007 (UTC)

1969 and 1983, yes; AFAIK there are no later revisions. The one you're thinking of in 1990 is JIS-X-0212 Tacitus Prime 11:22, 11 September 2007 (UTC)

So the ":1997"'s in the article are wrong and should go, right? Shinobu 05:32, 4 November 2007 (UTC)

Bit late to the discussion, but to avoid confusing future readers: at the time of this writing, JIS X 0208 has four versions (JIS C 6226-1978, JIS C 6226-1983, JIS X 0208-1990, JIS X 0208:1997). JIS X 0201 has three (JIS C 6220-1969, JIS C 6220-1976, JIS X 0201:1997). The colon before the 1997 instead of a hyphen is just a change in convention (and technically, so is the “X” classifier; that was a new category created in 1987 for information processing and such, because while it may have made sense to put that under “Electronic and Electrical Engineering” back when they were first writing these standards, it became a field in its own right). This statement is correct as stands, and at some point in the next few weeks months years, this should all be fleshed out their own, appropriate articles. -BRPXQZME (talk) 23:27, 13 June 2009 (UTC)

EUC-JP[edit]

The article says "the competing 8-bit format EUC-JP, which does not support halfwidth katakana" - but EUC does indeed have the halfwidth katakana (upper half of JIS-X-0201:1976) in G2 (i.e., as two-byte sequences 0x8E 0xA1 .. 0x8E 0xDF) Tacitus Prime 11:22, 11 September 2007 (UTC)

The article means that it doesn't support single-byte encoding of halfwidth katakana. I've added a clarification. Jpatokal 12:53, 11 September 2007 (UTC)

Recommendation or lobbying?[edit]

"it is recommended that Unicode be used instead" recommended by unicode.org, isn't it? Should be added then! —Preceding unsigned comment added by 84.56.91.141 (talk) 01:06, 4 June 2009 (UTC)

Error in the Transformation Formula[edit]

The artice states some malfunctioning transformation formulas:

33 \le j_1 \le 94  \Rightarrow s_1 = \left \lfloor \frac{j_1 + 1}{2} \right \rfloor + 112\,
95 \le j_1 \le 126 \Rightarrow s_1 = \left \lfloor \frac{j_1 + 1}{2} \right \rfloor + 176\,
j_1 \mbox{ is odd }  \Rightarrow s_2 = j_2 + 31 + \begin{cases} \left \lfloor \frac{j_2}{95} \right \rfloor & \mbox{if } j_2 \ge 96 \\ 0 & \mbox{otherwise} \end{cases}  \,
j_1 \mbox{ is even } \Rightarrow s_2 = j_2 + 126\,

If you apply these formulars to some randome examples you will get e.g.:

  • 朧 (Kuten 59-16) 8E, 2F instead of the correct 9E, 4F
  • 察 (Kuten 27-01) 7E, 20 instead of the correct 8E, 40
  • 鯣 (Kuten 82-40) 99, A6 instead of the correct E9, BE

It appears to be that you have to increase both, j1 and j2 by 32 (0x20) before doing this kind of calculation, witch will correct the first two example and the 1. Byte of the last one but will get E) instead of BE for the second byte of 鯣. Do you have an idea how to fix also this one? --Sannaj (talk) 14:55, 11 November 2012 (UTC)

Interesting. Here's an alternative formula: http://www.sljfaq.org/afaq/encodings.html#encodings-Shift-JIS Jpatokal (talk) 02:50, 12 November 2012 (UTC)
I just realised I read the the article wrongly. It stated that the formula needs to be applied to "double-byte JIS sequence j_1 j_2", but I've used the Kuten-Code. This explains the increase of 0x20 for both bytes. But it still doesn't explain 鯣. --Sannaj (talk) 20:06, 22 November 2012 (UTC)
I think your E9,BE shift JIS code is wrong.
A table for Kuten 82-40 matches your symbol.
My calculation says shift JIS for 82-40 (j1,j2)=(114,72) [decimal] is (s1,s2)=(E9,C6) [hex].
Codepage 932 converts E9,BE to U+9BB4 鮴
Codepage 932 converts E9,C6 to U+9BE3 鯣 symbol matches Kuten table.
Inverting the formula takes (s1,s2) (E9,BE) [hex] to (j1,j2) (114,64) [decimal] which goes to Kuten 82-32.
Kuten 82-32 matches U+9BB4.
The given formula is consistent with codepage 932.
The formula also makes sense for shifting around the kana.
Glrx (talk) 19:18, 3 January 2013 (UTC)
O, ok, that would explain my problems with the formula. --Sannaj (talk) 18:31, 5 January 2013 (UTC)

Formula Error[edit]

I've looked at this for a while now, and I'm pretty sure that the formula for s_2 in the odd j_1 should have \lfloor j_2/96 \rfloor rather than \lfloor j_2/95 \rfloor. Isn't the purpose of that skip to avoid the non-printing DELETE character (code 127) in the second byte? As it stands it skips code 126 instead, which doesn't make sense. Uranographer (talk) 19:33, 1 February 2013 (UTC)

This is the second time the formula has been challenged in the past month. It's completely unsourced and we have no idea where it came from. I propose removing it entirely unless it can be reliably sourced. Without sourcing, it is original research. Regards, Orange Suede Sofa (talk) 20:42, 1 February 2013 (UTC)
I see your point, but it's really just a mathematical way of describing the Shift JIS encoding procedure, so in that sense it really isn't very extensive original research. (Although, I've seen people get papers published with about as much content!) I think it's probably okay to leave it. I won't argue, though, if you want to remove it. Uranographer (talk) 21:56, 1 February 2013 (UTC)
Looks like I introduced the error when I mis-simplified the previous contorted expression in the section above. Floor(j_2/95) was only used if j_2 ≥ 96. Since floor() will then be one, the simpler version is floor(j_2/96) without the conditional (or 1 if j_2 ≥ 96 and 0 otherwise). I thought the expression was just doing the same thing twice. Glrx (talk) 22:54, 1 February 2013 (UTC)
Heh, it's easy to do. I've been coding and writing this stuff up all day and I get those simplifications right about half the time--and that's if I'm lucky.96.18.211.87 (talk) 00:12, 2 February 2013 (UTC)
(Guess I had logged out--that was me Uranographer (talk) 00:13, 2 February 2013 (UTC))

Removing "UTF-8 is recommended"[edit]

On October 30 2011 user BIL added: https://en.wikipedia.org/w/index.php?title=Shift_JIS&diff=458095114&oldid=455096120 "The same thing is valid for UTF-8 which is a world standard, better supported by software, and is predicted to fully replace Shift-JIS and EUC-JP." On December 8 2010 user 131.107.0.81 added: http://en.wikipedia.org/w/index.php?title=Shift_JIS&diff=401316723&oldid=396679673 "... , conflicting with some code points. This is one reason why applications are recommended to use Unicode such as UTF-8 or UTF-16 instead." There was no explanation or citation. I added "By whom?" markers in May 2013. I'll be happy if someone cites some respected authority. Until then, these UTF-8 endorsements don't belong. I removed them. Peter Gulutzan (talk) 02:09, 21 October 2013 (UTC)

Questionable redirect[edit]

The JIS X 0213 article links to Shift JIS-2004 but Shift JIS-2004 redirects here. It is *not* the same encoding and no information about Shift JIS-2004 is present on this page. 58.173.133.147 (talk) 10:56, 20 May 2014 (UTC)