Wikipedia talk:Manual of Style/Japan-related articles/UTF-8 conversion

English Wikipedia is now UTF-8

This will necessitate some changes to these guidelines - for example, the stuff about converting characters to numeric entities is probably not needed anymore. The big issue I think is whether to use macrons in article titles now that we can. People are already moving articles, for example Shōnen, so it would be good to get a consensus quickly. Articles with Czech diacritics in the title are being moved en masse too. I'm in favor of this as long as there are macron-less redirects for people trying to type things in. What do you all think? DopefishJustin (・∀・) June 28, 2005 22:13 (UTC)

Typically, Wikipedia's English web pages are served using the ISO 8859-1 encoding, which does not support Japanese characters,
Shouldn’t this be corrected, instead of arguing over whether to use macrons? —Frungi 04:19, 13 July 2005 (UTC)
That whole section is largely irrelevant now. Kanji and kana are all presented as UTF-8 encoded characters now. I replaced that section with a bit on UTF-8, please have a look if it is correct like that. JeroenHoek 14:21, 13 July 2005 (UTC)
I favor the use of ō too. I remember occasional discussion of how to use titles that can be confusing without a macron. (Can't remember specific one, though) Issues would be what we are gonna do with Japanese terms that became English words by today; like is it Shogun or Shōgun. I think if the term is common without a macron, then I suggest we use the common term, as Shōgun, for example, is quite odd-looking. I just can't think of why not use macrons. Anyone? -- Taku June 29, 2005 00:41 (UTC)
I agree. The common English (macronless) spelling should be used if it's in common use, otherwise macrons should be used. If anyone suggests circumflexes, bonk them over the head with a copy of The Unicode Standard 4.0. Gwalla | Talk 29 June 2005 01:09 (UTC)
Certainly, no mācrons for words like "shogun" which are fully naturalized in English. Hokkaido, Honshu, Kyushu should all be native English with no macrons. Tokyo, Kyoto, Osaka likewise. And I am furthermore opposed to the use of macrons in article titles. The macron is not part of the English language. Rather, it is an indicator of how the word is pronounced in another language (Japanese). As such it belongs in the first line of the article, in the place where we indicate the pronunciation of the Japanese term. However it does not belong in the article title regardless of whether the word is fully assimilated into English or not. The article title should be in English, and that means without macrons. Fg2 June 29, 2005 07:44 (UTC)
Well, it appears aritcle titles don't have to be in English. In wikipedia, there are many titles for European people or places that contains non-English alphabets. In addition, as I said above, the macron can clear up some confusion that may be caused by the process of transliteration. for example, a name like Mori Yoshio can be either Mori もり(森) or Mōri もうり(毛利). As you can see, without macrons, the two become indistinguishable. Moreover, I remember a discussion on a name Bungoono, Oita, and the name with macron Bungoōno, Oita is clearly clearer, and such a debate would simply become moot. True, this is an English encyclopedia. But the bottom line is that article titles may be non-English, and for very good reasons, I think. (I guess then I am not surprised if someone suggests to use a Japanese kana or kanji for titles, now that we are not subject to the limitation of latin alphabets. Seeminly, though, that latin is ok but non-latin is bad is a double standard, when the technical issue is lifted.) -- Taku June 29, 2005 10:30 (UTC)
Hi Taku, I'm not saying that the Latin alphabet is good and other is bad! What I'm saying is that this is the English Wikipedia, and we should write it in English. We could probably now (or soon) rename the article ja:東京 with a Korean or Arabic name but that wouldn't be right (at least in my opinion) in the Japanese Wikipedia. My principle is simple: In the English Wikipedia, write in English; in the Japanese WP, write in Japanese. By the way, you know Japanese, and you know romanization, so things like Bungoōno make sense to you, but I believe most Americans (and probably most speakers of English) have no idea what the macron means. As a result, the macron is only useful to people who know (some) Japanese. Compare this to IPA, which Wikipedia uses to indicate pronunciation. It has no meaning to me, and I believe that very few people have any clue what the symbols mean. Wikipedia is supposed to make knowledge available to everyone. Adopting an obscure symbol set is a bad way to accomplish that goal. In my opinion, of course. Fg2 June 29, 2005 10:45 (UTC)
I hate to use this description Fg2, but you sound a little too much like the one Wikipedian who keeps railing against "funny foreign squggles" in the English-language wikipedia. Instead of being an encyclopedia that is only for native English-language speakers, has become an international encyclopedia that is written in English. My personal opinion is that the macrons should be used unless it is a Japanese word or name has been naturalized into the English language (e.g. shogun). For example, almost all of the English-language literature on waka that I've read use the macrons for Japanese words and names. My personal opinion is that the macrons should be used on all historical Japanese names and for all historical and technical terminology unless the word has become a loanword in English. We are an encyclopedia and we should strive to use the most accurate spelling. BlankVerse 29 June 2005 11:58 (UTC)
Please give credit were credit is due :-) I do not "rail" my argument is "keep it simple". If some cases need a more fancy solution then use the simplest solution which fits. Along the lines of Occams razor. The best place to see a full(er/est) discussion on diacritic marks is Wikipedia talk:Naming conventions (use English) Philip Baird Shearer 10:53, 19 August 2005 (UTC)
Well, it's true that a lot of people wouldn't know what the macron means, but at the same time it wouldn't be hopelessly obscure - they can still see the letter underneath the macron, so they can just ignore the macron, it doesn't do any harm to them. Then, for people who do know a little Japanese, we're conveying additional information. Also, it's not just Japanese, other languages like Latin and Māori use the macron to indicate vowel length, so people familiar with those languages can probably figure out that the macron means the same thing in Japanese. DopefishJustin (・∀・) June 29, 2005 16:30 (UTC)

Let me give some quick comment. This seems to be a larger issue than I (or maybe we) thought. I've been reading the discussion at Talk:Gdansk, whether to rename it to Gdańsk. Judging by that, both mine and yours (Fg2's) are apparently not minor views. Now, I am not so sure about my position. At least, we should never start mass renaming at this point; we have to take time for consideration. (For christ's sake, why naming in wikipedia has to be so complicated!) -- Taku June 29, 2005 11:55 (UTC)

I lose most battles anyway! But I'll just agree that we should use the best spelling --- a spelling being a collection of the letters of the alphabet. We should use macrons when we are not indicating spelling but pronunication. We don't name the article ja PAN' (indicating pronunciation) but rather Japan (in letters) so we shouldn't name an article Ōita Prefecture (indicating pronunciation) but rather Oita Prefecture (indicating spelling using letters). Again, my opinion --- at least today! Fg2 June 29, 2005 12:19 (UTC)
I say it is more of a "spelling" issue than pronounciation, (bad example follows, Tokyo is the established English spelling) entering "tokyo" (ときょ) in a Japanese IME will likely not result in the same thing as entering "toukyou" (東京). We probably need a page somewhere linked to from some popular Japan-related place explaining macrons. -- Philip Nilsson 29 June 2005 12:40 (UTC)
Macrons are not a matter of pronunciation. In rōmaji, macrons indicate spelling. "Tōkyō" isn't written for pronunciation's sake but for spelling's sake (which, of course, leads into correct pronunciation); likewise, "Yōko" is not "Yoko" with a pronunciation aid. — J44xm 16:23, July 13, 2005 (UTC)

I'm for macrons, and against the use of non-Latin scripts in titles. This is not a double standard. The technical barrier has been removed, but the language barrier has not. Most literate English-speakers can recognize a letter with a diacritic (although they might not understand the relevance of the diacritic), but can't tell 濁 from 薫. Also, many written works in English containing romanized Japanese include the macrons in running text. Gwalla | Talk 29 June 2005 20:39 (UTC)

If the consensus seems to be that using macrons is a good thing (with the exception of Tokyo, Osaka, etc.) then why not just write the title of an article with macrons (see Wāpuro rōmaji), provided that there is a redirect page with the article name without macrons (see Wapuro romaji)? That way anyone can find the article by looking for its name either with or without macrons. JeroenHoek 6 July 2005 08:54 (UTC)
And if there's not a consensus, what's wrong with keeping them written in letters? Furthermore, the move of wapuro to wāpuro is not supported by the MoS, which only specifies macrons for o and u. It should be at wapuro or waapuro but not at wāpuro. Fg2 July 6, 2005 09:31 (UTC)
Are ō or ū not letters? Using your reasoning, would you rather see Champs-Élysées at Champs-Elysees? (which, like Wapuro romaji, redirects to it?)
Yes. In English it is "General Napoleon Bonaparte" not "Général Napoléon Bonaparte".Philip Baird Shearer 10:53, 19 August 2005 (UTC)
The MoS states that Wikipedia uses Hepburn and that care should be taken for (XXX). It does not exclude the proper use of (revised) Hepburn:
In words of foreign origin, all long vowels are indicated by macrons. (Hepburn)
Wāpuro rōmaji is the correct way of writing ワープロ・ローマ字.
I agree that the user should not be bothered with typing those scary accent circumflexes, accent grave, accent acute, umlauts, cedillas and macrons, but they should be presented with the proper pronounciation of the item concerned, so why not use the proper title and redirects? JeroenHoek 6 July 2005 09:59 (UTC)
Thank you for your opinion about wāpuro being "correct" --- but please be aware that it is not Wikipedia policy, it is an opinion. Wikipedia MoS for Japan-related articles clearly limits its scope to o and u and does not mention a. Fg2 July 6, 2005 10:40 (UTC)
"Wikipedia uses the Hepburn romanization." (MoS)
This seems pretty clear to me. Assuming that Hepburn means "revised Hepburn", as the MoS implicitly states this by mentioning the usage of long vowels using macrons and the syllabic n (ruling out traditional and modified Hepburn) then it is not just my opinion. So, if the Wikipedia MoS is policy, then using revised Hepburn is policy. Thus, Wāpuro rōmaji is correct according to policy is it not? JeroenHoek 6 July 2005 11:08 (UTC)
The MoS does not exhaustively describe Hepburn, and "waapuro" is a bit silly IMO (not to mention the wāpuro-style wa-puro), so wāpuro is fine by me. We should explicitly mention this case in the rules though. The same should apply to ē and ī when written with ー (furī, sērā). DopefishJustin (・∀・) July 6, 2005 18:58 (UTC)

Macrons are important, but not mentioning macrons in the article title doesn't mean it can't be mentioned in the article body. On the other hand, if we create redirects with macronless titles, the transition would almost be transparent. However, I see some downsides to going macron:

  • Even if we go macron, macronless titles will still be important, either as redirects or disambigs, because some people can't type macrons or they just don't know which vowels are long vowels. Also macronless spellings are already established and have legitimacy, even if titles with macrons are superior in some respects. This means:
    • It won't reduce the need for disambiguations
    • It will greatly increase the number of redirect pages
  • We would have to put a lot of work in renaming articles and fixing double redirects
  • If we go macron, there would be debate on which articles should go macrons and which shouldn't. I'm pretty sure Tokyo would fall under the ladder. What about place names, people names, work titles, etc. etc. There may also be special individual cases. We will need countless mini macrons-vs-non-macrons debates. The task of figuring out what The Right Title is for an article will become more complex and murky. Doing so is difficult already.

That's a lot of trouble to go through for something that can be put in the article body, and we're already gonna be adding kanji/kana in the article body anyway. So in short, I am opposed to going macron. —Tokek 16:55, 31 July 2005 (UTC)

This isn't a unique situation. I agree that people should be able to type "Showa" and get the "Shōwa" disambiguation page, redirects are good for that. A similar situation already exists for other articles that have accents and such in their name; as mentioned above, Champs-Elysees redirects to the article Champs-Élysées, as you would expect. So why not use the proper article titles? It seems to be the convention for a lot of other areas of Wikipedia as well.
The discussion about which well-known names should not have macrons (such as Tokyo) will be held regardless of articletitles. I don't think that it makes a good argument against using the full range of characters Unicode offers us.
I agree that it is a lot of work, but it can be done gradually. This bit is important I think: "On the other hand, if we create redirects with macronless titles, the transition would almost be transparent.". I think slowly moving pages to their proper articlenames is beneficial to the Wikipedia in the long run, it shouldn't get in the way of the user, but I don't think it does. JeroenHoek 17:26, 31 July 2005 (UTC)
What you are suggesting is native spelling. Champs-Élysées is a reasonable title on latin alphabet-based English WIkipedia but 東京 wouldn't, because that would be too complex. On the Japanese website, シャンゼリゼ通り would be an appropriate title (which, btw, doesn't exist yet). —Tokek 20:06, 5 August 2005 (UTC)
I meant to illustrate that it is preferable to use a transliteration instead of a title consisting only of [a-z] and [A-Z]. (I should've used a better example since French isn't transliterated, but the argument stands that accents, macrons and such aren't an objection as long as redirects are in place from the same title without the accents, macrons and such) JeroenHoek 20:27, 5 August 2005 (UTC)
I don't think it does increase the number of redirects. If the correct Hepburn romanization of a word is e.g. "Shōwa", then entering "Shōwa" ought to get you to the right page just as much as "Showa", regardless of which is actually the article title. I.e., the alternative to macronless redirects to macroned articles is macroned redirects to macronless articles.Butsuri 01:37, August 3, 2005 (UTC)
The proper article names are without macrons, so we should move Rōmaji to Romaji (for example), which is a firmly established English spelling of the Japanese word Rōmaji. The English word is romaji (no macron). The Japanese word is rōmaji (with macron), and if there were a Wikipedia in Japanese with romaji as the writing system, it would be appropriate for the article title there. It is natural and correct for the English to be different from the Japanese. It is also natural and correct for the English to be written with the letters a–z. The macron is not part of English spelling. It is a pronunciation indicator, not spelling. We should no more write romaji as rōmaji than we should write Rome as Rōme. (And, by the way, the name of that city is "Rome," not "Roma," in English, further illustrating that it is completely natural and normal for English to use different letters, spellings and pronunciations than do other languages.)
Macrons are an objection. There is a difference between macrons and accents: macrons are not part of the native Japanese way of writing Japanese (nor are they part of any native system of writing English), whereas accents are part of the native way of writing French. So even if some case might be made for using accents for names of articles on French-language subjects, the argument has no bearing on Japanese.
Next, the matter of redirects. This is the English Wikipedia, so the readers who do not know Japanese vastly outnumber those who do, and furthermore, those who do know Japanese know various ways to romanize it. No reader who does not know Japanese will ever enter Shōwa into the search box. They won't even know how to enter a vowel with a macron. And, if a reader who does know Japanese enters it, clicks on Go, and does not get to the desired article, he or she will try again, this time entering Showa. So, it is not necessary to create redirects with macrons in the titles to get readers to articles with no macrons in their titles. Fg2 00:35, August 6, 2005 (UTC)
Those redirects will be created regardless. I does not make sense for an encyclopedia that has the technical means to do so to not produce the Shōwa disambiguation page when requesting "Shōwa". Hepburn is the de facto standard in academic literature on Japanese subjects, not creating a redirect would be silly and have a negative impact on the credability of Wikipedia (oh look, "Shōwa" doesn't exist? should we type shouwa? or showa? or - heavens forbid - syowa?).
If a language is readable by monolingual English speaking visitors, as is the case with French, then there is no need for "dumbing down" the title to use just [a-zA-Z]. People can just ignore the accents, cedillas and circomflexes. With a hard to read language like Japanese, the most logical fallback is an established transliteration. On Wikipedia, Modified Hepburn was choosen. Keep in mind, that for the average monolingual English speaking visitor, there is no significant difference between ó ô õ ō or ö, we should be consistent about that. People who really can't deal with strange foreign looking elements in their encyclopedia do have an alternative.
Consistency is also one of the reasons I think that we should discontinue the non-standard practice of "stripping macrons from a word" for use in titles.
Since the majority of the readers is unfamiliar with Japanese, this encyclopedia offers them a great place to learn more about this land and its culture. In the articles, Hepburn is used to indicate the pronounciation of the word, and it will only confuse people to see "Sonnō jōi" in the article called "Sonno joi" in the title. You refer to "Romaji" as an example, but that has nothing to do with the discussion. We have already established that words such as daimyo and shogun don't need macrons because they are part of the English language. JeroenHoek 11:05, 6 August 2005 (UTC)
Within the Hepburn system of romanisation of Japanese, macrons are not any kind of ancilliary "pronunciation indicator" as they are for example in certain English dictionaries. They are every bit as integral to this system as accents are to French spelling. This is the system which has been adopted for use in Wikipedia, and it's the most common system in academic writing about Japanese (in English). It's already standard to use Hepburn, including macrons, for Japanese words within Wikipedia articles. What is under discussion here is whether an exception to this rule should continue to be made in the case of article titles, now that it is no longer technically necessary. All the arguments I have seen against macrons in titles either (a) can be circumvented through use of redirects or (b) apply equally to macrons in body text, and have no special bearing on the issue of titles.
Whichever decision is made on this matter, it would be perverse to adopt a policy such that Japanese words correctly spelled in the standard romanisation do not reach the appropriate articles. That this would be only a minor inconvenience for most users is not an argument for such a policy. The proper thing to do is clearly to have entries for both Showa and Shōwa, with one being a redirect to the other - the only question is which is to be which. Butsuri 02:20, August 7, 2005 (UTC)
There are several ways to correctly romanize Japanese words, and most of them don't have a redirect to the correct article. What can be considered established transliteration methods include kunreishiki, nihonshiki, hepburn without macron, and hepburn with macron. If we add an exception to the rule of sticking with one romanization scheme by allowing titles with macron in it, we then will be using two romanization schemes for titles instead of the previous one established romanization scheme. This increases unpleasant complexity. Not having macrons in titles does not confuse or take away information from the reader about the Japanese language, or reduce the credibility of Wikipedia. Nobody is arguing against adding macrons in the body of the article. See also: KISS principleTokek 03:17, 7 August 2005 (UTC)
"Hepburn without macrons" was a method deployed to use hepburn in article titles, within the limits of the technical limitations. Now that those limitations are gone, all over Wikipedia articles and redirects start to appear with titles using the greatly extended characterset. If I type in α as a title, it redirects to Alpha (letter), when I type in ¥ (which I might have copy/pasted from an article I was reading on some website) then naturally, I get the Yen article.
Similarly, when someone requests "Fukoku kyōhei" (perhaps copy/pasted from an academic paper in PDF format?), shouldn't he get the correct article? Since the technical limitation to do so is gone, we should reach consensus on which method to choose; that is, should the modified Hepburn title redirect to the non-standard "stripped Hepburn" location of the article, or is it desirable to move a page to its correct location? (provided the old link is a proper redirect)
One advantage for editors, is this: When creating interwiki links to articles with macrons, it will no longer be necessary to strip the macron in the link. I agree that we should adhere to the KISS principle, but doesn't that mean that we should use modified Hepburn in both the article body and its title to be consistent and to avoid confusion? JeroenHoek 10:45, 7 August 2005 (UTC)
"Hepburn without macrons" was a method deployed to use hepburn in article titles, within the limits of the technical limitations.'' There are more reasons than technical limitations to use Hepburn without macrons, as already mentioned. Do you believe that some Japanese article titles are better without macrons, or are you suggesting that all Japanese titles should have macrons to indicate long vowels? —Tokek 20:11, 7 August 2005 (UTC)
I believe the usage of Hepburn in titles should match the usage of Hepburn in the article body (My apologies if my wording was confusing). In short: "Daimyo" (english word) and "Fukoku kyōhei" (transliterated Japanese) for example would be proper titles. Naturally, in this example "Fukoku kyohei" redirects to "Fukoku kyōhei" JeroenHoek 20:26, 7 August 2005 (UTC)
I want to underscore the point that JeroenHoek just made: even if the community decides that articles having titles containing Japanese words with macrons will be moved to new titles (to which I've already expressed my opposition), English words will remain without macrons, since this is the English Wikipedia. Thus, Tokyo, Osaka, Kyoto, Hokkaido, Honshu, Kyushu, Romaji, Daimyo (to give a few examples) will remain the correct titles with no macrons; they are not accommodations to the former limitation on titles but fully nativized English words. (There's no harm in having redirects to them.) The proposal to rename articles applies to terms that have not become English. As before, we should continue to consult reliable references such as the major dictionaries of English-speaking countries. My short list of examples is not exhaustive; cities, geographic entities, general terms etc. not on the list should not automatically be moved but like other words should be subjected to the same test by looking them up in appropriate references. Fg2 20:52, August 7, 2005 (UTC)
Of course, but I don't think anyone is disputing that? In short, to me it seems the above discussion is about which of two situations we prefer:
  • Titles remain stripped Hepburn, modified Hepburn titles are redirects.
  • Titles are modified Hepburn, stripped Hepburn titles are redirects.
Modified Hepburn is the transliteration system of choice for Wikipedia, nobody (presently participating in the discussion) proposes we change that. JeroenHoek 22:41, 7 August 2005 (UTC)

Renaming titles that use macrons, if done, would be a very small minority of cases, since most articles are macronless already. I agree that geographical article titles don't deserve macrons as there is already universal agreement that articles such as Tokyo shouldn't have them. Article titles for Japanese persons shouldn't use macrons either since formal documents such as US or Japanese passports do not romanize a person's name with macrons. The argument shouldn't be about which romanization is more accurate or superior but which is more appropriate for article titles. For example, "Bill Clinton" is a more appropriate article title, even though his name can also be spelled as "William Jefferson Clinton" which is more formal and exact. —Tokek 10:21, 11 August 2005 (UTC)

The longer this issue goes undecided, the more divided the page space becomes as confused authors produce both macroned and macronles (title and content) pages not knowing which standard to conform to. If anyone is still paying attention I propose we organize the objections for both and try to move on some sort of compromise in the near future.  freshgavin TALK   06:10, 28 November 2005 (UTC)

I favor using macrons in the titles and creating redirect pages out of the macronless versions of the titles. --nihon 17:30, 6 December 2005 (UTC)

I don't like the macrons very much. I would like them better if someone would explain how to pronounce them. I hate not knowing how to pronounce things. Could someone please transliterate the ka (カ), ki (キ), ku (ク), ke (ケ), and ko (コ) into macron for me? Also, why is リチャード transliterated into Richādo in the list of FFII characters article? I always thought the straight line over the "a" indicated a long "a" like the "a" in "take." Jecowa 22:00, 1 June 2006 (UTC)
Macrons are not used on ka, ki, ku, ke, and ko. They are are only used on vowel sounds which are held for twice as long as a normal vowel sound in Japanese. You can see more by reading the Help:Japanese page and the Manual of Style page for Japanese on WIkipedia. ···日本穣? · Talk to Nihonjoe 23:30, 1 June 2006 (UTC)
Thank you for the explanation. I like that macron shortens the double vowels. I also like that it doesn't change the vowels. It was confusing for me because I was taught that a long vowel had a different sound than a short vowel, like the "a" in "fat" and the "a" in "fate," the first being a short vowel and the latter being a long vowel. Is this what I get for going to a public school, i.e. was I taught incorrectly? The use of macrons is kind of confusing for anyone unfortunate enough to think that a long vowel is pronounced differently from a short vowel. If we do end up using macrons, it won't confuse me. Thanks! Jecowa 01:42, 2 June 2006 (UTC)
What I said above applies only to Japanese, not to English. Long vowels in English are as you described them, but long vowels in Japanese are not like that. They are said the same, only for twice as long. ···日本穣? · Talk to Nihonjoe 03:55, 2 June 2006 (UTC)


Has consensus been reached? I see an old mediation request regarding this matter. +sj + 20:41, 28 February 2006 (UTC)

Mediation apparently takes forever to get assigned, and then even longer to be looked at. The only person who seems to have concerns regarding the consensus reached is freshgavin. Everyone else seems to agree that a consensus has been reached. So, we're waiting for the mediation to be assigned so we can all move on with the various projects. --日本穣 00:03, 1 March 2006 (UTC)