Jump to content

Wikipedia talk:Usage of diacritics

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Aradic-en (talk | contribs) at 11:23, 5 July 2008 (→‎Treating these as four (four and a half?) problems). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

(copying from WP:Use diacritics, which was a previous similar proposal that failed to gain consensus)
See also Wikipedia:Naming conventions (standard letters with diacritics) another proposal that failed to gain consensus

Proposal featuring the differentiation of diacritics and extensions

Summary: A proposal that advocates usage of diacritics in many cases but that differentiates letters with diacritics from "extensions". It calls for relatively wide usage of diacritics but allows for exceptions that cover a number of cases where their use is controversial.

Editorial comments: I am a strict proponent of adhering to WP:UE at Wikipedia as my record can attest. However, unless a placename is a true exonym or a person's name is truly "English" (e.g., that person is notable primarily in an English-speaking country or is naturalised in such a country), most place or personal names cannot be said to have an English form at all. Many "English verifiable reliable sources", especially web sources but also many older print sources, are or were limited by technological considerations from using diacritics but these do not exist at Wikipedia. (In some cases, the comparative method can be used to determine whether diacritics are dropped for techinical or conbvenience reasons but even this is not fool-proof. For example, the Economist style guidelines mentioned above are the worst of geo-bias — they deem Western European languages worthy of carrying diacritics but not others [e.g., Gerhard Schröder but Abdullah Gul ]).

Therefore, since an encyclopedia is a reference work of higher calibre than a wire service news story and should aim to be marginally more "highbrow" (for lack of a better word), Wikipedia should, to a degree, reflect the underlying native names of persons and places when these are variations of the Latin alphabet.

The current situation works surprisingly well but there are cases such as Vietnam's Bac Kan Province (with no diacritics like the articles on many other Vietnamese places) where there are few English speakers versus Pūpūkea, Hawai'i that, like many placenames in Hawaii (where WP:UE has been vetoed) carry diacritics despite rarely if ever being used in English.

The proposal below attempts to standardise diacritc usage at Wikpedia while acknowledging some of the problems that can occur. It assumes regular diacritics do no "harm" to the unfamiliar reader — the name can be read by ignoring them — whereas "extensions" render a term unpronounceable to the unfamiliar reader. — AjaxSmack 00:21, 18 June 2008 (UTC)]

  • Does [[Ngo

Proposal

For the placename or person that is well known in the English-speaking world, i.e. is widely mentioned in English-language sources:

  1. When person or place has a name in the Latin alphabet including letters with diacritics (or some ligatures), e.g., Å, Œ, Ř, Ŵ, names should be spelt with them. (e.g., Ngô Đình Diệm)
  2. When a name includes Latin "extensions" (and other more obscure ligatures), e.g, Ŋ, ß, Ʌ, Þ, the name should be spelt with the normal Latin substitute for these extensions (e.g., Abülfaz Elçibay, not Əbülfəz Elçibəy)
  3. Certain letters with unusual circumstances follow national conventions. For example, Đđ (D with stroke) is rendered "Dj" in South Slavic contexts following usual English conventions but is rendered "Đ" in Vietnamese contexts. Ðð (eth) is rendered as "Dh" in Icelandic, Faroese contexts due to the complication of the lowercase form, "ð".
(This results in Meissen, but Göttingen and Tudjman but Dvořák.)

For the placename or person that is not well known in the English-speaking world, i.e. is not widely mentioned in English-language sources the preferred style on Wikipedia is to use diacritics, as this provides maximum information to the reader. This includes article titles; alternatives without diacritics should be set up as redirects.

AjaxSmack 00:21, 18 June 2008 (UTC)

Discussion

I would be interested in comments on the above proposal and in hearing from others on is how one would differentiate "well known in the English-speaking world" and not well known. I don't consider Google hits from computer generated Weather sites and such to be evidence of usage in English. Even mere mentions in traditonal printed texts don't hack it. I prefer "critical commentary" (to borrow a phrase from WP:FAIR) on a subject before it can be considered to be well-known.— AjaxSmack 00:21, 18 June 2008 (UTC)

Your proposal is exceedingly complex, so complex as to be unworkable. Aside from that insurmountable problem, I think you need a new project page for this proposal. This project apparently has been closed because no consensus in favor of it emerged. Tennis expert (talk) 02:51, 18 June 2008 (UTC)
Compared with most Wikipedia policy, this proposal doesn't seem particularly complex at all. Moreover it seems to be a) quite easy to follow in practice, and b) quite in line with what already happens.--Kotniski (talk) 06:34, 18 June 2008 (UTC)
Why isn't this just the reopening of the closed discussion where it was apparent that the suggested policy was no where near gaining consensus? Haven't we spent enough time on all the various permutations of the suggested policy already? The previous proposal was to change existing Wikipedia policy. Where there is no consensus to change a policy, the policy stays as it is. You (or others) can repeatedly open new project pages every day to achieve what could not be achieved the previous day. But what's to be gained by that? Aren't you just trying the community's patience with these repetitious debates? "Give it a rest" is my recommendation. Tennis expert (talk) 07:03, 18 June 2008 (UTC)
By the way, I oppose this proposal, for the reasons that innumerable others provided during the previous discussion. Tennis expert (talk) 07:06, 18 June 2008 (UTC)
Strange that you should suggest creating a new project page in one comment and then attack the idea in the next. Anyway, consensus has not been reached, so we have to keep trying. Please be constructive in doing so. This is a much better-thought-out proposal than the previous one (mine), so may well be a step on the road to such consensus.--Kotniski (talk) 07:27, 18 June 2008 (UTC)
I suggested nothing of the kind. I was simply saying that the new proposal could not go on the previous project page because it had been closed due to a clear failure to reach consensus. For some reason, you seem to believe that it is OK to keep hammering away to try to obtain consensus for a policy change that the community just rejected the day before. Perhaps you're trying to rely on the "silence equals agreement" principle and hope that the opposition will become silent (and, hence, agreeable) just because of fatigue. Anyway, I personally believe that reopening this debate one day after it was closed is abusive, regardless of whether you have the right to do so. Perhaps my opinion will turn out to be the consensus here. Tennis expert (talk) 07:43, 18 June 2008 (UTC)
You seem to be taking a similar position, making "failure to achieve consensus" equal to rejection. I believe this proposal is much better than the previous one and, though it may not get consensus exactly as it stands, is a valid subject for continued discussion. If you don't want to join in, you don't have to. --Kotniski (talk) 07:55, 18 June 2008 (UTC)
When a proposal is to change an existing policy and when a consensus is needed to adopt the proposal (and hence change the policy), then the proposal fails if the consensus does not exist. That is a rejection of the proposal. Clearly. Tennis expert (talk) 08:02, 18 June 2008 (UTC)
I think previous proposal was closed too soon. The discussion just started. And this proposal is not to change existing policy. It about making it clear for everyone.--Irić Igor -- Ирић Игор -- K♥S (talk) 08:39, 18 June 2008 (UTC)

We need to have a policy page about this. Either we use diacritics or we don't, but either way, there needs to be a working guideline. Generally I support using diacritics as titles as long as there are redirects, thus the first president of South Vietnam should have all those funny diacritics in the title, and the non-diacritic name will be a redirect, and similarly Novak Djokovic should redirect, etc. Yechiel (Shalom) 19:38, 18 June 2008 (UTC)

We have a working guideline: Use diacritics in those words in which English generally uses them. Septentrionalis PMAnderson 20:30, 19 June 2008 (UTC)

This proposal is the closest yet to what I'd consider a good solution for the diacritics. It is closest not only (and not necessarily) in end results, but also in principle, i.e. in the way these end results are obtained. I don't understand objections that it is "exceedingly complex": for one thing, determining the compliant spelling for any given name is made possible almost off the top of one's head - unlike the current policy, where research is required, with somewhat unpredictable results.

The formulation "not widely mentioned in English-language sources" is indeed open to interpretation. This is a problem with WP:UE too. To give a tennis example again: a relatively obscure pro tennis player whose name is originally rendered with diacritics could, per policy, retain these diacritics in Wikipedia. However, there's www.atptennis.com, and if the player has played at the ATP Tour, he will be listed there, no matter how obscure. Their data has no diacritics at all, so this would effectively stonewall diacritics from most (all?) of Wikipedia's tennis biographies, because it could always be argued that atptennis.com is an authoritative English source. (Still "authoritative" ≠ "widely mentioned"!) At the same time, a person with the exact same name who is not a tennis player might get to keep the diacritics. I'd say this is not desirable. GregorB (talk) 10:13, 19 June 2008 (UTC)

Not only the ATP website. There's also the New York Times, the Sydney Morning Herald, the New Yorker, the London Daily Telegraph, the International Tennis Federation, the French Open, the Australian Open, Wimbledon, the US Open, the Times of India, the Gulf News (Bahrain) and many more that do not use diacritics concerning tennis players. The forumlation of this new policy would require the ignoring of well established English-language usage in favor of an artificial, "well, if his name has diacritics in the native language, then Wikipedia must have them, too, even if all the English-language sources say not to use them." That's the major problem with the new policy. Unencyclopedic WP:OR running amok. Tennis expert (talk) 13:34, 19 June 2008 (UTC)
I have modified the proposal so that it does not contradict WP:UE. It is still rough and needs more work, but ultimately I think this guideline is not needed, as the use of Ŋ, ß, Ʌ, Þ, can if need be added as a short paragraph to Wikipedia:Naming conventions (use English)#No established usage --Philip Baird Shearer (talk)
I think it is important to note here that the same end result (e.g. "Tudjman" instead of "Tuđman") can be achieved through ostensibly very different guidelines. It would be interesting to produce some examples where this proposal yields different solutions than those of current WP:UE. To Tennis expert: I have explained already why WP:OR does not apply to this discussion - WP:UE itself isn't sourced (nor it should be), and that does not make it original research. Furthermore, an encyclopedia does not and should not copy every aspect of English usage: for example, Wikipedia does not emulate newspaper/magazine article tone and style regardless of how widespread it is in English publications. GregorB (talk) 15:41, 19 June 2008 (UTC)
Ngô Đình Diệm, most obviously. Widely discussed in English, and there is real problem finding an English source which ever uses that form. Septentrionalis PMAnderson 20:30, 19 June 2008 (UTC)
After reading all the pages and all the convoluted discussion going around, I think this proposal looks good on the right direction. Although I think we should use diacritics always, in the sense that we should write people's names the same way they do using a Latin alphabet (think the names on football jerseys), I do think this is a good compromise.
To counter the argument used by Tennis expert (talk · contribs) about the news sites, I come from a country with diacritics (Brazil) where our keyboards have the ability to type them correctly, but even then, there are programs that do not accept, and there are sites where people don't put it thanks to encoding problems with browsers. But neither of those should be an excuse not to use diacritics when such restrictions do not occur. And if the argument is with WP:UE I think it should change to amend for this case. Samuel Sol (talk) 15:50, 19 June 2008 (UTC)
Why do you think that we should not use reliable English language sources to decide content? It seems to me from that you have written that you wish to ignore WP:V and WP:NC. Is that true? --Philip Baird Shearer (talk) 18:33, 19 June 2008 (UTC)
Well, while we're citing Wikipedia guidelines: WP:AGF. Indeed, I don't see anyone saying Wikipedia guidelines should be ignored; rather, they should be changed. They are not set in stone, I believe. Also: judging by the current de facto situation with diacritics, I'd say that ignoring the guidelines (mind you, by pro- and anti-diacritics folks alike) has worked quite well... I'm joking, of course, but there's quite a bit of truth in it. GregorB (talk) 20:19, 19 June 2008 (UTC)
WP:V and WP:NC are Wikipedia policies not guidelines. Is it proposed that this is a policy or a guideline? If it is a guideline then surly to implement as it is one would have to ignore those Wikipedia policies. --Philip Baird Shearer (talk) 21:36, 19 June 2008 (UTC)
Talk about gratuituos attack Philip. No, I'm not saying that we should ignore WP:V, far from it. I said that we should probably change WP:UE to account for diacritics. Simply because if the name use it on the Latin spelling of the language, we DO have a verifiable source. Simple as that. Samuel Sol (talk) 16:25, 2 July 2008 (UTC)

Strong oppose—This is unnecessary creep. This proposal would allow "ǚ" but ban "þ" ... as if thorn were the more "foreign" of the two. WP editors can be trusted to be big and bold enough to know how to spell. JIMp talk·cont 00:22, 20 June 2008 (UTC)

As an encyclopedia, I believe we need to be as precise as possible. This is a similar argument as using logical quotation marks. Granted, the dividing line for accepted English usage is arbitrary, but even though it would be silly to move the article on China to Zhōngguǒ, I feel that where names are inaccurately transcribed merely due to typographic problems, as in our tennis examples, we owe it to our readers to use the original form. I just don't like the added level of arbitrariness introduced by segregating letters into acceptable and unfamiliar categories, and then getting into endless debates over what to do with the latter. kwami (talk) 09:46, 20 June 2008 (UTC)

Extensions

I'm still not okay with substituting "ss" for "ß" every time. If there's no exonym, we can't create one by simply dropping the extended Latin letters. —Nightstallion 19:59, 18 June 2008 (UTC)

We should not do it every time we should only do it if the English language sources on the subject do. --Philip Baird Shearer (talk) 13:35, 19 June 2008 (UTC)
That is (just ) your opinion which I disagree. We should the validity of information- that are not necessary in English --Anto (talk) 19:37, 19 June 2008 (UTC)
See WP:V and WP:NC. It is not just my opinion it is Wikipedia policy. --Philip Baird Shearer (talk) 21:43, 19 June 2008 (UTC)

What is an extension, other than a table on an unsourced Wikipedia article? What is the difference between Əbülfəz Elçibəy and Ngô Đình Diệm that we should applaud one and deprecate the other? (And, as a metapoint, we will have exactly this argument if we try to apply this proposal and any Azeri cares; this is not an end to division; it's the beginning.) Septentrionalis PMAnderson 20:26, 19 June 2008 (UTC)

I also dislike (3). The distinction between a recognizable diacritic/letter and an unfamiliar diacritic/letter is completely arbitrary. I'm happy with "Peking" as the capital of China, because it is (or was) well established. But I don't feel we should create some sort of arbitrary hybrid for e.g. Azeri names: The choice should be an anglicized form or the standard orthography, not a bastard of the two. Just look at the crazy "rules" we have on substituting for Đ! This will lend itself beautifully to endless conflict, draining time that we could be spending on more useful endeavors. kwami (talk) 09:33, 20 June 2008 (UTC)

Weak Oppose I guess my overall question is "why have a special policy for diacritics?" There's a general and consistent rule: Wikipedia is the mirror, not the lamp. We just follow what other sources say. When the world realizes that it's being silly and "does it right", then Wikipedia will follow. We could have a separate rule for each condition, but that's just creepy. Somedumbyankee (talk) 17:56, 20 June 2008 (UTC)

Ngo Dinh Diem

I'm sorry GregorB does not see that "use what most other English speakers do" is a working guideline; but it is, and it is the only guidance either necessary or compatible with our fundamental principle for naming articles. This proposal, in fact, encourages Original Research; who calls Diem Ngô Đình Diệm? What source do you have that this mass of squiggles even consists of the right squiggles? Septentrionalis PMAnderson 20:11, 19 June 2008 (UTC)

Nor does National Geographic, for that matter. They are also quite blunt about it: "Although Vietnamese is written in the Latin alphabet, the number of accent marks can be distracting and may therefore be omitted."[1] Personally, I don't mind the diacritics in Vietnamese, but I could concede NG has something of a point. That's why I said that I support the proposal in principle, not necessarily in implementation details. We might still do what NG is doing for Vietnamese; in this particular case I think English sources would agree fully. GregorB (talk) 20:31, 19 June 2008 (UTC)
Also I'd like to point out that "use what most other English speakers do" is not the absolute principle for naming articles, otherwise we wouldn't have titles such as Elizabeth II of the United Kingdom or the like. Newspapers don't call her that. GregorB (talk) 20:47, 19 June 2008 (UTC)
And that exception, which was resolved on long ago because most monarchs don't have unambiguous most common names (the common name is Henry IV, but which of the dozen claimants gets the article?), is more disputed than diacritics themselves; it only remains unaltered because there is no consensus what to change to. WP:NCNT does require that the monarch's name and the name of his country be common English usage. Septentrionalis PMAnderson 20:56, 19 June 2008 (UTC)
@PMAnderson :Ah, yes. Neither Playboy or Cosmopolitan do not use diacritics! You are definitely right! </irony> --Anto (talk) 15:25, 25 June 2008 (UTC)

Zuerich and Goering

I have edited this proposal so that it fits in with WP:UE, if there is a common English usage in reliable sources, for a place or a person we should use it. We do not want a proposal like this being used to suggest that Zuerich is correct, but equally we do not want a proposal that say we must not have Goering. Common English usage takes care of this. --Philip Baird Shearer (talk) 13:33, 19 June 2008 (UTC)

Your edit would change the meaning of the proposal quite radically. Nothing wrong with putting up an alternative proposal underneath or somewhere else, but so that we know what we're discussing, the main proposal ought to be basically stable (like if someone writes that they support the proposal, we don't want confusion as to what version they're looking at when they support it).--Kotniski (talk) 16:11, 19 June 2008 (UTC)
And I don't see how the present proposal would lead to Zuerich. To Zürich, possibly, which is what we have now anyway last time I looked.--Kotniski (talk) 16:13, 19 June 2008 (UTC)
That article begins, surely correctly, by saying: "Zürich (‹See TfM›German: Zürich [ˈtsyːʁɪç], Zürich German: Züri [ˈtsyɾi], French: Zurich [zyʁik], Italian: Zurigo [dzu'ɾiːgo]; in English generally Zurich [zjuːɹɪk]) is ...." - so we should just title it Zurich. Johnbod (talk) 11:59, 20 June 2008 (UTC)

I am confused are you saying that we should not use reliable English language sources to decide content of our articles? --Philip Baird Shearer (talk) 18:30, 19 June 2008 (UTC)

I'm not saying that, but I don't believe "using" them has to mean unthinking imitation of them. Reliable English sources collectively show us that certain styles of writing (treatment of diacritics in this case) are acceptable in English. Out of those styles, we should prefer to use those which are more suitable for an encyclopedia (which people come to in order to be informed). And we should also prefer to be consistent, particularly where failure to do so is likely to mislead. So we may (and indeed already do) adopt naming conventions and stylistic rules which, while consistent with good English usage, do not necessarily entail reflection of the majority of reliable sources in every single case. This is entirely distinct from questions of fact - when a source uses a particular spelling, it is not stating as a fact that "this is the only correct spelling in English"; it merely implies that "this is one acceptable way of writing it in English, the one most in accordance with our [i.e. their] adopted style and technical limitations". Our (i.e. WP's) style and limitations may well differ. --Kotniski (talk) 07:24, 20 June 2008 (UTC)
I would direct you to the introduction to WP:V "The threshold for inclusion in Wikipedia is verifiability, not truth—that is, whether readers are able to check that material added to Wikipedia has already been published by a reliable source, not whether we think it is true." This suggested guideline should start with a definitive statement that make it clear that WP:V, WP:NC and WP:UE are being adhered to and that this is only for exceptions. --Philip Baird Shearer (talk) 07:47, 20 June 2008 (UTC)

Reliable sources indicate that diacritics are common in some words, uncommon in others, and unheard of (except as conscious adoptions of foreign spelling) in yet others. We should do the same: use Besançon (and probably Björn Borg because reliable sources do, use facade and Handel because reliable sources reliable sources generally do, use Stanislaw Ulam, like Rome, because reliable sources almost always do. Inflicting the cedilla of Besançon on facade is reinventing English spelling to provide an non-existent consistency; Wikipedia is not an institute for spelling reform.

Roma, like fontana, can be found in English prose; but both are conscious Italianisms. Except to assert a linguistic fact, they are generally held to be bad writing; Mark Twain kidded the bejesus out of that in Roughing It, and it should be deprecated severely. The same applies to all these proposals to use diacritics where our sources do not. Septentrionalis PMAnderson 15:31, 20 June 2008 (UTC)

There are no "national conventions"

There are no "national conventions" for transliteration! the transliteration is used only when the usage of diacritics is disabled!

And there are no neither unique tranliterations. "Đ" is not always transliterated as "DJ" -so that argument can not be accepted


My proposal is "use personal names as it is official-if there is no English eqivalent". Official names for persons are only ones that they use themselves. Even some Croats have not Croatian names- but we do use their own versions -not Croatized So it is hr:Ksaver Šandor Gjalski -Gjalski - not Đalski --Anto (talk) 19:58, 19 June 2008 (UTC)

Just to say the same: There are no "national conventions" in relation to transcription "Đ" with "Dj". If only ASCII is available, in Serbia it will be usually transcribed with "Dj", but in Croatia it will be transcribed with "D" these days. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)
However, there is no conventions, no standards and using "dj" as a supplement for "đ" (when the later is available) is considered as a non-orthographic form. The only reason why "dj" was used is not because of any kind of transcription or transliteration rules, but, simply, because of lack of the character at old typewriters. Books about orthography don't mention usage of "dj" as an option (actually, some of them, like Mitar Pešikan's (the main author of the Orthography of Serbian language) Our alphabet and its norms suggests usage of "dy" if it is not possible to use "đ" at a typewriter or a computer. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)
Also, according to the English language writing tradition, it is usually to use the full set of the basic Latin characters with diacritics; and letter "đ" is a basic Latin character with a diacritic. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)
So, please, remove this from the rules. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)

Is this a change or a clarification?

If this is a change to existing guidance, we should say so; if not, say so. I see people arguing both above. Please make up your collective minds. Septentrionalis PMAnderson 20:48, 19 June 2008 (UTC)

I think it's a change to guidance which actually moves towards what is currently practised.--Kotniski (talk) 07:09, 20 June 2008 (UTC)

Tokyo

Need a clarification. Does the proposal suggest "Tokyo" to be spelled "Tōkyō"? -- Taku (talk) 23:17, 19 June 2008 (UTC)

It would appear to, since ō falls in the table of diacritics in Latin letters; but I await correction. WP:UE would prefer Tokyo, of course, as common usage. Septentrionalis PMAnderson
WP:MOS-JA (see point 9) specifically states that it should be Tokyo as it has been established as an English word that way. ···日本穣? · Talk to Nihonjoe 02:13, 20 June 2008 (UTC)

Which is the whole point, it should be Tokyo in line with normal English. The proposal goes

When person or place has a name in the Latin alphabet ...

Sure it's a transliteration of "東京" but you can argue that it does have the name. This is a point that would have to be fixed. We surely would want to restrict the guideline to people and places the names of which are natively written in the Latin alphabet. JIMp talk·cont 02:56, 20 June 2008 (UTC)

Aye, indeed. —Nightstallion 11:01, 20 June 2008 (UTC)

Handel

When this says it does not support Handel, would it move to George Frideric Händel? Why? Nobody uses, or would use, that hybrid form; Grove's uses George Frideric Handel, as he did; that's why we do. Septentrionalis PMAnderson 02:17, 20 June 2008 (UTC)

Pick your choice:
I'd avoid hybrid combinations like old English middle name combined with German version of the last name,... --Francis Schonken (talk) 11:33, 20 June 2008 (UTC)

Revision

I've revised the wording of the proposal (hopefully without changing its original intention) to deal with some of the cases arising above (Tokyo and Handel for example).--Kotniski (talk) 07:54, 20 June 2008 (UTC)

Please make your proposed guidance consistent. --Francis Schonken (talk) 20:54, 20 June 2008 (UTC)
What do you see as inconsistent at the moment?--Kotniski (talk) 07:22, 21 June 2008 (UTC)
Well, for starters, it doesn't explain what happens when rule #1 and rule #5 lead to a different result (e.g. Händel/Handel, Schönberg/Schoenberg, Jogaila/Jagiełło,...) --Francis Schonken (talk) 15:22, 21 June 2008 (UTC)
OK, I see your point. Rule 5 is intended as an exception to Rule 1; I'll try to rephrase it to make that clear.--Kotniski (talk) 15:54, 21 June 2008 (UTC)
Rule #5 is stated as if that is what we generally do, currently we don't always:
Rule #5 also doesn't explain what happens if it leads to an ambiguous result,
  • e.g. a king of Polish(-Lithuanian) descent becomes ruler of Hungary and Bohemia: what should we do: use the Czech version of the name? The Hungarian? The Polish? Or?
--Francis Schonken (talk) 16:15, 21 June 2008 (UTC)
OK, these are questions I'm sure the proposal was never intended to cover. It's basically only addressing questions of "name with diacritics" vs. "same name without diacritics". I guess it needs slight rewording to make that clear. --Kotniski (talk) 20:14, 21 June 2008 (UTC)
I think what you need is called scope definition. Think e.g. Wikipedia:Naming conventions (standard letters with diacritics)#Scope – not that that one saved that proposal. Probably also you could do with a simpler one. The field is too delicate though not to have a very sturdy one. --Francis Schonken (talk) 20:45, 21 June 2008 (UTC)

Most of these problems go away if the guidance in WP:NC and WP:UE are followed and we use verifiable reliable English language sources to decide these naming issues. --Philip Baird Shearer (talk) 01:22, 22 June 2008 (UTC)

Ulam

Note that Stanislaw Ulam differs from the Polish form only in the l in Stanislaw. I don't see how this proposal supports it; but this is the usage of his autobiography and his coworkers. Septentrionalis PMAnderson 14:59, 20 June 2008 (UTC)

The part about naturalization is intended to cover cases like this.--Kotniski (talk) 07:26, 21 June 2008 (UTC)
I would support removing the roman text from Where a name which is clearly the best-established in English differs in spelling, other than merely in terms of diacritics or ligatures, from the native name, then the English name is used. (But this leaves us where we started.) What reason is there to distinguish between the two cases? Septentrionalis PMAnderson 15:01, 20 June 2008 (UTC)
Because when you see diacritics and ligatures, you know what the spelling is without them. See Zürich and you recognise Zurich. But see Göring and you don't necessarily recognise Goering. Maybe it isn't phrased very well (and nor is this explanation), but I think you get the point.--Kotniski (talk) 07:26, 21 June 2008 (UTC)
Not if you don't know the foreign language in question. Is ö to be represented by o or oe? It varies. But let me rephrase the question: why distinguish between diacritics and ligatures on one side, and everything else on the other? This proposal sensibly would leave Castile at Castile, not Castilla, but would move Aragon to Aragón. This makes no sense, and invites the mannered and illiterate phrase "Aragón and Castile", which nobody ever uses. (One time in ten, pedanticism will use "Aragón and Castilla", or sometimes "Aragon and Castilla".) Septentrionalis PMAnderson 17:38, 21 June 2008 (UTC)
On the first point, we minimize confusion overall by being consistent in our treatment, which is the aim of the proposal. The second point is a good one though; we probably need another exception for historically well-established spellings (that might permit umlautless Zurich as well, which would be fine by me).--Kotniski (talk) 20:48, 21 June 2008 (UTC)

Almost invariably called Boscovich (or Boscovitch), but always a citixen of Ragusa. Septentrionalis PMAnderson 15:34, 20 June 2008 (UTC)

That's what I mean by English version of a name. When English orthography is applied as opposed to simply dropping the diacritics, as has been done with many Serbian and Croatian scientists. Boscovich or Boscovitch is where UE is actually applicable. BalkanFever 22:38, 20 June 2008 (UTC)
Aye, in that case noone here's calling for the original name. —Nightstallion 10:06, 21 June 2008 (UTC)
And a good thing too; but this proposal would, unless it gets another bell and whistle to exclude him; perhaps this ad hoc notion that ch can be English orthography, and c can't. Septentrionalis PMAnderson 17:41, 21 June 2008 (UTC)
My understanding is that the current proposal mandates Boscovich per point #2 ("Where a name which is clearly the best-established in English differs in spelling, other than merely in terms of diacritics or ligatures, from the native name, then the English name is used."). Established anglicized version, more or less the same as Rome vs Roma - no dispute here I suppose. GregorB (talk) 19:04, 21 June 2008 (UTC)

Leave it to regional projects

I think this is something best left to individual regional WikiProjects and language manuals of style as they will have the best idea of how the words affected by this should be used. ···日本穣? · Talk to Nihonjoe 04:44, 21 June 2008 (UTC)

The problem is when members of Wikiprojects like the Tennis one omit all diacritics. BalkanFever 04:57, 21 June 2008 (UTC)
That's a patently false assumption. Whether a particular biography of a tennis player will include or not include diacritics will depend on what reliable English-language sources are doing for that player. How many times and ways do we have to say this before it is clear? Tennis expert (talk) 06:14, 21 June 2008 (UTC)
You can't use a source that doesn't use any diacritics to support your argument. How many times does that have to be repeated? BalkanFever 06:29, 21 June 2008 (UTC)
You still aren't listening. A number of older players with diacritics in their names do not have a biography on tennis websites. If the name of the English-language Wikipedia article for an older player uses diacritics, then the article name will not be touched unless it can be demonstrated that reliable English-language sources (such as books and newspapers) do not use diacritics for that name. Do you now understand the concept? And by the way, no one has explained why a website that never uses diacritics is ipso facto unreliable. Instead, people like you just keep saying it is, which makes me think that it's unreliable only because it never serves the agenda of the always-use-diacritics crowd. Tennis expert (talk) 07:05, 21 June 2008 (UTC)
That's rich, coming from Mr. diacritics-are-scum. If a website does not use diacritics, it is probably due to technical restrictions or a stylistic issue. Unless you can prove it isn't, i.e. show that sometimes they use diacritics, then you cannot use it as a source for omitting the diacritics. Get it? And please tell me, what agenda do I have? I really would like to know. BalkanFever 07:31, 21 June 2008 (UTC)
(1) I've never said "diacritics-are-scum" or anything close to that. NEVER. Got it? In case you don't, let me rephrase it. NEVER EVER EVER in this or any other discussion, on or off Wikipedia, have I ever said that "diacritics-are-scum" or anything remotely similar to that. Got it now? (2) If an English-language website does not use diacritics for WHATEVER reason, what makes the website unreliable as evidence of English-language usage concerning a particular name? I see no logic whatsoever in your arguments. You and others simply provide the bottom line of unreliability without saying WHY. That's unacceptable. (3) Your agenda appears to be "use diacritics always because it's the correct thing to do even if reliable English-language sources don't use them" and to discredit your opponents by putting words in their mouths and deliberately misrepresenting their actions when any reasonable person would know the truth of their words and actions by simply listening to and watching them. Tennis expert (talk) 13:32, 21 June 2008 (UTC)
To the "expert": Yes, I get it. You resort to ranting when you don't get your way. BalkanFever 02:07, 22 June 2008 (UTC)
Don't get my way? Once again, I have no idea what you're talking about. Civility doesn't appear to be your strong suit. Maybe you should read or re-read WP:CIVIL before your next attempt at fictional writing concerning yours truly. Tennis expert (talk) 07:40, 22 June 2008 (UTC)

(reply to TE, edit conflict; and try to remain civil, chaps) I think we get the concept by now, but your proposed way of proceeding is fraught with problems which would make the encyclopedia less good. They have been set out at length in previous discussions; I would summarise them as inconsistency, instability, potential misinformation and constant argument.--Kotniski (talk) 07:34, 21 June 2008 (UTC)

Less good because Wikipedia would then reflect standard English-language usage in the tennis world and would reject individual Wikipedia editors' conception of what's right, wrong, or respectful to nationalistic interests? What would make our procedures "instable"? The instability comes from editors like you who make losing proposal after losing proposal to change a perfectly fine existing policy. The inconsistency and constant argument come from people who for the most part refuse to abide by Wikipedia policy and instead come up with their own rules, unilaterally implement them, and then fight every effort to enforce that policy through edit warring, interminable and repetitive discussions, canvassing, and, worst of all, the recruitment of administrators to threaten nonadministrators that they will be reported, blocked, or banned if they persist in trying to enforce that policy. 13:32, 21 June 2008 (UTC)Tennis expert (talk)
You have argued on a number of occasions that nationalism, original research etc are the real opponents here and the correct names should not be used, ignoring completely the fact that it is their name, they were born with it, they have not legally changed their name, their name is written and pronounced correctly only in its original form, and they are known by a great many English speakers by their correct name. (A comparison is Russian, where their name has been reproduced for English speakers using a transliteration, so Ку́рникова has been rendered "Kournikova", with the unusual character substituted as "ou", rather than "u".) It does not take a nationalist to highlight that correctness should be our end goal, rather than compliance with some other website's style guide that we had no hand in developing - and when the sources are there to indicate what a name should be, there's no problem with using them. If the person has specifically themselves indicated a preference for a different name in the English language - which isn't terribly common, but is not rare either (Martina Navratilova is a good example) - then we go with that and source it. Easy. Orderinchaos 14:37, 21 June 2008 (UTC)
I've never said that "correct names should not be used." NEVER. Thanks for being about the 10th person to misrepresent my position. I challenge all of you to cite reliable English-language sources that use diacritics for the article names I've listed on the tennis moves project page. Place those citations in the appropriate place, which is the discussion page of each tennis player article. It's that simple. Tennis expert (talk) 14:56, 21 June 2008 (UTC)
I'm sure we could find some if we tried, but just suppose we found such citations for some names only. You would allow the diacritics to be kept on those we happened to find, and not on the others? I think it's obvious how this inconsistency, which reflects no truth of any importance in the real world (we know well that either form is always acceptable in good English without actually having to see it), is going to annoy and mislead readers. And what I mean by instability is that if someone comes up with a few new sources, or some existing source changes its style, we'll keep getting proposals to change the name back and forth - again to no encyclopedic purpose. --Kotniski (talk) 15:51, 21 June 2008 (UTC)
Change is what Wikipedia is all about. Otherwise, failed proposal after failed proposal about diacritics wouldn't keep appearing. And if you were truly concerned about annoying readers, you would cease with these repetitive proposals. And, no, there wouldn't be any inconsistency caused by Wikipedia. Wikipedia would simply be reflecting the inconsistencies caused by the real world, which is what a good encyclopedia does anyway, not try to fix them by encouraging editors to engage in original research or, more accurately, original opinionating. Tennis expert (talk) 16:32, 21 June 2008 (UTC)
That is just your POV. Original name is not any kind " original research " ! Claiming that is total nonsence. What you want to say. that you know somebody's name better than he himself knows. --Anto (talk) 18:53, 21 June 2008 (UTC)
Whatever also you may think of the proposal, you can hardly claim it encourages original research or opinionating, when it's far more objective than your proposed way of doing things (and I wonder why your multiple tennis-player nominations don't count among these "repetitive proposals"?)--Kotniski (talk) 20:36, 21 June 2008 (UTC)
How a name is represented in written English is a fact, so policies about facts should apply. Not sure why that's so controversial, but it is. Somedumbyankee (talk) 20:43, 21 June 2008 (UTC)
The original name is also a fact. With diacritics we are in the happy position of being able to give people both facts in one, since their general knowledge of English will tell them that the diacritics are often omitted. --Kotniski (talk) 20:58, 21 June 2008 (UTC)
I guess the problem is that WP:NC and WP:UE recommend "most common name" which in this case I'm reading as "most common representation of that name". Making Wikipedia have "more truthiness than the average reference" is a dangerous step. Better written, sure. More accessible, definitely. Free of obvious errors, we'd like to think so. Any implication that we are "more right" sets off massive alarm bells for me. I don't really care about the diacritics, frankly, but I am an evil government peon and an enemy of WP:TRUTH in all forms. Somedumbyankee (talk) 21:11, 21 June 2008 (UTC)
Let me restate, with emphasis: I think this is something best left to individual regional WikiProjects and language manuals of style as they will have the best idea of how the words affected by this should be used. Please note that my statement does not apply to the Tennis WIkiProject. ···日本穣? · Talk to Nihonjoe 22:07, 21 June 2008 (UTC)
I would say it's only a runner if there are very, very precise and exhaustive rules for clashes where names come under the auspices of multiple regional and language WikiProjects. Adam Mickiewicz is almost certainly at the correct title, but when you're dealing with a Polish (diacriticless name) - Lithuanian (diacritised name) born on what is now Belarusian soil (Cyrillic alphabet, five transliteration systems documented here and counting), it's fortunate that the name most used in English is crystal clear. Borderline cases will always be messy, but we don't want to create extra problems in picking between different wikiprojects systems, especially when the remit of a given WikiProject is not really well-defined beyond one editor's drive-by templating - see Talk:Karlovy_Vary#WikiProject_Germany... Knepflerle (talk) 18:29, 23 June 2008 (UTC)
Mickiewicz and Mickevičius are two distinct names; the latter is not a diacritised version of the former. Therefore, UE is applied. If, however, it was simply between Mickevičius and Mickevicius, the point of this proposal is to use the former, not the latter. BalkanFever 03:32, 25 June 2008 (UTC)
In the case of most East Asian languages, there is only one country per language (Chinese being the big exception). The WP:MOS-JA is very clear on how to romanize Japanese words, though. ···日本穣? · Talk to Nihonjoe 03:34, 25 June 2008 (UTC)

Sources

As a long time reader and researcher, what has always troubled me is that Wikipedia is so often wrong. I typically first use Encarta because for the most part they can at least spell their topics correctly. Of course I understand Wikipedia's position: use the most common form. When it comes to diacritics, they are a pain to type, so they will often be omitted by all except who care. They will never be the most common. There are often technological restrictions as well. Many journals and newspapers have a zero-diacritic policy for these reasons. And searching the web is almost always a wasted effort. University-level academic publications will often get it right. If you can not even spell the subject matter correctly, then why even bother having an article on it? It is important to use English when available. However, an English exonym is not created merely by dropping the diacritics. References are also important, and they need to be quoted. However, due to the nature of input, non-diacritic references will surely always outnumber those which properly include them. As long as it is verifiable, commonality should not be an issue. Let redirects take care of the rest. This issue has been holding Wikipedia back from reaching its full potential for a long time. 124.102.8.155 (talk) 22:15, 21 June 2008 (UTC)
That deals with the quality of the sources used; in practice and in policy, we do use the best sources available. This is never raw Google, or a collection of somebody's blog sites; ideally it would always be University-level academic publications (which can indeed be searched; use JSTOR if you have access to it, and Google Scholar if you don't). For tennis players still playing, the newspapers and the sites of tennis organization are often the best available sources.
Using those sources, it is simply not true that non-diacritic spellings always outnumber diacritic'd ones. Septentrionalis PMAnderson 23:08, 21 June 2008 (UTC)
As for Wikipedia being often wrong: yes, we are not a reliable source for anything. WP:V says so; and I do not expect this to change before our publication date, although there is one glimmer of hope. Septentrionalis PMAnderson 23:11, 21 June 2008 (UTC)


124.102.8.155 you wrote "I typically first use Encarta because for the most part they can at least spell their topics correctly." please define correct in this context. --Philip Baird Shearer (talk) 07:29, 22 June 2008 (UTC)

I'd assume he shares my opinion that
  • the original spelling, including all diacritics (in the case of a Latin-based alphabet); or
  • the professional and scientific transliteration, including all diacritics (in the case of a non-Latin-based script)
is the "correct" rendition of a topic. —Nightstallion 10:44, 22 June 2008 (UTC)
What makes that more correct than the usage in verifiable reliable English language sources? --Philip Baird Shearer (talk) 23:07, 22 June 2008 (UTC)
Precisely. Needless to see, it needs to be verifiable in any case. 124.102.8.155 (talk) 23:31, 22 June 2008 (UTC)
There are verifiable, reliable English sources both with and without diacritics. Just like any editor, we need to choose one and try to be consistent. At least with diacritics both those who desire them and those who do not can read it unmodified: both sides win. Missing diacritics are often not possible to recover without further research: one side wins; one side looses. If you have worked in academia, stripping diacritics is generally frowned upon as a form of sloppy, unprofessional writing. That is one issue that I constantly hear about Wikipedia. But it does not need to be so. 124.102.8.155 (talk) 23:31, 22 June 2008 (UTC)
Re "stripping diacritics is [...] unprofessional writing": Yes, exactly. —Nightstallion 10:06, 23 June 2008 (UTC)
Including diacritics where we have no evidence for them is unencyclopedic and deeply unprofessional writing. Writing Foolandish instead of English is pedantry, which is also unprofessional writing. This misbegotten proposal advocates both. (WP:UE opposes stripping diacritics normally used in English, however.) Septentrionalis PMAnderson 15:01, 23 June 2008 (UTC)


Are you really following the conversation? Quote: "Needless to see, it needs to be verifiable in any case." Quote: "There are verifiable, reliable English sources both with and without diacritics." No one is purposing to arbitrarily add diacritics just for fun without any evidence. And I hope that no one is purposing arbitrarily removing diacritics just for fun without any evidence. 124.102.8.155 (talk) 22:45, 23 June 2008 (UTC)
Both have been suggested; adding them without any evidence of their use in English (or, in some cases, with the slightest possible evidence) has been insisted upon. Septentrionalis PMAnderson 23:29, 23 June 2008 (UTC)

Anto is right. Writing original name of a person is not OR, so there is no need for citing websites. This is Wikipedia written in English language, not Wikipedia written for users who speak English. And I see that English language does allow diacritics, so what is the problem? Disrespect for ours cultures? -- Bojan  18:09, 24 June 2008 (UTC)

And what about surnames Šimić (e.g. Dario Šimić) and Simić (e.g. Nikola Simić, an actor)? If we drop diacrtitics, both surnames will be written as Simic. -- Bojan  18:28, 24 June 2008 (UTC)

We need diacritics like a hole in the head

We need diacritics like a hole in the head. They should be avoided except where absolutely necessary. --Anticipation of a New Lover's Arrival, The 21:19, 22 June 2008 (UTC)

Well, that was certainly a helpful addition to the dialogue. Unschool (talk) 22:26, 22 June 2008 (UTC)
Actually, it was exactly as dogmatic as some of the "use diacritics, because they're correct" posts we've had. Let the two extreme positions cancel each other out. Septentrionalis PMAnderson 22:34, 22 June 2008 (UTC)
As usually, Pmanderson is misinterpreting other peoples' statements. Nobody has told that diacritics must be always used! That is just one of your fabrications! We insist that the personal names have to be spelled as the persons spelled their names themselves. If persons have anglfied their names and they use that forms. So, we have John Malkovich , George Radanovich, Gary Gabelich , George Chuvalo etc. the extremism here would be to insist that their names have to spelled in their original forms (Malković, Radanović, Gabelić, Čuvalo) -which makes no sense because they don't use that name forms. So, nobody insists for usage of Croatian forms!! --Anto (talk) 10:44, 25 June 2008 (UTC)
I don't believe there's a lot to add. We've got diacritics even on city names like Zurich, though the umlaut in that city's name is dropped by the native French-speaking Swiss, and the umlaut is seldom used in contemporary English references to Zurich. It's just daft, and only makes it more difficult to search for names in Wikipedia. --Anticipation of a New Lover's Arrival, The 23:06, 22 June 2008 (UTC)
Not if appropriate redirects are used. ···日本穣? · Talk to Nihonjoe 02:11, 23 June 2008 (UTC)
Well, my problem is not with finding articles, it's with reading them. The whole purpose of WP:UE is to make this English-language encyclopedia comfortably readable for all persons who read English, not just the small percentage (yet overrepresented on Wikipedia talk pages) of readers who are comfortable with non-English characters. No one who speaks languages other than English is hurt by Vossstrasse (they already know about how English traditionally handles eszett), but many readers will be thrown off by a spelling that their best guess tells them is VoBstraBe. Those redirects do come in handy—and they should be used to help the person who types in Voßstraße. Unschool (talk) 02:45, 23 June 2008 (UTC)
Can you read them properly with transliteration. No, you caN'T . I can bet that you pronounce 99% of the foreign names improperly. So, what is a big deal??

--Anto (talk) 17:04, 23 June 2008 (UTC)

Although Diacritic does mention it (not sourced), ß is not what I consider a diacritic. There is no mark to modify the S; it is a base character. While it is of the Latin script, it probably should not be dealt with by this proposal. 207.46.92.16 (talk) 04:07, 23 June 2008 (UTC)
I think that's a semantic quibble. Eszet (ß) poses almost identical problems for readers of English not familiar with German. If ß does not fit the proposal as named, I think it would be better to rename the proposal so that it covers ß to your satisfaction. --Anticipation of a New Lover's Arrival, The 09:35, 23 June 2008 (UTC)
Surely not identical problems - eszet doesn't look anything like double "s", whereas letters with diacritics do look like their alterantive forms (readers quickly get used to "not seeing" diacritics they don't want truck with). The distinction between the two cases is made in the proposal. Actually, though, much current practice on WP seems to be to include the non-diacritic extended characters like the eszet, Icelandic thorns, Croatian crossed d's and so on. Just to make it quite clear, implementation of the proposal as it stands would actually lead to significantly fewer foreign squiggles in Wikipedia, not more.--Kotniski (talk) 10:00, 23 June 2008 (UTC)

'ß' is not a diacritic indeed, but a ligature. Until about the beginning of the 19th century in Britain and France as well, a single 's' had the form of, more or less, a 'f' without its horizontal bar. This explains the left side of the letter and then, when you ignore the top right (the ligaturing part), the bottom of the right side looks like a modern 's'. This ligature was in use in other European languages too at the time, but disappeared, except in Germany and (partly) in Austria. Switzerland - another country where German is spoken by a substantial part of the population - has abandoned it at the same time as the 'Fraktur', often called 'Gothic script'. It is absent from a Swiss keyboard and was (probably - that's so far away that I don't remember) already away from typewriters. In addition, 'ß' is always converted to 'ss' in indexes, search arguments and so on. So it is almost like an esthetical display choice like 'œ' vs. 'oe'. Clpda (talk) 22:23, 29 June 2008 (UTC) (additions/corrections Clpda (talk) 12:34, 30 June 2008 (UTC))

So would implementation of present guidance; the problem is not guidance, it is a handful of nationalist editors who will run to forms familiar to them whatever English does. The downside is that this proposal would, since it relies on common usage for the character, not each individual word, ban diacritics where we should use them, and require diacritics where we do not. Septentrionalis PMAnderson
Haha . Can you read any foreign name which is not English??? --Anto (talk) 17:02, 23 June 2008 (UTC)
I think that's kind of the point of WP:UE, you only have to be able to read English to use it. Somedumbyankee (talk) 17:13, 23 June 2008 (UTC)
It is, after all, the English Wikipedia. Where there is a well established English usage, we should follow it. --Anticipation of a New Lover's Arrival, The 18:21, 23 June 2008 (UTC)
Sorry ,guys, but you have missed the point. there are some things that you won't be able to understand not matter how good is your Englis . Such as Differential (calculus) -which is not easy to understand. And sorry , we can't simplify you with adding and substraction.

For simlified issues search the books for children--Anto (talk) 04:56, 24 June 2008 (UTC)

See Wikipedia:Make technical articles accessible. This isn't technical, but the same logic applies. I would like to believe that you shouldn't need a PhD to use wikipedia, but I am, after all, an American of no particular intelligence. Somedumbyankee (talk) 15:21, 24 June 2008 (UTC)
This is not technical issue whether to use diacritics for some names. We can use them- Wiki software allows it. For the Latin Script names there is not need to transliteration. One thing you have to confess :Some things will never be clear to you. Some things you will never understand. Same thing for me. But you can not distort facts in order to be intelligible to you. The reality is complex as it is. Some things might be complex. What is the cure for that ?? LEARNING , LEARNING , LEARNING! Don't blame Newton because you don't understand calculus! Do not blaim Bruce Lee because you don't know karate. And don't blame Germans because you can't pronounce German names. it is only YOUR fault. If you don't understand something and don't want to know - go away from it. Don't be destructive.--Anto (talk) 17:41, 24 June 2008 (UTC)
Indeed - and good point. Orderinchaos 17:55, 24 June 2008 (UTC)
(dropping indent count) Actually, I think he missed my argument entirely. The policy I pointed at was about making the articles accessible to the "average user" and has nothing to do with "technical problems". I am neither ignorant nor anti-intellectual and I pronounce German reasonably well from many years of singing it (the average jelly donut would probably find my pronunciation stilted, though). Wikipedia is an encyclopedia built on reliable sources, not the place to push proper spelling because "all of the terrorists communists fascists methodists Hedy (HEDLEY!) Lamar's band of thugs other people are doing it wrong." Somedumbyankee (talk) 23:00, 24 June 2008 (UTC)
You have missed my point, entirely! We insist on using the name forms that are used persons themselves. And these are the facts! How some half-literrate journalists call him/her that is secondary- less relevant issue. precisely because [3] that everybody can write a book or build a website in which he can claim what evere he wants--Anto (talk) 10:58, 25 June 2008 (UTC)
The New York Times, the BBC, the US Department of State, and the United Nations are "half literate"? Uh... yeah. These are all reliable sources about Slobodan Milosevic, and to kick it off, the Serbian Embassy to the United States uses no diacritics. Clearly they're a bunch of idiots who haven't given the topic much thought. Somedumbyankee (talk) 19:38, 25 June 2008 (UTC)
All verifiable, but hardly reliable. Again, here is where Encarta shows its professionalism and accuracy: Slobodan Milošević. Wikipedia too gets it right, but you never know for how long... 124.102.8.155 (talk) 21:32, 25 June 2008 (UTC)
So the Serbian Embassy can't spell his name properly, but Microsoft can. Excuse me while I have a good chuckle at that comment. Somedumbyankee (talk) 21:41, 25 June 2008 (UTC)
Some sources are reliable for certain issues. But not for all. Including the spelling!!! I believe there was never a man in Serbia with name Slobodan Milosevic! --Anto (talk) 17:52, 26 June 2008 (UTC)
Please read some reliable English language sources such as the ICTY website MILOSEVIC Case Information Sheet(IT-02-54) "Bosnia and Herzegovina" then you will be aware that such a person existed. --Philip Baird Shearer (talk) 19:18, 26 June 2008 (UTC)
Between some random guy on the internet and the Serbian Embassy, I think it's obvious which one I find more credible. Somedumbyankee (talk) 18:01, 26 June 2008 (UTC)
It does not change a fact that a named "Milosevic " did not exist. ICTY might be reliable for the data from his history. But not for his name spelling. Just take a look at the book of Carla del Ponte in which she calls "Serbs and Croats sons of bitches" here and here

Are we gonna put this statemnt somewhere ( in articles Croats , Serbs as a statement from "reliable source"?? --Áñtò | Ãňţõ (talk) 10:23, 28 June 2008 (UTC)

By that logic, there never was a man named ar:سلوبودان ميلوسيفيتش, el:Σλόμπονταν Μιλόσεβιτς, ko:슬로보단 밀로셰비치, he:סלובודן מילושביץ', ja:ソロボダン・ミロシェビッチ, or th:สโลโบดัน มิโลเชวิช either. All those people also obviously stupid because they can't spell his name right!!!!one!! English is a distinct language and it has its own customs and traditions and ways to spell foreign words. It has just as much "right" as Hebrew to spell it differently. Somedumbyankee (talk) 14:36, 1 July 2008 (UTC)

If we apply either guideline correctly, because of redirects finding articles should not be a problem. As for the arguments pertaining to difficulty of reading the articles - are we really saying that this article in the New Statesman is fundamentally "less difficult to read" than this one from nine months later, because it doesn't use ö in Schröder? Or the Guardian's football reporting inherently confusing to English-speakers compared to The Independent's? Are the Economist's articles on Czech subjects easier to understand than those on French ones due to the vagaries of their style guide? When we look at the cost-benefit analysis of using diacritics, diacritics do have benefit to those who understand them; the extent of the cost to readers who do not understand them has not yet been demonstrated. I'm not saying "all diacritics should be allowed because they do less harm than good" - however, the discussion here might benefit from explicit demonstration of how diacritics negatively affect articles, so that it we can better focus discussion on specific issues and cases and begin looking at how these problems could be addressed. Knepflerle (talk) 19:06, 23 June 2008 (UTC)

The Economist's style guide, as a crude rule of thumb, is doing more or less what current guidance would encourage. Françoise is English usage, and we should use it (so would the Economist); Plzeň is not; we use Pilsen, the Economist makes do with a easier method and plumps for Plzen. Septentrionalis PMAnderson 19:48, 23 June 2008 (UTC)
Do these differences have any impact on our readers' understanding compared to theirs? Do readers cope with the small differences, perhaps in a similar way readers cope with differences in orthography between UK and US publications, or the way The Times uses Lyons and other press uses Lyon, or is there something fundamentally different in the case of diacritics? Are the negative effects provably worse in names from languages outside the French-German-Spanish-Italian axis? What if the form of the diacritics in other languages is the same as these ones - (eg Jana Novotná) - is it the form of the diacritic or the language of origin which is crucial? Knepflerle (talk) 20:18, 23 June 2008 (UTC)
Is this a demand to adopt this policy unless we can do a scientific study on our readership's comprehension? If so, why should this rhetorical device be confined to this proposal? First show me the our readers benefit from Ngô Đình Diệm; I certainly don't, and I doubt many of the readers of the histories of Vietnam which don't use those diacritics do. Septentrionalis PMAnderson 20:27, 23 June 2008 (UTC)
And above all, the key question: what's wrong with writing this English Wikipedia in English? Septentrionalis PMAnderson 20:29, 23 June 2008 (UTC)
I appear to have hit a nerve. I explicitly pointed out above there is no implicit demand in my questions. This simple cost-benefit analysis is what the major pro-diacritic argument boils down to - small positive benefit to small number of readers and no negative effect on the rest is still a small positive effect overall; I am just highlighting this in the hope that focusing on the details of this argument on both sides leads to new directions instead of infinite facile restatements of old chestnuts.
We've been focusing on your broad-brush "key question" for years now, but there's no agreement yet for a variety of reasons. And yes, readers' comprehension is the correct yardstick against which we should be developing new ideas. What the use of diacritics can add or detract from readers' comprehension is precisely what we should focus on, and this is an invitation for people to do just that. Knepflerle (talk) 20:57, 23 June 2008 (UTC)
PS: in a discussion on the generalities of using diacritics and in a subsection titled like this one, I think it's important to highlight some inherent contradictions and implementation problems with the eradication of diacritics (I wrote about this in more detail in this post to WT:NC(UE), but note the specific wording of the proposal there was quite different to this). However, I am quite happy in both the ideological consistency and practice of the current WP:UE's usage-based rules, sitting comfortably as it does with the core policy of WP:V - verifiability of orthography, not "truth" being the deciding factor. I support what we've got already over any alternative proposed so far. Hope this clarifies things somewhat. Knepflerle (talk) 21:20, 23 June 2008 (UTC)
Using Đoković when our readers are accustomed to Djokovic, or Schroder when they are accustomed to Schroeder or Schröder are both barriers to comprehension. How high they are we cannot tell, so the cost-benefit analysis is unperformable, but both should be avoided as far as practicable. We should not assume away real costs, nor should we claim to know what we cannot; both are all-too-common problems with cost-benefit analyses. Septentrionalis PMAnderson 21:17, 23 June 2008 (UTC)
The first case I might well personally believe. The second I cannot - all three spellings are widespread in English-language literature, just as we expect English-language speakers are accustomed to color/colour and -ise/-ize. Just say, merely for sake of argument, that Schroder were the predominant form seen in English-language sources, say 70/30 over Schröder, I still strongly doubt seeing Schröder would be a "barrier to comprehension", any more than the town of Zzyzx is hard to comprehend because it does not obey standard English orthographical rules. It is hard to believe comprehension is an issue when non-technical commonly-read English-language reliable sources such as the newspapers and magazines highlighted above use the diacritic regularly. It is hard to believe comprehension is impaired to any significant level at the Schröder article when other words might commonly use the same diacritic, say at Göttingen, unless there is a form of transient word-blindness I am unaware of. And yet, in that case would not use Schröder because of the predominance clause in WP:UE, even though it would be a verifiable, reliably-sourced spelling which might offer extra information to some of our readers without impairing the others. I'm not saying we should change our stance, but it's that kind of situation that has meant the discussion on this topic is still ongoing. Knepflerle (talk) 21:42, 23 June 2008 (UTC)
It's Schroder and Gottingen that would be barriers to comprehension to a reader who expected the umlauts; majority usage works both ways. Septentrionalis PMAnderson 23:56, 23 June 2008 (UTC)
You have somewhat missed my point: if the usage split as we measure it is say 60/40, then I doubt readers expect either version, or find either a barrier to comprehension, as they will be regularly exposed to both. Our readers expectations and understanding are not measured in the black-and-white of our majority decisions, which is why there may be a case for analysing other benefits of particular spellings. Knepflerle (talk) 13:09, 26 June 2008 (UTC)
I agree that in the case of a 60/40 split (i.e. no obvious or consistent English usage) we should stick to the more complete spelling. My problem with this proposed guideline is that it recommends we use that spelling in 95/5 split cases where the spelling with diacritics is obviously not the common English usage. Somedumbyankee (talk) 14:36, 26 June 2008 (UTC)
Instead of "no obvious/consistent usage" I would rather call it "parallel dual usage" - just like -ise and -ize spellings are both widespread in the English language canon taken as a whole, and we expect readers to see both here on en.wp just like they see both in English-language world. I agree that the current proposal goes too far, and that a little tweak to the existing WP:UE could account for this case - a sufficiently common level of use is what is important, not whether Google gives 46% or 52% of the results to one spelling (especially given Google's inherent biases, useless optical recognition of diacritics, patchy coverage, poor counting algorithms which lead to incorrect totals...). But that is a discussion for another day, at WP:UE. Knepflerle (talk) 16:32, 26 June 2008 (UTC)
I also agree. Anyone who argues that we must take the 52% side of a 52/48 split is ignoring the basic justification of WP:UE; our readers will have seen the 48% usage. They are also ignoring the problem that all our search engines result in samples (and samples with unknown biases) of all English usage. What language would you propose? Septentrionalis PMAnderson 16:58, 26 June 2008 (UTC)
As always, it'll be hard to convey the spirit precisely whilst eliminating ambiguity - I'll have a think, and post to here and UE. Knepflerle (talk) 17:40, 26 June 2008 (UTC)

I think the point about Plzeň vs. Pilsen is off the mark: Plzeň is Czech, Pilsen is English. That's an entirely different issue than whether we write the Czech name Plzeň or "Plzen". I don't care whether we use the Czech or English name. However, I do object to the faux-Czech name "Plzen" because it is imprecise. Such easy-to-avoid imprecision is not appropriate in an encyclopedia. True, many readers won't know the difference and won't care. A few will know enough to supply the diacritic themselves. But there are a large number of us who appreciate seeing the actual name, and who don't know enough to fill in the gaps. Take Ngô Đình Diệm: Readers who don't know how to pronounce that won't be able to pronounce "Ngo Dinh Diem" either; however, those who know enough to work out Ngô Đình Diệm will get it completely wrong if we leave off the diacritics. As for Schroder, Schroeder, or Schröder, are we really going to get into an edit war with every Wikipedia article over which spelling is "most familiar" to which groups of people? Why not just write his name as it's spelled and leave it at that?

We don't need to dumb down Wikipedia on these matters. Any encyclopedia worth its salt shouldn't be dumbed down. Are we interested in emulating the EB here, or have we given up on ever achieving any respectability and are willing to settle for World Book? kwami (talk) 21:35, 23 June 2008 (UTC)

Most English speakers can write, and have heard, "Ngo Dinh Diem"; they probably would pronounce it with a vile accent, but diacritics will not solve that; it is even less likely to be fixed by an unsourced spelling, which most English speakers have never seen and will learn nothing from (we can include it in parentheses, if any would). It is mere pedantry, and interferes with comprehension, to use spellings our readers have not seen.
Inventing diacriticed spellings without authority is not merely dumbing down, it is being dumb; this proposal would mandate ignoring the authority of our actual sources to invent terms like Catherine of Aragón.
As you can probably tell, I have had enough. If any editor supports this who actually has English as his native tongue, do let me know; until then, I utterly oppose this effort by aliens to rewrite the English language for their own convenience. Septentrionalis PMAnderson 21:51, 23 June 2008 (UTC)
I gladly forgo any added validity of my arguments that depends only on the language of my parents. Knepflerle (talk) 22:13, 23 June 2008 (UTC)
My native language is English. I'm also not talking about "inventing" spellings. Just the opposite: I'm suggesting that we don't invent spellings by deleting diacritics. I would never write "Catherine of Aragón" because that is English, and the long-assimilated English name of the province is "Aragon". In English, use English. However, when we have a name that is not established in English usage, I think that we should use the actual name. E.g. the provinces of Vietnam, which for the most part are completely foreign to English speakers, and have official Latin spellings, but which are presented here in bastardized form. Just because newspapers are sloppy and drop off the diacritics is no reason for us to be sloppy too. kwami (talk) 00:28, 24 June 2008 (UTC)
See WP:UE#No established usage. Who are we to say when a word becomes established in English -- it sounds like original research. If we see what reliable English language sources use and copy those then we will be following WP:V and WP:NC. For example should we name the country Romania or Rumania or Roumania, before WWII it would probably have been Rumania, but current sources suggest Romania is most common. It may be in the future that names like Aragón become the norm in which case we can change or page name, but until then we should follow the lead in reliable English language sources. --Philip Baird Shearer (talk) 18:30, 26 June 2008 (UTC)
I also take exception, I'm most definitely not "alien" - English is my first, and only, language. I can handle alphabets of about two dozen other languages without being able to speak them, but I think most educated people in Australia can as well, as can many who are not but are exposed to them in other ways (in particular South Slavic languages which are the third largest language minority in my country after Italian and Chinese.) Orderinchaos 16:06, 24 June 2008 (UTC)
They aren't our inventions, they're other people's inventions. My (not so) humble opinion is that if the consensus of authoritative sources is wrong, Wikipedia should be wrong too. It's really the same concern as WP:TRUTH and WP:FRINGE. Somedumbyankee (talk) 01:29, 24 June 2008 (UTC)
          • Yes, they are Your inventions! You (couple users here on en.wiki) are trying to make some non-existing "law" about English language by imposing rules that does not exist in any university English grammar tutorial. --Áñtò | Ãňţõ (talk) 10:51, 28 June 2008 (UTC)
We are, after all, relying upon the authority of the best sources in English, not World Book: the New Cambridge Modern History uses Ho Chi-minh (XII, p. 325); Oxford DNB uses Ho Chi Minh (Kingsley Amis); so do our competitors. Who uses "Hồ Chí Minh"? If it is common usage in comparable sources to spell the provinces with diacritics, fine. But we should not redesign English. Septentrionalis PMAnderson 01:41, 24 June 2008 (UTC)
(I didn't say we were relying on World Book.)
Should we at least retain all diacritics if we place a name in italics as a foreign name? kwami (talk) 02:06, 24 June 2008 (UTC)
Yes, as we should represent any foreign word correctly. But we should only do so when necessary; unnecessary foreign words are showing off, like the travel writers who displayed their German by using Bahnhof where "railway station" would have done just fine. For one thing, foreign letters, even single letters with diacritics, can render as little square boxes; I can testify that the same is true of accented Greek, and therefore the FA to which I largely contributed alternates on using the smooth breathing. Septentrionalis PMAnderson 02:47, 24 June 2008 (UTC)
"are we really going to get into an edit war" - well according to WP:UE we should research the usage at every talk page and use that as a binding decision. The edit wars are the unfortunate occasional consequence of conflict with editors' opinions. Normally if one spelling is predominant then using the other would impede understanding, but in cases like the one I mention above the predominance might not give any extra clarity but still cause loss of information useful for others. Whether we can develop a guideline that eliminates this possibility and still satisfies WP:V by not using spellings undocumented in English-language texts is an open question. Knepflerle (talk) 21:53, 23 June 2008 (UTC)
In most cases it is clear what reliable English language sources indicate as common usage. That leaves two categories WP:UE#Divided usage: "When there is evenly divided usage and other guidelines do not apply, leave the article name at the latest stable version. If it is unclear whether an article's name has been stable, defer to the name used by the first major contributor after the article ceased to be a stub" and WP:UE#No established usage "...follow the conventions of the language in which the entity is most often talked about (German for German politicians, Turkish for Turkish rivers, Portuguese for Brazilian towns etc.)." --Philip Baird Shearer (talk) 19:18, 26 June 2008 (UTC)

wow. this discussion must have been going in circles for fully four years now. Without moving an inch forward in terms of reason or common sense. The only guideline we need is "check usage in English language WP:RS", end of debate. A good example of a case where diacritics are actually useful is Pāṇini (not an Italian sandwich). There are lots of English language sources that give Sanskrit terms in full IAST, no debate there. Catherine of Aragón otoh is an excellent example of what not to do. WP:RS, WP:UCS, all further debate on a case-by-case basis please. dab (𒁳) 11:49, 28 June 2008 (UTC)

Technological solution

There is a technological solution that if implemented could please all sides on the issue. Unicode normalization form NFD could be used to to decompose characters with diacritics. Then in conjunction with the UCD, diacritics (class Mn etc.) could be stripped. Enabling or disabling this setting could be added to the user preferences. Such a solution would allow those who prefer diacritics to get them, while those who dislike them can opt out at any time. Just a note: If implemented, I would suggest that it be disabled for edit screens. 124.102.8.155 (talk) 12:02, 23 June 2008 (UTC)

This has three major problems.
  • It won't work for article names, at least not for all purposes, including linking.
  • We do want some diacritics, at least in stating: "the Fooian form of the name is..." or "the Barland alphabet has thirty-five letters, including the variant forms..."
  • It has the potential for unintended side effects, like the long-established but still opposed date auto-formatting convention. Septentrionalis PMAnderson 13:28, 23 June 2008 (UTC)
In principle, I think this solution has a lot of merits, but the technical side may indeed have some problems; I don't know whether points 1 and 3 are really valid, but as far as point 2 is concerned, we could easily implement some kind of environment (like <forcediacritics> or something like that) within which all diacritics would be shown regardless of user preferences. —Nightstallion 13:27, 24 June 2008 (UTC)
I think the above analyses are correct. Comparison with date formatting is valid: diacritics are perhaps more of a presentation than a content issue. However, there isn't an easy solution that would work right, and the one that would work right would necessarily involve some kind of additional tagging. Let's say that {{lang|vi|Ngô Đình Diệm}} (nota bene, this template already exists!) would render the name verbatim (as it does now), but Ngô Đình Diệm by itself would automatically be displayed as Ngo Dinh Diem. Not too alluring, perhaps, but definitely possible. GregorB (talk) 14:35, 26 June 2008 (UTC)

A joke :)

Application of this rule would probably result in Meissen, but Göttingen; Tudjman and (?)Goering but Dvořák; Lech Wałęsa but Stanislaw Ulam and George Frideric Handel; Munich and Tokyo but Zürich.

This is a joke :) --millosh (talk (meta:)) 18:04, 24 June 2008 (UTC)

To be more precise: This is a very nice example of prescriptive madness and I'll use it in my linguistic works :) --millosh (talk (meta:)) 18:12, 24 June 2008 (UTC)

Erm, this is confusing English names with de-diacritised native names, e.g. "Munich" is the English name for München, not a simply a de-diacritised version. - Francis Tyers · 18:20, 24 June 2008 (UTC)
It would perhaps be a good idea to make a side-by-side table: original spelling / current spelling in the Wikipedia article title / WP:UE spelling / proposed new guideline spelling, with a couple of examples such as these. Rationales for individual cases could also be added. Might make everything a bit clearer (if not easier...). GregorB (talk) 19:04, 24 June 2008 (UTC)


Anything that is not a simple case of diacritics vs. diacritic dropping is covered by WP:UE, not by this proposal. The point is that UE is not about omitting or keeping diacritics. As I have said before, and Francis Tyers reaffirmed, a name with diacritics omitted and an English name are two different things. BalkanFever 01:24, 25 June 2008 (UTC)

Yes, it is; please read WP:UE#modified letters; more to the point, no sufficient reason has been given why it should treat diacritics any differently than any other difference between English and a foreign languiage. The English for Meißen is Meissen, according to descriptive linguistics; that's what English-speakers call the city. Septentrionalis PMAnderson 02:36, 25 June 2008 (UTC)
Actually, substitution ß->ss is usual in modern German, too. --millosh (talk (meta:)) 16:10, 25 June 2008 (UTC)
Only in Switzerland -- everywhere else it's wrong, wrong, wrong. So wrong, in fact, that only recently the Unicode Consortium was convinced to add a capital ß to Unicode. —Nightstallion 18:40, 25 June 2008 (UTC)

Diacritics infact necessary different letters in some languages

Hi, I would again like to point out a certain thing in Swedish, Finnish, Norwegian, Danish eg. alphabets. There are letters like Å, Ö, Ä, Ø there. The fact is, they certainly can't be considered just "accented versions" of the other ones. That is like implying Q is an accented version of O because it just has an additional dash there. Their appearance has nothing to do with the way they are pronounced.

Swedish perhaps provides the best example. If a word has both, Å and Ä, they both would be rendered as A here. Very wrong. They are completely different letters which's pronounciation is different. Ä is more like "/ee/" and Å "/oo/".

The article Kimi Räikkönen was some time ago moved to Kimi Raikkonen. I opposed this move with the fact his official name in all papers is Räikkönen. You can't change other person's name here in Wikipedia if his name is other in all legal documents. Again the fact they are different letters, Räikkönen is a last name of 936 persons and Raikkonen of 16 persons ([4]). So it is a different lastname, you can't change it.

I'm sure the problem exists in other languages as well, since in Vietnamese d and đ are different letters as well, but I'm most familiar with these languages I brought examples of. Fully supporting the usage of "diacritics" and proper rendering of names as we have Unicode and redirects here. --Pudeo 12:07, 25 June 2008 (UTC)

The Finnish and Swedish Wikipedias should of course differentiate as they in fact do; but we are the English Wikipedia. We should, in such cases, include the foreign spelling as information, and differentiate when reliable English sources find it necessary to do so. (Quite often they do: Åland Islands is the conventional spelling.)
Nor are Scandinavian languages alone in this; the Os in Orion represent ω; the O in Odysseus represents ο: different letters, with different sounds, in Greek. But English does not distinguish; we, the English Wikipedia, need not, and should not. Septentrionalis PMAnderson 12:38, 25 June 2008 (UTC)


  • en.wiki is wikpedia in English not anglophone POV wikipedia. Are you able at all to distinguish those two phrase??
  • No, different wikipedias should not be different. Unfortunately , lot of artcicles (related to the politics/history) are de facto POV of certain nations. But we should make effort to eliminate them.

--Anto (talk) 15:21, 25 June 2008 (UTC)

Yes, Septentrionalis, your comments on Orion and Odysseus are completely off the mark: these are completely assimilated English names. Of course we don't and shouldn't use diacritics, except in their etymology. Personal names which are not assimilated into English are an entirely different matter. It's like the difference between writing an English-derived word in kana or romāji in Japanese. I don't understand why people insist on confusing these concepts. kwami (talk) 18:53, 25 June 2008 (UTC)

So are many tennis players. What evidence can there be that a name is fully assimilated, and perhaps altered in the process, but usage? (I pass by, as inconsequential, the detail that the assimilated form of the second name is Ulysses; Odysseus is a nineteenth-century learned correction.) Septentrionalis PMAnderson 15:40, 26 June 2008 (UTC)

Indeed this is the English Wikipedia, but not every name in the world is in English. That's why different letters are used as well (of the Latin alphabet) because there simply can't be any substitutive letters. --Pudeo 19:30, 25 June 2008 (UTC)

Most names, including many with diacritics, are spelled the same in English as in the original language. Some are not. Whether a given name is is a question of fact; the way to answer it is to look at what English does with the name, not at which letters are involved (as this proposal would do). Septentrionalis PMAnderson 15:52, 26 June 2008 (UTC)
A personal name is a fixed thing you get at birth. If it has diacritics, they may be lost through emigration in a country ignoring them as in the case of the current French president. The name’s owner may change it her/himself, like taking an artistic name or a pseudonym for public appearance. In all other cases, i.e. nearly all, a personal name is neither alterable nor translatable.
About sources, do not forget that, although Unicode has been technically available here for over a decade now, keyboard drivers have not followed (e.g. many of us have a key for an acute accent but it works only on vowels). I do not think that Wikipedia should reproduce the sloppiness of others who could not render (or did not bother rendering) a personal name in its original form. I disagree as well with the idea that because people may have been used to see a name without diacritics, they should be served that form. It is like putting up wrong beliefs just because they have been frequently quoted. REDIRECTs are definitely needed but the title page should be simply accurate.
The issue may be different with loanwords and places - this discussion might be easier if split in 3 parts... Clpda (talk) 17:03, 26 June 2008 (UTC)
Clpda, do you have any evidence that stripping accent marks is "sloppiness", if so how do you explain the process of anglicisation of words like hotel or should English speaking people still write "hôtel" because they are sloppy? Usage for whatever reason governs English, and if the majority of reliable English language sources strip the accent marks off a word then we should follow their example (see WP:V, WP:OR and WP:NC)
Names are just the same. Napoleon Bonaparte usually written that way in English. The name is not usually written in English as it is in French "Napoléon Bonaparte". Even Encarta strips the diacritic something they do not do for Lech Wałęsa even though most reliable sources do. I do not think that your position is credible if we are to keep within Wikipedia content and naming policies, which is to use what most reliable English language sources use. --Philip Baird Shearer (talk) 18:15, 26 June 2008 (UTC)
I explicitly restricted my comment to personal names, so your example of 'hôtel' is not in my line of discussion. I'm perfectly fine with 'hotel'. By the way, I'm not sure that any other language having imported 'hôtel' from French has taken the circumflex with it, even the languages which, contrary to English, are used to diacritics.
I disagree with 'names are just the same'. However, I fully admit that the name of historical people such as Napoléon could be rendered without its diacritic(s). Other names that were debated above on this page, such as the one of an author of Azerbaijan, are not (yet) historical, whatever his fame within the English speaking world, and should be kept in its original spelling. I understand that drawing the line may be occasionally difficult (what is 'historical' enough?) but the discussion would then be left to their individual pages. If a consensus can be reached for over 95% of the pages concerned, that's already a good result... Clpda (talk) 19:05, 26 June 2008 (UTC)
What's the line between historical and not historical? Is it the point at which a conventional spelling (which may be either with diacritics or without) becomes most common in English? If not, what is it? and how do we determine it without original research? Septentrionalis PMAnderson 19:20, 26 June 2008 (UTC)
When they have have been adopted or naturalized to English in historical texts. Almost all languages have their own variants for European monarchs. That's okay, but you can't change Formula One World Champion's name without his permission. :-). Hotel is an English word, with French origin. It is fully adopted, thus naturally acceptable without the diacritic as it has been. Not all words are adopted, like Norse mythology Óðr. Then I don't see any point in crippling the word trying to be "English" using the classical Roman alphabet. --Pudeo 19:30, 26 June 2008 (UTC)
[Adapting the name of a monarch to one's own language is] okay, but you can't change Formula One World Champion's name without his permission. Says who? On the contrary, we anglophones do both all the time; we always have. (Not Formula One, of course; but the spelling of foreign jousters was much more erratic.) You may prefer more moral languages, in which spelling is regulated by government edict; you are free to do so. If so, do leave us in our sloth and heathen folly. Septentrionalis PMAnderson 19:44, 26 June 2008 (UTC)

I absolutely agree with Pudeo. —Nightstallion 19:40, 26 June 2008 (UTC)

Then go ahead and establish MoralWiki, where you can impose any commandment that seems good to you. We have a policy to write in English, and a preference for communicating with our readers. Septentrionalis PMAnderson 19:48, 26 June 2008 (UTC)

OH My ! God! Aren't all these (more then 2 million articles) written in English??? Have you been thinking they were in Hungarian???

Perhaps we should change this article about English language:


From this

Regulated by: no official regulation


into this:

Regulated by: User:Pmanderson on wikipedia

LOL

--Anto (talk) 19:50, 27 June 2008 (UTC)

Anderson, your conceit is getting annoying. If you feel you need to insult or make fun of other editors' opinions, I must assume you don't feel your own opinions can stand on their merits. Your habit of repeatedly and evidently purposefully misrepresenting others' arguments is also less than impressive; again, it appears you are unable to address the issue at hand. The more you write, the better you make your opponents look, even when I don't agree with them. kwami (talk) 20:08, 26 June 2008 (UTC)
I quoted an argument and responded to it; I did not knowingly distort it. If I have, please explain. If Pudeo is not asserting a moral imperative to write most current persons with their birth name, whether it is ever so used in English, I do not understand his position at all. If I do understand it, I see no basis for its binding force.
That position would, it seems to me, require a nineteenth century WP to use Napoleón; it would require us to use Franjo Tuđman now. The first is an idiom violation; the second is contrary to the explicit wording of this proposal. What are the three of you defending? There is certainly no consensus to always use diacritics; this is, in its way, a compromise proposal.
I await information. Septentrionalis PMAnderson 01:21, 27 June 2008 (UTC)
What I meant is: there's Charles XIV John of Sweden, although in Swedish it is Karl Johan, Henry IV of France although in French it is Henri. Okay, it's so in almost all languages due to acceptable historical texts. However, you are not allowed to change my name for example without my consent. It is what it is in legal documents. Kimi Räikkönen is Räikkönen, and in fact removing äö results in a different last name used by 16 people! (see earlier link to Name Register Centre). This is an encyclopedia: while removing diacritics improves nothing (we have redirects), removing them erases the only proper way to call them and factuality. Welcome to Unicode age and an encyclopedia that covers the subjects of the whole world. --Pudeo 11:49, 27 June 2008 (UTC)
(Warning, this reply is silly!) "Self-identifying usage" is actually the second criterion used in WP:NCGN when a common English usage cannot be determined. It's definitely a useful tiebreaker, but a common English usage, if it exists, is always preferred as res judicata. A fortiori, Many English language publications will leave off the diacritics simply because English basically doesn't use any, so an anglicization will automatically leave them off without being malum in se. Excessive use of foreign words or forms is frowned upon as pretentious and silly, and I would give a lawyer example but res ipsa loquitur. Somedumbyankee (talk) 15:00, 27 June 2008 (UTC)
The legal distinction here seems particularly weak, because it is as true of the kings as of tennis players. Didn't Bernadotte sign legislation Karl Johan? Then, if this argument were binding, we would have to call the article that, which hardly anyone else does.

Septentrionalis PMAnderson 20:21, 27 June 2008 (UTC)

As for the main comment there's Charles XIV John of Sweden, although in Swedish it is Karl Johan, Henry IV of France although in French it is Henri. Okay, it's so in almost all languages due to acceptable historical texts. Yes, of course; and it is also true for non-royals, and people within their lifetime. Henry Fuseli was so spelled in his own time, and still is; more recent examples are Stanislaw Ulam and Waldemar Matuska; Novak Djokovic is so spelled by the Britannica Book of the Year, which is as close to a historical text as he is likely to get. Septentrionalis PMAnderson 21:18, 27 June 2008 (UTC)

We're not speaking about the names of persons, whose names were adjusted to the respective languages. Adjusting the names of kings, bishops, popes, patriarchs etc. is the thing that's being done in other languages also.
We're speaking about the names of persons that do not belong to that category.
I don't know for you, but on Croatian Wikipedia, we use redirects to proper form of original names - in short: we practice what we preach.
Noone from us here on en.wiki knows the grammar and ortography of all possible languages, and how is written someone's name properly.
E.g.: Lech Walesa, Sissel Kirkjebo, Voros Lobogo, Szekesfehervar, Tirgu Mures, Constanta, Sibenik, Besancon, Guimaraes, Rascane, Kizilkoy, Citroen, Hotel des Invalides, Leixoes, Uniao Leiria.
But we have redirects (what is the purpose of redirects, if not for this?), so we can know the proper name. Proper, correct information. Isn't that one of necessary conditions of Wikipedia?
Insisting on not using the diacritics is insisting on illiteracy. Kubura (talk) 07:24, 1 July 2008 (UTC)

If foreign usage is the rule editors on the Croatian Wikipedia have agreed to all well and good, it may well be what people do when writing Croatian (you tell me), but it is not what people do when writing English, and our Wikipedia policies do not support the use of foreign names that are commonly spelt a different way in English. BTW I note that the article hr:Kristofor Kolumbo is not under hr:Cristoforo Colombo so you had better change it if ordinary people are under their foreign spellings on Croatian Wikipedia, or do you make allowances for common usage as well over at Croatian Wikipedia? --Philip Baird Shearer (talk) 13:36, 1 July 2008 (UTC)
Precisely. Also hr:Luj August., regent Francuske, with its odd punctuation, which could also profit from collation with the French or English Wikipedia; this is the Duke of Maine, whose regency was not confirmed; if it existed, it lasted for twenty-four hours. Septentrionalis PMAnderson 13:53, 1 July 2008 (UTC)

Further action?

There is a real question here, but this entire page has become a ridiculous assortment of WP:TRUTH and WP:IDHT. "I don't agree with that source, therefore it must be unreliable" seems to be a lot of the discussion I've been having. Neither side is going to cave, and the only time that there will be an agreement is if one side decides it's just not worth it and gives up (clearly a false consensus). I'm tired of arguing with a brick wall, and I think at this point this proposal needs professional help. A request for comment is usually the first step before seeking outside assistance, so let's try that. Somedumbyankee (talk) 21:16, 26 June 2008 (UTC)

Support. kwami (talk) 21:28, 26 June 2008 (UTC)
Is this RFCpolicy or RFCstyle? Somedumbyankee (talk) 21:56, 26 June 2008 (UTC)
This would revise the placement of a good many articles if enacted; that would seem to be policy. Septentrionalis PMAnderson 01:26, 27 June 2008 (UTC)

Formal Request for Comment

Template:RFCpolicy

Summary of the discussion

Since the argument seems to continue going round in the same eternal circles, let's try to summarise the two main positions (this is my attempt, with the disclaimer that I support the proposal, so someone opposing it should probably rewrite/expand the arguments against):

For the proposed guideline
  1. We have the technology to use diacritics, so we should use them where they add information
  2. WP already uses diacritics far more widely than the majority of English sources do, so to a large extent the proposal documents current practice (although this may not apply to non-diacritic extended characters, which are used much more widely than the proposal would imply)
  3. Readers know that foreign diacritics are optional in English, so are not misled by seeing them
  4. In the vast majority of cases, the spelling of a foreign name with the original diacritics is acceptable in good English, and does not make it less recognisable to readers used to seeing it without diacritics
  5. Although the use of diacritic-less forms is also acceptable in English, use of such forms in WP means subtracting information and thus making the encyclopedia worse for no particular gain
  6. Special cases where the above arguments do not apply are by and large handled through the exceptions stated in the proposal
  7. Implementing a policy of "doing what reliable sources do", though superfically attractive, leaves open the major question of what sources are to be considered reliable in this matter, what to do where sources clash over a particular name, and how to handle situations where such policy leads to a confusing clash of styles between different names
Against the proposed guideline
  1. There is already a naming convention guideline that covers this issue called Wikipedia:Naming conventions (use English) (WP:UE) which depends on current Wikipedia policies (and is in harmony with them) to guide on the choice of name to be used and whether it is appropriate to use or not use diacritics.
  2. Adopting hard rules about diacritics leads to forms which are against English idiom, and breaches at least two Wikipedia policies WP:NC "article naming should prefer what the greatest number of English speakers would most easily recognize, ..." and WP:V "Articles should rely on reliable, third-party published sources with a reputation for fact-checking and accuracy."
  3. Use of diacritics where the majority of English sources do not use them is likely to mislead readers into thinking that they are most commonly used in English
  4. Use of diacritics when they are never used in English is not writing in English (This would be equally true of extensions which are always used in English, and which this convention would exclude, but there are fewer of those.) This is the English Wikipedia.
  5. We get into trouble when a foreign name contains both diacritics (which the proposal says should remain) and extended characters (which the proposal says should be transcribed)
  6. Common usage is usually easy to determine if editors are willing to look at reliable third party sources using good faith. Where it is not WP:UE describes how to minimise conflicts: see the sections WP:UE#Divided usage and WP:UE#No established usage.
  7. Most cases where the use of diacritics is reviewed by a large pool of uninvolved editors either move them or provide clear evidence that the diacritics are common in English usage.
  8. Many cases are in fact decided by the standard of common usage; where they are not, they are non-consensus due to a plea that the diacritics are "correct" in some other language, which would also oppose this proposal.
  9. Just as many differences between national varieties of English, can distract from the information contained in the article for people familiar with another dialect (and hence our rules in the MOS on national varieties of Englis)— The evidence for this is the number of times English is "fixed" by altering spelling and grammar in articles from one varieties of English to another and by requested moves for the articles such as Orange (colour) — so too diacritics on words that do not usually have them in English, or no diacritics on words that usually have them can also be distracting (evidence of this can be seen with requested moves for articles such as Zürich). Using reliable third party sources to indicate common usage, reduces the annoyance factor of the "wrong" spelling for the largest number of people.
It's the last of the "against" arguments I find common and faintly ridiculous. There is no evidence whatsoever of "distraction" being caused by the presence of other national forms - people may change between forms from ignorance, habit or any other reason - and likewise for diacritics. Our rules are there to keep the number of changes down no matter what the reason - and there is absolutely nothing in policy about "distraction", being a completely subjective and wildly variable matter between readers and not even a simple function of relative usage.
Comprehensibility is what matters, and any argument which claims understanding (or "annoyance", or "distraction") is a binary-valued quantity based on a simple majority vote on usage is obviously over-simplistic - just look at the example of Schröder given above. The same goes for Zurich/Zürich - English speakers come across both spellings a reasonable amount of the time without massive misunderstanding or incomprehension, just as we assume they are familiar with both color and colour - and that's a nearly 80/20 split on the internet. Saying that merely because a spelling is used by less than 49% of the sites on Google it is suddenly more annoying or incomprehensible to more people is unfounded, unlikely and simplistic. When picking a spelling, if comprehensibility to all is not at risk, then we are justified in starting to look at whether one of the spellings has other possible extra benefits, such as diacritics. Our overriding aim is that we don't use spellings not well-established in English - any talk of majorities is an over-restrictive blunt tool, and talk of "annoyance" and "distraction" are obscurantist on the par of "I like diacritics cos they look exotic" or "it's distracting not to have diacritics cos it's not the right spelling in Foo-ian". The third "against" argument is also equally speculative for much the same reasons. Knepflerle (talk) 16:59, 27 June 2008 (UTC)
I think the main sticking point here for me is whether ć is the same as c is the same as ç. I "see" (pun intended) them as three different letters, and so I would avoid using ć the same way I would avoid using π in an article title for an English encyclopedia, probably because in the Spanish ("Castellano" as I learned it) alphabet n and ñ are different letters, not one letter and a modified version (the "tilde" isn't really a diacritic at all). There are cases where using the non-English character is the sensible thing to do (q.v. El Niño), but it should be avoided when there is an obvious and widely-used alternative.
I agree that "majority use" is a blunt instrument, but "fringe use" (q.v. Milosevic, see my comments above) is easier to identify and should not be encouraged. Somedumbyankee (talk) 17:31, 27 June 2008 (UTC)
If that's a "sticking point", you're focusing on the wrong issues. It is neither here nor there whether these are diacritics or separate letters in a different language - what matters is whether, in a particular word, they have a detrimental effect on comprehensibility to English-speakers or not. Needless to say there is a world of difference between the comprehensiblity of ç and that of Greek/Cyrillic letters such as π. But for Göttingen, Schröder, Zürich, François Mitterrand, Dvořák, ångström and El Niño the loss to comprehensibility is practically zero because these forms are seen with quite decent (if not >50%) regularity in English language texts. Knepflerle (talk) 23:44, 27 June 2008 (UTC)
"Focusing on the wrong issues" is a bit misleading. I'm focusing on different issues, which is probably why we don't agree. My point is mostly that ć isn't an English letter any more than ñ or θ are, and the English Wikipedia shouldn't be using foreign spellings when most reliable sources aren't using them. When reliable sources do, I see no problem using them, but the reality is that this policy would force us to use them when our sources do not. Verifiability, not truth is the "wrong" issue that bothers me. Somedumbyankee (talk) 00:14, 28 June 2008 (UTC)
The current proposed guideline does not necessarily require any English-language sources use what you call the "foreign spelling"*, and for this reason I am not in favour of it. Verifiability requires that some reliable English-language sources use the "foreign spelling". Comprehensibility requires that a decent proportion of reliable English-language sources use the "foreign spelling". Our current guideline requires a majority use the "foreign spelling". The middle path which I favour, is that we could (but of course do not have to) accept a spelling as long as it is comprehensible, and this is a stricter requirement than and implies verifiability. It is, however, a slight slackening of the current UE which requires simple majority and which means occasionally we may be not using more information-rich spelling which is just as comprehensible to English-speakers. (* PS: I contend that a spelling which is used in a decent proportion of English-language texts is as English as any other, no matter perceived "foreignness" - what is more English than what a decent amount of English speakers write? Café is certainly English usage, and Zalaegerszeg and Pilisszentkereszt are just as foreign as Hódmezővásárhely!) Knepflerle (talk) 00:51, 28 June 2008 (UTC)
To quote the current policy, "If there is a consensus on spelling in the sources used for the article, this will normally represent a consensus of English usage." That policy isn't looking for a majority, it's looking for a consensus of how reliable sources spell it. It defers judgment to people who know better than Wikipedia editors instead of instruction creep that "fixes" "incorrect usage" by "half-literate" people like the UN.
For the record, café is really more of a foreign branding, intentionally taken raw from the French to sound foreign and gastronomically appealing. It's not really an English spelling, it's a foreign word routinely used in English. Then again, that accounts for the majority of the language depending on how you slice it. Somedumbyankee (talk) 01:30, 28 June 2008 (UTC)

Café is not an English word, just "foreign branding" because of your unfounded and irrelevant speculations on why it entered English use? Or just a case that no true English word would look that foreign? "That policy isn't looking for a majority" - just read the first line: "Use the most commonly used English version of the name of the subject" Knepflerle (talk) 11:24, 28 June 2008 (UTC)

This is more of an WP:ENGVAR issue. Café is far more natural in British English; it would be more speculation to point out that the British spend the most time in France of any English-speaking nation. But even in Britain the frequency of café may well be declining, and using the accent has been controverted at WT:MOS. (The discussion may well have been archived by now.)
WP:UE has never been held to mean a 51% majority, except in cases driven by some other motive; wording to clarify this, if the section on Divided Usage does not, would be welcome. Septentrionalis PMAnderson 18:49, 28 June 2008 (UTC)

Couple of my comments

  • Point number 1-absolutely base -for personal names (excluding royalty ! ) !

Point number 1 in this proposal shoul reflect to all personal names in latin script! Unless it opposes with the point number 5

  • Point number 3 is nonsense

there are no national conventions for transliteration! So, there is no convention neither for transliterrration of letters Like "Đ, đ". Only convention that exists for tranliterration is in German language : transliteration of Ä Ö Ü and ß into AE, OE ,UE and SS where the usage of these characters is disabled due the software problems( URLs, e-mail addresses... ). I remind you that wikipedia has no such a problem! What is the difference between Tuđman and Ngô Đình Diệm ??

  • I support point 5. ( " # When a person has changed his or her name (for example, in the process of naturalization as a citizen of another country), the new form of the name is used." ) This applies only for legal documents (passport etc. not football fan member cards !! )

So ,these rules would apply to the Edward George de Valera who became Éamon de Valera , Wilhelm Oberdank who became Guglielmo Oberdan , Ivan Vučetić who became Juan Vucetich.

I will add these 3 examples to the proposal to the point 5 . --Áñtò | Ãňţõ (talk) 20:28, 27 June 2008 (UTC) 20:12, 27 June 2008 (UTC)

Of course there is "evidence of distraction". It is strongest for ß and đ, where editors have said so; see Talk:Meissen and its archives. But it is clear that TennisExpert finds even single letter variants from the names he is accustomed to distracting; so do I, although I think less so. Septentrionalis PMAnderson 20:25, 27 June 2008 (UTC)
If that will make things easier we can make always :REDIRECTIONS. i.e . Tudman to Tuđman, Dokovic to Đoković. perhaps a little tutorial for English monoglots that will be explaining them that "There are some things that ar not written in English. There are some names written by strange letter! etc. .... " There are laways "copy -paste " methods. --Áñtò | Ãňţõ (talk) 20:37, 27 June 2008 (UTC)
Not only are diacritics visually distracting, but they're a barrier to editing on English-language Wikipedia, as I have explained before. Those are important considerations, as is preserving the fundamental principle that this is an English-language encyclopedia and everything in it should be based on reliable English-language sources and not on original research, personal opinions, or emotional appeals to nationalism. Tennis expert (talk) 20:50, 27 June 2008 (UTC)
Personally, I'd never expect of anyone to enter any diacritics while editing. (This is an important issue, and we've missed it in the discussion.) As for "emotional appeals to nationalism", well: turnabout is fair play. GregorB (talk) 21:08, 27 June 2008 (UTC)
@Tennis expert . If you are not familiar with diacritics then don't use them. You can write an article without them and later native speakers might help you with name spelling. And problem solved. --Áñtò | Ãňţõ (talk) 10:47, 28 June 2008 (UTC)
  • Comment I generally support diacritics, but agree with con #4 "when a foreign name contains both diacritics ... and extended characters". If we're going to use one, we should use both.
As an encyclopedia, I think we should retain as much reference information as possible. Newspapers and magazines may not care, since the individual will be disambiguated by being topical at the time of coverage, but that is not the case for us.
Also, please let's not get into the straw man of common English words (we're discussing proper names), or the debate about whether to use an English vs. foreign form for a proper name (this would only apply once it's decided to use the foreign form). kwami (talk) 20:33, 27 June 2008 (UTC)
This proposal proposes that we include misinformation; less than some users would like, I admit, but more than we should. Septentrionalis PMAnderson 20:36, 27 June 2008 (UTC)
What are the misinformation in this proposal??? Explain please!!! --Áñtò | Ãňţõ (talk) 20:46, 27 June 2008 (UTC)
Yes, I am also curious as to what "misinformation" you're talking about. I thought this debate was merely a matter of opinion, not of fact.
One other comment. Using an encyclopedia is part of a person's education. By removing diacritics in order to dumb down the encyclopedia (because they're "distracting", "unfamiliar", "difficult", etc.—in other words, because our readers aren't educated or intelligent enough to handle them), we're not doing our readers any favors. Of course, removing diacritics because the names have been assimilated into English is an entirely different matter. kwami (talk) 21:02, 27 June 2008 (UTC)
Con point 3: Use of diacritics where the majority of English sources do not use them is likely to mislead readers into thinking that they are most commonly used in English. That will do as a summary, but it is too weak. It does mislead, and in some cases (Tudjman is the most obvious, but I'm sure there are others) it misinforms about what is ever used in English (excluding the hopeless pedant and the terminally illiterate). Septentrionalis PMAnderson 21:10, 27 June 2008 (UTC)
Such cases are indeed genuinely dumbing Wikipedia down, where following common usage would not. Septentrionalis PMAnderson 21:10, 27 June 2008 (UTC)
That's hardly "misinformation". All we would need to say is "commonly spelled 'Tudjman'", unless of course in that case we decide "Tudjman" is an assimilated English name and go with 'Use English'. kwami (talk) 21:31, 27 June 2008 (UTC)
Try and include "commonly spelled 'Tudjman'" in the article, and see what happens. But of course "Tudjman" is an assimilated English name; that doesn't stop our nationalists from arguing about it. (Some of these tennis players seem to have had their names assimilated to English, or at least Western European, usage, also; but that is a question of fact, to be decided by evidence.) Septentrionalis PMAnderson 21:36, 27 June 2008 (UTC)
Part of the problem is that it seems like many of the comments seem to want to reject the entire WP:UE guideline. I'm hearing many of the comments on this page as "modern assimilated English names are not English usage they're just misspellings and using them insults the reader." If there is a reasonable consensus among many prominent and reliable English language sources on how to spell a name, I would take that as clear evidence that there is an assimilated English name and it should be treated the same way as Munich or Napoleon or Venice. If editors really have a problem with the way that established English usage handles a word, they should bring it up with the reliable sources and not try to "correct the world" through Wikipedia. Somedumbyankee (talk) 22:40, 27 June 2008 (UTC)

Pointing just few flaws in that changing ö->oe ä->ae in my native Finnish. Former Prime minister Anneli Jäätteenmäki, huh Jaeaetteenmaeki? Or what about the following sentence, Mosquitos on Lake Onega's ice: Saeaeskiae Aeaenisjaerven jaeaellae. Great.. When removing "diacritics" only, Willow Tit fi:Hömötiainen turns into Homo Tit. It just isn't acceptable to cripple the letters. As I see it, all WikiProjects of countries which use diacritics should quit as I don't see any point in creating articles on names they are incorrect. Atleast I wouldn't create / edit any. --Pudeo 23:38, 27 June 2008 (UTC)

The name may be incorrect in Finnish, but it a name is spelt a different way to the Finnish in the majority of reliable English language sources, then the spelling is not incorrect in English. "Napoleon Bonaparte" is to an incorrect English spelling of Napoléon Bonaparte. For that matter neither is Wikipédia an incorrect spelling in French. --Philip Baird Shearer (talk) 12:43, 28 June 2008 (UTC)


Again : I have told this to PMAnderson and I will tell to you as well:Some thing can be "correct" if it is against the system , against the rules! And so far:
  • there are no rules in English language for spelling of foreign names.
  • There are no rules for transliterration of the names from the Latin script.
  • If there are such rules provide me some sources. Those sources can be only English grammar/ortography books from some anglistic studies at some university. They are only experts for English grammar /ortography rules (d . Not NYT, CNN , BBC neither Playboy, Cosmopolitan or FHM because they are not in charge for regulating of English language.--Áñtò | Ãňţõ (talk) 15:53, 28 June 2008 (UTC)
There are no rules for English but usage. Anto does not know this; then again, he misspells "orthography" consistently - ortography is not usage. Enough. Septentrionalis PMAnderson 18:41, 28 June 2008 (UTC)
Point 1. your insinuations about my spelling are pathetic! (A Bushmen is making jokes with Chinese how short he is)Considering the fact you are monoglot and unproven "expert" for any issue. who wants to show himself great "expert". So far-Mission failed!! Point 2. ortography or orthography -whatever.Find me that manual , Mr. Big Expert! I am waiting entire month! --Áñtò | Ãňţõ (talk) 07:32, 29 June 2008 (UTC)
WP:NPA.--Prosfilaes (talk) 12:36, 1 July 2008 (UTC)
For me it is not important which solution will win, but it is only important that we will have 1 rule for all languages. If there will be vote for any universal solution please "call me"--Rjecina (talk) 20:54, 28 June 2008 (UTC)
There shouldn't be one rule for all languages; each language handles things differently.--Prosfilaes (talk) 12:36, 1 July 2008 (UTC)

Who uses English and how ?

If one looks at the user pages of the participants to this debate, there is clearly a gap between a majority of the native English speakers, who live their language and (probably and understandably) want WP to keep it as pure as possible (despite the fact that its purity is already seriously jeopardized by dialectal differences across all continents), and a majority of the non-native English speakers, considering English as a lingua franca and expecting parts of their own culture be absorbed in that globalized language. Diacritics and special Latin characters (called 'extensions' above) are quite symbolic in this respect. Although thousands of editors develop other linguistic versions of WP (by making original articles or translating existing ones), the English WP is likely to remain the largest and most comprehensive one for many years (also thank to the fact that non-native English speakers contribute!) and I would not be surprised at all if the number of non-native English speaking users exceeds the number of native English speaking users (even by consultation volume, not just the number of people). Another posting on this page (about a 'Wikipedia in English' against an 'Anglophone Wikipedia', from --Anto) already evoked this issue. English use has already escaped its standards at several occasions, in geographical variants leading to dialects and creoles. It is now facing its next evolution through globalization (WP has no responsibility in that!) where, beside new words – as usual – it is now loaning more than just words. Diacritics and special Latin characters are parts of this process. I'm afraid that native English speakers will have to admit that their language do not any more belong to them only. This is the price to pay for world dominance (only on a linguistic point of view, of course). Inputs from all around the world (including diacritics and special Latin characters) will enrich it and should be seen as a positive thing. 85.3.21.150 (talk) 23:15, 27 June 2008 (UTC)

That was a thoughtful and rational essay, anon, and you may very well be right about where we are heading. But today, in the here and now, English is still extant as the language of a minority of nations. And it is not the place (and is against the policies) of Wikipedia to lead the charge in the direction you see the world going. Wikipedia follows. If your predictions come true, and English becomes globalized and diacritics become de rigueur (yes, I recognize the irony in my use of a French phrase), then WP:UE will force these new usages upon en.wiki. That's the beauty of WP:UE (for those who are not trying to use it to push an agenda). It says that we use what is recognized as standard. So, if in 20 years, the Washington Post and the Economist and most other English sources are applying these markings and using ßs, then so too will en.wikipedia. And some may be surprised to find ust how accepting we curmudgeons will be than is expected. We're not all a bunch of xenophobes and bigots, you know. We just want the rules applied as they are written, not twisted into something the opposite of what their intent was. Unschool (talk) 23:46, 27 June 2008 (UTC)

(ec; and I agree with Unschool's favorable impression) I would support a Pidgin Wikipedia for those who feel that they want an international standard of their own invention; but what are we anglophones going to use? Every other language on the planet has a Wikipedia which is intelligible to them; English should too. Septentrionalis PMAnderson 23:50, 27 June 2008 (UTC)

Indeed, and if you look at article like Kimi Räikkönen and its interwiki links, every single Wikipedia uses diacritics in the title, no problem there. Why do you want English Wikipedia to differ? No bigotry here because of English language's dominant global status, no? :) --Pudeo 23:55, 27 June 2008 (UTC)
The very first interwiki link goes to the Arabic wiki, which puts the name in the Arabic alphabet. Or how about Latvian (Kimi Raikonens) or Basque (Kimi Raikkonen) or Sudanese (Kimi Raikkonen). But hey, evidence just gets in the way of making your point, doesn't it.--Prosfilaes (talk) 01:42, 28 June 2008 (UTC)
The only reason because they use that is that they don't know the correct spelling! Neither majority of Croatian people knows his exact name spelling. But , we put the original form on hr.wiki I guess some Finnish guy would correct it but very little of Finnish people speak Basque , Latvian or Sudanese so they can not argue there . Mentioning Arabic wikipedia is meaningless because they don't use Latin script!!!!--Áñtò | Ãňţõ (talk) 11:05, 28 June 2008 (UTC)
When your logic isn't winning, pound on the evidence; when evidence isn't winning, pound on the logic; when both aren't winning, pound on the desk. Frankly, I find the assumption of incompetence to be rude.--Prosfilaes (talk) 12:34, 1 July 2008 (UTC)
English not using any (with about three or four minor exceptions) native diacritics is sort of unusual, so it's really only fair to compare other languages that don't use them for their own words. It's also possible that they copied the English version, which is a bit of a flagship for the wikipedia project. Compare fi:Napoleon I, which leaves off a diacritic in a language that we know uses them... Somedumbyankee (talk) 01:54, 28 June 2008 (UTC)

A call for federalist tolerance

Some parts of this debate seem, to me, to be open to consensus. Others do not. Areas where I think consensus is possible include:

  1. The use of diacritics and extensions when they are commonly used by publications that have neither a blanket "no diacritics or extensions" policy nor a blanket "always diacritics and extensions" policy. This would recommend "Piña colada" rather than "Pina colada".
  2. The non-use of diacritics and extensions when they are not commonly used by publications that have neither a blanket "no diacritics or extensions" policy nor a blanket "always diacritics and extensions" policy. This would recommend "George Frideric Handel" rather than "George Frideric Händel".
  3. Sources that always use diacritics and extensions or never use diacritics and extensions are not helpful--this tells us about the source's convention rather than common educated usage. Thus a source dealing with German subjects that left out umlauts entirely, replaced them wholesale with <vowel> + e, or always used the native spelling, when other sources write "Hermann Goering", "Gerhard Schröder", and "Rudolf Hess", would carry no weight as to what usage WP should adopt.
  4. Articles that do not use diacritics or extensions should indicate at the top of the article the native spelling, and articles that use the native spelling with characters that are unlikely to be understood, such as ß or Ə, should similarly provide an English-characters-only alternative at the top.

These are ways of applying the parts of WP:UE that are generally accepted.

Problems that prevent complete consensus are:

  1. Sometimes, common usage, educated or not, does not exist, because the word is not commonly used in English. This seems to be one of the two principal sticking points. Very few English speakers have heard of the Polish town of Borek Strzeliński (conveniently, there's a German name for it, too: Großburg), so WP:UE is silent on it.
  2. Divided usage is the other biggest sticking point. Z(u/ü)rich, for example.

The debate on these two items--no common usage and divided usage--has raged on and off for half of Wikipedia's lifetime now, and no consensus has emerged. People are getting angry and defensive and dismissive and sarcastic, and have spent wasted hundreds of person-hours on the matter. The debate has sort of become a prisoner's dilemma: if your opponent disengages from the debate, you can win by pursuing it; but if your opponent pursues the debate, you must also pursue it to avoid losing, and so the most advantageous move for you is always to pursue it. Of course, everyone is worse off when all parties pursue it rather than disengage. So let us disengage, or at least compartmentalize the debate:

If people active in articles on, say, Switzerland, or German-speaking places in general, agree to use umlauts and the eszett when there is no common usage as well as when there is divided usage, then they are to be left alone, and their convention shall hold, but only in their bailiwick. The Swiss or perhaps German-language consensus will carry no weight in arguments relating to, say, İlham Əliyev versus Ilham Aliyev. I believe consensus in a particular subrealm of Wikipedia is more likely to emerge if the debate is circumscribed, and not of global consequence. Also, fewer editors will be involved in each subdebate: Three to one is a consensus, but twelve to four is divided usage. People are less likely to be drawn into a debate if it only looks at, say, Azeri words, or Croatian words, and doesn't affect their preferred convention for Meissen versus Meißen. So there will be less debating, and more consensuses are likely to emerge. It's certainly no worse than the present situation.

Just as the Peace of Augsburg sacrificed uniformity for peace, so does this proposal. Cuius projectum, eius conventio.

This is perhaps a pessimistic view that uniformity will not in the near future be agreed upon, but it is a view that has been borne out by the evidence. As long as the first four points at the top of this post are generally agreed to, which by and large they seem to be, readers will not be inconvenienced or miseducated. Perhaps we can all move on and spend our time on more worthwhile and enjoyable pursuits. --Atemperman (talk) 09:07, 28 June 2008 (UTC)

(Phillip, I've moved your responses out from inside my proposal and placed them here, along with, in italics, text that they respond to. I hope you don't mind--It just seemed a little confusing when I came back and looked at the page.--Atemperman (talk) 18:34, 28 June 2008 (UTC))
Sometimes, common usage, educated or not, does not exist, because the word is not commonly used in English. This seems to be one of the two principal sticking points. Very few English speakers have heard of the Polish town of Borek Strzeliński (conveniently, there's a German name for it, too: Großburg), so WP:UE is silent on it.
Divided usage is the other biggest sticking point. Z(u/ü)rich, for example.
  • Zurich may be split, but it is not an even split. Google Books gives "3250 on Zürich -Zurich" and 11000 on "Zurich -Zürich" which a ratio of three to one in favour of Zurich, so this is not a good example as it clearly should be Zurich under common usage (WP:NC). --Philip Baird Shearer (talk) 16:18, 28 June 2008 (UTC)
It does not matter what the policy on diacritics of a journal book or newspaper is, what matter is if they are reliable source on the subject. At a practical level how does one ascertain the policy on diacritics in third party publications if they do not publish them?
WP:UE#Divided usage already incorporates your suggestion see the section Divided usage "When there is evenly divided usage and other guidelines do not apply, leave the article name at the latest stable version. If it is unclear whether an article's name has been stable, defer to the name used by the first major contributor after the article ceased to be a stub." --Philip Baird Shearer (talk) 16:18, 28 June 2008 (UTC)
In response to Philip. There's a subtle distinction between determining common English usage with regard to particular words and common English usage with regard to diacritics and extensions (D&E) generally. Just as some editors want diacritics and extensions all the time, some want them never, and some want them some of the time depending on the individual word, some publications avoid them entirely or almost entirely, others make a point of using them whenever possible, and others have a more case-by-case approach. The proportion of publications, weighted by their importance or salience or authority, that fall into these categories can tell us where English is as a language on how to deal with diacritics and extensions, but only the ones that make choices on a case-by-case basis can tell us about individual words. The policy each publication has is easy enough to ascertain simply by seeing whether D&E appear all the time, not at all, or somewhere in between.
If we spend a lot of time (which clearly I don't think we should do) to survey publications' use or non-use of D&E, however, we are likely to find 1) inconsistency within a publication, 2) the problem of whether publications that use them exceedingly sparingly are never-users or sometimes-users, and 3) the problem of whether a source, regardless of its authority as regards content, is an authority on foreign orthography. Some editors want to distinguish between the two, others don't. What do we do with pre-Unicode sources on the internet? You may have a clear idea of what you think we should do in all of these cases, but given the debate we've had already, it's unlikely others will agree with you, or with each other.
Fair enough on the no-established-usage case. If it's in WP:UE that uncommonly used words are to have their native orthography, I guess that's simple enough. It seems, though, that there is disagreement over whether native conventions that respell these words when the special characters are not available should be applied to WP. Since these conventions say they only apply when the characters are not available, which is not the case in WP, it seems natural to use the preferred native orthography. This will probably happen anyway, as people who write on Rudolf Höß are likely to have some knowledge of the German language, while people who write on Rudolf Hess may very well not.
It's hard to draw the line on how evenly divided the usage has to be for contemporary English usage not to deliver a verdict. Is 3:1, in the example of Z(u/ü)rich, strong enough? I don't think it's possible to arrive at a consensus ratio, even if we all agreed on which sources count and how we go about determining the number of independent uses of one version versus another version. From the way editors have been arguing over this, it seems people can agree only that we should use Goering since it's spelled that way in an overwhelming proportion of English publications, and that truly 50:50 splits are divided and thus we fall back into the "no-common-usage" territory. And of course, how uncommon does a name have to be for it to be considered to have no common English usage? I don't think it's possible to draw bright lines that can be mechanically implemented on these questions, which is why I think we're better off simply not having this debate. --Atemperman (talk) 18:34, 28 June 2008 (UTC)
To answer the question of "how common does a name have to be for it to be considered to have no common English usage" I bellyfeel that this means "if no reliable English sources talk about it, then it has no common English usage." The mayor of Katowice and the mayor of Nice are notable and the English wikipedia has an articles about them, but reliable English publications probably don't talk about them all that much and I would definitely apply "no common English usage" there. (Actually, the Mayor of Nice may have English usage because of his dashing former life in motorcycle racing, but whatever). Somedumbyankee (talk) 06:21, 29 June 2008 (UTC)
I agree with Atemperman that this debate is going nowhere. I've said what I have to say about it, and I've gotten to the point where I'm going to put down the stick and walk away from the horse carcass. Pretty much every argument listed on WP:TRUTH and m:How to win an argument has been used on this page already, and it's just kind of silly. I'm just waiting for the appeal to Jimbo to drop, and then I'm going to go home. Somedumbyankee (talk) 17:14, 28 June 2008 (UTC)
  • Atemperman's language may indeed be the consensus of Wikipedia as a whole, although I believe some editors on this discussion have disagreed with each one of them.
  • Similarly, this proposal, while it should not itself be guidance, contains useful suggestions which may form a rule of thumb for those who are uncertain on how to spell an article. I see no reason they cannot be discussed on the relevant talk pages. Septentrionalis PMAnderson 18:57, 28 June 2008 (UTC)
If it is to be used as a rule of thumb then it ought to be rewritten to comply with policy and WP:UE such as this attempt that I made shortly after the proposal was suggested and which was promptly reverted. --Philip Baird Shearer (talk) 07:51, 29 June 2008 (UTC)
I think a lot of editors will be up in arms over blanket proposals to banish ß or þ from WP. Or at least, that's how they'd see it. Why not let the editors of Germanophone- and Icelandophone-related articles to decide for themselves? That's what might happen anyway, even if you do get a momentary consensus on your proposal--there'll be some overzealous implementer of it trying to turn every eszett into a double ess, the changes will be reverted by longtime editors of the Germanophone-related pages, the anti-extensionist will cite this consensus, the reverting editors will rebel against the consensus, etc.. Or maybe they won't, but I don't think another attempt to achieve WP-wide consensus on this will end the acrimony. I'd much prefer Pudeo's solution below. --Atemperman (talk) 15:23, 29 June 2008 (UTC)
No, a handful of German, or Icelandic and Scandinavian, nationalists will. But again, WP:UE does not require that þ be banished, merely that it be used only where English has failed to adopt th in its stead. We should use Althing and we do; we should use Thingvellir and do not; our articles on obscure Icelandic politicians should probably be mixed. Septentrionalis PMAnderson 20:19, 30 June 2008 (UTC)
Truly non-English characters fail to meet the "does not impair comprehension" argument for retaining diacritics. I'm familiar with β from exposure to German, but I doubt a substantial number of English speakers would realize that it's not pronounced anything like B. þ means about as much to me as Ж or ₪ or Њ or ℳ. None of these should be retained unless there is no plausible alternative. That is the ά and ω of making this intelligible as an English document. Demanding that people learn a new alphabet to read about topics in Finland is akin to demanding that the article about Arabic be written from right to left. Somedumbyankee (talk) 21:23, 30 June 2008 (UTC)
I don't think there's much support for retaining clearly non-English characters in article titles in any but the most obvious cases, such as El Niño. Interestingly, NASA's "for kids" website strips that quasi-diacritic as well, though the header image and the rest of the site use it. Somedumbyankee (talk) 19:06, 29 June 2008 (UTC)

Perhaps the best solution is what is between British/American spellings. If the article is about a German city, the only one regulating its name is the city itself. Then most likely a German/someone who has knowledge about German has created the article and is most active in editing it. Let's see what criteria from WP:ENGVAR it would fullfil: a) Retaining the existing variety b) Strong national ties to a topic c) Consistency within articles (current situation: diacritics used everywhere). Listen to those who have spent great amounts of time for their national topics in several WikiProjects. See what titles they have used. --Pudeo 10:55, 29 June 2008 (UTC)

Value judgments aside, this seems like what actually happens with most of these articles. It raises interesting questions about WP:OWN, but they seem to be tolerable for the "national varieties of English" process. It obviously causes problems for names that have changed hands multiple times during the history of the English language (q.v. Gdánzkig) when that ownership is contested. Since we cannot find a consensus here, the default is to retain the status quo, and I think recognizing an "International English" for the purpose of presence or absence of diacritics as a separate "variety" is appropriate. If the page was created at Zürich, it stays there unless there's a clear consensus for Zurich and vice versa.
In short, the policy could read "do not propose moving a page solely over the presence or absence of diacritics unless there is an overwhelming consensus to do so." I don't like it, but we're getting nowhere here, this appears to be what actually happens, and it really doesn't matter that much since 99% English speakers will just ignore them either way. Somedumbyankee (talk) 19:06, 29 June 2008 (UTC)
I agree with Atemperman’s synthesis and especially about the proposition about a renouncement of an absolute rule, applicable to all D&E from all languages. Leaving a space for a lack of standardization may hurt the systematic part of our minds that we all expressed in hoping to find a global solution. I feel however that the consensus was closer than many thought.
I find Somedumbyankee’s policy proposal a good compromise.
I’d have a few (partly) technical comments to add, the first of which should be kept in mind for the application of the soft consensus and for interoperability purposes:
  • The google statistical evaluation of names with diacritics is flawed when performed on google.com only. google’s national homepages lead to localized indexes taking into account the different use of letters. See also contribution of Pudeo in the section ‘Diacritics infact necessary different letters in some languages’ above. So if one performs searches on a term containing a letter with diacritic that is considered an individual letter in a specific language, the results will be different between google.com and google.[a country code where that letter with diacritic is considered as an individual character]. At the time of my research on this issue, 2 years ago, I used ‘Łodz’ vs. ‘Lodz’ on .com and .pl and the differences were then statistically significantly different – today, they are not any more, probably because of the higher number of hits. A test with a more rarely used name would be interesting. In addition, the example of ‘Ł’ is simple because it is unique to Polish. Other letters considered ‘with diacritic’ in English are more complicated because their status may differ from language to language. For instance ‘š’, a frequent letter in many East European languages, is considered a distinct letter in Czech, Estonian, Slovakian and Slovenian, but not in Latvian and Lithuanian. As long as REDIRECTs are done, no problem with either policy being chosen, but this issue should be remembered if WP wants to provide non-linguistically restricted search indexes. Then, the conformity of page titles to a standard about the use of the available character set may matter.
  • The Zurich/Zürich example is not really good, I’m afraid, because ‘Zurich’ without Umlaut could easily be considered an exonym; it is used by the city itself on its own website: watch http://www.zuerich.ch/, see the name in large characters, choose English on the top right, and compare! So, although I might have been seen as a supporter of a larger use of original diacritics, I would not object at all about reverting the present ‘Zürich’ page to ‘Zurich’. Anyway, if the discussion goes longer, I’d advise against using Zurich as an example.
  • About the ‘ß’ character, I’ve inserted a complement directly into the section in which it was commented, in reply to ‘Anticipation [...]’ and ‘Kotniski’ (section ‘We need diacritics [...]’). Clpda (talk) 13:08, 30 June 2008 (UTC)
Err, yeah, Zurich is a strange example, because they use a transliteration for websites as Zuerich, which isn't the native spelling or the standard English spelling. This entire discussion (most recently) started based on some proposals to move a bunch of tennis players (e.g. Tomáš Šmíd and I forget where the diacritics are in that name) from names with diacritics to undiacriticized (neologisms are cheap) versions, so they're a fair set of examples to use. There's a clear (but not universal) preference in English publications to strip them, notably in the English websites for the major tournaments, and a move was proposed based on WP:UE. Somedumbyankee (talk) 18:22, 30 June 2008 (UTC)

Treating these as four (four and a half?) problems

Potential cases where an English name and the native name are spelled differently:

  • First, spellings where the only difference is presence or absence of a diacritic: Tomáš Šmíd vs. Tomas Smid.
  • First.1, quasi-diacritics such as ñ which the native language treats as a separate character but function like a diacritic and are printed as the "base character" when anglicized: El Niño vs. El Nino (not a great example, the tilde-less version is extremely rare).
  • Second, spellings that use a diacritic where common anglicized spellings add characters.: Tuđman vs. Tudjman.
  • Third, spellings that use one or more characters that are incomprehensible to monoglot English speakers.: Alþing vs. Althing
  • Fourth, spellings that consist wholly of characters (or otherwise) that are incomprehensible to monoglot English speakers.: These have always been transliterated into anglicized spellings and are not contested.

Pudeo's proposal handles the first case in a fashion that hasn't encountered any opposition so far, and the fourth case is not controversial, so let's talk about the second and third cases. I would prefer to retain existing WP:UE standards on these, resulting in Tudjman and Althing. Somedumbyankee (talk) 22:25, 30 June 2008 (UTC)

Sorry, but how would Pudeo's proposal handle the first case? If he is proposing that "Tomáš Šmíd" be used even when reliable English-language sources use "Tomas Smid," then I am completely opposed to that proposal. Tennis expert (talk) 04:45, 1 July 2008 (UTC)
All the problems disappear if one uses the current guideline WP:UE, which is largely based on the policies WP:NC and WP:V. In the first case use the most common English version and if there is not one, use the last stable version used in the article. --Philip Baird Shearer (talk) 07:18, 1 July 2008 (UTC)
I guess the question is ultimately whether it's worth the blood, sweat, and tears to move all of these articles that are currently at the wrong location per current policy when, as has been pointed out, there isn't much of a barrier to comprehension. The reality is that the current policy is more honored in the breach than the observance. I'd really rather have the pages be consistent, but if there really is no meaningful barrier to comprehension, I don't see how it's that different from English/British spelling difference. I'd prefer to have them meet the current WP:UE standard as well, but current reality is more of a status quo than current policy. Somedumbyankee (talk) 14:45, 1 July 2008 (UTC)
All the problems will disappear if we show a bit lingustic tolerance. Even if we strictly follow the rule about "common English nonsense" we will have problems. Because, if you want to explore something about certain topic we should know whether we talk aout vegetables or movies.--Áñtò | Ãňţõ (talk) 19:22, 1 July 2008 (UTC)
The most intolerant people here are the people who can't tolerate the titles as they are in most cases, and must change them to another form despite the fact that's not the English form.--Prosfilaes (talk) 18:54, 2 July 2008 (UTC)
Bingo! Here we go ! 99% of the articles related to issues from non-anglophone countries use diacritics! Same thing applies for all other wikis in Latin script. So , instead of inveneting some non-existing "rules" pay attention to the accuracy of information. Including the names-for the most reliable sources are persons themselves!--Áñtò | Ãňţõ (talk) 16:23, 3 July 2008 (UTC)
There are customs on English use, however, and while they aren't absolute they do exist. Since they aren't documented, we're left with following what custom has laid out for us in reliable sources. Self-identifying usage has its own problems (mainstream use or what he calls himself}? Regardless, in the debate between what should be and what is, wikipedia must stick with what is. Should English language publications use diacritics? We don't care. All that matters is whether they do. Somedumbyankee (talk) 16:47, 3 July 2008 (UTC)
Customs are not laws/rules! Instead of inventing new "rules" pay attention to the accuracy of the information. If some person chnges his names from A.A. into B.B. then he is B.B. (born A.A.) -end of discussion! That is the rule for all other wikis. --Áñtò | Ãňţõ (talk) 11:23, 5 July 2008 (UTC)
Your stance about English use against original/normalized forms remains problematic. Imagine a writer/journalist writing a paper about a German guy named Georg Busch. S/he decides to anglicize this name to ‘George Bush’ because a) ‘Georg’ is non-English – translations or transpositions of first names are not uncommon – and b) the ‘c’ in ‘Busch’ is totally unneeded in English. S/he has probably some ulterior motive about media coverage and visibility, e.g. a line on the disambiguation page would be needed, but said writer/journalist dutifully adds ‘(from Germany)’ for disambiguation purposes. The initial paper can be considered as a reliable source, since the facts mentioned can be retraced in German sources. English speaking media commenting the initial paper take/cite the name as used and the form ‘George Bush (from Germany)’ becomes the generally used and recognized form in English, maybe just because one person wanted to boost his/her paper. According to your criteria, there is no reason to reject the form ‘George Bush (from Germany)’: it is English and it is used in most reliable English sources. Would you feel comfortable in this scenario? If not, why would removing a letter (‘c’) be more serious than removing a diacritic, remembering that what we call a diacritic may constitute a separate letter in other languages (just refer to the example higher up where 'Q' could be a 'O' with diacritic)? Clpda (talk) 19:40, 4 July 2008 (UTC)
English has always done this, and English speakers are used to names being spelled differently in English, even when there isn't much logical reason to do so. The problem is when English speakers wouldn't recognize Georg Busch and assume that the article is about someone else. Take, for example, Cristoforo Colombo, the man's probable birth name, or Cristóbal Colón, likely what his patrons called him. Who is this guy? Most English speakers wouldn't recognize him by those names, though I'm almost certain they know who he is. That's why the criteria for WP:NC is "most recognizable", not official name or self-identifying name. It's not a big deal, and I think that trying to move articles based on rigid interpretations of guidelines is a waste of time when there are already redirects and such, but new articles should be placed not where they "should be" but where people will find them. SDY (talk) 20:44, 4 July 2008 (UTC)
I see your point and do not basically disagree but since there are redirects that would lead from all possible forms which people would enter, shouldn’t the title page reflect some standard, i.e. with most accurate information? Think also about interoperability. Clpda (talk) 21:07, 4 July 2008 (UTC)
There is a standard (most common use), it's just that some people don't like it. SDY (talk) 21:23, 4 July 2008 (UTC)
I wouldn't call that a standard! a standard requires rules - there aren't any as you said yourself earlier - or statistics that would allow to recognize the 'most common use' - and there aren't any either (google having already been denied this role above) - so, it's only bias. Clpda (talk) 22:11, 4 July 2008 (UTC)
There are caveats to the guideline for when there isn't an obvious common use, and there are clearly rules (e.g. "use the spelling that the article's sources use"). It isn't prescriptive, nor should it be. I don't see how it's biased when it's essentially just "the common spelling of the name is a fact, follow the same rules you would follow for other facts." SDY (talk) 22:22, 4 July 2008 (UTC)
Err, where does the citation "the common spelling of the name is a fact, follow the same rules you would follow for other facts" come from ? I couldn't see it on the page you cited. Clpda (talk) 22:48, 4 July 2008 (UTC)

(dropping indents) The phrase you're quoting is just my synthesis of the topic, but the argument is generally that the current WP:NC convention meets both WP:NPOV and WP:V because it defers to sources instead of relying on editors.

"Wikipedia does not decide what characters are to be used in the name of an article's subject; English usage does. Wikipedia has no rule that titles must be written in certain characters, or that certain characters may not be used. Versions of a name which differ only in the use or non-use of modified letters should be treated like any other versions: Follow the general usage in English verifiable reliable sources in each case, whatever characters may or may not be used in them."

is the relevant section. SDY (talk) 23:01, 4 July 2008 (UTC)

I couldn't find that quote in the pages you referenced but that doesn't matter too much to me anymore - see also below. Clpda (talk) 00:08, 5 July 2008 (UTC)
So you say the press decides what names should be used, and they can change someone's personal name? Thank god we have a regulator in Finnish so tabloids don't dominate language usage here. :) Anyone can point to several media websites that don't use diacritics. But we can also point out to legal documents which use diacritics. In the case of Räikkönen, several news websites call him 'Raikkonen'. Legal documents don't, as well as authentic sources such as FIA (governing F1) use diacritics everywhere in English [5]. I don't see anyone changing that by giving us a link to The Sun or New York Post.. --Pudeo 14:09, 4 July 2008 (UTC)
In that case, retaining the diacritics there seems very reasonable. In the case that brought up this particular question, there was overwhelming evidence against their use. Milosevic has overwhelming evidence against use of diacritics. I don't think that many people have problems with using them when they are commonly used, it's just cases where they are almost never used in English publications and someone insists that they be included. SDY (talk) 16:53, 4 July 2008 (UTC)
Another issue is between ‘frequency of use’ vs. ‘reliability’. About the example above:
  • In the Library of Congress Subject Headings, a source that I wouldn’t expect you to deem unreliable, diacritics are retained; and no way of saying that the LoC is elitist or radical about original forms, the popular ‘Hutchinson Encyclopedia 2000’ does retain diacritics as well.
  • [Later addition from Clpda (talk) 22:24, 4 July 2008 (UTC)] The Spanish WP - a language for which you announce a good level of knowledge - keeps diacritics as well [for Milošević]. Both letter+diacritic combinations are absent from that language, i.e. hacek totally unused, acute accent used on vowels only. Doesn't this show a different perspective?
  • Inputting diacritics used to mean some technical difficulties (and partly still do so) as has been already mentioned on this page;
  • Having that name spelt ‘Milosevic’ is actually an accident of history: he was a Serb; Serbian is mostly written in the Cyrillic script and his name would then have needed to be transliterated into English as ‘Miloshevich’. We ‘inherited’ a downgraded Latin version only because he was spoken about at the time of former Yugoslavia, where both scripts were equally in use. It is legitimate to keep the Latin script but not the downgraded version.
Clpda (talk) 21:07, 4 July 2008 (UTC)
The LoC is a reliable source, I agree. The Spanish wikipedia has its own rules, and there is a rule-making body for Spanish, so the rules are probably substantially different. Inputting diacritics isn't prohibited, but it's still not particularly convenient for many users (hunt and peck on the editing page). English has always butchered the spelling of foreign names when anglicizing them, and those changed spellings are the accepted form (q.v. Columbus). It's exactly the same problem as Munich vs. München vs. مونی: English, like Farsi, spells it differently. It's not "wrong" it's just "how it is spelled in English." SDY (talk) 22:38, 4 July 2008 (UTC)
You did not convince me against a general adoption of foreign diacritics but I give up. The pages I intend to update on the English WP are not at all controversial on the diacritics issue. I entered this debate only because of my interest and knowledge about diacritics use - not that much in display but in indexing, processing and access. This input having been ignored or unwelcomed, I'm leaving the scene now, just keeping an eye on what is discussed and maybe coming back only to counter blatantly wrong statements. Clpda (talk) 00:08, 5 July 2008 (UTC)
Sadly, the non-arguments against diacritic use seem to have precisely that end effect on most people who get involved. I have been following the discussion but have been less involved as my efforts are needed more in article space. Agreed in general with the issues you have raised though (as would be mostly obvious from my own input at various stages and places.) Orderinchaos 01:42, 5 July 2008 (UTC)

Individual project exemptions

I see the discussion above, going back and forth on whether Wprojects should be able to have exceptions. I propose a new idea: individual projects may have exceptions, subject to individual approval. For example: the Wikipedia:WikiProject Hawaii works with specialised diacritics, using macrons and okinas that aren't much used outside of Hawaii, and not many other projects are affected by their use or non-use, especially with okinas. If a version of this proposed policy becomes official, wouldn't it be reasonable to tell the Hawaii project that they would be the ultimate authority on the use of Hawaiian (or Hawaiʻian :-) diacritics in a Hawaiian/Hawaiʻian context? Nyttend (talk) 14:29, 3 July 2008 (UTC)

The method for doing this is pretty simple, actually, since there is a process under WP:MOS for having project-specific style guides. The more vicious arguments in this discussion mostly circle around the title of the article itself, which some see as a matter of policy and not a question of style. Somedumbyankee (talk) 16:22, 3 July 2008 (UTC)

Closing this discussion

It appears that no one has convinced anyone of anything and there is no consensus to adopt this, so does anyone mind if I mark this proposed policy as rejected? SDY (talk) 02:07, 5 July 2008 (UTC)

Not so much rejected, as failed to gain consensus. But you're right, it doesn't look like anything's going to be solved by continuing this discussion (though the problem remains).--Kotniski (talk) 08:08, 5 July 2008 (UTC)