Wikipedia talk:Vernacular scripts

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This discussion started at Wikipedia:Village pump (policy). Please see

utcursch | talk 11:15, 30 November 2006 (UTC)

  • It would help if you explained on the page here what you mean by a "vernacular script". (Radiant) 14:31, 30 November 2006 (UTC)
Vernacular refers to any local or native language of a region. In the case of India. Vernacular scripts are the non-Roman Tamil, Kannada, Malayalam, Hindi, Bengali, etc - Parthi talk/contribs 01:53, 1 December 2006 (UTC)
Why is this Indian centered? I thought this was for all of wikipedia.--D-Boy 02:06, 1 December 2006 (UTC)
This can be a global policy. The Indian scripts are just used as an example. - Parthi talk/contribs 02:23, 1 December 2006 (UTC)
So is this global or not global? Please state.--D-Boy 06:17, 1 December 2006 (UTC)
I havent read all the discussion on this page, but I feel we shouldnt make it global yet. We should just discuss it from an Indian articles' perspective, and once we arrive at a consensus, we should invite other(non-indian) editors for comments and see how we can weave a global policy out of whatever consensus we arrive at for it. To start the discussion by focusing on all of wiki's articles will only lead to more chaos. Sarvagnya 07:27, 4 December 2006 (UTC)

Japanese script[edit]

Wikipedia:Manual of Style (Japan-related articles)#Using Japanese in the article body provides guidance for articles on topics related to Japan. The {{nihongo}} template is in widespread use in such articles. One feature of the template is that it includes a link to a Help article. The article Yokohama begins with the nihongo template (if you edit the article, you see it after the info box and the {{for}} template. Fg2 02:15, 1 December 2006 (UTC)


Since the only problems that have been raised are specific to the Indian situation, and since the issues involved vary enormously from one context to another, I suggest that this discussion be directed toward Wikipedia:Manual of Style (Indic-related articles), which could really use some attention, and maybe a name change.

I just can't see the value of any catch-all solution here. We've found the box solution to work well in Korea-related artices; on the other hand, in Japan-related articles (where typically only one script and romanization is needed), the vernacular script fits handily into the stream of article text. ...Although perhaps this page could serve to document the various ways that different communities on Wikipedia have resolved this issue in different contexts... -- Visviva 02:22, 1 December 2006 (UTC)

As parthi said, this is a "global" policy. Just vote yes for scripts and it will make it all go away. ^_-.--D-Boy 02:32, 1 December 2006 (UTC)
Whether it's a catch all solution, or India specific policy, the reasons behind the proposal is genuine. The communal, religious, ethnic rivalry and non productive edit wars that go on in so many India related articles make WP look a laughing stock. Imagine reading an article with five scripts in the first sentence plus IPA and IAST transliteration (as was the case with Carnatic music until reasently). - Parthi talk/contribs 02:58, 1 December 2006 (UTC) (edit conflict) For example of inappropriate inclusion of scripts, see Mahesh Bhupathi. For example an unreadable lead, see Rajnikanth. For example of the stupid edit wars that happen, see Vidya Balan. Cheers Parthi talk/contribs 03:23, 1 December 2006 (UTC)
To my Koreanized eyes, it seems like an infobox is what you need (option 3), but that needs to be worked out with regard to the specific Indian situation(s). India's situation is even more unique than most; trying to create a general policy based on that is bound to result in something that is either dangerously permissive or destructively restrictive... -- Visviva 03:18, 1 December 2006 (UTC)
I could live with a box. It just depends on what's in the box. Urdu shouldn't be on half the articles. I always wanted a box for indic articles like the Korean one. It's so simple.--D-Boy 04:15, 1 December 2006 (UTC)
Articles dealing with Korea have a box with multiple scripts and romanizations, which is pretty close to the situation that articles on India face. A box might be right for India as for Korea. Differences between India and other places are specific to India and should not form the basis for a policy for other places. Situations vary so much from country to country that a worldwide policy seems unlikely to happen. I recommend discussing it within the community of editors of articles on India, as Visviva said. Fg2 03:43, 1 December 2006 (UTC)
I agree with Visviva in that a catch-all solution would not work very well. I also like the idea of this page documenting all of the different methods used for the various languages. For some, inline usage works really well (e.g. Japanese articles), and for others, a box would work better due to multiple scripts that would tend to clutter the intro. ···日本穣? · Talk to Nihonjoe 04:27, 1 December 2006 (UTC)

Good idea[edit]

This guideline proposal is a great idea. Personally I don't understand how the vernacular scripts help. Can someone please explain to me how they help? Who are the target audience? I am with proposal 4. The interwiki-links can do the same job in a much better way. Regards, Ganeshk (talk) 03:36, 1 December 2006 (UTC)

Please look at the proposal again. I have fleshed out the arguements.--D-Boy 07:30, 1 December 2006 (UTC)
As long as the article has interwiki-links, there is no special advantange of having vernacular scripts. This is the main intent of the proposal #4. Utcursch has explained in the proposal with an example: Interlanguage links such as [[mr:आशा भोंसले]], [[fr:Asha Bhosle]], [[gu:આશા ભોંસલે]], [[hi:आशा भोंसले]] already serve the purpose of including vernacular scripts. - KNM Talk 04:17, 1 December 2006 (UTC)
If the east asians have them, why can't we? this hypocratic just because the articles are indian.--D-Boy 06:01, 1 December 2006 (UTC)
The most important reason is verifiability; in many cases, people and topics from non-English-speaking countries and regions are documented largely (or exclusively) in the local language. Romanization is frequently problematic, so it is often difficult or impossible to reconstruct the original name if only a romanization is given. Also, many articles don't have interwikis, and even when they do the title of the linked article may not accurately reflect the native-language name (due to disambiguation, NPOV, etc.). More generally, on Wikipedia native-language names have traditionally been considered necessary for full, encyclopedic treatment of a topic. That is why we have Category:Lacking non-English text. Cheers, -- Visviva 07:07, 1 December 2006 (UTC)
I see. Thanks for the explanation. Regards, Ganeshk (talk) 07:12, 1 December 2006 (UTC)
That statement was amazing.--D-Boy 08:04, 1 December 2006 (UTC)
dB, are you referring to my response or Visviva's? I meant to say Visviva got me converted. I no longer believe that the scripts should be totally removed. I will go for a box like they did with Korea. -- Ganeshk (talk) 19:35, 1 December 2006 (UTC)
Yeah. It was visviva's statement. it was beautiful.--D-Boy 19:51, 1 December 2006 (UTC)

In addition to verifiability doesn't it make it easier to search for the articles in the first place? Particularly it will make Wikipedia articles more likely to come up in a third-party web search engine like Google when searching on native-script words.

Clarification on Satyameva Jayate[edit]

Hi Utcursch, Can you please elaborate on the special circumstances for Satyameva Jayate? Thanks Parthi talk/contribs 05:12, 1 December 2006 (UTC)

my people have no identity or unity...Can't believe there's a problem with even that.--D-Boy 05:22, 1 December 2006 (UTC)
What do you mean by that?? -Parthi talk/contribs 05:25, 1 December 2006 (UTC)
I mean my people have no identity or unity.--D-Boy 05:38, 1 December 2006 (UTC)

I regard it is a special case because it is encountered in Indic script on the emblem of India (see Image:Emblem of India.svg). utcursch | talk 12:32, 1 December 2006 (UTC)

Opinion/arguments from User:Ragib[edit]

Articles are for providing knowledge. IAST or ITRANS or IPA - these phonetic solutions do not work if you want to find out the native name of a person. Why? Because name spellings are fixed, but same pronunciation can be written using different spellings. An example is Iajuddin Ahmed - his name can be written in Bengali script as ইয়াজুদ্দিন আহমেদ, ইয়াজুদ্দিন আহম্মদ, ইয়াজউদ্দিন আহম্মদ etc. However, he writes it using ইয়াজউদ্দিন আহম্মেদ only. So, the presence of the native script provides this small, but significant information.

Another example is Rabindranath Tagore. The Bengali pronunciation is different, and so is the spelling in Bangla (where it's written as রবীন্দ্রনাথ ঠাকুর, pronounced as Robindronaath Thhakur). You might argue that it is easy to spell out the original Bangla spelling using IPA, however, would someone be able to figure out that the Robi is actually spelled using "Robee" (long-e) (due to spelling conventions?).

I looked into many other articles, which happily has these type of script but that's not raising an eyebrow anywhere. For example, Aristotle, Plato display Greek script. Li Peng has chinese script (yes, several different scripts). There doesn't seem to be any problem with those articles.

Time and again, I see the problems only in some Indian articles, involving *only* a particular group of editors. That is not a justification for removal of quite useful scripts from the rest of the biography pages.

10-15 characters per article is not a nuisance, nor a big deal. But these few characters carry a significant piece of info, if not the most important one - the person's name. I strongly support keeping the native script name of the person (not the "vernacular" which is ambiguous). A person's name is his/her identity, we gain nothing by removing it just to solve a few edit wars.

In the end, I'll argue for using the East Asian solution, i.e. boxes, as in Mao Zedong. That won't, however, stop the edit warriors, but that's a different problem. Thanks. --Ragib 06:53, 1 December 2006 (UTC)

8-)--D-Boy 07:13, 1 December 2006 (UTC)

Which scripts to include (Proposal three)[edit]

I have added one note to proposal three:

The purpose behind adding scripts is to help the user find relevant documentation. This should be kept in mind in deciding which scripts to add, and scripts shouldn't be added just because the subject is popular in that language.

Maybe someone whose English is better than mine can reword this. What I mean is that when we talk about people like A.M. Rajah, we should write the name in a box in Tamil, Telugu and Kannada scripts because searching in each script will will give you selections of his songs in films in those languages. Similarly, for A.E. Manoharan the name should be in both Sinhala and Tamil because you will find a lot of his songs on Sinhala sites. Geeta Dutt should for the same reason be in Bengali, Hindi and Gujarati because she sang in all these languages (I think all three have different scripts). I don't know how this applies to Bollywood people in Hindi and Urdu, but it should be easy to decide whether it will help to find more information or not. -- Ponnampalam 12:19, 1 December 2006 (UTC)

Geeta Dutt in Bengali seems fine since she's a bengali woman.--D-Boy 19:19, 1 December 2006 (UTC)
A great illustration of the absurdity of the need for native scripts. One needs to define the subject's ethnicity to define one's native language. What is the subject is of mixed parentage? Take for example Nora Jones. She is the daughter of Ravi Shankar and Sue Jones, whos is an American. Nora lives in the US and sings in English. But since her parentage is half Bengali, would you consider including Bengali script in that article? The same problem would apply for Zubin Mehta, Iftikhar Ali Khan Pataudi, and many more. The bone of contention is the acts of a number of editors diligently adding Hindi scripts in article even remotely related to India, arguing that it was because 'Hindi is the official language' of the Indian Government. I'm sorry, it all look a bit silly to me. I see no need to lable someone or something denoting their ethnicity, religion or race. - Parthi talk/contribs 23:29, 1 December 2006 (UTC) Look at Rekha for example. Why is there Hindi and Urdu in this article, but not Tamil (her ethnicity)? I'm really surprised that there hasn't been an edit war for that omission! - Parthi talk/contribs 23:31, 1 December 2006 (UTC)
You are right about Nora Jones, but these are exceptional cases. In many cases, the native language is quite obvious (such as William Shakespeare's being english, Rabindranath Tagore's being Bengali). Same goes for Uttam Kumar, Suchitra Sen, Amitabh Bachchan and a most other people. Thanks. --Ragib 23:35, 1 December 2006 (UTC)
I respect your opinion Ragib, but my point is that such exceptional circumstances and the need felt by some to include Hindi scripts in all India related articles always lead to long-drawn edit wars. If we can find a via media for this and come up with an acceptable solution such as restricting them in an infobox, I will support option 3. Regards Parthi talk/contribs 00:06, 2 December 2006 (UTC)
I have redesigned the Carnatic music infobox to include the vernacular scripts instead of the lead para. Please take a look: Carnatic music. - — Preceding unsigned comment added by Venu62 (talkcontribs) Parthi talk/contribs 04:05, 2 December 2006 (UTC)
If you're worried about Hindi and that's you're main complaint, don't worry. I don't even want Hindi used everywhere. It shouldn't be.--D-Boy 04:03, 2 December 2006 (UTC)
I'm not worried about Hindi. I'm worried about the useless edit wars created by these scripts that waste the time of a lot of useful editors. - Parthi talk/contribs 04:11, 2 December 2006 (UTC)
If they get all the scripts on a page like this Japanese invasions of Korea (1592-1598), i'm sure we can manage.--D-Boy 05:13, 2 December 2006 (UTC)

Song lyrics[edit]

I don't know if the Indian writers want to use these same guidelines or have different ones to decide what script the complete lyrics of songs should be written in. Because some people have had fights about this, maybe we should decide this also? -- Ponnampalam 12:19, 1 December 2006 (UTC)

Surely the more important question is whether articles should include complete song lyrics at all? That seems to raise issues with WP:NOT, to say nothing of the matter of copyright... Perhaps I'm misunderstanding, though. Can you provide an example? -- Visviva 03:09, 2 December 2006 (UTC)
The issue seems to be died down now, so maybe we can leave it at that. -- Ponnampalam 12:39, 3 December 2006 (UTC)

Indian name table?[edit]

Test. OK, let me preface this by saying that I am completely ignorant of the India-specific issues at play here. However, since there seems to be a fair amount of interest in an infobox solution, I've borrowed the very nice formatting from {{Carnatic}} and mocked up a general-purpose infobox at User:Visviva/Test. See that page for basic documentation, and the talk page for some miscellaneous examples. I've applied our experience from the Korean and Chinese infoboxes: that arguments should be optional, that it should be possible to request help, that {{lang}} is a wonderful thing, that images should have a second colored bar between the caption & the names, etc.

All arguments are optional, and most of the languages are in alphabetical order (except that Hindi and Sanskrit currently display at the top, in that order). Don't know if that's appropriate -- perhaps everything should be in alphabetical order? Please be advised that making the display order flexible would be a Humongous Pain. Anyway, feel free to adapt this as needed, or not to use it at all. :-) Cheers, -- Visviva 15:26, 2 December 2006 (UTC)

You should post that on Ganeshk's talk page. He trying to create one.--D-Boy 16:28, 2 December 2006 (UTC)
I feel implementing the table per the Korean way will not work. Indian table cannot have a standard set of languages and not all articles need to have all the languages. I see the template adding "Lacking script cat" when Bengali is not present. This will not work in the Indian case. Bengali script does not apply for example to a South Indian article. This will lead to edit-wars about which lanugage is listed and which is not. For example Gujarathi and Marathi are important languages that are missing on your table. Instead, I was looking for the template to have generic placeholders for upto 5 languages. The template should accept the name of language (as place holder #1) and the script the respective language (as place holder #2). If language name is present, but the script is not, then the template should add to the respective category. Regards, Ganeshk (talk) 17:01, 2 December 2006 (UTC)
I see your point... but just to clarify, in this table *none* of the languages will display unless specified, and the "lacking Foovian text" cat will only appear if the page author has specifically set "Foovian=!", thus explicitly requesting that language's text. (Otherwise this very page would be in Category:Lacking Bengali text). It would be fairly easy to add (an arbitrarily large number of) languages to the template, since the code for each line is basically modular; just copy, paste, and swap in the language name, variable name and ISO code...I'll add Gujarathi and Marathi by way of illustration. However, I can see where the solution you describe might work better; I guess it depends to some extent on whether the languages potentially involved form a closed set of manageable size or not. Cheers, -- Visviva 01:33, 3 December 2006 (UTC)
Another thing that occurs to me: language-coding can't be automated if the languages are user-defined. That may not be a deal-breaker, but it would be problematic, given the gains in accessibility that language-coding allows. -- Visviva 03:35, 3 December 2006 (UTC)

Use of Option 4, globally[edit]

I feel, option 4 summarises everything, quite succinctly. We can keep vernacular scripts except in cases where they are really needed (which can be found through consensus on a cases by case manner). I agree that even in Korea or Japan related articles, such practice provides context. However to a non-local reader, like me for example, it doesn't do any good. Think of me as the reader. I don't understand that language, Korean (or a variation of Korean), Japanese or Chinese, or any other 20 odd Indian languages (other than my native language) or Greek. In that case, to put it simply, if I can't understand the language, I can't understand what context it relates to as well.

The main aim of an encyclopaedia is to convey the information in a easier way to all readers neutrally. Use of vernacular scripts does no good in that case. Let's accept it. The number of people who dont speak, Korean or Japanese or Hindi outnumber the number of people who speak those languages. Besides as I mentioned earlier, this is English Wikipedia.

People give a lot of examples as It is used here... It is used there... So why don't we use it as well. Let's think the other way, we find using scripts in Indian languages as so crappy and annoying when used unnecessarily. We've experienced it first hand. And we know that for a person who don't know the language, whatever that script may mean, and whatever context the script may provide, it isn't of any use. Interlanguage links can help. And that wikipedia in that particular language can give whatever variation in the scripts that belong to that language if needed, say, in Korean. If there is two different Korean or Japanese way of expressing something, let that find its place in Korean Wikipedia and Japanese Wikipedia. May be we can cite or explain (using an infobox at the bottom) in English Wikipedia saying that this word can mean "this" or "that" in English! That would help much better than placing the exact scripts, which most other people in the world can't understand !! -- Chez (Discuss / Email) 21:25, 2 December 2006 (UTC)

It's not like scripts are taking up the entire page. And only some bollywood articles seem to have a problem. Also, wikipedia is for spreading for free knowledge. Having the scripts is like an easter egg on a dvd. You get even more and you increase knowledge of the articles origin especially if its not native to english. Not having the scripts in the article would cheat the user. He's not getting the knowledge and you're trying to withhold because a minor edit war which is less insignificant compared to the vandalism that occurs on president bush's article. And i think vis's arguement just kills your. it was statement to beautifully.--D-Boy 21:44, 2 December 2006 (UTC)
Also, trying to remove the scripts globally will be a bigger edit war than just trying to remove indic ones. you'll never get that through.--D-Boy 21:44, 2 December 2006 (UTC)
D-Boy, removing the scripts globally will not be an edit war, if a consensus is established. I agree that there may not be interwiki links articles available to say what the vernacular scripts say, as said by Visviva. That is why I stress that the different contexts may be explained in English, so that everyone can understand it. I'm foremost a reader than an editor. And I like any other reader would like to know whatever it may mean, in English, in English Wikipedia, than learning hundreds of vernacular scripts and their related pronunciations. Vernacular scripts are simply POV pushing as they provide an owning stance to the article and nothing more than that. All along its ur view that only Bollywood articles are subject to vernacular scripts. Check out Chalukya Dynasty. It actually expanded upto Maharashtra. In that case, I'd argue that Marathi and Hindi be included as well, if i'm on the other side of this discussion. So is the case with a number of other articles. And, it is not like if east asians can do it, we can also do it. We don't know what those letters are characters in vernacular scripts mean. And it is practically impossible for everyone to learn every single vernacular script. Unless English wikipedia decides to explain it in English, every vernacular script that stands for context can go. OTOH, if pronunciation is the problem let's stick to standard IPA. IPA is independent of language. I'd like to re-iterate that I support Proposal 4.
IPA may be independent of language, but it is for pronunciation, not for spelling. A person's name may be pronounced in only one way, but the same pronunciation can be written in various ways. In the examples I have shown above (and elsewhere), the native script provides the spelling used by the person/others, while IPA could only provide the pronunciation. Why we need native language spelling has been explained by others above. So, this argument of IPA being independent of language is not entirely applicable here. Besides, it is not always correct ... Kajol is pronounced differently in Hindi and Bengali languages. Thanks. --Ragib 06:18, 3 December 2006 (UTC)
A name like Thamilselvan can be pronounced in four or five different ways in Tamil itself. Tamil Nadu people pronounce it differently from us. This is also true of names in European languages. If the pronunciation is to be conveyed, it should be how the person himself would pronounce the name. So I don't understand what you mean about Kajol's name having different pronunciation in Hindi and Bengali. Her own pronunciation is correct, the others are wrong. Why is it useful to know in an English language wikipedia what the native language spelling of a name is? And even if this is necessary information, I think that the ISO transliteration systems give that information more usefully than the native script does because they will be accessible to more people. I and most others can't read Russian or Hindi or Bengali so spelling in that script is meaningless, but a proper transliteration to Latin according a standard system will be more easily accessible. -- Ponnampalam 12:50, 3 December 2006 (UTC)
The idea that this can't get through 1.5 million articles doesn't look into the future. Cleaning up 1.5 million articles may be tough, but cleaning up 10 million articles in future will be tougher. This proposal aims at stopping further use in those 8.5 million articles and also cleaning this 1.5 million articles. It can be done if we get a consensus here. -- Chez (Discuss / Email) 01:44, 3 December 2006 (UTC)
Chalukya dynasty doesn't have any problems. You're just trying to make it an issue. It definitely was one of the great southern dynasties such as the chola. I see no need to add maratha or hindi there. I disagree with your proposal and I think you are providing less info to the user if you remove the scripts.--D-Boy 01:48, 3 December 2006 (UTC)
That is what I'm trying to make clear. If we want we can simply make an issue out of nothing. Anyone can argue like that for any related article, where POV can be pushed to include such scripts. The solution is to remove such vernacular scripts completely. Alternatively, more info can be provided in English, if the community feels that removing vernacular scripts deprive the reader of important info. -- Chez (Discuss / Email) 02:35, 3 December 2006 (UTC)
I see no problem except with the bollywood and bollywood bio articles. Those are the only problems. nothing else. This is where the whole arguement stemmed from. Those are easily solvable on a case by case basis.--D-Boy 03:16, 3 December 2006 (UTC)
I agree with Proposal 4! Just dont include the scripts since they're creating so much problems. I dont think that pages dedicated to actors need scripts. Like proposal 4 says, Interlanguage links have already been provided so there really no need of scripts. -- Hariharan91 07:06, 5 December 2006 (UTC)

A global approach[edit]

I'm not sure if it's feasible, but I agree that it would be nice to have a general Wikipedia-wide standard for the exclusion and inclusion of non-English name information. I wonder if an acceptable set of rules might go something like this:

  1. For people:
    1. Yes: The person's native-language name(s), if known, should always be included.
    2. Maybe: If the person has self-identified (written, spoken, etc.) extensively in another language, it *may* be reasonable to include the other-language names, particularly if they are substantially different (not a simple transliteration of the English or native name). This aids in verification, adds encyclopedic content, and helps to prevent article duplication. This can be compared to the inclusion of pen names. Edit-warring is not acceptable.
    3. No: Generally, all other language names should be excluded. Even if Belgian performer X is wildly popular in Mongolia, it is still not appropriate to include X's Mongolian name in the article, except perhaps in a section specifically devoted to X's popularity in Mongolia.
  2. For things:
    1. Yes: If the article subject is culturally tied to a specific language community(ies), its name(s) in that language community should be included. For example, the Korean name of kimchi should be included in that article.
    2. Maybe: If the article subject has become deeply rooted in multiple language communities, such that it can be considered a part of the associated culture, it may be appropriate to include those names. Example: Thousand Character Classic, Chopsticks. Edit-warring is not acceptable.
    3. No: Generally, all other names should be excluded.
  3. For places:
    1. Yes: The place's official name in the language(s) of the country(ies) in which it is located should always be included.
    2. Yes': If jurisdiction is subject to an active international dispute, the official name given by all parties should be included. Example: Dokdo.
    3. No: All other names should be excluded, although names of historical interest may be included in the appropriate section or sub-article.
  4. Display:
    1. If the total number of language representations (native scripts + romanizations) is less than 4, it is probably best to use an in-line template, as part of the article lead. Examples (where only one language is in play): Japanese, Russian, Arabic, Greek, all languages using the Roman alphabet.
    2. If the total number is 4 or more, it is probably best to use an infobox, placed to the right of the article lead. Examples: Korean, Chinese(?), many Indic situations.

This is more or less off the top of my head, and I'm not at all attached to it, but I think it broadly corresponds to the way we've already been handling these issues across Wikipedia.

At any rate, I have to agree with the editors above (notably Dangerous-Boy) that the global removal of non-English text will not gain consensus on Wikipedia. *I* certainly would never join such a consensus, since I have added Korean name information to thousands of articles, and know many other editors who have also worked very hard to ensure that such information was present, accurate, and appropriately presented. None of us would have gone to such trouble if we didn't think it was imperative that Wikipedia provide such information.

Here's a crazy thought: Looking through the list of ISO 639-2 codes, I suspect that it would be possible to create a *single* standard infobox that would represent all of these languages, and probably any others that we might deem necessary. This could permanently put the kibosh on the silly edit wars over language precedence that afflict us from time to time; languages could simply be ordered by ISO code, just as the interwiki links are now. It would also provide a welcome level of uniformity across languages, which would make this information somewhat more accessible to general users. There shouldn't be any performance issues... Mediawiki's pre-processing limit for transclusion, if I recall, is 2 megabytes, and an all-languages template, even including multiple writing systems, would surely not exceed 30K. Just an idea... -- Visviva 04:14, 3 December 2006 (UTC)

you forgot Four Heavenly Kings!--D-Boy 05:32, 3 December 2006 (UTC)
Gaaah! That table makes me want to reconsider the whole thing. ;-) But yes, that would fall under criterion 2.2, as something with deep cultural significance across multiple languages. -- Visviva 07:26, 3 December 2006 (UTC)
Well, written, Visviva. I still believe that it's better to do away with vernacular scripts. But as others have pointed out that it will never gain consensus because many users have put these scripts into thousands of articles. utcursch | talk 06:51, 3 December 2006 (UTC)
I'm inclined to go with a set of well-defined ground rules for entering vernacular scripts on articles pertaining to people, places, ethnicities, literature, and arts. I support the box/table idea. On India related matters, a box with the relevant languages need to be there. The text should be restricted to official languages only (ie these. When necessary, we can make them degenerate , like when Marathi and Hindi, almost the same script, spell a name the same, we can put Hindi/Marathi= <whatever>. Only appropriate scripts need be used (I assume here is where the gray area lies). For instance, there is no need to put Urdu everywhere, only where appropriate, such as articles on native Urdu speakers or Urdu literature (also historical articles on Mughals etc. and Pakistani stuff). Similarly, no need for Tamil anywhere except on subjects pertaining to Tamil Nadu or Tamil people (native or diaspora, Indian or Sinhala).All India related articles need to have Hindi (even if the subject is connected to Pakistan as well) as Hindi is the official language of the union government. Likewise, all Pakistan related articles need to have Urdu (even if the subject is connected to India as well) as Urdu is the national language of Pakistan. generally, Hindi and Urdu are necessary for subjects connected to both countries. Articles related to Sindh (Pakistan) can have Urdu and Sindhi, Pakistani Punjab can have Urdu and Punjabi, Indian Punjab with Punjabi and Hindi etc. I am against the total removal of vernacular scripts. Any disputes over the appropriateness of including a language in a box can be resolved on a case by case basis whenever the ground rules produce ambiguities. Hkelkar 21:29, 3 December 2006 (UTC)
I don't see aneed to have Hindi in all India related articles. Even if Hindi is the official language of the Indian Government how is that relevant to Wikipedia, which is an English language encyclopedia? Do you mean to say in the Madurai article or in Sangam literature or Arundhati Roy should have Hindi script? Why??? - Parthi talk/contribs 23:01, 3 December 2006 (UTC)
I don't think Hindi has to be in every Indian article. Just be cool. We'll get through this.--D-Boy 05:10, 4 December 2006 (UTC)


You know, tables needn't be as large as the ones that people have been linking. I have noticed a tendency for people designing tables and templates to be much more concerned with the legibility of the table than they are with their impact on the article or discussion page. Most of the tables/templates are just too dang large! In fact, there should be some sort of policy as to just how large they can be and furthermore, if they can be applied. I know of several "projects" that consist of one person plastering huge templates on inappropriately chosen articles.

Tightening up columns would help. The size of the type could also be dropped slightly. Designs could be changed. Width is more disturbing than length. A big table aligned center that completely interrupts the text is worse than one that is only 150 px wide and hugs the right margin. It might also be possible to produce the table in two versions, a large version and a kind of thumbnail that, when clicked, would expand to fill the screen. Zora 09:03, 3 December 2006 (UTC)

Go slow please[edit]

I want to add a proposal too. But I am very busy for another week and possibly few weeks. Too much seems to be happening on this discussion too fast. Outcome of this discussion will possibly have ramifications for hundreds of articles - almost all India-related articles. So, I request people to go a little slow on this. Sarvagnya 07:23, 4 December 2006 (UTC)

Don't worry! We are not rushing into anything like voting soon. utcursch | talk 05:58, 5 December 2006 (UTC)

Reg Consensus[edit]

I believe we can come to consensus, rather than trying to note every now and then that consensus is not possible here. Let's refocus starting with the situation related to Indian articles.

  • Why do we need a vernacular script added to an article in the first case? Is it really necessary? If yes, when and where? If yes, it is impossible to use all the scripts in all articles. So what will be the policy we can follow from now on to streamline the use.
  • If they are to be used, what are the implications for non-local community which doesn't know the language (which for any local community is globally larger than that particular local community)?
  • Is local language spellings necessary? Or whether they can be provided using interwiki links?
  • If pronunciation is the only thing that matters, is it okay to use IPA, which is the standard accepted in Wikipedia?
  • Should people (readers) really learn though inclusion of vernacular scripts in English Wikipedia, because we've already got Wiktionary for that!

If we address the basic criteria first, i guess, we can move on to the next level, which will be outside Indian context. Let's not discuss about adding Hindi or Urdu or any specific language. Let's generalise the use of LOTE (Language other than English) in English Wikipedia.

Lastly, the idea that this is impossible to implement, just because, we've put so much effort on adding vernacular scripts is not a strong argument here. We all know that any article is subject to iterative changes based on new policies when we come across problems and as we find solutions such problems. We cannot simply maintain status quo, just because we've put enormous effort. 100 years later, after our death, wikipedia will be there and someone else will edit the articles, based on a different view as scope changes within the larger global community. At that time, we wont even be alive to influence an argument that we've put so much effort in articles. Please bear that in mind. Cheers. -- Chez (Discuss / Email) 01:50, 5 December 2006 (UTC)

OK, as a non-Indian editor, perhaps I shouldn't be so involved here. But since the discussion seems to be slowing, I'll put in my responses to the above:
  1. Such information is needed for verifiability and encyclopedic content. Wikipedia:Naming conventions (use English) specifically calls for the original language name of the article subject to be included in the article lead. Perhaps this is especially inappropriate in Indian contexts, but no one has yet explained why.
  2. This seems to be a red herring. Interwiki links are not part of the article content; they come and go based on the content of other Wikipedias.
    1. Further, there is no guarantee that an article's proper name in language Y is also the title of the corresponding Y-language Wikipedia article. (And even if it is so at time Z, it may not be at time Z+1).
  3. Per Wikipedia:Accessibility, non-Roman text should always be accompanied by a transliteration. If the burden of scripts and transliterations is obstructing the flow of text, use an infobox.
  4. I'm confused... why would pronunciation be the only thing that matters? In fact, why does it matter at all in most cases? Generally, pronunciation information for other languages belongs on Wiktionary. That said -- if there is no standard transliteration for a particular script, it may be reasonable to use IPA instead. That would certainly be better than using a purely ad hoc system (which is barred under WP:NOR).
  5. No. Per WP:NOT, information about usage, etymology, pronunciation, et al., should be sent to Wiktionary -- unless it is for some reason particularly relevant to the article. There is even a nifty template you can use to link original-text words to their Wiktionary entries.
As far as I'm aware, the Indian context is the only one where the inclusion of original-language names has been a serious matter of contention. If the Indian context is substantially different -- and again I'm still at a loss as to why -- then by that same token, whatever conclusion is reached here will not affect the existing global consensus in favor of including other language names. If the Indian context is not substantially different, then we already have a guideline, backed up by widespread agreement across many editing communities, which dictates that this information be included. -- Visviva 12:55, 9 December 2006 (UTC)

Who has the right to spell a name[edit]

A point was raised in favour of multiple scripts - namely that names with the same pronounciation are spelt differently in different scripts. I must at this point ask - who decides how a proper noun is spelt in a particular script. Very often, people use nonstandard spellings for names that are pronounced in exactly the same way as the standard spelling. Again, most people typically decide how their name is to be spelt in one script, and then all other spellings are transcriptions of the first.

The example offered was Kajol. Apparently her name is spelt কাজল in Bangla, and काजोल in Hindi, and hence no method of transliteration could back out both these spellings. What I want to know is - how do we know that Kajol spells her name কাজল in Bangla. Firstly - do we know how regularly she has occasion to write her name in Bangla? Second - how can we be certain that she spells her name কাজল and not কাজোল. While it is true that the Bangla word "kaajol", meaning collyrium, is spelt that way, it in no way implies that a person whose name is based on that word would spell it in exactly the same way.

Let me use another example to illustrate my point. I also have a Bengali father and a Marathi mother. My name is pronounced differently in Bangla and Marathi, but since I grew up in Calcutta, the Bangla pronounciation is the one I use. However, I learnt Hindi over Bengali in school, and hence the way I write my name in Hindi is an exact transliteration given how I pronounce it. The Bangla spelling, in turn, is a one-to-one transcription from Hindi. If someone were to write my name in Gurmukhi, Tamil or any other script, it would always be a transcription from Hindi. I see no reason to believe that others do not do the same.

Hence, writing a person's name in multiple scripts is pointless unless the person himself/herself actually makes it a point to write his/her own name differently in different scripts. Not only is this unlikely, it is also difficult to ascertain. This is why I believe there is nothing to be gained by adding multiple scripts to a page. Gamesmaster G-9 07:16, 5 December 2006 (UTC)

That's exactly why we should only stick to native script, i.e. the name as written by the person. --Ragib 07:58, 5 December 2006 (UTC)
Typically, that would be fine, but what about cases where nativity is difficult to determine. Gamesmaster G-9 08:03, 5 December 2006 (UTC)
Naming information is not exempt from WP:V or WP:BLP; if we cannot verify that her name is either self-spelled or generally-spelled in the given way, the information does not belong in the article. However, if otherwise trustworthy sources consistently give her name with a certain spelling, then I don't see any reason for Wikipedia not to use that spelling.
I agree with your general point, I think: that non-Roman scripts should not be included when they are simply transliterations of another script; there is no point in having any transliterations that are not romanizations. The only exception might be where the non-Roman transcription is the actual official name, such as when East Asian companies take names that are simply renderings of English words into native script (e.g. CJ Group). -- Visviva 08:36, 5 December 2006 (UTC)

Why have vernacular scripts?[edit]

The only main reason I see is for correct pronunciation, which is a reasonable point. But whatever script is used on a page, whether native or Hindi etc. there will be people who can't read it and so has no purpose for them. Most of the opponents argue that IPA/ITRANS serves the purpose, yet I doubt the average person knows how to read IPA/ITRANS. I think the best solution for "correct pronuciation" is WikiProject Spoken Wikipedia. The .ogg files don't take long to download and hearing the name is obviously the best way to learn how to pronounce.

Another minor reason may be that in exceptional cases, romanisation is misleading. I'll use an example: The Hindu god Shiva. In most of modern India, Shiva is now pronounced as Shiv. Adding an "a" (techinically a long "a") changed the word from the masculine form to the feminine form. So Shiva and Shivā (well actually Śiva and Śivā) in ITRANS mean Shiv the god and feminine Shiv which is Parvati respectively. Since in native English speakers always pronounce the "a" at the end in long form, when they say the God's name, they pronounce it in the feminine form and hence techinically are referring to a different deity.

Thus when an Indian may see the article's title Shiva they may think that the article will be about Parvati, if they pronounced the a at the end in long form naturally. However, when they see the Devanagari (if they can read it) it is confimation that the article is in fact about the male God. But this is English Wikipedia and we have to use the English-speaking scholars' conventions here, not what is techinically better for the avagere Indian in rare circumstances. If you like the beauty of Indian scripts so much, you should start contributing to an Indian language Wikipedia.

I therefore see no purpose at all for having vernacular scripts. Any problems? GizzaChat © 23:09, 10 December 2006 (UTC)

See my comments under #Reg Consensus above. In sum: there already is a global consensus that original-language names are useful, and necessary, for truly encyclopedic coverage. Maybe the Indian context is different for some reason, but so far no one has explained why. Pronunciation is, I think, a different issue; I agree that sound files are at least part of the solution for that. -- Visviva 02:52, 11 December 2006 (UTC)
To answer your query on why the Indian context is different, it is because it is difficult to find out the native name in many cases. This is because of the political, religious and liguistic tensions within India. For example, what script should be used for a Tamil (South Indian language) actor who works in Bollywood (Hindi Cinema industry- a North Indian language)? The actor's native language is Tamil but all of his fans and notability comes from Hindi. GizzaChat © 06:21, 11 December 2006 (UTC)
That would seem like a reasonable situation for including both scripts in the article. -- Visviva 08:42, 11 December 2006 (UTC)
My example was actually quite simplistic. For famous Indians who live in areas where more than one language is spoken, eg. in the state of Haryana Punjabi/Hindi or a hybrid is spoken, the situation becomes more complicated. The religion also complicates matters. Some Wikipedians here want many biographies to have Urdu, which is essentially the same as Hindi written in a different script and with different literary vocabulary. Some oppose Urdu except for Indian Muslims, which then sparks an argument based on religion.
Another problem is that the Indian constitution does not make it very clear whether Hindi is the "national language" of India or just "one of the 22 official languages." Even if it was a national language, many Indians don't know how to speak it but some Wikipedians believe everything Indian related should have Hindi. So as you see, scripts on Indian articles isn't the easiest thing a achieve consensus on. GizzaChat © 21:53, 11 December 2006 (UTC)
A more fundamental issue I have with using these vernacular scripts in the Indian context is the apparent need for some editors to label the subject of the article, especially in bio articles, with their ethnicity and/or religious affiliation, thereby leading to ridiculous circumstances. See Rajnikanth, for example. He was born in Maharashtra, lived in Bangalore, became popular in Tamil Nadu and later in Teluge and Hindi movies. Now which script would you include in this article? You will have bigger issues with Nora Jones or Zubin Mehta - both are ethnically atleast part Indian, but popular in the West.
Why should we label someone's ethnicity? Why should an English language encyclopedia have non-roman scripts when they are useless to the vast majority of people? Thanks Parthi talk/contribs 22:27, 11 December 2006 (UTC)
Fair enough. At this point I would have to defer to the consensus of editors with local knowledge; but surely it *is* possible to reach such a consensus, at least on general principles governing the relevance of names in various contexts. It seems that the primary problems are over biographical articles, so perhaps specialized criteria are needed for that. On the face of it, I'm not sure why something like this might not work:

Non-English names are included, in the lead text or an infobox, only if they are attested in reliable sources as the original name of the person, or as a primary and self-assigned name. If reliable sources state that a person has performed or written widely in a certain language under a certain name, it is reasonable to include that name in the article. When the original script is non-Roman, a transliteration should also be provided. However, names should not be included if they are simple transliterations or phonetic spellings of the original-language or English name.

I think that more or less describes existing practice across Wikipedia. Probably the above wording is still too weaselly to solve most problems, but it would seem to prevent egregious name-stuffing while still allowing names with encyclopedic value to be kept. -- Visviva 06:31, 12 December 2006 (UTC)

I strongly support Visviva's proposal. Its suitably worded, and allows for the inclusion of a name in vernacular scripts IF there is proof that the person in question used that script to write his name. At the same time, it will avoid the proliferation of scripts. Gamesmaster G-9 16:42, 13 December 2006 (UTC)

Illustrative examples[edit]

In a world where we could pin each person's ethnicity down easily, I would be all for including the name in vernacular scripts. However, that is not always the case. Here are some examples.

  1. Indians whose names are best written in Roman script. This is often the case with Indian Christians.
    • John Abraham - Of Parsi/Malayali descent, who works in Hindi movies. Relevant vernacular scripts are Gujarati, Devnagari and Malayalam. However, all of them mangle the pronounciation to some extent.
    • Leander Paes - Of Goan descent, but calls Calcutta home. Relevant scripts would be Bengali and either Devnagari or Kannada. Again, this name will be mangled in any script except Roman.
  1. Indians who are multi-ethnic, and/or are famous in a region that they are not native to.
    • Simran Bagga - a Punjabi who made her debut in Hindi films, but is famous for her roles in Tamil cinema. Relevant scripts Gurmukhi, Tamil, Devnagari.
    • Rajnikant - Ethnically Marathi, lived in Bangalore and worked principally in Tamil cinema, but also in Telugu. His original name is a Marathi one, but his assumed name is more Tamil-sounding. This one is a mess.
    • Riya Sen - Bengali/Tripuri origin. Has worked in Malayalam, Tamil, Hindi and English cinema.
    • Reema Sen - Bengali who is most well-known for roles in Tamil cinema.
    • Kajol - Bengali/Marathi parentage. Famous Hindi film actress.
    • Esha Deol - Tamil mother, Punjabi father. Acts in Hindi cinema.
  1. Indians who are famous because of their work in English
    • Almost all Indo-Anglian authors write their own names in Roman script. Vikram Seth, Amitav Ghosh and Arundhati Roy all speak English as a mother tongue, even though they are Punjabi, Bengali and Malayali/Bengali respectively. In each case, should we use a transliteration of their names that they themselves don't use? What about Salman Rushdie? Should we add an Urdu/Hindi spelling to his name too?

I realise that there are some people who spell their names differently in different scripts. Shobha De, Sonu Nigam, and Vivek Oberoi are examples of idiots who like to add meaningless letters to the English spellings of their names. Also, in the case of people like Rabindranath Tagore, or N. T. Rama Rao, who are associated with exactly one linguistic group, it may not be a problem. However, it seems clear that we need a broad policy.

Gamesmaster G-9 06:26, 12 December 2006 (UTC)

There are, of course, some

I think we should take it on a case by case basis. John abraham doesn't really need scripts.--D-Boy 07:14, 12 December 2006 (UTC)
Wikipedia can't work on a case by case basis. There are just too many Indian-bios, probably in the tens of thousands. GizzaChat © 22:04, 12 December 2006 (UTC)
We need a policy atleast to cover India related articles. Also can someone give me a good reason to indicate someone's ethnicity through the use of these scripts? - Parthi talk/contribs 22:17, 12 December 2006 (UTC)
I think the main problem with the indian bios is the bollywood bios and some of the music ones. Bengali bios usually don't have any problems. Ancient indians don't usually have any problems. The people gamemaster listed are mostly entertainers. I still see nothing wrong with the scripts. the koreans handle it nicely. they have a nice box with the hangul and the hanja. An infobox would get in enough scripts. and the urdu for Salman Rushdie is already there. Should we also start removing the persian scripts for the moguls and the tamils scritps for the cholas?--D-Boy 06:00, 13 December 2006 (UTC)

If we were to follow the rule I tentatively proposed just above, then John Abraham and Leander Paes would not have extra scripts (since -- I presume -- these would simply be transliterations of the Roman). I don't know enough about the actors to say whether more than one script is appropriate, although in a case such as Rajnikant both the assumed and birth names would surely be included (each in their original script/s ). At any rate, for these and for the authors, I think that the original script should be included, although if the English name is a strict and unambiguous transliteration that might be dispensed with.* In cases such as Salman Rushdie, where the English and Urdu names are quite distinct, it should go without saying that both should be included). -- Visviva 12:31, 28 December 2006 (UTC) * See Wikipedia:Naming conventions (Cyrillic) for a similar case.

I propose a modification of proposal #3[edit]

First of all, I believe that the original-language names are very usefull for cases where you have to search for information but don't have access to a keyboard which would give you those letters, or if you don't know the language. I am strongly against implementation of either 4 or 5 for this reason. However, I can see the problems with 1 and 2 (mainly among the Indian communities, I might add). If I had to choose between the 5 right now, I'd pick #3, because it seems to avoid the major problems in the other solutions while creating few of its own. However, I still think that it can be unwieldy sometimes for cases where there is really no controversy, so I thought about what has been said so far - about how this seems to be a problem for some languages but not for others - and I propose the following:

If there is a clear consensus (or no objections) on which original-language title should be used, then that title should be put in the introduction (example).

If, however, there is more than one possible vernacular spelling (maybe more than two?) then a box should be used as suggested in proposal #3. And the different vernacular names should be listed in alphabetical order (like the interwiki links are), to prevent bias.

Anyway, this is my last contribution to wikipedia before I leave for the holidays - I'll be back in a week. :) Please discuss. Esn 10:45, 24 December 2006 (UTC)

Even if the consensus is to include vernacular scripts, we should evolve a uniform policy governing what scripts are to be used for each person's name, and prevent proliferation. Gamesmaster G-9 03:07, 26 December 2006 (UTC)
This whole proposal is dead.--D-Boy 02:41, 28 December 2006 (UTC)

Not necessarily. It is a holiday season for many people. I haven't been participating here because I've been very short of time. I've been cooking, not writing. This is an issue that should be handled and will be handled. Zora 03:46, 28 December 2006 (UTC)

Template:Infobox Indic names[edit]

Myself, Usingha (talk · contribs), Amartyabag (talk · contribs) and other editors have started to utilize the box created by visiva. Its quite useful, and we have a test run at Ching-Thang Khomba.Bakaman 05:44, 29 December 2006 (UTC)

Just some views[edit]

These are just my views.

  • Roman script is problematic even for unambiguously representing syllables in native languages of Europe, let alone other world languages. In an ideal world, Roman alphabet would not exist.
  • Few people know IPA conventions, safely fewer than even the lesser used of Indian scripts, but it is easier to learn them than learning many scripts.
  • One important consideration for choosing the right script(s) is the language of origin of the word. John Abraham need not be written in Marathi, Savitribai Phule might need be. Names are often words in languages, not just arrangement of syllables. Seen in this way, Satyameva Jayate is not more eligible for an Indic script than, say, a name Satya. Of course, using this as the sole criterion will also be problematic.
  • Most people will not benefit by seeing alien scripts on the article at all. For a handful it will be useful. Unless we clutter up the lead like in Rajnikanth, will any harm be done to the majority who will not benefit?
  • Number of scripts should be limited to, say three.
  • A lot of these script additions are not justifiable, like using Devanagari on Farhad Darya just because he sang a Hindi song. We do need a policy on this.
  • "Indic scripts are not completely scientific ".. I haven't seen anything better till now.
  • "Romanization of all the nuances of the Mandarin script is beyond hope".. completely agree, but I fail to see how adding Chinese would help more in that case than say adding Thai script on a Thai article. Those who know Chinese would immediately know what's being talked about anyway, and for those who don't, the script would literally be Chinese to them.

I'd just add that all names in my history education were screwed up because they were Romanized, and the teachers talked about the Rastraakutaas, the Vaakaataakaas, the Cholaas, the Cheraas, the Paalaas, etc. Besides, everyone still thinks it's Yogaa. And after realizing the correct pronounciation, if I use the correct one now, other unfortunate people who got similar education don't even understand what I'm talking about. Well, just some things that came to my mind. deeptrivia (talk) 19:28, 21 January 2007 (UTC)