Template talk:Lang/Archive 8

This is an archive of past discussions about Template:Lang. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 5

Archive 6

Archive 7

→

cat=n, and related

Would be nice if |cat=no and other negative values worked the same as |nocat=y and other positive values, so people don't have to try to remember which one this template uses. — SMcCandlish ☏ ¢ 😼 17:20, 1 July 2018 (UTC)

Contrary to the template documentation, any value assigned to |nocat= inhibits categorization. This is a bug that should be fixed (the pre-Lua version used {{category handler}} which accepts 'yes', 'y', 'true', 't', 'on', and '1' as affirmative values). Adding |cat= which accepts the opposite values can be done at the same time.

—Trappist the monk (talk) 12:27, 2 July 2018 (UTC)

@Trappist the monk: Module:Yesno interprets values in the way that you describe. I've come up with a way to add |cat= and check that, if |cat= and |nocat= have boolean values, they don't contradict each other. (For instance, the combination of |nocat=yes and |cat=true will trigger an error, but |nocat=blahblah and |cat=hahaha will not, because they are not boolean.) But probably I'm not throwing an error in the correct way. — Eru·tuon 18:46, 2 July 2018 (UTC)

I've tweaked your code and, I think, made it less confusing. To do that I decided that |nocat= should only accept affirmative values ('yes', 'y', 'true', etc) and that |cat= should only accept negative values ('no', 'n', 'false', etc). By doing that, there is no error message because there cannot be acceptable contradictory values (|nocat=y and |cat=y treated same as |nocat=y and |cat=dog). Examples in the collapse box; to get the examples to display here (outside of mainspace), I have forced local namespace = 0;.

examples

{{lang/sandbox|es|casa}}
- casa – categorized
{{lang/sandbox|es|casa|nocat=no}}
- casa – categorized
{{lang/sandbox|es|casa|nocat=yes}}
- casa – not categorized
{{lang/sandbox|es|casa|nocat=}}
- casa – categorized
{{lang/sandbox|es|casa|cat=no}}
- casa – not categorized
{{lang/sandbox|es|casa|cat=yes}}
- casa – categorized
{{lang/sandbox|es|casa|cat=}}
- casa – categorized
{{lang/sandbox|es|casa|nocat=yes|cat=no}}
- casa – not categorized
{{lang/sandbox|es|casa|nocat=yes|cat=yes}}
- casa – not categorized
{{lang/sandbox|es|casa|nocat=yes|cat=}}
- casa – not categorized
{{lang/sandbox|es|casa|nocat=no|cat=no}}
- casa – not categorized
{{lang/sandbox|es|casa|nocat=no|cat=yes}}
- casa – categorized
{{lang/sandbox|es|casa|nocat=no|cat=}}
- casa – categorized
{{lang/sandbox|es|casa|nocat=|cat=no}}
- casa – not categorized
{{lang/sandbox|es|casa|nocat=|cat=yes}}
- casa – categorized
{{lang/sandbox|es|casa|nocat=|cat=}}
- casa – categorized
{{lang/sandbox|es|casa|cat=no|nocat=yes}}
- casa – not categorized

—Trappist the monk (talk) 00:07, 3 July 2018 (UTC)

I think it's less confusing to be completely logical and respond to both true and false values for both parameters – though the only change that entails is a warning if the parameters conflict; either way |nocat=false and |cat=true don't change the default behavior. — Eru·tuon 02:40, 3 July 2018 (UTC)

What about ~~{{lang/sandbox|es|casa|cat=no|nocat=yes}}~~ {{lang/sandbox|es|casa|cat=yes|nocat=yes}}, etc.? Is one parameter going to override another universally, or just by which ever one is last? — SMcCandlish ☏ ¢ 😼 01:43, 3 July 2018 (UTC)

@SMcCandlish: Huh. There's no contradiction in {{lang/sandbox|es|casa|cat=no|nocat=yes}}: either |cat=no or |nocat=yes would turn off categorization. So neither can override the other. — Eru·tuon 19:39, 6 July 2018 (UTC)

D'oh! I fixed it. — SMcCandlish ☏ ¢ 😼 20:19, 6 July 2018 (UTC)

Same as example #9 above. Because categorization is the default, |cat=yes is ignored (always) so |nocat=yes controls. Example 11 is the opposite case: |nocat=no is ignored (always) so |cat=no controls.

—Trappist the monk (talk) 20:44, 6 July 2018 (UTC)

Alternative categorization ideas

Also, it's tedious and error prone to have to figure out and remember under what circumstances |nocat=y must be added (e.g., when it's used inside a link, an image caption, and various other cases). It's a pain in the butt, and not getting it perfect often breaks display, in-article, which is "reader-hateful". Sometimes it's quite unpredictable. E.g., when I applied it to père (without |nocat=y) at Template:Alexandre Dumas, it rendered fine in the template, but dumped exposed wikimarkup when transcluded at Alexandre Dumas. Given our new-ish section transclusion system, we now have no way of knowing what will be transcluded where, so this would seem to indicate that the only safe choice is to |nocat=y every single instance.

My first thought on a solution: I note that collapsible navboxes have a way to detect the presence of more of them on the same page and auto-collapse when several are present. I would think that the same technique could be used to detect whether a category has already been applied or at least whether the same template is already in use on the page (i.e. the page is already categorized). That might not be the way to do it, since first occurrence on the page might be nocat'ed.

An alternative, and the proposal I'm leaning toward, is to either remove the categorization code or turn it off by default, and instead have a trawling bot manually add the category, once, to the article on the basis of the presence of the template there (e.g. if it contains both French and Spanish, add the category for each, one time each, into the rest of the categories at the bottom. This would actually be much more efficient anyway, since parser wouldn't have to deal with the same category code being dumped out 100 times for an article that uses ... 100 times in it. This would be more accurate categorization anyway: if the only occurrence of such markup on a page is one requiring |nocat=y then the article will not be correctly categorized, and this would not happen under the bot-catting method.
— SMcCandlish ☏ ¢ 😼 23:22, 1 July 2018 (UTC)

Another idea: a template that uses Lua to retrieve and search the article text for language codes in language-tagging templates ({{lang|... and {{lang-...) and adds the appropriate categories. Searching the article text seems to be an inexpensive operation even with a fairly large article such as English language: my experimental module function returns {{#invoke:Sandbox/Erutuon|search_for_language_codes|English language}}. — Eru·tuon 23:54, 1 July 2018 (UTC)

{{lang}} only categorizes when used in article space; never categorizes anywhere else (this has been true since about April 2009).

I'm not convinced that this is a problem that requires a solution.Navbox auto-collapse is handled by javascript in MediaWiki:Common.js which runs on your browser when you load a page from the Wikipedia servers – long time after the {{navbox}} code has run. The bot 'solution' will work as long as there is someone around to write it, run it, and maintain it. The Lua code version of {{lang}} is much more efficient than the wikitext version ever was. The Lua version does not require the use of {{#ifexist:}} (an expensive parser function). I suspect that 'efficiency' gains obtained by limiting the number category links will be minimal. To test that notion, I created a page that contained 5000 of these and nothing else:

{{lang|fr|fils|nocat=}}

Then I previewed the page. Here is the parser profiling data for that preview

CPU time usage	7.620 seconds
Real time usage	7.874 seconds
Preprocessor visited node count	45,001/1,000,000
Preprocessor generated node count	0/1,500,000
Post-expand include size	1,160,000/2,097,152 bytes
Template argument size	0/2,097,152 bytes
Highest expansion depth	2/40
Expensive parser function count	0/500
Unstrip recursion depth	0/20
Unstrip post-expand size	0/5,000,000 bytes
Number of Wikibase entities loaded	0/400
Lua time usage	4.963/10.000 seconds
Lua memory usage	16.09 MB/50 MB

A second test, this time with all 5000 templates set to |nocat=y:

CPU time usage	6.676 seconds
Real time usage	6.696 seconds
Preprocessor visited node count	45,001/1,000,000
Preprocessor generated node count	0/1,500,000
Post-expand include size	630,000/2,097,152 bytes
Template argument size	0/2,097,152 bytes
Highest expansion depth	2/40
Expensive parser function count	0/500
Unstrip recursion depth	0/20
Unstrip post-expand size	0/5,000,000 bytes
Number of Wikibase entities loaded	0/400
Lua time usage	4.623/10.000 seconds
Lua memory usage	16.09 MB/50 MB

Clearly, for these two tests, the |nocat=y version required less time:

CPU time usage 7.620 vs. 6.676 (1.524mS/template vs. 1.335mS/template)

so category handling then accounted for:

1.524mS − 1.335mS = 189μS per template.

An alternate form of the Dumas template that renders correctly and supplies correct categorization is:

{{lang|fr|[[Alexandre Dumas|Alexandre Dumas ''père'']]|italic=unset}}

—Trappist the monk (talk) 12:27, 2 July 2018 (UTC)

Okay, so forget the parser efficiency argument. I still think it'd be better to do this via bot, if someone's willing, so no one ever has to try to remember these complicated syntax and nesting tricks. — SMcCandlish ☏ ¢ 😼 01:46, 3 July 2018 (UTC)

A far-fetched possibility occurred to me while editing Czech phonology. Background: {{audio-lang}} uses {{lang}}, and I discovered that categorization has to be turned off because the language-tagged text is inside the link to the audio file. So if all text in a given language on a given page is in {{audio-lang}}, not in {{lang}}, then the category for that language will not be added. Adding the categories by bot or some other way not involving {{lang}} would avoid this. But it's unlikely that all snippets of text in a given language will have sound files. (Russian phonology has an unusually large number of soundfiles, but even it doesn't achieve universal coverage.) — Eru·tuon 23:21, 8 July 2018 (UTC)

You might turn it around so that {{lang}} wraps {{audio}} like this:

{{lang|{{{2|}}}|{{Audio|{{{1|}}}|{{{3|}}}|help={{{help|}}}}}}}

mimicked here with this (sandbox version of lang because, for the time being, it allows categorization outside of mainspace):

{{lang/sandbox|cs|{{audio|cs-zakladatel.ogg|zakl'''a'''datel|help=no}}}}

zakladatel^ⓘ

But, if you want {{audio}}'s help text and links, that's problematic:

{{lang/sandbox|cs|{{audio|cs-zakladatel.ogg|zakl'''a'''datel|help=}}}}

zakladatel^ⓘ

A tweak to {{audio}} will fix that. Change:

to:

Which renders like this:

zakladatel (help·info)

—Trappist the monk (talk) 07:58, 9 July 2018 (UTC)

I guess putting {{audio}} inside {{lang}} can look okay if the language doesn't use a noticeably different font, or if the font properties are overrided, but the HTML output would be bad, because (help·info) would be tagged as something other than English. It's only a cosmetic fix to override the italicization of (help·info), and a complete cosmetic fix would require overriding the font-family property as well, to fix cases where the language is assigned different fonts (like maybe Urdu, which uses Nastaliq: I think some Nastaliq fonts have "fantasy" type Latin glyphs). But it's best not to use inline CSS to hide semantic problems (text being tagged as the wrong language), because font isn't the only purpose of language tagging. A better way to fix categorization in {{audio-lang}} would be to go back to something like the initial version, which didn't use {{audio}} at all and placed the link to the audio file inside {{lang}}. — Eru·tuon 16:24, 9 July 2018 (UTC)

Of course. Writing from scratch is preferable. But, you seem determined to blame {{lang}} for these categorization problems when the problems are interaction problems. Unless the interacting templates were specifically designed to operate together harmoniously, you should expect that templates of any complexity will likely have interaction problems. It is not the fault of the templates when they don't all play well with each other.

—Trappist the monk (talk) 17:06, 9 July 2018 (UTC)

I think it's not just preferable, but essential, to avoid tagging text with the wrong language. But you're right that this is an interaction problem, and the example I chose is not a very good one because there's a way to solve it.

I guess the broader issue is that, with the current behavior of {{lang}}, when categorization is turned off in one instance of {{lang}}, a category will only be added if there's another instance of {{lang}} (or a {{lang-xx}} template) with the same language. So maybe there will be pages on which a category for a language isn't added. Or maybe not. I wonder how to determine that. — Eru·tuon 17:43, 9 July 2018 (UTC)

the current behavior of {{lang}} First attempts to figure out categorization based on language codes occurred with this edit in August 2008. It took a while but category handling in {{lang}} pretty much settled into is final wikitext form with this edit in September 2013 though I think that that edit could have been made earlier (perhaps as early as this edit in May 2012. The 'current' categorization behavior has been a long time in the making. We did not simply dream it up when we converted to Module:lang.

—Trappist the monk (talk) 18:18, 9 July 2018 (UTC)

Okay. I know the template added categories before you Lua-ified it. I don't mean to blame you for this potential issue, if that's the impression you're getting. — Eru·tuon 18:31, 9 July 2018 (UTC)

I don't see you blaming me. Rather, I see you blaming {{lang}} (I said that above) because it doesn't (and as far as I can tell), was never intended to, always coexist with whatever templates and wiki markup with which editors choose to mate it (I suspect that it is just not possible without MediaWiki grants templates the ability to see beyond their bounding brackets – don't hold your breath for that).

—Trappist the monk (talk) 18:48, 9 July 2018 (UTC)

Okay.... I am not sure how your point relates to the potential issue I brought up. Are you saying that there are cases in which {{lang}} is not meant to be used, and that in those cases some other method should be used to add language-tagging and categories? — Eru·tuon 19:31, 9 July 2018 (UTC)

No, I'm saying that there are markup conditions where {{lang}} will never work and still provide all of its intended output. For example, stuff like including {{lang}} inside a wikilink will cause the wikilink to fail because of categorization:

[[house|{{lang/sandbox|es|casa}}]] – sandbox used here because at present it categorizes outside of mainspace

casa

[[house|{{lang/sandbox|es|casa|nocat=yes}}]] – turn off categorization

casa

Editors have blamed {{lang}} for this, assuming that somehow the template is at fault. It is not; wiki markup does no allow nested wikilinks be they normal links or category links:

[[Template talk:Lang|[[Category:Articles containing Spanish-language text]]]]

[[Template talk:Lang|]]

[[Template talk:Lang|[[house]]]]

[[Template talk:Lang|house]]

—Trappist the monk (talk) 20:02, 9 July 2018 (UTC)

Okay, but I already understand this. — Eru·tuon 20:34, 9 July 2018 (UTC)

If you already knew this then why are you blaming {{lang}} for 'breaking' {{audio-lang}} and blaming {{lang}} when it becomes necessary to turn-off categorization so that adjacent markup works? For the vast majority of {{lang}} usage, there is no problem, and when there is, there is often a solution by rewriting or alternately configuring the markup or template settings, yet you appear to believe that radical categorization-ectomy is the solution to problems that really are not the fault of {{lang}} but limitations of the environment in which it is used.

—Trappist the monk (talk) 12:32, 10 July 2018 (UTC)

Look, I'm interested in whether the category for a given language is always being added when text in that language is present on a page, and in ensuring that it is. If, because categorization in {{lang}} sometimes has to be turned off because of the syntactic issue, the current behavior of {{lang}} has to be changed (even though, as you've explained, it's existed for about ten years), so be it. But blame in itself is not interesting to me because it does not relate to ensuring that categories are added. If you have a way to ensure categories are added without changing the behavior of {{lang}} (or if you think that the problem I'm bringing up is far-fetched), then say so. But I'm not interested in discussing the question of blame on its own. — Eru·tuon 17:31, 10 July 2018 (UTC)

Cirrus searches are never perfect but they can hint at the magnitude of a 'problem'. This search returns about 450 articles where some template in each article (not necessarily {{lang}}) has set |nocat= to something. That's 450 articles out of 211,295 articles that use {{lang}} – about 0.02%. Yet, for the sake of these 450 articles, you would have us remove automatic categorization from the other 210,845 articles and have us invent some new method of categorization; all because you find fault with {{lang}}'s traditional behavior when confronted with certain, apparently rare, markup conditions. Yeah, I think that this is a mountains and molehills issue. Were there thousands upon thousands upon thousands of articles that aren't getting categorized because of |nocat=yes, and were there a common reason for |nocat=yes among those articles, I would suggest that we might write specialized templates and perhaps supporting modules to take care of those. For a few hundred? I don't know.

—Trappist the monk (talk) 19:15, 10 July 2018 (UTC)

Thank you. A more specific search (hastemplate:lang insource:/\{\{lang[^}]+?\| *nocat *= *[^\|\}]+/) to only match instances of the parameter in this template yields about 323 results. That doesn't include transclusions of content that has {{lang}} with categorization turned off, but I wouldn't expect there to be many cases of that. And of these, fewer would have a non-categorizing instance of {{lang}} and no categorizing instance with the same language. So it's probably a very small issue if it is one at all. That's reassuring. Hm, it would be interesting to make a script that would check for this issue and send up a flare. — Eru·tuon 19:44, 10 July 2018 (UTC)

Same here. It's not a matter of the template being bad or Trappist's hard work on it being poor (to the contrary!), but rather the end result of having the template do the categorization impedes editing and causes a lot of errors, many of which are not detected until much later. — SMcCandlish ☏ ¢ 😼 21:30, 9 July 2018 (UTC)

It's sounding more and more like a bot should be doing the categorization (where people don't do it manually), without the templates forcing it. I don't know enough about our bot stuff to be sure how to get this implemented and approved, though. — SMcCandlish ☏ ¢ 😼 02:40, 9 July 2018 (UTC)

ISO 639-6

Cantonese translaterated: guzh-Latn isn't working, nor is the less mnemonic yye-Latn. This may be affecting other ISO 639-6 constructions. It's not desirable to only be able to do something like {{lang|zh-Latn|cheongsam}} because Chinese script encodes multiple languages and we need to be able to distinguish between them, e.g. in the lead sentence of Cheongsam. — SMcCandlish ☏ ¢ 😼 03:07, 12 July 2018 (UTC)

ISO 639-6 indicates the standard is withdrawn. --Izno (talk) 03:19, 12 July 2018 (UTC)

Then we need to remove it from articles and the language infobox and replace it with something else. — SMcCandlish ☏ ¢ 😼 05:01, 12 July 2018 (UTC)
I've patched up the Cantonese article [1]. — SMcCandlish ☏ ¢ 😼 05:16, 12 July 2018 (UTC)

How about yue-Latn? That refers to Yue Chinese, but disambiguating between Cantonese and other varieties may not be necessary. — Eru·tuon 03:25, 12 July 2018 (UTC)

Yes, that one works. It may be insufficient to distinguishing between Cantonese and other Yue dialects/languages, but if it's all we have then it is. — SMcCandlish ☏ ¢ 😼 05:01, 12 July 2018 (UTC)

Wiktionary language code `gmq-oda`

Wiktionary uses the language code gmq-oda for Old Danish; any chance we could support a fuller range of Wiktionary codes on here, where there isn't an ISO code? — OwenBlacker (talk; please {{ping}} me in replies) 22:52, 15 July 2018 (UTC)

Language codes used with the {{lang}} and {{lang-??}} templates must be understandable by browsers and screen readers so consequently must meet the requirements of the IETF language tag format. The form of gmq-oda suggests that oda is a language extension to North Germanic languages (a collective language code) but there is no oda extlang defined in the IANA language subtag registry file.

We can create private use codes that comply with the IETF format and have done so in the past. You can see them in Module:Lang/data.

—Trappist the monk (talk) 23:24, 15 July 2018 (UTC)

That should work; the template could just "translate" Wiktionary-invented codes to private use ones on the fly, right? — SMcCandlish ☏ ¢ 😼 02:52, 16 July 2018 (UTC)

Old language-collective categories

Conversation moved here from Module talk:Lang

I just noticed that the old categories, like Category:Articles containing Germanic-language text still exist and, presumably, should be put up for deletion, given any articles that would have gone there will now be in either Category:Articles with text from the Germanic languages collective or Category:Articles containing Proto-Germanic-language text.

Is there an easy way of enumerating the categories that should now be up for deletion? — OwenBlacker (talk; please {{ping}} me in replies) 13:54, 29 June 2018 (UTC)

I would keep the one with a simpler name. "the Germanic languages collective" sounds like a social movement. I've never seen this term before. Maybe it was meant to have a comma in it. Even so, it's misleading in another way, in that it's not a collective (container) category for articles with Germanic language markup, but only those with a specific ISO code, gem. The English and French names of this code are vague, but the German one is more specific and translates to "Germanic Languages (Other)". It appears to be a code for Germanic languages that don't have their own code, and is not for "Germanic language in general" or "the Germanic language family", though I suppose no one's head will explode if it's used that way. I think "Category:Articles containing text in an unspecified Germanic language" would be the accurate category name. — SMcCandlish ☏ ¢ 😼 15:44, 29 June 2018 (UTC)

The term collective as applied to ISO 639-2 language codes is defined at Library of Congress. Collectives are described in our ISO 639-2 article at §Collections of languages.

There was discussion on this page about the collective category name.

—Trappist the monk (talk) 16:57, 29 June 2018 (UTC)

Okay, so WP didn't invent it out of nothing. But how does using odd ISO terminology help readers, or editors doing maintenance? — SMcCandlish ☏ ¢ 😼 17:47, 29 June 2018 (UTC)

Because we are talking about a thing, and because that thing has an internationally defined set of terminologies, we should use the internationally defined set of terminologies for that thing in our own work. To make up our own terminologies will only serve to confuse.

—Trappist the monk (talk) 12:19, 30 June 2018 (UTC)

I could buy that if the purpose is to codify ISO specs in our category structure. But it appears to be to categorize by language. The ISO technical term could be included in a note in the category page, which could live at a name that made sense to normal people. :-) I.e., it's the WP:RECOGNIZABLE factor. — SMcCandlish ☏ ¢ 😼 05:48, 1 July 2018 (UTC)

In general, {{lang}}, has always used name from the ISO specs for language categorization. The only difference now is that this subset of category names accurately reflects their collective nature; that several individual languages are associated with these codes. Yeah, the documentation for the categories sucks. It is on my to-do list to improve / replace it.

—Trappist the monk (talk) 12:26, 1 July 2018 (UTC)

It makes sense to use ISO codes for the params, because they're short, and easy to get from the infoboxes at the languages' articles. That doesn't really relate to categories; ISO geekery isn't helpful in them. The "collective" (ambiguous word, and not very accurate – more like "unspecific") nature of a few codes and corresponding categorizations is better explained in more natural English, like "Category:Articles containing text in an unspecified Germanic language". — SMcCandlish ☏ ¢ 😼 23:30, 1 July 2018 (UTC)

@Trappist the monk: I was noticing in the table at Template:ISO 639 name/doc that at least one category that was moved to one of these "collective" names is redlinking in that table. I don't now if the underlying code needs to change to use a different category now, or whether it's just that the /doc needs to be updated to use it. — SMcCandlish ☏ ¢ 😼 18:44, 21 July 2018 (UTC)

I think that bh is the only two-character language code (ISO 639-1) that identifies a language collective; all others language collectives are three-character codes. The category display is created in Template:ISO 639 name/doc by the documentation template:

{{Template:ISO 639 name/doc/row|bh|bih}}

You can make it show a blue-link to the same category that {{lang}} uses by changing the above to:

{{Template:ISO 639 name/doc/row|bh|bih|||[[:Category:Articles with text from the Bihari languages collective]]}}

—Trappist the monk (talk) 19:11, 21 July 2018 (UTC)

Further, I'm not really sure why that template documentation lists categories. Neither {{ISO 639 name}} nor {{llink}} add categories when used. And, of course, the article link should be to Bihari languages (plural).

—Trappist the monk (talk) 19:18, 21 July 2018 (UTC)

Name parameter

Currently to make a link to a language we have to write [[French language|French]] which is cumbersome even for one or two languages, but really a pain when one needs to list a dozen or more. What if we do {{lang-fr|n=1}} to create a language name with a wikilink?--Lüboslóv Yęzýkin (talk) 14:39, 15 July 2018 (UTC)

It is not the purpose of the {{lang-??}} templates to act as editor typing shortcuts. I do not think that such facility should be added to the {{lang-??}} remit.

—Trappist the monk (talk) 23:27, 15 July 2018 (UTC)

@Любослов Езыкин: Yeah, it would probably make more sense to create something like {{lsc|fr}}, etc. (for "language shortcut"). If it were shoehorned into {{lang}}, it would require a parameter, something like {{lang|fr|sc=only}}, which would eliminate any convenience. — SMcCandlish ☏ ¢ 😼 02:54, 16 July 2018 (UTC)

I suggested "n=1", what can be shorter. Of course, you can create a parameter as verbose as possible to prove it inconvenient.--Lüboslóv Yęzýkin (talk) 12:30, 16 July 2018 (UTC)

That's contradictory. You want a non-mnemonic "n=1" here – which isn't going to mean much to anyone or be remembered – but object to a short template name below on the grounds that it's non-mnemonic. — SMcCandlish ☏ ¢ 😼 14:44, 16 July 2018 (UTC)

N = name. You create a language name from an ISO code. Exactly like "ISO 639 name", but wiki-linked. Why are you so sure that your "(l)sc" is going to mean much to everyone or be remembered?--Lüboslóv Yęzýkin (talk) 19:20, 16 July 2018 (UTC)

I also don't find lsc very obvious. It suggests "language script" to me because of the |sc= parameters on Wiktionary. I like langname better, but Template:langname is a redirect to Template:ISO 639 name right now. — Eru·tuon 19:33, 16 July 2018 (UTC)

There's pretty much no continuity between WP and Wikt templates. On en.WP, though, all the page-banner templates ({{Guideline}}, etc.) support |sc= parameters so "sc" = "shortcut" in en.Wikipedian minds already. L for language is pretty obvious. That said, I really don't care what it's called. I don't think of "name" when I think of languages so |n= doesn't seem mnemonic. And the primary maintainer (understander, for that matter) of the {{lang}} code isn't interested in integrating such a parameter, so it's probably a dead stick. — SMcCandlish ☏ ¢ 😼 09:11, 17 July 2018 (UTC)

PS: I assume there'd be an easy way to get {{lsc}} to use the same language list as {{lang}}. — SMcCandlish ☏ ¢ 😼 02:55, 16 July 2018 (UTC)

There is {{ISO 639 name}}, but it is indeed verbose and does not create a link. I'm not sure about the "lsc" name, it does not look very mnemonic for me. But I do not see why we should create another template. Why couldn't we do this template multipurpose? There are enough of them to get confused already.--Lüboslóv Yęzýkin (talk) 12:30, 16 July 2018 (UTC)

Module:Lang has the function name_from_tag() which, given a properly formatted IETF tag, returns the appropriate language name from the {{lang}} data set. That function was created for use as a documentation tool and as a possible replacement for {{ISO 639 name}} (because that template uses the expensive parser function call {{#ifexist:}}). name_from_tag() might be modified to accept a second argument so that name_from_tag('fr', link) would return [[French language|French]]. This functionality must not be part of the {{lang-??}} templates. However, that does not mean that you couldn't create {{lsc|fr}} or {{lang link|fr}} (with a redirect from {{langlk|fr}}) to accomplish what you want.

—Trappist the monk (talk) 13:49, 16 July 2018 (UTC)

I've tweaked Module:lang/sandbox so that:

{{#invoke:lang/sandbox|name_from_tag|fr}} → French → French

{{#invoke:lang/sandbox|name_from_tag|fr|}} → French → French

{{#invoke:lang/sandbox|name_from_tag|fr|link}} → French → French

{{#invoke:lang/sandbox|name_from_tag|gem|link}} → Germanic languages → Germanic languages

—Trappist the monk (talk) 15:00, 16 July 2018 (UTC)

Great. I'm not insisting on using "lang-xx", "lang" is OK. But I'm afraid using simply "link" may cause problems, because it looks not as a parameter but as a word, in theory there can be {{lang|de|link}} > link. Better make it more obvious that it isn't a word. Why not to use simply zero: {{lang|de|0}}? It is the most obvious and shortest. Why in the last example is there "languages"? No need for that.--Lüboslóv Yęzýkin (talk) 19:20, 16 July 2018 (UTC)

No. You will not be using {{lang}} nor any of the {{lang-??}} templates for this. Create a new template where, internally, it looks something like this:

{{#invoke:lang/sandbox|name_from_tag|{{{1|}}}|link}}

where {{{1|}}} holds an IETF language tag and link is a word (could be anything really except spaces) that is used to tell name_from_tag() that it should render a linked output. If you want, you can write:

{{#invoke:lang/sandbox|name_from_tag|{{{1|}}}|{{#ifeq:{{{2|}}}|0||link}}}}

so that your template will default to linked rendering but can be switched to non-linked by setting the template's second positional parameter to 0. See these template mock-ups:

{{lang link|fr}} → French

{{lang link|fr|1}} → French

{{lang link|fr|0}} → French

Test your template using the sandbox version of the module and when you are ready, let me know and I'll update the live module.

—Trappist the monk (talk) 19:59, 16 July 2018 (UTC)

It exactly creates a wiki-linked language name, but just followed by a colon and a word. I do not see any problem in allowing not to write the latter.--Lüboslóv Yęzýkin (talk) 12:30, 16 July 2018 (UTC)

Can anybody explain how this template/module construct is in any way an improvement over typing "French"? -- Michael Bednarek (talk) 01:46, 17 July 2018 (UTC)

It wouldn't be. The idea is that it would be an improvement over typing [[French language|French]], perhaps more the the point [[Proto-Indo-European language|Proto-Indo-European]]. — SMcCandlish ☏ ¢ 😼 02:04, 17 July 2018 (UTC)

a) Linking "French language" is very rarely a good idea. b) Creating and testing a new template {{lang link}} and then typing {{lang link|fr|1}} (18 characters) doesn't seem a great improvement on [[French language|French]] (26), which can be shortened to [[ISO 639:fr|French]] (21). c) I haven't seen "Proto-Indo-European language" in this thread before, but if that is too much effort, typing just Proto-Indo-European will have the same effect. -- Michael Bednarek (talk) 05:40, 17 July 2018 (UTC)

The linking you don't like is a built-in feature of {{lang-fr}}, and that link is used, on first ocurrence, in the majority of articles in which the template is used. So, I don't think consensus agrees with you on "very rarely". Even if it did, it wouldn't pertain to, say, Sesotho language, unfamiliar even by name to most English-speakers. You can't make an argument against an entire template class on the basis that one of thousands of potential uses of it might be unnecessary. Why would I want to type [[Proto-Indo-European]] if I can instead do {{lsc|pie}}? The fact that your lang link example isn't much of a savings is why I advocated a three-letter template name (or at least a shortcut redirect). All that said, I'm not entirely convinced we need such a template, either; I just think these are shaky rationales against one. — SMcCandlish ☏ ¢ 😼 15:41, 17 July 2018 (UTC)

@Trappist the monk: Thanks for your explanation, but I still haven't got the answer why using "lang-xx" or "lang" is "verboten". Right now

{{lang-fr|la}}

creates

<a href="/wiki/French_language" title="French language">French</a>: <i><span lang="fr">la</span>

What is wrong if we allow leaving the span part?

{{lang|fr|la}}

is more tricky because it only creates

<span title="French language text" lang="fr">la</span>

But again, what is wrong if we expand the template making it multi-purpose?--Lüboslóv Yęzýkin (talk) 15:38, 21 July 2018 (UTC)

Yes, I know what the output of these templates looks like. The purpose of the {{lang-??}} templates is do as you have illustrated: properly markup, non-English text so that browsers and screen-readers render/pronounce that non-English text correctly. You desire something different; you desire something that doesn't help browsers/screen readers render/pronounce non-English text; you desire an editor's shortcut template so that you don't have to write out complete wikilinks to language articles. I have some sympathy for that desire and have suggested a way for you to get what you want that does not require {{lang-??}} to deviate from its intended purpose nor complicate Module:lang.

—Trappist the monk (talk) 17:54, 21 July 2018 (UTC)

@Erutuon: We may usurp "langname", as it's practically not used at all, however, it is a little bit too long. My potential candidates are "langn", "langl", "lana", "lali", "liso" or whatever: I'm open to discussion. But I will rather be glad with already existing "lang-xx"/"lang".--Lüboslóv Yęzýkin (talk) 15:38, 21 July 2018 (UTC)

@SMcCandlish: I don't think of "name" when I think of languages But the people who created "ISO 639 name" did. But OK, why not having aliases like "n=1" and "l=1": let others to choose what one wants. primary maintainer...isn't interested in integrating such a parameter Who is this guy?

@Michael Bednarek: Alright, let's count. [[|]] and {{|}} seem equal (5 chars). But in what world is a two/three-letter code longer that the full name which may be quite long? In what world repeating the full language name and adding the word " language" (9 chars) is shorter than any template? And why have we to use a long template ("lang name" is just a suggestion), when we can easily limit ourselves to three to five letters? If you do not know why, how and where, just do not use it, what is the problem? As for myself, just recently I had to write such a line:

[[Karachay-Balkar language|Karachay-Balkar]], [[Karaim language|Karaim]], [[Krymchak language|Krymchak]], [[Kumyk language|Kumyk]], [[Urum language|Urum]] and extinct [[Cuman language|Cuman]]

In what world is this short and easy? I had to type every language name, then add [[|]] to every name, then repeat every name again, and then add " language" after every name. And it is always prone to errors/typos. Yes, I can copy-paste (and I do), but why have I? Another scenario: I just copy-paste a template any times I want and just add two-letter codes. Feel the difference, right? And such cases are really in (tens of) thousands of articles, so I wonder why such an option has not been created before. Its need is so obvious. We have {{flag}}, let's do this to languages! --Lüboslóv Yęzýkin (talk) 15:38, 21 July 2018 (UTC)

The main maintainer is Trappist the monk, who's done the months of work to get the language templates within the realm of "working properly and sensibly". I'm not sure who understands all the code behind this other than him. — SMcCandlish ☏ ¢ 😼 18:42, 21 July 2018 (UTC)

I, too, end up having to manually enter things like your Karachay-Balkar ... example, and it also drives me nuts. Thus I threw a few minutes at it, and now – for common and semi-common languages, and without extended data like sa-Latn or pt-BR (the second part just gets ignored, except for simplified and traditional Chinese) – you have {{llink}} at your disposal. It safely substitutes. This would probably be better redone as a call to {{lang}}'s smarts; for now it's just a variant of {{ISO 639 name}} (which should likewise be converted to use the better routines that fuel {{lang}} and {{lang-xx}}). If you want to create {{lang link}} or {{langlk}} or whatever redirects, knock yaself out. I skipped {{lsc}}, because it occurs to me that using three-letter shortcut names is a poor idea; ISO might actually pick that combination for a language code at some point! Not likely for any particular string, but still. — SMcCandlish ☏ ¢ 😼 18:42, 21 July 2018 (UTC)

Update

Update: {{ISO 639 name link/sandbox}} (and {{ISO 639 name/sandbox}}) are now using {{#invoke:lang/sandbox|name_from_tag|{{{1}}}|link}} (and ...|{{{1}}}|0}} for the latter), as Trappist the monk has been sandboxing above. It works in general, but partially fails for some specific cases. E.g.:

{{ISO 639 name link/sandbox|en-AU}} correctly outputs Australian English: {{ISO 639 name link/sandbox|en-AU}}
{{ISO 639 name link/sandbox|pt-BR}} outputs only Portuguese instead of Brazilian Portuguese: {{ISO 639 name link/sandbox|pt-BR}}

Still, it should be an incremental improvement, able to handle more language codes that the old, still-live code of {{ISO 639 name}} can. — SMcCandlish ☏ ¢ 😼 14:31, 22 July 2018 (UTC)

PS: The table of supported codes at Template:ISO 639 name/doc will need an update. We should probably keep a master list at something like Template:Lang/codelist and transclude it as a collapsed box in the documentation of all the templates in the set. Maybe even auto-generate it. — SMcCandlish ☏ ¢ 😼 14:37, 22 July 2018 (UTC)

{{lang}}, {{ISO 639 name/sandbox}} and {{ISO 639 name link/sandbox}} support 7973 language codes (the full list is at Module:Language/data/iana languages). That's rather a large number of codes to include in the documentation.

—Trappist the monk (talk) 15:28, 22 July 2018 (UTC)

Ah! Just referencing them should be sufficient. — SMcCandlish ☏ ¢ 😼 02:16, 25 July 2018 (UTC)

~~Are you sure? This should not produce linked output, right? But it does:~~

~~{{ISO 639 name/sandbox|fr}} → French → French~~ fixed

I was thinking this morning about changing the {{#invoke:}} so that it required a named parameter to select the linked/unlinked state because, as you have demonstrated, an empty positional parameter is sometimes misunderstood by humans. My original selection mechanism was:

{{#ifeq:{{{2|}}}|0||link}} – where {{{2|}}} is a template parameter, not a module parameter

which sets the {{#invoke:}} positional parameter {{{2|}}} to link when the template's positional parameter {{{2|}}} is anything but 0; else the {{#invoke:}} positional parameter {{{2|}}} gets nil. Naming the {{#invoke:}} link option parameter should prevent that confusion. If I change the module then the templates would read:

{{ISO 639 name/sandbox}} → {{#invoke:lang/sandbox|name_from_tag|{{{1|}}}|link=no|template=ISO 639 name/sandbox}}

{{ISO 639 name link/sandbox}} → {{#invoke:lang/sandbox|name_from_tag|{{{1|}}}|link=yes|template=ISO 639 name/sandbox}}

~~Shall I?~~ done

—Trappist the monk (talk) 15:28, 22 July 2018 (UTC)

It is done.

—Trappist the monk (talk) 15:47, 22 July 2018 (UTC)

Yeah, that's a bit clearer. — SMcCandlish ☏ ¢ 😼 02:17, 25 July 2018 (UTC)

live update

I have updated Module:lang from its sandbox.

—Trappist the monk (talk) 10:49, 25 July 2018 (UTC)

@Trappist the monk: Must be why I feel that sudden sense of vim and vigor! Oh, could you have a look at MOS:TITLES#Typographic conformity and see if it gets at what it needs to get at for CS1 purposes and such? (Yeah, wrong page, but we're both here.) If it's not quite "it", feel free to user-talk ping me or something. — SMcCandlish ☏ ¢ 😼 00:25, 27 July 2018 (UTC)

This is answered at Help_talk:Citation_Style_1#Polluting_COinS_with_markup?

—Trappist the monk (talk) 23:24, 28 July 2018 (UTC)

BUG: Current template/Module does not support multi-paragraph usage...

See:

https://en.wikipedia.org/w/index.php?title=Lojban&action=edit&section=20

Here the lang template is used with a multiple paragaph..

The expansion thereof is

<blockquote class="templatequote" ><i><span lang="jbo" title="Lojban language text">'''la berti brife jo'u la solri'''<br /><br />ni'o la berti brife jo'u la solri pu troci lo ka cuxna lo poi me vo'a vau traji be lo ka vlipa i ca'o bo lo pa litru noi dasni lo glare kosta cu klama

.i le re mei pu simxu lo ka tugni fi lo nu lo traji be lo ka clira fa lo nu ce'u snada lo ka gasnu lo nu le litru co'u dasni le kosta cu traji lo ka vlipa

.i ba bo la berti brife co'a traji cupra lo brife i ku'i lo nu by by zenba lo ka cupra lo xo kau brife cu rinka lo nu le litru zu'e ri zenba lo ka se tagji le kosta i ba bo la berti brife co'u troci i ba bo la solri co'a glare dirce i ba zi bo le litru co'u dasni le kosta

.i se ki'u bo la berti brife cu bilga lo ka tugni fi lo nu la solri cu traji lo ka vlipa</span></i>
</blockquote>

This is CLEARLY malformed, as implied cannot be placed inside a ShakespeareFan00 (talk) 09:49, 4 August 2018 (UTC)

This is a known issue... --Izno (talk) 11:31, 4 August 2018 (UTC)

And whats the timescale for it being resolved? ShakespeareFan00 (talk) 11:52, 4 August 2018 (UTC)

No need to shout; demonstrations of anger will not speed a resolution.

When I looked at the rendered html of Lojban#The North Wind and the Sun using my browser's view-page-source context menu, I saw something rather different from the expansion you provided. For me, this:

<blockquote class="templatequote"><i><span lang="jbo" title="Lojban language text"><b>la berti brife jo'u la solri</b><br /><br />ni'o la berti brife jo'u la solri pu troci lo ka cuxna lo poi me vo'a vau traji be lo ka vlipa i ca'o bo lo pa litru noi dasni lo glare kosta cu klama
<p>.i le re mei pu simxu lo ka tugni fi lo nu lo traji be lo ka clira fa lo nu ce'u snada lo ka gasnu lo nu le litru co'u dasni le kosta cu traji lo ka vlipa
</p><p>.i ba bo la berti brife co'a traji cupra lo brife i ku'i lo nu by by zenba lo ka cupra lo xo kau brife cu rinka lo nu le litru zu'e ri zenba lo ka se tagji le kosta i ba bo la berti brife co'u troci i ba bo la solri co'a glare dirce i ba zi bo le litru co'u dasni le kosta
</p></span></i><p><i>.i se ki'u bo la berti brife cu bilga lo ka tugni fi lo nu la solri cu traji lo ka vlipa</i>
</p>
</blockquote>

This is clearly malformed. I have made a tweak to Module:Lang/sandbox that looks for sequential \n characters in the template's text parameter. When this condition is detected, the module switches to <div>...</div> tags which produces this:

<blockquote class="templatequote"><i><div lang="jbo" title="Lojban language text"><b>la berti brife jo'u la solri</b><br /><br />ni'o la berti brife jo'u la solri pu troci lo ka cuxna lo poi me vo'a vau traji be lo ka vlipa i ca'o bo lo pa litru noi dasni lo glare kosta cu klama
<p>.i le re mei pu simxu lo ka tugni fi lo nu lo traji be lo ka clira fa lo nu ce'u snada lo ka gasnu lo nu le litru co'u dasni le kosta cu traji lo ka vlipa
</p><p>.i ba bo la berti brife co'a traji cupra lo brife i ku'i lo nu by by zenba lo ka cupra lo xo kau brife cu rinka lo nu le litru zu'e ri zenba lo ka se tagji le kosta i ba bo la berti brife co'u troci i ba bo la solri co'a glare dirce i ba zi bo le litru co'u dasni le kosta
</p>
.i se ki'u bo la berti brife cu bilga lo ka tugni fi lo nu la solri cu traji lo ka vlipa</div></i>
</blockquote>

Since I was mucking about with span/div switching, I also added a test for list markup (see this archived discussion). For list markup, the module looks for \n immediately followed by one of *, ;, :, or #:

{{lang/sandbox|de|<nowiki />
:*a list item 
:*another list item
:*and yet another list item}}

gives:

:*a list item :*another list item :*and yet another list item

The <nowiki /> tag is required for the time being because Module:lang and its /sandbox use Module:Arguments which, by default, trims whitespace from positional parameters. This can be customized so that ultimately the <nowiki /> will not be required.

—Trappist the monk (talk) 14:37, 4 August 2018 (UTC)

must contain only phrasing content. Neither divs, nor paragraphs, nor lists are included as phrasing content. --Izno (talk) 15:45, 4 August 2018 (UTC)

The ... tags wrapping the ... tags were, I think, dictated by the vagaries of Tidy. One iteration of Module:lang did not use ... for italicized text but instead used ... tags with all of the appropriate attributes; if I recall correctly, Tidy buggered that up when a {{lang}} template was wrapped in wiki italic markup (nested ... tags, one within the other and Tidy discarded the more specific tags) so we went back to .... We tried using the font-face:italic attribute in the  tag but editors complained about having to use !important in their personal css when they wanted to override specific languages with special fonts.

—Trappist the monk (talk) 16:42, 4 August 2018 (UTC)

The lang= attribute is valid on all HTML elements (HTML 4.0 and later), without exception; so we may use

<blockquote class="templatequote" lang="jbo" title="Lojban language text" style="font-style:italic;">'''la berti brife jo'u la solri'''<br /><br />ni'o la berti brife jo'u la solri pu troci lo ka cuxna lo poi me vo'a vau traji be lo ka vlipa i ca'o bo lo pa litru noi dasni lo glare kosta cu klama

.i le re mei pu simxu lo ka tugni fi lo nu lo traji be lo ka clira fa lo nu ce'u snada lo ka gasnu lo nu le litru co'u dasni le kosta cu traji lo ka vlipa

.i ba bo la berti brife co'a traji cupra lo brife i ku'i lo nu by by zenba lo ka cupra lo xo kau brife cu rinka lo nu le litru zu'e ri zenba lo ka se tagji le kosta i ba bo la berti brife co'u troci i ba bo la solri co'a glare dirce i ba zi bo le litru co'u dasni le kosta

.i se ki'u bo la berti brife cu bilga lo ka tugni fi lo nu la solri cu traji lo ka vlipa
</blockquote>

which produces:

la berti brife jo'u la solri

ni'o la berti brife jo'u la solri pu troci lo ka cuxna lo poi me vo'a vau traji be lo ka vlipa i ca'o bo lo pa litru noi dasni lo glare kosta cu klama
.i le re mei pu simxu lo ka tugni fi lo nu lo traji be lo ka clira fa lo nu ce'u snada lo ka gasnu lo nu le litru co'u dasni le kosta cu traji lo ka vlipa
.i ba bo la berti brife co'a traji cupra lo brife i ku'i lo nu by by zenba lo ka cupra lo xo kau brife cu rinka lo nu le litru zu'e ri zenba lo ka se tagji le kosta i ba bo la berti brife co'u troci i ba bo la solri co'a glare dirce i ba zi bo le litru co'u dasni le kosta
.i se ki'u bo la berti brife cu bilga lo ka tugni fi lo nu la solri cu traji lo ka vlipa

Notice also how the ... tags (which like ... can only contain inline content) have been turned into a style="font-style:italic;" attribute. --Redrose64 🌹 (talk) 19:34, 4 August 2018 (UTC)

But, alas, in the configuration that OP presented (one that I suspect most editors would choose), {{lang}} is unaware that it is wrapped in <blockquote>...</blockquote> tags so cannot rewrite the <blockquote> tag as you have done. We might, in a future version of {{lang}} offer a parameter that would allow editors to specify one of a limited set tags in which to wrap the text – I have not thought this out so it may not be workable.

Because Tidy has gone away, it appears that we can go back to using ... tags for italicized text. I have tweaked Module:lang/sandbox. From this:

''{{lang/sandbox|es|casa}}''

{{lang/sandbox}} produces this:

''casa'' → casa

which is not 'tidied' but is rendered in the html as:

casa

The issue of italics when text contains \n\n, \n*, \n;, \n:, or \n# requires {{lang}}, I think, to convert this:

:{{lang/sandbox|de|
:*a list item 
:*another list item
:*and yet another list item}}

to this:

:<div lang="de" title="German language text">
:*<i>a list item</i>
:*<i>another list item</i>
:*<i>and yet another list item</i></div>

which MediaWiki then renders as this html:

<dl><dd><div lang="de" title="German language text">
<ul><li><i>a list item</i></li>
<li><i>another list item</i></li>
<li><i>and yet another list item</i></li></ul></div></dd></dl>

—Trappist the monk (talk) 14:17, 5 August 2018 (UTC)

I've hacked at Module:lang/sandbox and have had success with list markup. For example, this:

:{{lang/sandbox|de|
:*a list item 
:*another list item
:*and yet another list item}}

renders like this:

a list item
another list item
and yet another list item

with this html:

<dl><dd><div lang="de" title="German language text">
<ul><li><i>a list item </i></li>
<li><i>another list item</i></li>
<li><i>and yet another list item</i></li></ul></div></dd></dl>

But, implied ... (from \n\n) does not work though the failing can't, I think, be laid at the feet of Module:lang/sandbox.

{{lang|de|
a line item 
another line item

and yet another line item}}

makes this html:

<p><i><span lang="de" title="German language text">a line item 
another line item
</span></i></p><p><i>and yet another line item</i>
</p>

the last line is outside of the enclosing span:

a line item another line item

and yet another line item

The same 'text' using Module:lang/sandbox wraps the whole in <div>...</div> tags gives this html:

<div lang="de" title="German language text"><i>a line item  another line item</i>
<i>and yet another line item</i></div>

which renders:

a line item another line item

and yet another line item

What to do? Detect \n\n in 'text' and throw an error? Detect \n\n in 'text' and insert ... tags in the appropriate places? If we throw an error, do we recommend <poem>...</poem> markup? Probably not because this:

{{lang/sandbox|de|
<poem>a line item 
another line item

and yet another line item</poem>}}

produces this flawed html (the module doesn't produce <div>...</div> because all it sees is the poem stripmarker):

<i lang="de" title="German language text"><div class="poem">
<p>a line item <br />
another line item<br />
<br />
and yet another line item
</p>
</div></i>

We could detect the poem stripmarker, wrap the whole in <div>...</div> tags and lose auto-italics.

What to do with \n\n in 'text'?

—Trappist the monk (talk) 12:21, 14 August 2018 (UTC)

isolation

Please consider to add style="unicode-bidi:isolate" - this should resolve issues such as the mixup in Mahmud of Ghazni. Thanks, Eran (talk) 20:24, 15 August 2018 (UTC)

I haven't given this any thought but my immediate reaction is that this:

{{lang|fa|محمود غزنوی}}; 2 November 971 – 30 April 1030

محمود غزنوی; 2 November 971 – 30 April 1030

should have been written like this:

{{lang|fa|محمود غزنوی|rtl=yes}}; 2 November 971 – 30 April 1030

محمود غزنوی; 2 November 971 – 30 April 1030

Doing this serves two purposes: first it makes the rendering correct because |rtl= tells the browser that the ... tags in the template rendering enclose right-to-left text and second, in the wikisource, it disentangles right-to-left (the Persian text), the ambiguous text (the digits can go either way), and the left-to-right text (English text).

—Trappist the monk (talk) 21:53, 15 August 2018 (UTC)

|rtl=yes could be partially automated using script detection, like italicization: look up the scripts of the characters in the text and if all of them are right-to-left scripts or not real scripts (Zinh, Zyyy, Zzzz), add dir="rtl". I've created an is_rtl function in Module:Unicode data and added it to Module:Lang/sandbox. It uses rtl_scripts from Module:Lang/data via Module:Unicode data/scripts. With this change, the example above displays correctly without |rtl=yes:

{{lang/sandbox|fa|محمود غزنوی}}; 2 November 971 – 30 April 1030
محمود غزنوی; 2 November 971 – 30 April 1030

|rtl=yes would sometimes still be needed: for instance, when there are Latin-script characters in Arabic-script text, for instance. — Eru·tuon 22:44, 15 August 2018 (UTC)

Is there a problem? is_rtl():

if not (rtl[script] or script == "Zyyy" or script == "Zinh"
		or script == "Zzzz") then
	return false
end

A string entirely composed of Zyyy, Zinh, and / or Zzzz characters is, by this definition, right-to-left, right? Yet the same string of characters is also Latin (if I understand what is_Latin() does). Perhaps one or both of those functions need a tweak to require that the source string contains some minimal percentage of characters that are members of the target script(s)?

—Trappist the monk (talk) 23:36, 15 August 2018 (UTC)

@Trappist the monk: Yes, that's the way the functions behaved. I've changed them to be more accurate, meaning now they will return false instead of true if their input consists of nothing but Zinh, Zyyy, or Zzzz characters. Do you think there would be a benefit to requiring a certain percentage of Latin or right-to-left characters?

I intentionally didn't fix is_Latin, because the text supplied to {{lang}} almost always contains letter characters belonging to a particular script. But for is_rtl the inaccuracy would have a greater cost, and is_Latin might as well be more accurate too.

Now that I look into it a little, I wonder if tying right-to-left direction to the script is correct, because bidirectional class is the actual basis for the Unicode bidirectional algorithm (see Bi-directional text § Table of possible BiDi-types). I don't know how exactly the text direction in HTML or CSS relates to the bidirectional classes of characters. But English Wiktionary ties text direction to script (see wikt:MediaWiki:Common.css) and it seems to work well enough there. — Eru·tuon 02:50, 16 August 2018 (UTC)

I don't know what a minimal percentage might be. It could be that a string composed of only a single proper right-to-left character and any number of Zyyy, Zinh, and / or Zzzz characters is sufficient. This, I think is pretty much what is_rtl() now recognizes, right?

I would be perfectly content not to automate rtl detection because that mechanism doesn't fix the mishmash that occurs in the edit window when left-to-right digits follow right-to-left text as is the exemplar case here.

—Trappist the monk (talk) 14:58, 16 August 2018 (UTC)

@Trappist the monk: Yes, that's how the function behaves right now. — Eru·tuon 18:39, 16 August 2018 (UTC)

Incorrect text direction in the textbox doesn't take away from the fact that automatic detection would make the template more convenient. (I'm not seeing a mishmash in the textbox here, unless I copy the example as plain text and paste it into the textbox, but I've seen it elsewhere, for instance in wikt:Module:ckb-translit.)

Editors wouldn't have to add |rtl=yes and there would be fewer text-direction rendering errors, because dir="rtl" would almost always be added when necessary even if the editor knows nothing about it. If automatic detection had already been implemented, this thread wouldn't exist, because the rendering error wouldn't have happened.

The cases in which automatic detection would fail because the text contains both right-to-left and left-to-right scripts seem to be rare. For instance, a search for text containing Arabic-block and basic Latin characters in {{lang}}(hastemplate:"lang" insource:/\{\{ *lang *\|[^|]+\| *([؀-ۿ]+[؀-ۿ ]*[a-zA-Z]|[a-zA-Z][؀-ۿ ]*[؀-ۿ]+)/) only gives a few results, most of which need to be corrected so that English text or transliteration isn't incorrectly tagged. — Eru·tuon 23:41, 18 August 2018 (UTC)

In my browser's edit box, latest chrome win7, the exemplar template looks like this:

{{lang|fa|2 ;{{<rtl text> November 971 – 30 April 1030

Auto-detecting rtl in {{lang}} won't unscramble the stuff that editors work with.

—Trappist the monk (talk) 10:37, 19 August 2018 (UTC)

Oh yeah, so Firefox (my default browser) doesn't scramble the text, while Chrome does. — Eru·tuon 18:39, 19 August 2018 (UTC)

Unconverted lang- templates

Please would somebody look at the {{Lang-ug}} template to determine either (i) whether it should be converted to Module:Lang, or (ii) explain why it should not be converted. --Redrose64 🌹 (talk) 19:24, 22 August 2018 (UTC)

Because {{lang-ug}} renders the text using {{ug-textonly}}; because {{lang-ug}} has special code and parameters for romanization and other scripts; in short, because it doesn't fit neatly into the {{lang-??}} model. The {{ug-textonly}} issue might be solvable with template styles; the special parameter functions might be fixable by code tweaks; or not.

—Trappist the monk (talk) 20:01, 22 August 2018 (UTC)

Also can anybody explain why we have both Category:Articles containing Uighur-language text and Category:Articles containing Uyghur-language text? --Redrose64 🌹 (talk) 19:32, 22 August 2018 (UTC)

{{lang|ug|...}} takes language name from Module:Language/data/iana_languages which is a list of the accepted language codes used for IETF tagging. Code ug has two accepted spellings the first of which (and the one used by {{lang}} is Uighur; the second is Uyghur used by {{lang-ug}}. This is fixable given a determination of which name Uighur or Uyghur is preferred. (Chrome's spell-checker prefers Uighur; for whatever that is worth.)

—Trappist the monk (talk) 20:01, 22 August 2018 (UTC)

Bug: italicization of non-Latin text

This template has a bug where it italicizes non-Latin text if it is encoded as HTML entities. For example, {{lang|lab-Lina|𐘀}} produces 𐘀, but {{lang|lab-Lina|𐘀}} produces 𐘀. Gorobay (talk) 00:58, 22 September 2018 (UTC)

One approach here (decoding HTML character references), or maybe in this case italicization could be turned off based on the script subtag (Lina) instead. — Eru·tuon 04:36, 22 September 2018 (UTC)

I've added a test to disable auto-italics when there is a script subtag that is not Latn. To test this I disabled your suggested fix; I tested my tweak with Cyrl because on my browser Linear A is just a little square box:

{{lang/sandbox|sh-cyrl|АБв}} → АБв

{{lang/sandbox|sh|АБв}} → АБв – fails because of the 'x'

{{lang/sandbox|sh|АБВ}} → АБВ

It seems to me that the mw.text.decode() fix for this properly belongs in Module:Unicode data.

—Trappist the monk (talk) 12:37, 22 September 2018 (UTC)

I'll work on that. I'd envisioned is_Latin as a function only operating on UTF-8-encoded characters, but on a wiki it's more convenient if it decodes HTML character references. [Edit: Done.] — Eru·tuon 20:11, 22 September 2018 (UTC)

Linking "lit."?

If a lang-xx template is called with the addition of a literal translation (using the parameter |lit=), then this translation will be prefixed by the abbreviation "lit.". I've just noticed that this abbreviation will by default get displayed as a wikilink, as in the following example:

{{lang-sa|Savyasācin-|lit = ambidextrous}}: Sanskrit: Savyasācin-, lit. 'ambidextrous'

Now, this linking could be turned off via |link=no (which would also unlink the language name), but is the default behaviour desirable at all? First off, the meaning of the abbreviation is given in a tooltip anyway, so the addition of a link seems like a bit of an overkill. It adds visual clutter, and it doesn't quite fit into our linking philosophy: we don't link common terms whose articles are incidental to the text they appear in. – Uanfala (talk) 18:27, 23 September 2018 (UTC)

Module:Lang does not add a tool tip to the lit. link. The tool tip that you are seeing is the normally provided tool tip that comes with any Wiki link; Sanskrit in your example also has a tooltip. When |link=no Module:lang does not link to Literal translation but instead renders the 'lit.' static text wrapped in <abbr title="literal translation">...</abbr>. Module:lang does not do both.

—Trappist the monk (talk) 20:02, 23 September 2018 (UTC)

Sorry, my mistake: I had in mind the tooltip that usually comes with the abbr tags: that's the one visible when |link= is set to "no", as here:

{{lang-sa|Savyasācin-|lit = ambidextrous|link=no}}: Sanskrit: Savyasācin-, lit. 'ambidextrous'

So the question remains: why is the current default behaviour to have "lit." linked rather than formatted with abbr tags? – Uanfala (talk) 20:33, 23 September 2018 (UTC)

Module:Lang descends from several older templates. One of them is {{Language with name and transliteration}}, the source of the lit. link. Until now, no one has thought to question why lit. is linked – the link has been in that template from the beginning (9 November 2015).

—Trappist the monk (talk) 20:56, 23 September 2018 (UTC)

Are there times when the {{lang}} template should not be used with foreign words?

The template page list multiple very good reasons for using it in the rationale section and I was wondering if there were cases in an article where use of the template should be specifically avoided or if there is such a thing as over use of the lang template. —The Editor's Apprentice (talk) (contribs) 14:54, 27 September 2018 (UTC)

{{lang}} should never be used in any of the cs1|2 citation-template parameters listed here. Otherwise, I don't know of any other places where non-English text should not be wrapped in {{lang}}. If the template shows an error message, fix the template; if unfixable, report the circumstances here. It is possible to over-use any template because MediaWiki will only allow template expansion to 2MB; see (Post-expand include size).

—Trappist the monk (talk) 15:10, 27 September 2018 (UTC)

We still need to hash out a block version of this template, which I have been meaning to chase. That's one spot where this template should probably not be used ("technical limitations"). --Izno (talk) 17:37, 27 September 2018 (UTC)

Pretty sure that I did something about that:

{{lang|de|
*a list item 
*another list item
*and yet another list item}}

a list item
another list item
and yet another list item

<div title="German-language text"><div lang="de">
*<i>a list item </i>
*<i>another list item</i>
*<i>and yet another list item</i></div></div>

—Trappist the monk (talk) 18:36, 27 September 2018 (UTC)

There is a bit of guidance for the OP at MOS:FOREIGNITALIC about loanwords that have become English words. – Jonesey95 (talk) 18:38, 27 September 2018 (UTC)

Messed up formats

Could you guys matybe verify what recent changes have done to articles such as Luceafărul (poem)? Dahn (talk) 05:22, 4 October 2018 (UTC)

Done. {{verse translation}} has a |lang= parameter that should be used. — Eru·tuon 06:03, 4 October 2018 (UTC)

One word in multiple languages in one template?

Is there a way to put more than one ISO code into {{lang}}? For example, instead of saying:

The Huli language word is anga, and it is also anga in the Duna language.

The [[Huli language]] word is {{lang|hui|anga}}, and it is also {{lang|duc|anga}} in the [[Duna language]].

I could write: The word is anga in the Duna and Huli languages.

The word is {{lang|hui|duc|anga}} in the [[Duna language|Duna]] and [[Huli language]]s.

I tried in my sendbox, and I couldn't tell if it added the right categories. --Nessie (talk) 15:15, 22 October 2018 (UTC)

This will not work (and cannot be made to work). You will need to use a construction similar to your first, though you might employ semicolons and parentheses or some such. --Izno (talk) 15:31, 22 October 2018 (UTC)

Macrolanguage categories

Moved from User talk:BrownHairedGirl § Macrolanguage redirect

Hi, I'm not sure I immediately see the point of the redirect Nepali (macrolanguage) language. What am I missing? – Uanfala (talk) 12:37, 3 November 2018 (UTC)

Hi @Uanfala

I presume that before posting, you tried Special:WhatLinksHere/Nepali (macrolanguage) language. So you will have seen there that the redirect is linked to by Category:Articles containing Nepali (macrolanguage)-language text. Currently empty, but wasn't when I created it.

Such categories use a template which generates a link to "Foo language".

When "Foo" includes the word "language", the resulting ugly construct needs a redirect, which I created. As I have done for hundreds of other such cases. --BrownHairedGirl (talk) • (contribs) 12:46, 3 November 2018 (UTC)

Ah, it didn't occur to me to check that. Hmm, isn't Category:Articles containing Nepali (macrolanguage)-language text now superseded by Category:Articles containing Nepali-language text, which is what you get when calling {{lang}} with the language code set to either "nep" or "ne"? Using "npi" (the code for the individual langauge) seems to add the non-existent Category:Articles containing Nepali (individual language)-language text. Pinging Trappist the monk, who knows this system better than anyone else. – Uanfala (talk) 13:02, 3 November 2018 (UTC)

@Uanfala: yes {{Lang|ne|text in Nepali (macrolanguage) language here}} no longer populates Category:Articles containing Nepali (macrolanguage)-language text, which I guess is why the cat is now empty.

To be honest, I am not much interested in this. i just created those cats to fill redlinks per WP:REDNOT. I presume that since I did so, some llokup table was modified. --BrownHairedGirl (talk) • (contribs) 13:10, 3 November 2018 (UTC)

Some language names defined in ISO 639-1, -2, -3 are parenthetically disambiguated by the custodians so those disambiguations are made part of the language lists that {{lang}} uses. Disambiguators are:

'macrolanguage' and 'individual language' – Dogri (macrolanguage) and Dogri (individual language)

country, state, or other political division names – Aniu (China) and Aniu (Japan)

time periods – Old English (ca. 450-1100), Egyptian (Ancient)

institutional names – Interlingua (International Auxiliary Language Association)

abbreviations – Hawai'i Sign Language (HSL)

Complicating all of this, is Module:Language/data/wp languages which contains a list of ISO 639 codes and names without disambiguation (this is where {{lang|ne|...}} gets 'Nepali'). The provenance of the data in this module is unknown, unclear, and as such, suspect. /wp languages existed before I created Module:Lang to consolidate most of the {{lang}} and {{lang-??}} templates using codes and names from the custodian sources (en.wiki should not be in the business of making up codes and names).

I'm not sure that Module:lang should remove disambiguators when adding categories though I suppose that it could be argued that removing 'macrolanguage' and 'individual language' would be acceptable. This, of course, is a topic for Template talk:Lang.

—Trappist the monk (talk) 14:53, 3 November 2018 (UTC)

Moved to the template talk page. – Uanfala (talk) 15:08, 3 November 2018 (UTC)

New parameter idea

Shouldn't there be a new parameter for Lang-XX templates, which will give appropriate punctuations, brackets, styling, etc., in accordance with what the subject is (like Book title, Episode title, Film title, etc.)? JSH-alive/^{talk/cont/mail} 08:09, 9 November 2018 (UTC)

If I understand what it is that you are suggesting, I think that I have already proposed a wrapper template for {{lang}} that would have done that. For want of support, the proposals died.

—Trappist the monk (talk) 09:47, 9 November 2018 (UTC)

Well, suppose I put a Korean title in {{lang-ko}}, and put a parameter named something like |instance=. If I give some value like "film title", "TV title", "album title" and "book title", the title will be wrapped in 《》, while other values like "song title", "episode title", "chapter title" and "article title" will make it wrapped in 〈〉. JSH-alive/^{talk/cont/mail} 14:20, 9 November 2018 (UTC)

This is much better left to the editor using the template. -- Michael Bednarek (talk) 14:37, 9 November 2018 (UTC)

{{lang}} and the {{lang-??}} templates attempt to adhere to en.wiki's manual of style. At MOS:QUOTEMARKS curly (“”), low-high („ “), and guillemets (« ») as quotation marks are prohibited. I suspect that use of 〈〉and 《》 as quotation marks are similarly inappropriate in en.wiki text so for {{lang-ko|instance=...}} to add those marks would be a violation of the attempt-to-adhere-to-MOS rubric.

—Trappist the monk (talk) 14:53, 9 November 2018 (UTC)

I see. JSH-alive/^{talk/cont/mail} 15:02, 9 November 2018 (UTC)

Translation issue

In the module Module:lang, in line number 1082, should the word language which is present inside single quotes be translated into local language while translating it into malayalam? I am getting around thousand errors as in the page [Hindi].Adithyak1997 (talk) 18:44, 24 November 2018 (UTC)

I have done nothing to internationalize Module:Lang. It's on the TODO list but ...

The first thing that I noticed about ml:ഹിന്ദി is the script error. You don't have the current version of Module:Unicode_data/scripts.

—Trappist the monk (talk) 20:13, 24 November 2018 (UTC)

@Adithyak1997: Updated ml:ഘടകം:Unicode data/scripts to fix module error in ml:ഹിന്ദി. — Eru·tuon 02:34, 25 November 2018 (UTC)

Arabic in western alphabet

Why does {{lang|ar|[[Allahu Akbar]]}} render as Allahu Akbar, with a different font to the rest of this post, and no underscore? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:25, 12 December 2018 (UTC)

And without the link: {{lang|ar|Allahu Akbar}} as: Allahu Akbar. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:27, 12 December 2018 (UTC)

And without italics: {{lang|ar|Allahu Akbar|italic=no}} as: Allahu Akbar. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:29, 12 December 2018 (UTC)

For comparison; with no template: Allahu Akbar; and italicised: Allahu Akbar. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:54, 12 December 2018 (UTC)

Transliterated non-English text is rendered in italic font; your example is an Arabic transliteration written with Latin script. Perhaps the better template for this example is {{transl}}; cf:

{{lang|ar|[[Allahu Akbar]]}} → [[Allahu Akbar]] → Allahu Akbar

{{transl|ar|[[Allahu Akbar]]}} → [[Allahu Akbar]] → Allahu Akbar

alternately:

{{lang|ar-Latn|[[Allahu Akbar]]}} → [[Allahu Akbar]] → Allahu Akbar

No underscore because because you did not include an underscore in the wikilink text; here I did:

{{lang|ar|[[Allahu_Akbar]]}} → Allahu_Akbar

—Trappist the monk (talk) 18:10, 12 December 2018 (UTC)

Apologies; I meant underline, not underscore. Your first three examples all appear identical (to each other, and to my first example; but not to the template-free version) to me, in the latest version of Firefox, on recently-patched Win 10; and in the latst Opera on the same device. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:51, 12 December 2018 (UTC)

Probably the Arabic text has a different font because your browser is rendering text that is marked as Arabic (enclosed in a HTML tag with the attribute lang="ar") in a font appropriate for the Arabic script, even though the characters are actually from the Latin alphabet. I've seen a similar problem, Japanese romanization tagged with lang="ja" or lang="ja-Latn" being rendered in a font appropriate for Japanese script. (I think it makes the Latin characters extra wide.) Adding a script code (lang=ar-Latn) and putting a CSS rule like [lang*="Latn"] {font-family: inherit;} (language attribute includes "Latn") or [lang$="Latn"] {font-family: inherit;} (language attribute ends with "Latn") in your common.css should override the browser's font rules and force the text to render in the font of the surrounding paragraph. — Eru·tuon 20:44, 12 December 2018 (UTC)

It hadn't occurred to me that the difference was browser-specific; I'll do some testing, later. Thanks for the CSS tips, but I'm more concerned about how it appears for general readers, than me personally. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:50, 12 December 2018 (UTC)

Use in links

Can anyone tell me why the markup in this diff failed, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:56, 14 January 2019 (UTC)

Because in mainspace, {{lang}} adds a category link:

[[Trial in absentia|tried <i lang="la" title="Latin language text">in absentia</i>[[Category:Articles containing Latin-language text]]]]

Add |nocat=yes.

—Trappist the monk (talk) 14:09, 14 January 2019 (UTC)