Template talk:Lang

Template:Lang is permanently protected from editing because it is a heavily used or highly visible template. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit. Usually, any contributor may edit the template's documentation to add usage notes or categories.

Any contributor may edit the template's sandbox. Functionality of the template can be checked using test cases.

This template (Language templates) was considered for deletion on 2006 February 20. The result of the discussion was "keep".

Archives

Archive 1	Archive 2	Archive 3
Archive 4	Archive 5	Archive 6
Archive 7	Archive 8	Archive 9
Archive 10	Archive 11	Archive 12
Archive 13

This page has archives. Sections older than 120 days may be automatically archived by .

strange linebreak and space created for some languages

Is it possible that something in template:lang is broken?

See example : (တိုင်း in Burmese)

template:my which uses lang looks ok to me... --katpatuka (talk) 17:24, 22 April 2017 (UTC)[reply]

The above text looks fine to me. What are you trying to demonstrate? – Jonesey95 (talk) 22:45, 22 April 2017 (UTC)[reply]

Is has already been fixed: there had bee I linebreak in template:my I didn't see. --katpatuka (talk) 04:59, 23 April 2017 (UTC)[reply]

How to display a Japanese ellipsis without resorting to ･･･?

Hello, I would like to display … as wikipedia does here. Wikipedia uses a template with "span lang", so I've tried to do it the same way, but the result is this. Is there any way to display … as in the aforementioned article? Seelentau (talk) 20:30, 1 July 2017 (UTC)[reply]

@Seelentau: What is wrong with using {{lang|ja|…}} which produces …, or indeed … which produces …? --Redrose64 🌹 (talk) 20:41, 1 July 2017 (UTC)[reply]

@Redrose64: I would like to use it in a wikia-wiki, where the lang-template doesn't exist. And … is displayed at the bottom of a letter height, whereas on said wikipedia page, the ellipsis is displayed in a similar fashion as ･･･ (which is just three ･). I would like to avoid using ･･･, but simply using … comes up as … for me. Seelentau (talk) 20:55, 1 July 2017 (UTC)[reply]

Have you tried using … which is the HTML expansion of that template, and which produces …? Please note: span is an ordinary HTML element, lang is one of its permitted attributes. --Redrose64 🌹 (talk) 21:33, 1 July 2017 (UTC)[reply]

Yes, I've tried using span lang, and it does work here, but not over at wikia. Seelentau (talk) 21:40, 1 July 2017 (UTC)[reply]

Then all I can think of is that the fonts are different. --Redrose64 🌹 (talk) 21:55, 1 July 2017 (UTC)[reply]

Yup, was just about to write that. :) By changing it to sans-serif, it works: … produces … Seelentau (talk) 21:56, 1 July 2017 (UTC)[reply]

You can contract that - … produces … - another reason for doing so is that the font element is obsolete. --Redrose64 🌹 (talk) 22:03, 1 July 2017 (UTC)[reply]

Ah, okay, I will do that, thank you! :) But one more problem is that I can't use it in article titles. For example, the song name "Unknown…Despair…a Lost" has to be titled "Unknown…Despair…a Lost". Is there no character that is actually … and not a modified …? Seelentau (talk) 22:15, 1 July 2017 (UTC)[reply]

Set the article title without attempting to style it. Then, at the top of the article, use

{{DISPLAYTITLE:Unknown<span lang="ja" style="font-family:sans-serif">…</span>Despair<span lang="ja" style="font-family:sans-serif">…</span>a Lost}}

- see mw:Help:Magic words#Technical metadata. --Redrose64 🌹 (talk) 22:28, 1 July 2017 (UTC)[reply]

Oh yes, displaytitle exist, completely forgot^^ I will do that, but do you know why the ellipsis is actually displayed this way? The background of all of this is Japanese, by the way, and their ellipsis is always displayed as …, but when I copy it (for example, from here), it's simply …. Then again, for some, … is displayed as … from the start... does Firefox not support that? Seelentau (talk) 22:36, 1 July 2017 (UTC)[reply]

@Seelentau: It looks like this has to do with the font that the browser chooses. In my browser (Chrome), the ellipsis character is displayed in the font Meiryo when marked as Japanese (…), but in the font Arial otherwise (…). To avoid this inconsistency, you might be able to use the "midline horizontal ellipsis" character (U+22EF, ⋯) instead of the horizontal ellipsis (U+2026, …), which displays the correct way even in Arial. But note that that is probably technically incorrect because U+22EF is in the mathematical operators block and categorized as a symbol rather than a punctuation character (FileFormat.info page). — Eru·tuon 01:17, 11 October 2017 (UTC)[reply]

Flemish (Belgian Dutch)

{{Lang-nl-BE}} doesn't work, but should, and should render as "Flemish (Belgian Dutch)" probably. Some linguists classify Flemish as a language (or even multiple languages), not a dialect or dialect continuum of Dutch (much the way Scots is considered a separate language from English, not a variant of it), but I think we're stuck with nl-BE for now. ISO defines separate codes for two forms of Flemish, West Flemish and Limburgish, but not the other two, nor Flemish a whole. Many sources for this or that which we may need to mark up with a template are not specific and just say "Flemish", so it's going to be original research for a Wikipedian to try to use one of the more specific ISO labels. However, just using {{lang-nl}} is inaccurate and a disservice to readers. Ergo, we need {{lang-nl-BE}}. — SMcCandlish ☺ ☏ ¢ ≽^ʌⱷ҅_ᴥⱷ^ʌ≼ 22:10, 21 August 2017 (UTC)[reply]

Does "nl-BE" conform to the existing template instructions at Template:Lang#Indicating regional variant? In other words, is "BE" the correct two-letter abbreviation? A sourced answer would be helpful. – Jonesey95 (talk) 23:16, 21 August 2017 (UTC)[reply]

nl is ISO 639-1 language code for Dutch; BE is ISO 3166-1 country code for Belgium. The 639 table also lists Flemish as a code nl language. See also code nld @ sil.org.

Flemish contributes extensively to the size of Category:CS1 maint: Unrecognized language because it is-a-language-that's-not-a-language. For cs1|2, we could modify Module:Citation/CS1 to accept |language=Flemish but we also require that there be a code from which we can render a language: |language=de → (in German). Code nl will always render as Dutch so we could, in lieu of decision from recognized authorities make up our own. nl-BE would work if there is nothing better.

—Trappist the monk (talk) 23:37, 21 August 2017 (UTC)[reply]

I have created two templates and a category to begin support for this language/dialect. Let me know if more are needed. – Jonesey95 (talk) 01:06, 22 August 2017 (UTC)[reply]

FYI MediaWiki does not recognize dialects and character sets (e.g. Simplified Chinese (a character set), British English (a dialect)). Flow 234 (Nina) talk 15:37, 20 September 2017 (UTC)[reply]

Fraternities and this template.

Is there any guidance as to whether this template should be used for Fraternities? For example, should it simply be ΦΒΚ or should it be ΦΒΚ (from {{lang|el|ΦΒΚ}})Naraht (talk) 08:56, 23 September 2017 (UTC)[reply]

Completely incorrect advice

Presently there some text that says:

Do not use quotation marks in your user style sheet; they may be misinterpreted as wikitext. While they are recommended in CSS, they are only required for font families containing generic-family keywords ('inherit', 'serif', 'sans-serif', 'monospace', 'fantasy', and 'cursive'). See the W3C for more details.

This is wrong on either every detail or almost every detail.

Quotation marks are not needed around generic family keywords. They're only needed around actual font names that contain spaces or other non-alphanum characters, which is a lot of them, or start with digits. These quotation marks are not optional in such a case, though many browsers gracefully decline to choke to death if you leave them out. They're usually double-quotes not single-quotes, unless one is using inline CSS inside HTML, e.g. in ... or whatever.

If it actually is true that "quotation marks in your user style sheet ... may be misinterpreted as wikitext", this is a very severe MediaWiki bug which needs to be addressed immediately. If this were the case, I think we would have heard about it by now. — SMcCandlish ☺ ☏ ¢ ≽^ʌⱷ҅_ᴥⱷ^ʌ≼ 03:48, 29 September 2017 (UTC)[reply]

It goes back twelve years, to this edit at 09:04, 27 January 2005 (UTC) by Mzajac (talk · contribs). In those days, template documentation was on the talk page - we didn't use /doc subpages. Subsequent relevant edits include:

22:13, 30 September 2007 removal from talk page and 22:13, 30 September 2007 added to /doc page by 16@r (talk · contribs) (text unchanged)
16:55, 6 October 2007 by Dan Pelleg (talk · contribs)
22:00, 23 May 2009 by AnOddName (talk · contribs)
13:06, 12 September 2012 by RexxS (talk · contribs)
16:38, 16 March 2013 by Nnemo (talk · contribs)
18:20, 18 March 2015 by SidP (talk · contribs)
17:07, 30 April 2016 by Quoth-22 (talk · contribs)
05:46, 1 October 2016 by Erutuon (talk · contribs) which produced the present version.

Maybe back in 2005 the MediaWiki software behaved differently to now. Or perhaps it was for browsers with an incomplete or improper implementation of CSS 2.1, such as Internet Exploder 6. CSS 2.1 is still largely current: the relevant document in CSS 3 is CSS Fonts Module Level 3 (3 October 2013), which being a W3C Candidate Recommendation is not yet a full W3C Recommendation. The section concerned is 3.1 Font family: the font-family property. --Redrose64 🌹 (talk) 11:15, 29 September 2017 (UTC)[reply]

Redrose64 has indicated the latest advice from W3C. A summary of that advice is:

There are two types of font names: family and generic.
Generic names are serif, sans-serif, cursive, fantasy, and monospace – these fonts are supplied by the user agent (browser) itself.
It is recommended that the list of fonts supplied to font-family has a generic name as the last entry to allow a guaranteed fallback should none of the named family fonts be available to the user agent. The generic font name at the end of the list must not be quoted.
It is recommended that family font names that contain spaces, digits or punctuation (other than hyphens) are quoted.
Font family names that contain the following words must be quoted: inherit, serif, sans-serif, monospace, fantasy, cursive, initial and default.

That is the same as the current W3C recommendation from 7 June 2011. Hope that helps. --RexxS (talk) 13:32, 29 September 2017 (UTC)[reply]

Recent change

WOSlinker has recently changed some (or all?) lang templates to use html for italics. Has that change been discussed anywhere? Does it improve anything? Because it has caused some problems: forms such as {{lang-it|'Livorno'}} now display as 'Livorno' instead of Livorno. Could someone please fix this (or, if the change is an important improvement, give some hint as to how the various affected pages could be tracked down and fixed)?

What we really need is a font style parameter for this template (yes, I know that bold is technically a font weight); while italics are commonly used for words in other languages, they are not used for proper names – and the templates are often used for proper names, which sometimes need to be bold-faced. Is there any reason why this couldn't or shouldn't be implemented? Justlettersandnumbers (talk) 10:21, 16 October 2017 (UTC)[reply]

I think there might be a broader problem with WOSlinker's changes, see User talk:WOSlinker#Why HTML for italics?. – Uanfala 11:00, 16 October 2017 (UTC)[reply]

Hmm, it looks as if this should have discussed before it was implemented. May I suggest that, pending the outcome of such a discussion, someone with smart rollback and template editor permissions roll back these edits (which I think are these (about 369), plus one a little earlier and three the previous evening, two of them not to {{lang}}-foo templates)? That'd fix the errors for now, without prejudice to doing this properly if there's consensus that it is what's wanted. Justlettersandnumbers (talk) 13:27, 16 October 2017 (UTC)[reply]

I would do it, but I have been brought to ANI for unbreaking templates in the past, and there I was accosted by administrators who would not read, could not read, or both. I learned my lesson from that experience, which was "It's better to be happy than right". I support reverting these changes, however. I would like to hear back from WOSlinker, though, whose editing knowledge and skills I respect greatly. – Jonesey95 (talk) 14:07, 16 October 2017 (UTC)[reply]

I've changed all my edits on the lang templates back to the wiki style italics. There are two lang templates still using ther html style but I've never edited those. -- WOSlinker (talk) 13:01, 17 October 2017 (UTC)[reply]

@WOSlinker: Would you please change Module:Zh back as well? Scriptions (talk) 01:01, 19 October 2017 (UTC)[reply]

Done. -- WOSlinker (talk) 08:12, 19 October 2017 (UTC)[reply]

Parameter to selectively disable auto-italics in the Lang-`xx` templates

We need to be able to selectively disable (e.g. with |italic=no) the auto-italicization of non-English content in the {{lang-xx}} templates that auto-italicize ({{lang-es}}, etc.), so that the style is not applied to proper names (e.g. placenames, titles of songs, etc.).

For example, the present code of {{lang-es}} is:

{{Language with name|es|Spanish|''{{{1}}}''|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}

It hard-coding the italics.

The brute-force way around this is to go template-by-template and do something like:

{{Language with name|es|Spanish|{{#if:{{{italic|}}}|{{{1}}}|''{{{1}}}''}}|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}

A more elegant solution is to:

Put this test into {{Language with name}}, to do italics automatically by default, but exclude it when |italic=no (or |italic=0, etc., etc.) if passed into it.
Change all the {{lang-es}} type templates that should auto-italicize by default, to do:
{{Language with name|es|Spanish|{{{1}}}|italic={{{italic|}}}|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}
(and whatever other parameters they need, case by case)
Change all the {{lang-ru}} type templates (the non-Latin-script ones) that should not italicize, to do:
{{Language with name|ru|Russian|{{{1}}}|italic=no|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}
(and whatever other parameters they need, case by case)

— SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 07:09, 30 October 2017 (UTC)[reply]

I was hoping you could just put italics around the template when you use it in an article, but that doesn't work:

Spanish: Di me con quien andas....

Spanish: Don Quixote

It looks like a systematic solution within {{Language with name}} is necessary. – Jonesey95 (talk) 13:43, 30 October 2017 (UTC)[reply]

Yeah, the presence of the language name necessitates a template-internal fix. There is a grotesque hack one can do in situ, but we should not have to do this, and it's so brittle and ugly that later editors are likely to break or revert it: {{lang-es|<nowiki />''Don Quixote''<nowiki />}} – [Don Quixote] Error: {{Lang-xx}}: text has italic markup (help). An even-worse kluge: {{lang-es|1=Don Quixote}} – Spanish: Don Quixote. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 00:39, 31 October 2017 (UTC)[reply]

This template's documentation suggests:

{{lang-es|{{noitalic|Don Quixote}}}}

[[Spanish language|Spanish]]: <i lang="es">'"`UNIQ--templatestyles-0000001B-QINU`"'<span class="noitalic">Don Quixote</span></i>

Spanish: Don Quixote

—Trappist the monk (talk) 11:30, 31 October 2017 (UTC)[reply]

converting to lua

Because it amused me to do it, I have hacked up Module:Lang (I was surprised to see that name still available). Not complete but in this first iteration it appears to correctly render {{lang-??}} for languages supported by MediaWiki (not the whole 900+ languages supported by the {{lang-??}} templates (see Category:ISO 639 name from code templates) so the module will need a table of the language names not supported by MediaWiki. The module supports |italic= and appears to correctly render when that parameter is used. It also appears to handle rtl languages when |rtl= is set. The module doesn't deal well with erroneous input and does not yet support categorization; basic rendering of {{lang-??}} and {{lang}} templates first. In these examples, the live {{lang-??}} template is followed by the module {{#invoke:lang|lang_xx}}:

Spanish: Don Quixote – {{lang-es}}
- Script error: The function "lang_xx" does not exist. – |italic=yes
German: Don Quixote – {{lang-de}}
- Script error: The function "lang_xx" does not exist. – |italic=no
Spanish: Don Quixote – {{lang-es}}
- Script error: The function "lang_xx" does not exist. – |italic=
Hebrew: הורביץ, אלוף ("לופי") – {{lang-he}}
- Script error: The function "lang_xx" does not exist. – |italic=no |rtl=yes
- Script error: The function "lang_xx" does not exist.

—Trappist the monk (talk) 14:46, 31 October 2017 (UTC)[reply]

Schweet. I'm not sure what the "for languages supported by MediaWiki" means; we'd want it, surely, to try to do the right thing for any arbitrary value given for ?? in {{lang-??}}. We're more apt to need something like {{lang-fy}} or {{lang-hop}} than {{lang-es}} in most contexts (how often do we really need a wikilink explaining what the Spanish language is)? Ideally, {{lang-en-GB}}, etc. would also work after the Lua adaptation, since we have specific articles on various dialects of English. I guess that's a lot of work, but hopefully the {{lang}} code with 900+ of these already worked up can be dumped and munged in a way that makes it easy to adapt to the new Lua code. If there's a convenient way to extrapolate the language code to WP article correspondences in an array that is included that would probably make maintenance and expansion easier. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 16:20, 31 October 2017 (UTC)[reply]

for languages supported by MediaWiki refers to the languages supported by the magic word {{#language:}}. For example, ISO 639-1 code ar (Arabic) is supported:

{{#language:ar|en}} → Arabic

but ISO 639-2 code ara (also Arabic) is not:

{{#language:ara|en}} → ara

Of those languages that are supported, there are likely to be differences:

West Frisian: Don Quixote – {{lang-fy}}
- Script error: The function "lang_xx" does not exist.

in this case 'Western Frisian' agrees with the ISO 639 custodians; see loc 639-1 and 639-2, and sil 639-3

I think that the rule we can apply to 639-2 and -3 language codes is to fall back on 639-1 when there is a 639-1: code ara → ar; fry → fy; etc. We can keep a table specifically for fall back codes and another table to hold language names for 639-2 and -3 codes that don't fall back to 639-1 (Hopi, for example)

Hopi: Don Quixote
- Script error: The function "lang_xx" does not exist.

—Trappist the monk (talk) 17:21, 31 October 2017 (UTC)[reply]

I haven't been following the discussion, so apologies if this is irrelevant, but there exists Module:Language. – Uanfala 17:48, 31 October 2017 (UTC)[reply]

Yep, am aware of that. I haven't given it a close line by line reading but to me it looks to be more tailored to Wiktionary's needs than to Wikipedia's needs. I'm not opposed to merging this with that if it makes sense to do so.

—Trappist the monk (talk) 17:59, 31 October 2017 (UTC)[reply]

I support the module-ization of this template, especially if it means that categories like Category:Articles containing unknown ISO 639 language template will be easier to deal with. I spent a while creating (hundreds?) of ISO 639 templates and matching categories for obscure languages; the error category should more properly be used to track actual errors. I would be happy to help create a list of language codes and their matching full language names. – Jonesey95 (talk) 20:05, 31 October 2017 (UTC)[reply]

If there should be an array matching ISO 639-3 codes to language names, then it should ideally be in sync with Module:Language/data/ISO 639-3 as well as – whenever possible – with the comprehensive series of ISO 639:xxx redirects. — Preceding unsigned comment added by Uanfala (talk • contribs) 20:17, 31 October 2017 (UTC)[reply]

Perhaps better for initial experimentation is Module:Language/data/iana_languages which also has 639-1 codes. That file may be dated since a comment at the top of it reads 2014-04-10 and I haven't wrapped my brain around the documentation in Module:Language/name/data.

—Trappist the monk (talk) 21:05, 31 October 2017 (UTC)[reply]

The documentation for this template seems to suggest that BCP47 (IETF language tags) should be used when choosing the code for the template. That being the case, Module:Language/name/data would seem to be the best choice ... except that it includes a file called Module:Language/data/wp languages which has, as its accompanying 'documentation', this: "Wikimedia wikis uses some non-standard codes and a subset of IANA codes, plus composite codes". Why? Why 'spoil' the standard that way?

—Trappist the monk (talk) 23:16, 31 October 2017 (UTC)[reply]

Erutuon might have an opinion here, as he was the last to work on this module. – Uanfala 23:25, 31 October 2017 (UTC)[reply]

And there is more ... There are lang-xx templates that don't use BCP47 codes:

Old Anatolian Turkish: كَیکاوس
- Script error: The function "lang_xx" does not exist.

Presumably we can troll through Category:Articles containing unknown ISO 639 language template and find what appear to be legitimate language codes that aren't part of 639-anything and create a table for use by the module.

—Trappist the monk (talk) 12:56, 1 November 2017 (UTC)[reply]

One answer to my 'why spoil the standard' question might be because the 'official' name associated with code el is 'Modern Greek (1453-)' so we use Module:Language/data/wp languages to overwrite the 'official' name with 'Greek'.

—Trappist the monk (talk) 16:56, 1 November 2017 (UTC)[reply]

The fallback idea sounds good to me. I have to note that many 639-2 codes do not work, even with the current non-Lua templates (including some of the other Frisian languages/dialects). I think we have a big win if end up with a system in which none of the lang-family templates will redlink (or break entirely) unless a) we have no article or the language/dialect, or b) the code given is simply invalid. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 02:31, 1 November 2017 (UTC)[reply]

Module:Language/name/data has flaws. For example, that data would return these language names for these codes:

fy → Frisian

frr → Northern Frisian

frs → Eastern Frisian

fry → West Frisian

stq → Saterfriesisch

So, I've created an override table in Module:Lang/data so that we can override the BDP47 language names if needs be. The initial values assigned produce these results

fy → Script error: The function "lang_xx" does not exist.

frr → Script error: The function "lang_xx" does not exist.

frs → Script error: The function "lang_xx" does not exist.

fry → Script error: The function "lang_xx" does not exist.

stq → Script error: The function "lang_xx" does not exist.

—Trappist the monk (talk) 15:56, 2 November 2017 (UTC)[reply]

I saw that my name was mentioned above. It's a wide-ranging discussion, and I'm not sure exactly what I'm being asked.

But I guess I can explain something about Wiktionary's treatment of languages and scripts, which is very different. Language codes that are allowed in language-tagging and linking templates are listed in language data modules. Each language code corresponds to a single language name that we call a "canonical name". The canonical name appears in level-2 headers in entries. There are two subtypes of languages: what could be called "full" language codes are allowed in regular linking or tagging templates, and etymology languages (codes for subtypes of full languages) are allowed in etymology templates: for instance, grc-att for Attic Greek, a dialect of Ancient Greek (grc). Some of the codes are Wiktionary-specific: for instance, ine-pro for Proto-Indo-European.

We also have a script data module that contains information on scripts, such as Ustring patterns for the Unicode characters included in the script. Each language may have an array of script codes indicating which scripts it is written with, either in real life, in linguistic works, or on Wiktionary (for instance, {"Latn", "Brai", "Shaw", "Dsrt"} for English). This list of scripts is used by findBestScript in wikt:Module:scripts to automatically detect the script of text that is being tagged. Thus, script codes are generally not required in tagging templates.

Script codes are used as class names (for instance, word for English). Many script codes are from ISO 15924 (for instance, Arab); others were created to allow wikt:MediaWiki:Common.css to select different fonts for a variant of the script, either for their looks or their character set. (The script code fa-Arab has the same character pattern as Arab, but having a distinct script code for Persian allows it to be displayed in Nastaliq-style fonts. We don't use the ISO 15924 code Aran because it does not involve a different character set.)

We don't allow any modifiers to be appended onto language codes: placing ru-petr1708, ru-Cyrl, or en-US into a linking or tagging template results in a module error.

As you can see, Wiktionary is much more restrictive than Wikipedia. Many of the features are probably not applicable, but at least you have an overview. One feature that would be nice is script recognition, at least if Wikipedia starts adding CSS classes for scripts. (Or the module could add the very verbose inline CSS that is currently found in {{Script}} and its subtemplates. But inline CSS is best avoided because, to overrule it, you have to add important! to every rule in your personal stylesheet that contradicts it.) I started Module:Language/scripts and Module:Language/scripts/data based on wikt:Module:scripts and wikt:Module:scripts/data, but didn't go anywhere with it, because it would only be for my own use until Wikipedia has a coordinated approach to script tagging and the associated CSS.

As to Module:Lang, I have no objections to it being merged with Module:Language eventually if possible. It's unfortunate to have two modules that do similar things. I did attempt to make Module:Language generate the content of {{lang}} and considered the idea of doing the same for the lang-xx templates, but I don't have the motivation to sort out the crazy IETF tags (crazy from my perspective because I don't have to deal with them on Wiktionary), non-Wiktionary language codes, language names, colons, italicization, and the lack of any CSS classes for scripts. But if the distinct purposes of generating a Wiktionary-compatible tagging and linking template ({{wikt-lang}}) and a Wikipedia-style one ({{lang}}) can be coordinated, that would be great. — Eru·tuon 07:24, 4 November 2017 (UTC)[reply]

Thanks for that; it'll take a bit to digest but my initial reaction is that there is a basic lack of compatibility between Wiktionary and en.wiki in that en.wiki attempts, for the most part, to adhere to IETF/IANA language coding and attempts to minimize custom language coding. I do like the css-classes-for-scripting idea.

I think that you were mentioned here because you were the last editor to touch Module:Language/name/data so I guess that the mentioning editor presumed that by doing so, you had become the expert.

—Trappist the monk (talk) 10:09, 4 November 2017 (UTC)[reply]

Another feature I forgot to mention is that Wiktionary uses a data module to determine whether a script is RTL. It's probably a bad idea to set text direction for a given language, because languages are written in multiple scripts, and direction is a characteristic of the script, and as script direction can be determined automatically, editors should not have to deal with it at all. (On Wiktionary, this item in the data module is almost never used, because text direction is set for many RTL scripts in wikt:MediaWiki:Common.css with the CSS property direction: rtl;.) I've added script direction data to Module:Language/scripts/data.

Another thing I could mention is that we use language and script objects that have several methods (for basic things like retrieving the code and canonical name, or more complex things like retrieving the scripts used by a language, transliterating, or counting the characters in a string that belong to the script). These methods are shared across all objects of the same type using a metatable. This is convenient, because you can use a single variable for the language or the script and retrieve the code or the name from it when needed, and cleaner, because the code that handles the retrieval of the code and name is removed from the functions that use the code and name. But an object is probably overkill at this point if just the code and name are used. Another possibility would be table containing the code and first name (for instance, { code = "en", name = "English" }). — Eru·tuon 21:20, 4 November 2017 (UTC)[reply]

categorization

I've added categorization code to the module. The live {{lang-??}} and {{lang}} templates use {{lang}} to do their categorization. {{lang}} will add Category:Articles containing unknown ISO 639 language template when there isn't a Category:ISO 639 name from code templates template that matches the language code. The module doesn't use these templates so it uses a different category when the code isn't in Module:Language/name/data: Category:Articles containing unknown language template codes – that name could certainly be less wordy and more concise. Suggestions?

The live templates do not categorize pages that are not in article space. For the time being, I have disabled that discrimination in the module for the purposes of debugging so you will see red-linked categories produced by the module at the bottom of this page (all hidden categories if 'Show hidden categories' is checked at Special:Preferences#mw-prefsection-rendering). If {{lang}} and {{lang-??}} templates ever call Module:Lang, namespace discrimination will be reinstated.

The red-linked categories attached to this page are Category:Articles containing Frisian-language text because 'West Frisian' (the current category name) does not match the code/name defined by BCP47+Module:Language/data/wp languages; Category:Articles containing Hopi-language text because ~~there is no~~ the {{ISO 639 name hop}} template ~~and therefore~~ has no matching category. For the Hopi case, the live {{lang-hop}} dumps all Hopi-language instances into Category:Articles containing non-English-language text. I think that philosophy is misguided. I think that red-linked categories are more likely to get 'fixed' than a blue-linked dumping-ground category.

—Trappist the monk (talk) 09:44, 2 November 2017 (UTC)[reply]

Yeah, I wasn't going to get into those yet. Getting all the ISO stuff to work would be first priority, but it would be nice to support codes introduced by others like Glottolog, at least for languages and dialects with no ISO code. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 17:27, 1 November 2017 (UTC)[reply]

I'm pretty sure that {{ISO 639 name hop}} has existed since 2011, but it looks like the non-existence of the category causes the generic categorization. You can see a couple hundred other such templates with gaps at Category:ISO 639 name from code templates without a category. I created a bunch of them, but it gets tedious, especially because three other categories are also requested by the documentation for each ISO 639 name xxx template. A bot might be helpful in creating all of these red-linked categories. – Jonesey95 (talk) 00:32, 2 November 2017 (UTC)[reply]

You're right, I've edited my post.

I can now see why this 'simple' task of converting the {{lang}} and {{lang-??}} templates to a module has been started before but never been completed. On the face of it, conversion to a module is simple but then you look under the bonnet ...

—Trappist the monk (talk) 09:44, 2 November 2017 (UTC)[reply]

Keep going! If anyone can do it, you can. Let us know how we can help. – Jonesey95 (talk) 21:45, 2 November 2017 (UTC)[reply]

Category:Articles containing unknown language template codes has become Category:Lang and lang-xx template errors. I have also created Category:Lang and Lang-xx templates using Module:Lang to track those templates that are using the module during the transition period. Once all templates that can be have been changed to use the module, this category can go away.

—Trappist the monk (talk) 13:06, 6 November 2017 (UTC)[reply]

translation and transliteration

The {{lang-??}} templates have support for translation rendering and some support transliteration rendering. I have attempted to add that support to Module:Lang.

Literal translation

{{lang-de|Im Westen nichts Neues|lit=In the West Nothing New}}

German: Im Westen nichts Neues, lit. 'In the West Nothing New'
- [[German language|German]]: Im Westen nichts Neues, [[Literal translation|lit.]] 'In the West Nothing New'

{{#invoke:lang|lang_xx|code=de|text=Im Westen nichts Neues|italic=|translation=In the West Nothing New}}

Script error: The function "lang_xx" does not exist.
- Script error: The function "lang_xx" does not exist.

Literal translation with generic transliteration

{{Lang-el|Θεοτόκος|links=yes|translation=God-bearer|translit=Theotokos}}

Greek: Θεοτόκος, romanized: Theotokos, lit. 'God-bearer'
- [[Greek language|Greek]]: Θεοτόκος, [[Romanization of Greek|romanized]]: Theotokos, [[Literal translation|lit.]] 'God-bearer'

{{#invoke:lang|lang_xx|code=el|text=Θεοτόκος|italic=no|translation=God-bearer|translit=Theotokos}}

Script error: The function "lang_xx" does not exist.
- Script error: The function "lang_xx" does not exist.

Literal translation with ISO 843 transliteration

{{lang-el}} doesn't allow editors to specify the transliteration standard nor does the underlying {{Language with name and transliteration}} which calls {{transl}} which does; confused yet?

Script error: The function "lang_xx" does not exist.
- Script error: The function "lang_xx" does not exist.

—Trappist the monk (talk) 14:06, 2 November 2017 (UTC)[reply]

Well, you were definitely right about this being more complicated than it seemed! Definitely appreciate the effort you're putting into this. We've needed to Lua-ize this for so long (and I don't have the Lua skillz to do it). — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 17:07, 2 November 2017 (UTC)[reply]

I got to wondering about the html/css markup around transliteration renderings when it occurred to me that the module doesn't (because {{transl}} doesn't) include the lang attribute in the enclosing ...:

{{transl|ar|al-Khwarizmi}} → al-Khwarizmi

al-Khwarizmi

For this example, shouldn't the module output something like this:

<span lang="ar-Latn" title="Arabic transliteration" class="Unicode" style="white-space:normal; text-decoration: none">al-Khwarizmi</span>

As I understand it, in css, white-space:normal and text-decoration:none are the defaults. If they are used here then that suggests that the css class="Unicode" class somehow alters those two properties. Where is class="Unicode" defined? Pinging Editors Dbachmann, the author of {{transl}}, and Ruud Koot, the author of these edits.

—Trappist the monk (talk) 12:53, 14 November 2017 (UTC)[reply]

Found it, and it appears to be gone:

came into existence
moved to common.css/WinFixes.css
moved to common.js
deleted

So then, does that not mean that the html/css markup around transliteration renderings should be:

al-Khwarizmi

—Trappist the monk (talk) 13:46, 14 November 2017 (UTC)[reply]

Changed. Results can be seen in the transliteration example above.

—Trappist the monk (talk) 15:52, 16 November 2017 (UTC)[reply]

links=no

If I have a template that renders like this:

{{lang-he/sandbox|פרת|Perat|lit=Euphrates|links=}} → Hebrew: פרת, romanized: Perat, lit. 'Euphrates'

If I set |links=no, shouldn't that unlink the primary language (Hebrew) and the transliteration and literal translation static texts?

{{lang-he/sandbox|פרת|Perat|lit=Euphrates|links=no}} → Hebrew: פרת, romanized: Perat, lit. 'Euphrates'

—Trappist the monk (talk) 00:03, 5 November 2017 (UTC)[reply]

I would certainly think so. Another issue I was just thinking of again today (and grinding my teeth) is that we need a way to suppress these things entirely e.g. with a |labels=no and |labels=lang; we don't need the language name, the "translit.", or the "lit." labels after the first occurrence in the same block of material, or sometimes we need the language one only, e.g. when comparing cognates. What we're doing now is using the template once, then abandoning it for manual markup with a {{lang|xx}} in it; or reusing the {{lang-xx}} and driving readers nuts by repeating the same crap over and over at them as if they have dain bramage. ;-/ — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 14:18, 5 November 2017 (UTC)[reply]

For the time being, I'm going to limit 'new features' to the |italic= switch and perhaps unlinking the translation and transliteration static text so that I can think about making the templates function correctly given a variety of inputs. That I think is mostly done so I'm about to take the module live on a handful of {{lang-??}} templates to see what happens – to see if anyone outside of this conversation notices. You should probably start a new wish-list topic for the label thing.

Done, below. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 14:28, 6 November 2017 (UTC)[reply]

—Trappist the monk (talk) 21:04, 5 November 2017 (UTC)[reply]

sandbox testing

Category:Lang-x templates lists several templates that have sandboxen. Of those, where the template also has a /testcases page, I have edited the sandbox to use Module:Lang. So far, these:

Template:Lang-ar/testcases

Template:Lang-arc/testcases

Template:Lang-el/testcases

Template:Lang-en/testcases

Template:Lang-es/testcases

Template:Lang-hbs/testcases

Template:Lang-he/testcases

Doing this found a handful of coding errors that have been fixed. The interesting case in these templates is {{lang-hbs}} Serbo-Croatian. This language uses both Latin characters and Cyrillic characters (not at the same time, I think) so the issue of italics arises. Rendering is controllable with |italic=no but it might be better to create another script parameter (|script= is currently used to override |code= when rendering the transliteration tool tip – though I don't know how useful that actually is). In this scheme, if |lang-script= is set to a valid IANA script, then we would write > and if not Latn would override whatever |italic= is to no-italic.

The previous sandbox version of {{lang-hbs}} had some module code that would automatically transliterate the input text to the other script. That apparently didn't ever become live because there are/were problems transliterating Cyrillic to Latin in the presence (or lack – I'm not quite sure) of certain Unicode characters. I don't think that Module:Lang wants to go there.

The other one that I have found, though I've done nothing with it yet, is {{lang-sco}}. That template introduces |l=, an alias of |link=; |i=, to control italic rendering; and |abbr=, to replace the langauge name with an unlinked abbreviation of the name. I am sure that we really don't need |l= because in the text editor l looks too much like 1 and because to someone unfamiliar with the internals of these templates, |l=no is meaningless; this latter reason applies to |i= as well. Is there a standardized list of language abbreviations? If yes, then perhaps we should support |abbr=; if no, then we should not support |abbr=. Without a standard list, editors can (and will) write whatever suits them but what they concoct may not be understandable by readers and other editors.

—Trappist the monk (talk) 12:55, 3 November 2017 (UTC)[reply]

I suppose one could poke through the hundreds of templates to look for parameters, but another way to do it would be to convert the templates one by one to the new module, and have module code that detects unsupported parameters. Like the proposed |script=, such parameters could be evaluated for their utility and potentially incorporated into the module. Parameters that are determined to be unneeded or non-standard could be removed or converted to standard parameters. – Jonesey95 (talk) 14:53, 3 November 2017 (UTC)[reply]

Isn't [poking] through the hundreds of templates to look for parameters more-or-less the same as [converting] the templates one by one because to do the latter you are in effect doing the former? These templates are basically similar enough that we will see the oddball parameters straight away; no need for the module to detect anything. Compare this edit to {{lang-el/sandbox}} as an example or this apparently more complex edit to {{lang/sandbox}}.

—Trappist the monk (talk) 15:39, 3 November 2017 (UTC)[reply]

Modifying the templates will tell us whether or not the unusual parameters are actually used, not just whether they exist in the template. Unused parameters can be discarded. – Jonesey95 (talk) 20:51, 3 November 2017 (UTC)[reply]

Editing {{lang/sandbox}} to use Module:Lang showed how it is necessary for the module to support IETF language tags so I've modified the module accordingly. When processing {{lang}}, because that template receives its language code directly from the template in wikitext, editors will be creative in how they set that parameter. The module now supports the most commonly used (I think) IETF tags:

primary language code-script-region

where

primary language code is the two- or three-character ISO 639 language code lowercase (ll)

script is the four-character IANA script code; title case (Ssss)

region is the two-character IANA region code; uppercase (RR)

in these forms

ll

ll-Ssss

ll-RR

ll-Ssss-RR

The module emits an error message when IETF tags don't match these forms or do look right but have invalid content. These tests should probably be added to the {{lang-??}} so that we can, if appropriate create new templates that might make use of it (perhaps {{lang-hbs-Cyrl}} and {{lang-hbs-Latn}}).

—Trappist the monk (talk) 15:55, 3 November 2017 (UTC)[reply]

I don't know how the ISO 639 name xx templates fit into all of this, but this list of redirects to Template:ISO 639 name ru might provide some useful examples of scripts that are in use. Some of the redirects appear to be for invalid scripts. – Jonesey95 (talk) 20:51, 3 November 2017 (UTC)[reply]

This is why we want to make a module. The article Film speed transcludes {{lang|ru-Cyrl|ГОСТ}} which transcludes {{ISO 639 name ru-Cyrl}} which redirects to {{ISO 639 name ru}} which returns 'Russian' so that the article is properly categorized in Category:Articles containing Russian-language text. With the module, Film speed transcludes {{lang|ru-Cyrl|ГОСТ}} which invokes Module:Lang which renders and categorizes in one go.

I imagine that the others serve similar purposes. {{ISO 639 name RU}} is wrong-case language code; should be ru because RU is the ISO 3166 country code for Russian Federation. {{ISO 639 name ru-Cyril}} is a misspelling of the IANA script code Cyrl. I have no idea where ru-1708 came from. Its only use is in Russian Empire; the redirect {{ISO 639 name ru-1708}} was created at the same minute, both by Editor OwenBlacker who can perhaps explain.

I think that the module handles all of these correctly:

{{lang/sandbox|ru-Cyrl|ГОСТ}} → [ГОСТ] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

{{lang/sandbox|ru-Cyril|ГОСТ}} → [ГОСТ] Error: {{Lang}}: unrecognized variant: cyril (help)

{{lang/sandbox|ru-Latn|GOST}} → GOST

{{lang-ru|ГОСТ|translit=GOST|script=Latn}} → Russian: ГОСТ, romanized: GOST

{{lang/sandbox|RU|ГОСТ}} → ГОСТ

{{lang/sandbox|ru-1708|ГОСТ}} → [ГОСТ] Error: {{Lang}}: unrecognized variant: 1708 (help)

—Trappist the monk (talk) 22:45, 3 November 2017 (UTC)[reply]

That is an excellent explanation. I look forward to getting rid of the current morass of hundreds of templates, redirects, and other madness. Keep up the good work. – Jonesey95 (talk) 23:01, 3 November 2017 (UTC)[reply]

Hey there, saw your {{ping}}. ru-1708 refers to the 1708 "civil script" reform of the Russian alphabet under Peter the Great. Text written in that specific form of Russian should be tagged ru-1708 to distinguish it from modern Russian. It's a valid IETF language tag, but using a variant subtag, so not the more common types you're covering here. German has the same kind of tags with de-1901 and de-1996; French has fr-1990, Portuguese has pt-1911 and pt-1990; Scottish Gaelic has gd-1767 and gd-1981 and so on. While there will always be variant subtags that won't get recognised by something all-encompassing (though you could just truncate off the last section, especially if it matches the regex -(\d{4}|[a-z]{5,8})), merging templates together like this is an awesome project. Anything that makes it easier for editors to add language tags to content gets my support :) — OwenBlacker (talk) 23:48, 3 November 2017 (UTC)[reply]

Are you sure? There does not appear to be a 1708 variant listed. There is this, extracted from the current IANA language-subtag-registry file:

%%
Type: variant
Subtag: petr1708
Description: Petrine orthography
Added: 2010-10-10
Prefix: ru
Comments: Russian orthography from the Petrine orthographic reforms of
  1708 to the 1917 orthographic reform

Same thing? de-1911 and de-1996 yes, but the others that you mentioned, no. The data files that the new Module:Lang depends on aren't necessarily current so at the moment I'm working on code that will extract language, script, and region information from the language-subtag-registry file. Currently there is no 'variant' data file but that could be extracted as well.

—Trappist the monk (talk) 00:44, 4 November 2017 (UTC)[reply]

I have extended the iana data extraction tool so that it also extracts variant data. The result is Module:Language/data/iana_variants. With that data module, and a bit of new code, Module:lang can support:

{{lang/sandbox|ru|Россійская Имперія}} → Россійская Имперія

{{lang/sandbox|ru-Cyrl|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

{{lang/sandbox|ru-Cyrl-RU|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

{{lang/sandbox|ru-Cyrl-RU-petr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

{{lang/sandbox|ru-petr1708|Россійская Имперія}} → Россійская Имперія

but rejects improperly formed tags and emits an error message:

{{lang/sandbox|RU|Россійская Имперія}} → Россійская Имперія

{{lang/sandbox|ru-Cyril|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: unrecognized variant: cyril (help)

{{lang/sandbox|ru-Cyrl-ru|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

{{lang/sandbox|ru-Cyrl-RU-Petr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

{{lang/sandbox|ru-1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: unrecognized variant: 1708 (help)

The variant data records in the iana language-subtag-registry file include a Prefix item that specifies the language code used with the variant. For variant petr1708 the Prefix is ru so using that variant with another language code is rejected:

{{lang/sandbox|de-petr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: unrecognized variant: petr1708 for code: de (help)

These changes also apply to the {{lang-??}} template support in Module:Lang.

—Trappist the monk (talk) 20:54, 5 November 2017 (UTC)[reply]

BCP47 says that IETF language tags are case insensitive so I have relaxed the checking to allow any mixture of case. The code does, however, prettify its output (not that anyone will see it):

{{lang/sandbox|RU-cYRL-ru-PeTr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)

[Россійская Имперія] <span style="color:#d33">Error: {{Lang}}: script: cyrl not supported for code: ru ([[:Category:Lang and lang-xx template errors|help]])</span>

I have also added support for three-digit region codes:

{{lang/sandbox|es-419|Spanish in Latin America and the Caribbean}} → Spanish in Latin America and the Caribbean

—Trappist the monk (talk) 13:23, 6 November 2017 (UTC)[reply]

Fantastic work. Should we also be warning against or disallowing language tags with suppressed script codes, e.g. ru-Cyrl?

– Quoth (talk) 11:51, 6 November 2017 (UTC)[reply]

I have not thought about that. Can you make a separate wish-list topic to hold this and other idea so that it/they don't get lost?

—Trappist the monk (talk) 13:23, 6 November 2017 (UTC)[reply]

I set up a section for that, and put both my and Quoth's items in it. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 14:28, 6 November 2017 (UTC)[reply]

iana data

Module:Lang uses Module:Language/data/iana languages, Module:Language/data/iana scripts, and Module:Language/data/iana regions which are, I believe, derived from the 2014-04-10 IANA language-subtag-registry file. There is a new version that is current as of 2017-08-15. I believe that we should update our data files to be inline with the current registry file. To that end I have cobbled-up a data extraction tool that creates the tables held in the data files from the IANA source. You can see the result.

Like the current version of the data modules, the data created by the extraction tool does not have codes that are deprecated, codes that have preferred alternatives, nor codes that are marked as private use. I do not believe that there is a need for these particular codes but I could be wrong. I'm going to update the data files. If anyone knows of a reason to include the codes that the tool skips, let us know.

—Trappist the monk (talk) 16:16, 4 November 2017 (UTC)[reply]

Along these lines I've hacked another data extraction tool that will generate a table for Module:Language/data/ISO 639-3. I have used this tool to update that module and the other tool to update the IANA data modules.

But what about Module:Language/data/wp languages? Anyone know where the data in that module came from? Is there an 'official source'?

—Trappist the monk (talk) 20:22, 5 November 2017 (UTC)[reply]

problems with the data set

List of native plants of Flora Palaestina (E-O) times out before it can be fully rendered. I guess I'm not all that surprised because the data set (all of those modules mentioned in §iana data) is recompiled every time a {{lang}} or {{lang-??}} template is called (in this case the template is {{rtl-lang}}). The Lua processing time limit is 10 seconds. As an experiment, I forced the module to use only one of the data modules Module:Language/data/iana languages and 'included' it in Module:Lang with mw.loadData() instead of with require(). The page rendered properly in about 2 seconds. The differences are significant. require() allows the included modules to hold executable code but must be reloaded with every {{#invoke:}} (every 'template' in the wikisource). The modules 'included' with mw.loadData() must not hold executable code but are loaded only once per page.

The obvious solution is to create some sort of static version of the table of tables created by require ('Module:Language/name/data'). These tables don't need to recompiled for every use because they will only change when the standards from which they were created change.

—Trappist the monk (talk) 17:54, 17 November 2017 (UTC)[reply]

You should be able to do mw.loadData ('Module:Language/name/data'), and the data will not be recompiled each time one of these templates is transcluded. That is the way we load data modules on Wiktionary. — Eru·tuon 20:50, 17 November 2017 (UTC)[reply]

That works. Thanks. Failure on my part to grasp this in the documentation: "The value returned from the loaded module must be a table ... [of] booleans, numbers, strings, and other tables" For a long time I somehow misunderstood that (perhaps not necessarily from the documentation; could have been from other reading or conversation) because modules always return tables (even if they are tables of functions – something that is used quite a bit in Module:Citation/CS1. Clearly it means that it doesn't matter how the table is built, just that when the module returns, it can only return a table containing a limited subset of data types.

—Trappist the monk (talk) 21:08, 17 November 2017 (UTC)[reply]

Exactly. The rationale is that functions can "trap" values from one module invocation that could then be transferred to another, or can otherwise change their behavior each time they are called. (For instance, the iterator function returned by ipairs(array) giving a new index and value from the array each time it's called.) So functions would in many cases make unexpected things happen if they were saved in memory and accessed by multiple invocations. Other types (number, string, boolean, nil) don't behave in this way, so they can safely be saved in a table by mw.loadData, accessed through the metatable of a dummy table, and shared between modules. In any case, you can always try loading a module with mw.loadData, and it'll tell you if you're not allowed to. — Eru·tuon 22:14, 17 November 2017 (UTC)[reply]

multiple text scripts in a single template

There are a couple of issues here:

{{lang-abq|Къарча-Черкес автоном область ''Q̇arća-Ćerkes avtonom oblast’''}}

Abaza apparently has both Cyrillic and Latin scripts so the italicized part could be the correct abq-Latn or it could simply be a transliteration of the abq-Cyrl. I don't know how to tell the difference. My gut would say that switching alphabets 'midstream' is inappropriate. The same applies to transliterations; {{{1}}} should not hold text in two alphabets.

Module:Lang detects italic markup in {{{1}}} (also incorrectly finds bold markup – I'll fix that) because the correct way to control italicization of {{{1}}} is with |italic=

All of this suggests that the correct way of writing this would be:

{{lang-abq|Къарча-Черкес автоном область}} {{lang|abq|Q̇arća-Ćerkes avtonom oblast’|italic=yes}}

—Trappist the monk (talk) 11:07, 7 November 2017 (UTC)[reply]

Trappist the monk, some languages use three scripts (at least) – kk.wp is available in Latin, Cyrillic and Farsi script, for example. It would be convenient if all could be accommodated within a single template, but the sort of workaround you illustrate above could work too. Justlettersandnumbers (talk) 16:47, 7 November 2017 (UTC)[reply]

As a solution to this languages-with-multiple-scripts problem, I have renamed the existing {{#invoke:}} parameter |script= to |transl-script= and created a new |script= that applies to the text and to the language code.

In the example above, both alphabets are contained in a single template. That is still wrong and this change does nothing to permit that. But, it does start us on the way to supporting multiple alphabets in a single template as I have suggested at #Wish list for future enhancement

{{#invoke:Lang|lang_xx|code=abq|text=Къарча-Черкес автоном область|script=Cyrl}}

Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

{{#invoke:Lang|lang_xx|code=abq|text=Q̇arća-Ćerkes avtonom oblast’|script=Latn}}

Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

Above, because |script=Cyrl, the text is not italicized. When |italic= is not set and |script= is set, the module will apply italic markup only when the specified script is Latn (case ignored). When |italic= is set, it controls:

{{#invoke:Lang|lang_xx|code=abq|text=Къарча-Черкес автоном область|script=Cyrl|italic=yes}}

Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

The module emits an error message if the value assigned to |script= is not recognized:

{{#invoke:Lang|lang_xx|code=abq|text=Къарча-Черкес автоном область|script=Cyril}}

Script error: The function "lang_xx" does not exist.

The module does not now, but will, compare the IETF script subtag ~~provided to {{lang}} or~~ received from a {{lang-??}} to |script=. If they are not the same, the module will emit a mismatch error message.

Another reason to do this? So we don't have to fork a bunch of templates to properly support script subtags. —Trappist the monk (talk) 13:55, 9 November 2017 (UTC)[reply]

Revision; |script= is not needed with {{lang}}. Because the template gets the language code directly from {{{1}}}, editors can simply add the appropriate IETF script subtag:

abq → abq-Cyrl or abq-Latn

Now emits an error message when the script subtag in |code= does not match the value assigned to |script=:

{{#invoke:Lang|lang_xx|code=abq-latn|text=Къарча-Черкес автоном область|script=Cyrl}}

Script error: The function "lang_xx" does not exist.

This error message should be rare because it should not be necessary to have {{lang-??}} templates that specifically set |code= to a value that includes an IETF script subtag.

I suppose, for completeness, the {{lang-??}} templates should also support |region= and |variant= (also not required in {{lang}}).

—Trappist the monk (talk) 14:40, 9 November 2017 (UTC)[reply]

I wonder if |transl-script= should be |trans-script= instead, to match the |trans-title= parameter style used in the popular Citation Style 1 templates. – Jonesey95 (talk) 15:27, 9 November 2017 (UTC)[reply]

Because too close to |transcript=? Because |translit-script= just felt too long? Because {{transl}} is the subsidiary template used by the current {{lang-??}} templates that support transliteration? Of course, none of these are good reasons.

For the most part, there are four different groups, if you will, of parameters in {{lang-??}} templates:

main group has:
fixed by the {{lang-??}} template – language code; module parameter |code=

{{{1}}} – text; module parameter |text=

|script= – language script (only templates rendered by the module); module parameter |script=
transliteration group:
|translit= or {{{2}}} – transliteration of the text in {{{1}}}; module parameter |translit=

|script= – not part of {{lang-??}} but introduced in {{Language with name and transliteration}}; module parameter |transl-script=

|std= – transliteration standard (only templates rendered by the module); module parameter |std=
translation group:
|lit= or {{{2}}} – literal translation; module parameter |lit=
control group:
|rtl= – fixed by the template; module parameter |rtl=

|italic= – italic display of {{{1}}} (only templates rendered by the module); module parameter |italic=

Can't do much about existing template parameters here and now (|lit=? who thought that was a good parameter name?)

—Trappist the monk (talk) 16:12, 9 November 2017 (UTC)[reply]

That all looks better to me. If we have both translation and transliteration, we should not have any parameters that are abbreviated "trans" or "transl". That's just begging for confusion. – Jonesey95 (talk) 20:27, 9 November 2017 (UTC)[reply]

Would want |lit= to continue working; lots of use that, since it's short and mnemonic for what it outputs. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 17:34, 10 November 2017 (UTC)[reply]

{{lang-he/sandbox|פרת|Perat|Euphrates}} → Hebrew: פרת, romanized: Perat, lit. 'Euphrates'

—Trappist the monk (talk) 20:40, 10 November 2017 (UTC)[reply]

Following up on my musing that for completeness, the {{lang-??}} templates should also support |region= and |variant=, implemented:

{{#invoke:Lang|lang_xx|code=ru|text=какой-то кириллический текст|script=Cyrl|region=ru|variant=luna1918}}

Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

—Trappist the monk (talk) 13:53, 10 November 2017 (UTC)[reply]

live testing

I have implemented the module in {{lang-aa}}, {{lang-bn}}, and {{lang-grc}}.

—Trappist the monk (talk) 14:42, 6 November 2017 (UTC)[reply]

+{{lang-ku}}, {{lang-mix}}, and {{lang-sco}}

—Trappist the monk (talk) 13:21, 7 November 2017 (UTC)[reply]

+{{lang-aec}}, {{lang-af}}, {{lang-ain}}, {{lang-ain}}, {{lang-akk}}

—Trappist the monk (talk) 17:16, 11 November 2017 (UTC)[reply]

switching |lang= to the module

I am at the point of switching {{lang}} to use the module. I don't anticipate that this will cause problems. But, with 625,000-ish transclusions, problems may arise. The number is so large because a majority of the {{lang-??}} templates use {{lang}} to create the ... around the text. I have disabled the italic checking for {{lang}} because such checking will detect the hardcoded italic markup added by many (most) of the {{lang-??}} templates that have not been converted to the module.

Objections to proceeding?

—Trappist the monk (talk) 16:54, 13 November 2017 (UTC)[reply]

Sounds good, though it may not be idea for lang-xx to be transcluding lang this way; better that it does this in Lua with a call to the same function, to reduce the transclusion count. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 21:05, 13 November 2017 (UTC)[reply]

The module supports both. The old versions of {{lang-??}} transclude {{lang}}. {{lang-??}} templates that use the module don't transclude {{lang}} because the module does it all.

Because the old templates transclude {{lang}}, the module will be doing the {{lang}} work that is now done by the wikitext version of {{lang}} until all of the {{lang-??}} templates are converted to the module.

—Trappist the monk (talk) 21:41, 13 November 2017 (UTC)[reply]

Switched.

—Trappist the monk (talk) 23:23, 18 November 2017 (UTC)[reply]

what about lang-?? with this ^?

From {{lang-am}}:

[[Help:Multilingual support (Ethiopic)|<sup><span class="t nihongo icon" style="color:#00e;font:bold 80% sans-serif;text-decoration:none;padding:0 .1em;">?</span></sup>]]

which gives us the '?' and a link to Help:Multilingual support (Ethiopic):

{{lang-am|text}} → Amharic: text

An insource search conducted in the template namespace found:

{{Lang-am}}

{{Lang-ti}}

{{Lang-gez}}

All of these are Ethiopic languages. If this is all that use this markup, then, for standardization, it would seem best to discontinue support.

—Trappist the monk (talk) 19:57, 13 November 2017 (UTC)[reply]

Not sure I follow. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 21:05, 13 November 2017 (UTC)[reply]

What don't you understand?

—Trappist the monk (talk) 21:45, 13 November 2017 (UTC)[reply]

@Trappist the monk: I work very closely with articles containing Ethiopic script. I agree with discontinuing support. Most modern browsers support rendering Ethiopic script. This is an outdated help page that should be archived. It is no longer necessary. The ^? is not needed or helpful any more. —አቤል ዳዊት^?(Janweh64) (talk) 08:31, 8 December 2017 (UTC)[reply]

In fact, it has become a page for software developers to add promotional spam. —አቤል ዳዊት^?(Janweh64) (talk) 08:44, 8 December 2017 (UTC)[reply]

recent changes and lang-ar

I am minded to revert back to this version of the module. A problem was introduced with these edits that made the module ignore the |italic=no setting in {{lang-ar}} so that all Arabic script was rendered in italics font when it should not have been.

The purpose of the module edits was to simplify a handful of if statements. Were this code running on a micro-controller, such optimization might be required. It is not so we can afford to spend some processor cycles and use up memory space evaluating if 'yes' == args.italic then. There is the added benefit that editors who come after us can know specifically what it is that is needed at that particular point in the code.

—Trappist the monk (talk) 11:16, 18 November 2017 (UTC)[reply]

Because we managed to break the module and because there are currently some 41k transclusions of it, I have protected it and created Module:Lang/sandbox.

—Trappist the monk (talk) 11:32, 18 November 2017 (UTC)[reply]

Additionally, I have started Module:Lang/testcases; results at Module talk:Lang/testcases. The sandbox produces different (correct) results for these tests.

—Trappist the monk (talk) 14:38, 18 November 2017 (UTC)[reply]

Auto-italicization of Latin scripts

The module currently seems to auto-italicize language tags which include a Latn script code, while the previous template didn't. Because the previous template didn't automatically do it, the correct way to format these words was to italicize them using wiki markup, which means that the module now appears to render them with two sets of encapsulating  tags (presumably one from the mark-up and one from the module). This also means the module auto-italicizes Latin scripts some of the time, but not most of the time (such as in the common cases where the Latn script is redundant/should be suppressed, e.g. for fr, es, it). I think this should be reverted to the previous behaviour to both avoid this inconsistency and the duplicate HTML.

If, however, anyone wants to go the opposite direction and make the module output for Latin scripts more consistent by auto-italicizing all Latin scripts, I'd also be fine with the relatively small amount of redundant HTML generated by the current formatting in order to remove the need for doing it manually in the future. That might be doable by checking a language's suppressed script codes for Latn when no script tag has been supplied, and italicizing it if true. – Quoth (talk) 16:12, 19 November 2017 (UTC)[reply]

Examples of what you mean are always appropriate. Which template are we talking about? Many of the {{lang-??}} templates unconditionally italicize the text in {{{1}}}.

This is a work in progress. It is not possible (for this human, at least) to, in one go, switch all of the {{lang}} and {{lang-??}} templates to use Module:lang.

—Trappist the monk (talk) 18:09, 19 November 2017 (UTC)[reply]

Right, sorry: you can find an example on this page under the Chinese Mandarin entry with its pinyin transliteration bàng, which uses cmn-Latn; and I'm only talking about usage of the main {{lang}} template. – Quoth (talk) 21:59, 19 November 2017 (UTC)[reply]

I'm having a difficult time understanding what the problem is. If I take a step back and view Open back unrounded vowel with the previous version of the template (the last one before Module:lang was introduced), the bàng text looks the same (to me) as it does when that page is rendered with the module. See for yourself:

this link opens the edit window for the previous version of {{lang}}
https://en.wikipedia.org/w/index.php?title=Template:Lang&action=edit&oldid=775049579
in the Preview page with this template box put:
Open back unrounded vowel
click the adjacent Show preview button

That is how it 'used' to look. Compare it against the rendering made by the live template. How are they different? They don't seem different to me.

—Trappist the monk (talk) 23:15, 19 November 2017 (UTC)[reply]

The look hasn't changed, only the HTML markup and the circumstances around when the text will be auto-italicized by {{lang}}. If you inspect the HTML you should see two sets of surrounding  tags instead of one; one set from the wiki markup, which was previously required for formatting, and one from the new lang module output. – Quoth (talk) 21:13, 20 November 2017 (UTC)[reply]

I did your experiment. First I viewed Open back unrounded vowel with the template as it was before the switch to the module (old). I right-clicked view source and to see the html the en.wiki serves, copy/pasted the markup for bàng. I repeated the procedure with the current template/module (new). Here are the results:

<a href="/wiki/Pinyin" title="Pinyin">bàng</a> – old

<a href="/wiki/Pinyin" title="Pinyin">bàng</a> – new

These look the same to me. Is it possible that you are looking at a cached version of an older page?

—Trappist the monk (talk) 21:58, 20 November 2017 (UTC)[reply]

Curious. I've cleared my caches, and purged the page, but on the current version of that article I see this markup:

<a href="/wiki/Pinyin" title="Pinyin">bàng</a>

I should note that I'm looking at the publicly available page, because I'm unable to use the template edit or preview functionality due to it being protected. – Quoth (talk) 20:00, 21 November 2017 (UTC)[reply]

I'm seeing the markup <a href="/wiki/Pinyin" title="Pinyin">bàng</a> when I preview the relevant section too. There is no caching involved because I previewed the page before looking at the source code. — Eru·tuon 23:15, 21 November 2017 (UTC)[reply]

most lang-?? templates switched to the module

I have switched most {{lang-??}} templates to use Module:Lang. Most were relatively trivial to switch, the remaining templates less so. These remain to be switched, redirected, deleted, or not:

{{lang-ber}} – this one expects as {{{2}}} an ISO 15924 script identifier – 244 transclusions
{{Lang-cdo-hani}} – includes several fonts in css in a span around {{{1}}} which don't appear to be necessary – 1 transclusion
{{Lang-cdo-latn}} – includes several fonts in css in a span around {{{1}}} which don't appear to be necessary; if really for Latn script, should be italicized (for pinyin, presumably) – 0 transclusions; should be deleted?
{{Lang|deu}} – this appears to be an improper use of deu which sil.org says is the ISO 639-3 code for German; this template uses it for something called Early New High German but named as 'early German' (sic – a redirect) – no article transclusions; should be deleted?
{{Lang-gem}} – probably an improper use of gem defined by sil.org as a collective with the name 'Germanic languages' but used by this template as an individual language named 'Proto German'; we should not be redefining international standards so if there is not international standard code for Proto German, we should not make one up except to perhaps create a private use variant de-x-proto; any private use IETF tags will require special handling by Module:lang or by rewriting the templates to use the lang() function of the module instead of the lang_xx() function – 5 article transclusions
{{Lang-gkm}} – template name uses a code that is not a legitimate IANA / ISO 639 code ostensibly to refer to Medieval Greek (internally the template uses grc, Ancient Greek); the correct solution may be to rename the template to use a private use variant: grc-x-medieval – 23 transclusions
{{Lang-grc-gre}} – appears to be a sort of catch-all for 'hard to define' Greek text or for Greek text that doesn't have a specific IANA/ISO 639 language code; internally the template uses grc; the template labels this text 'Greek' but the documentation implies that this template is to be used with Ancient Greek text so perhaps the labeling is incorrect; this is another case where private use tags may be useful: grc-x-gre as the catch-all; grc-x-koine for Koine Greek; grc-x-attic for Attic Greek (or the linguist list code grc-att); etc – 1424 transclusions
{{Lang-he-n}} – special version of {{lang-he}} to use {{script/Hebrew}} to render Hebrew text with Niqqud diacritical marks; not sure what to with this one – 3521 transclusions
{{Lang-ka}} – has support for automatic transliteration when {{{2}}} is set to tr; an insource search finds 83 instances of the template that use this functionality; not sure what to do with this one – 3819 transclusions
{{Lang-khb}} – calls {{script|Talu|{{{1}}}}} which calls {{Script/New Tai Lue}} to wrap {{{1}}} in ... tags with several fonts – 1 article transclusion
{{Lang-kmr-at}} – misuses {{lang}} by giving it the result of a call to {{transl}}; no documentation so not clear indication of the purpose – 1 article transclusion; delete?
{{Lang-ksw}} – calls {{Script/ksw-Mymr}} to wrap {{{1}}} in ... tags with several fonts – 31 transclusions
{{Lang-ku-Arab}} – {{Script/Arabic}} to wrap {{{1}}} in ... tags with several fonts – 11 transclusions
{{Lang-ku-at}} – misuses {{lang}} by giving it the result of a call to {{transl}}; no documentation so not clear indication of the purpose – 2 article transclusions; delete?
{{Lang-lij}} – one of two Ligurian languages officially 'Ligurian' but the en.wiki article is at Ligurian (Romance language) (the other officially is 'Ligurian (Ancient)' and its article is at Ligurian language (ancient) – there is no {{lang-xlg}}); may require article naming of the creation of suitable redirects to make this template work with Module:lang – 26 transclusions
{{Lang-mis-Cyrl}} – mis is the ISO 639-3 code for Uncoded languages; {{Lang-mis-Cyrl}} is used to label Montenegrin which, apparently does not have a language code; a search of sil.org finds little mention of Montenegrin – no article transclusions; delete?
{{Lang-mis-Latn}} – see {{Lang-mis-Cyrl}} – no article transclusions; delete?
{{Lang-mnc}} – has support for two simultaneous transliteration renderings – 47 transclusions
{{Lang-mnw}} – calls {{Script/mnw-Mymr}} to wrap {{{1}}} in ... tags with several fonts – 50 transclusions
{{Lang-mol}} – named using retired code mol (see sil.org); internally uses mo which does not exist in ISO 639-1 – 76 transclusions
{{Lang-naz}} – purportedly to be used for North Azerbaijani but uses the code for Coatepec Nahuatl – no article transclusions; delete?
{{Lang-nod}} – calls {{Script/Tai Tham}} to wrap {{{1}}} in ... tags with several fonts – 25 transclusions
{{Lang-nsd}} – purportedly to be used for Dutch Low Saxon but uses the code for Southern Nisu – 1 article transclusion
{{Lang-os}} – has support for IPA rendering plus transliteration none of which is documented and may only be used in a very few articles – 197 transclusions
{{Lang-pra}} – IANA/ISO 639 define code pra as 'Prakrit languages', a collective of individual languages; special handling in Module:lang is required for collections – 2 article transclusions
{{Lang-roa}} – IANA/ISO 639 define code roa as 'Romance languages', a collective of individual languages; special handling in Module:lang is required for collections – no article transclusions; delete?
{{Lang-rus}} – has support for IPA rendering plus transliteration none of which is documented and may only be used in a very few articles – 2073 transclusions
{{Lang-sal}} – IANA/ISO 639 define code sal as 'Salishan languages', a collective of individual languages; special handling in Module:lang is required for collections – 1 article transclusion
{{lang-sh2}} – has support for automatic transliteration when {{{2}}}, mechanism is different from that used in {{lang-ka}} – 3 article transclusions
{{Lang-shn}} – calls {{Script/shn-Mymr}} to wrap {{{1}}} in ... tags with several fonts – 20 transclusions
{{Lang-sla}} – IANA/ISO 639 define code sla as 'Slavic languages', a collective of individual languages; special handling in Module:lang is required for collections – 4 article transclusions
{{Lang-son}} – IANA/ISO 639 define code son as 'Songhai languages', a collective of individual languages; special handling in Module:lang is required for collections – no article transclusions; delete?
{{Lang-su-fonts}} – wraps {{{1}}} in a ... tag that applies special fonts and sizing; does not provide labeling in the manner of most other {{lang-??}} templates – 39 transclusions
{{Lang-tt}} – provides labeling for simultaneous rendering of Cyrillic, Latin, and Arabic scripts; this functionality apparently never documented – 402 transclusions
{{Lang-ug}} – provides for simultaneous rendering of multiple transliterations – 235 transclusions
{{Lang-vi-hantu}} – calls {{vi-nom}} which calls {{lang}} with text wrapped in ... tags with several fonts – 23 transclusions
{{Lang-wen}} – IANA/ISO 639 define code son as 'Sorbian languages', a collective of individual languages; special handling in Module:lang is required for collections – 8 article transclusions
{{Lang-yuf}} – IANA/ISO 639-3 name is 'Havasupai-Walapai-Yavapai'; this template requires the use of a code in {{{1}}} to choose one for the language label and link; 29 transclusions

—Trappist the monk (talk) 14:04, 9 December 2017 (UTC)[reply]

completed

✓{{Lang-de-AT}} – this and similar templates will require special handling either in Module:Lang or by rewriting the templates to use the lang() function of the module instead of the lang_xx() function – 7 transclusions
✓{{Lang-de-CH}} – see Lang-de-AT – no article transclusions; delete?
✓{{Lang-en-AU}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)
✓{{Lang-en-CA}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)
✓{{Lang-en-IE}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)
✓{{Lang-en-NZ}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)

—Trappist the monk (talk) 17:39, 9 December 2017 (UTC)[reply]

✓{{Lang-en-GB}} – see Lang-de-AT – 21 transclusions (previous TfD)
✓{{Lang-en-US}} – see Lang-de-AT – 16 transclusions previous TfD & second TfD)
✓{{Lang-en-ZA}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)

—Trappist the monk (talk) 18:27, 9 December 2017 (UTC)[reply]

✓{{Lang-en-emodeng}} – similar to Lang-de-AT, IETF language tags like this will require special handling bu Module:lang – 4 transclusions in article space (previous TfD)
✓{{Lang-pan}} – redundant ISO 639-3 version of {{Lang-pa}} – 3 article transclusions; delete? redirect to {{Lang-pa}}?
I think this can be safely redirected. – Uanfala (talk) 14:18, 9 December 2017 (UTC)[reply]
redirected by Editor Jonesey95.
{{lang-zh}} – Module:Zh handles the complexities and nuances of Chinese text; nothing to do here

—Trappist the monk (talk) 19:17, 9 December 2017 (UTC)[reply]

✓{{Lang-lez}} – sort of a version of {{Language with name and transliteration}} without the annotation; could be easily converted to use Module:lang – 32 transclusions
✓{{Lang-phn}} – sort of a version of {{Language with name and transliteration}} without the annotation; could be easily converted to use Module:lang – 14 transclusions
✓{{Lang-scr}} – uses deprecated code scr; the correct code is hbs – 2 article transclusions; redirect to {{lang-hbs}}?
redirected

—Trappist the monk (talk) 20:36, 9 December 2017 (UTC)[reply]

These two templates not redirect; instead, |script= set to the appropriate value; the names 'Serbian Cyrillic' and 'Serbian Latin' not preserved because that usage is inconsistent with other {{lang-??}} templates for languages that use multiple scripts and because it is easy to distinguish one script from the other.
- ✓{{Lang-sr-Cyrl}} – calls {{Script}} to wrap {{{1}}} in ... tags with class="Unicode" attribute; the Unicode class no longer exists (see #Literal translation with ISO 843 transliteration) – 6239 transclusions; redirect to {{Lang-sr}}?
- ✓{{Lang-sr-Latn}} – not a member of Category:Lang-x templates; does nothing special except label the text as 'Serbian Latin' – 50 transclusions; redirect to {{Lang-sr}}?
✓{{Lang-xal-RU}} – see Lang-de-AT – 24 transclusions

—Trappist the monk (talk) 15:08, 11 December 2017 (UTC)[reply]

promoting ISO 639-2/3 codes to ISO 639-1

According to the ISO 639-2 custodian, "Multiple codes for the same language are to be considered synonyms." This would explain why the IANA data set has both ISO 639-1 and 639-3 language codes but does not have both -1 and -3 codes for the same language. This issue was brought to my attention because code ltz was causing a mis-categorization to Letzeburgesch when it should have been Luxembourgish.

It is common practice to promote three-character language codes to equivalent two-character codes. We should adhere to this practice. To that end I have created a tool that creates a Lua table from the data in the table at the custodian's website. The result is Module:Lang/ISO 639 synonyms. Module:Lang uses that table to promote ISO 639-3 codes to ISO 639-1 codes. When this happens, a maintenance category is added so that the template call can be tweaked. Category:Lang and lang-xx code promoted to ISO 639-1 is currently only implemented for {{lang}} and cannot be turned off with |nocat=. Without any issues or problems, this functionality will be extended to the {{lang-??}} templates and |nocat= control enabled.

—Trappist the monk (talk) 17:54, 13 December 2017 (UTC)[reply]

So to fix these codes: I look for a three-letter code in a {{lang}} template within the page in question, then I look in Module:Lang/ISO 639 synonyms to see if there is an equivalent two-letter code. Then I change the three-letter code to the two-letter code. Like this? If that is correct, it would help to have an error message of some sort, perhaps shown in preview mode only, to give the editor a hint about how to fix the error(s). – Jonesey95 (talk) 20:03, 13 December 2017 (UTC)[reply]

Hadn't got there yet. Because it isn't really broken, I had thought to do something akin to the maintenance messages emitted by Module:Citation/CS1 but first I wanted to see if this stuff worked properly.

Yeah, for {{lang}} that is pretty much the fix. When {{lang-??}} gets categorization functionality, the usual fix will be a fix to the template itself – though it is possible to set |code= in a {{lang-??}} template to override its normal rendering:

{{lang-en|text|code=rus}}

Russian: text

Russian: <span lang="ru">text</span><span class="lang-comment" style="font-style: normal; display: none; color: #33aa33; margin-left: 0.3em;">code: rus promoted to code: ru </span>

(not sure why one would want to do that – perhaps that is something that should be prevented for {{lang-??}})

—Trappist the monk (talk) 20:20, 13 December 2017 (UTC)[reply]

The best fix for {{lang-???}} templates may be to redirect them to the appropriate {{lang-??}}. I did a lot of that when cleaning up those template calls in the pre-module days. – Jonesey95 (talk) 20:24, 13 December 2017 (UTC)[reply]

Concur.

—Trappist the monk (talk) 20:26, 13 December 2017 (UTC)[reply]

Hidden messaging added. To see the messages, add this to your preferred css:

.lang-comment {display: inline !important;} /* show lang messages */

—Trappist the monk (talk) 23:02, 13 December 2017 (UTC)[reply]

Categorization limited to article namespace, |nocat= supported.

—Trappist the monk (talk) 00:03, 14 December 2017 (UTC)[reply]

Curious about the construction of Module:Lang/ISO 639 synonyms. Is there a reason for doing ["eng"] = {"en"} rather than ["eng"] = "en"? The latter uses less memory. — Eru·tuon 21:42, 13 December 2017 (UTC)[reply]

Copy/pasta from another of the tools, otherwise no reason.

—Trappist the monk (talk) 23:02, 13 December 2017 (UTC)[reply]

fixed.

—Trappist the monk (talk) 00:03, 14 December 2017 (UTC)[reply]

I'm not quite sure I see the benefit of running this task. On occasions, the 3-letter code is more intuitive than the 2-letter one: if anything we should encourage the use of for example ave for Avestan rather than ae. – Uanfala (talk) 13:15, 16 December 2017 (UTC)[reply]

First sentence of this topic says why: According to the ISO 639-2 custodian, "Multiple codes for the same language are to be considered synonyms." (which see). Promotion to ISO 639-1 is the generally accepted convention. If you look in the IANA language-subtag-registry file for subtag ave you will not find it; Wikipedia's {{#language:}} magic word does not understand eng but does understand en (the magic word code does not support either of ave or ae – which is why Module:Lang has its own data modules):

{{#language:eng}} → eng

{{#language:en}} → English

By promoting synonymous ISO 639-2/-3 codes to ISO 639-1, Module:Lang aligns with this convention.

With regard to your revert: the {{lang}} and {{lang-??}} templates use codes and names from IANA (which gets them from ISO 639, but does sometimes reorder names when there is more than one spelling). IANA and ISO 639 do not distinguish pa from pan; they provide the same names in the same order: Panjabi and then Punjabi so {{lang}}, {{lang-pa}}, {{lang-pan}} all produce the same html markup and the latter two would produce the same visible display and links ({{lang-pan}} redirects to {{lang-pa}}). For completeness in my accounting here, {{lang-pun}} is deprecated, uses an invalid language code in its name, has no article transclusions, so should be deleted.

Most important though, is that w3c specifies the use of language codes from the IANA subtag registry so that browsers and other html readers understand what is meant by the value assigned to the lang= attribute. This is a prime argument for Module:Lang to discontinue support of the two linguist list codes it now supports.

—Trappist the monk (talk) 14:35, 16 December 2017 (UTC)[reply]

So, if I understand correctly, the practical rationale behind the promotion to ISO 639-1 is that these codes are more likely to be understood by browsers? If this is so then it makes sense. But do we really want to have the maintenance burden of having to clean up every time someone uses an ISO 639-3 code instead of the 639-1 one? Won't it be possible for the template to do these conversions internally? – Uanfala (talk) 15:02, 16 December 2017 (UTC)[reply]

The module does do the promotion so that it produces correct html markup:

{{lang|pan|ਮਾਝੀ}}

<span title="Punjabi-language text"><span lang="pa">ਮਾਝੀ</span></span><span class="lang-comment" style="font-style: normal; display: none; color: #33aa33; margin-left: 0.3em;">code: pan promoted to code: pa </span>

ਮਾਝੀ

The maintenance message is only visible to those who turn on the display with the css code above. I have an AWB script that will help to clear the hidden maintenance Category:Lang and lang-xx code promoted to ISO 639-1 (you reverted an edit made by that script). Yesterday there were about 550 pages in that category. Most of what remains is there because I didn't let the script make the edit so that I have the opportunity to fix the italic markup that will cause errors when the italic error checking code for {{lang}} gets reenabled.

—Trappist the monk (talk) 15:45, 16 December 2017 (UTC)[reply]

I might have said this somewhere in one of these threads, but it bears repeating: not all the three-letter codes are a 1:1 correspondence with two-letter ones. I have no issue with synonymous longer ones being made more concise (though yes, the longer ones are often more intuitive) as long as the longer ones aren't rejected as input, and most especially as long as three-letter codes for dialects, historical stages, etc., are never collapsed to the generic language name. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 23:07, 17 December 2017 (UTC)[reply]

There is no 1:1 mapping of all three-character codes to two-character codes. There is a 1:1 mapping of all two-character codes (ISO 639-1) to three-character codes (ISO 639-2/3). Three-character codes that have an associated two-character code are omitted from the IANA language-subtag-registry file so browsers and other html readers are not obligated to know about those synonymous three-character codes. We do not reject three-character codes as input but where there is a two-character synonym, we use the synonym.

The relationship between codes and language names is a frustrating one. ISO 639 establishes the base code-to-name mapping. When a code has more than one possible name, ISO 639 lists them in some sort of an order. IANA, sometimes chooses to use a different order for the same code and names. Sometimes the ISO 639/IANA names are not suitable for direct use as a label by Wikipedia:

ang → Old English (ca. 450-1100)

So, we have a table of alternate names; of alternate spellings; of names we choose because of ISO 639/IANA of list order differences; of codes that improperly redefine the standard's definition:

ISO 639/IANA: mla → Malo

but in Module:Language/data/wp_languages

mla → Medieval Latin (there is no ISO 639/IANA code for Medieval Latin)

The provenance for the codes/names listed in that module is wholly unknown so is suspect. Cleaning that up is just one more task to be done.

—Trappist the monk (talk) 11:43, 18 December 2017 (UTC)[reply]

using private-use tags

I have written elsewhere in these discussions that we should not be making up our own primary language tags; should not be redefining tags that have already been defined by international standards. Instead we should be operating within the permitted uses of the standard. BCP47 (IETF language tags) provides for private use tags. I have tweaked Module:Lang/sandbox to accept private use IETF language tags in the form:

ll-x-private

where:

ll is the standard ISO 639-1, -2, -3 language code

x is the BCP47-required singleton that marks the beginning of a private use tag

private is the private use tag; one to eight alphanumeric characters

I have created three of these tags for yuf:

yuf-x-hav

{{lang-yuf/sandbox|sw=ha|Havasuuw}}

Havasupai: Havasuuw → [[Havasupai language|Havasupai]]: Havasuuw

yuf-x-wal

{{lang-yuf/sandbox|sw=hu|Hàkđugwi:v}}

Walapai: Hàkđugwi:v → [[Walapai language|Walapai]]: Hàkđugwi:v

yuf-x-yav

{{lang-yuf/sandbox|sw=ya|Wi:kaʼi:la}}

Yavapai: Wi:kaʼi:la → [[Yavapai language|Yavapai]]: Wi:kaʼi:la

I use Walapai instead of Hualapai for standardization and because it matches the existing category. The label will link Walapai to Havasupai–Hualapai language because there is an existing redirect. Categorization isn't quite noodled out yet. Simplest and best, I think, it to create three individual categories for the three languages and make them subcategories of Category:Articles containing Havasupai-Walapai-Yavapai-language text.

This sandbox template needs to be implemented as {{lang-yuf}}, {{lang-yuf-x-hav}}, {{lang-yuf-x-wal}}, {{lang-yuf-x-yav}} to be compliant with the other {{lang-??}} templates.

—Trappist the monk (talk) 10:50, 23 December 2017 (UTC)[reply]

Wish list for future enhancement

An issue I was just thinking of again today (and grinding my teeth) is that we need a way to suppress the labels entirely e.g. with a |labels=no and |labels=lang; we don't need the language name, the "translit.", or the "lit." labels after the first occurrence in the same block of material, or sometimes we need the language one only, e.g. when comparing cognates. What we're doing now is using the template once, then abandoning it for manual markup with a {{lang|xx}} in it; or reusing the {{lang-xx}} and driving readers nuts by repeating the same crap over and over at them as if they have dain bramage. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 14:18, 5 November 2017 (UTC)[reply]
Should we also be warning against or disallowing language tags with suppressed script codes, e.g. ru-Cyrl? – Quoth (talk) 11:51, 6 November 2017 (UTC). If a warning or error is too heavy-handed, another option could be to just suppress the script code from the output, depending on the language it's attached to. – Quoth (talk) 16:32, 6 November 2017 (UTC)[reply]
I have added to Module:Language/name/data/iana data extraction tool so that it now extracts suppressed script data from the IANA language-subtag-registry file. Those data are now in Module:Language/data/iana suppressed scripts. I have also added code to Module:lang/sandbox that uses the new data but for the moment have left it disabled so that I don't have to rewrite examples elsewhere that are presently being discussed in other topics on this talk page.

—Trappist the monk (talk) 16:09, 21 December 2017 (UTC)[reply]
Support for bold-face – please see the section #Recent change immediately above this. Thanks to all who are working on this – it's long overdue. Justlettersandnumbers (talk) 16:47, 7 November 2017 (UTC)[reply]
I think that is handled:
{{lang-sco|''''''Dumbairton''''''}} → [''Dumbairton''] Error: {{Lang-xx}}: text has malformed markup (help)

{{lang-sco|'''''Dumbairton'''''}} → [Dumbairton] Error: {{Lang-xx}}: text has italic markup (help)

{{lang-sco|''''Dumbairton''''}} → ['Dumbairton'] Error: {{Lang-xx}}: text has malformed markup (help)

{{lang-sco|'''Dumbairton'''}} → [[Scots language|Scots]]: '''Dumbairton''' → Scots: Dumbairton

{{lang-sco|''Dumbairton''}} → [Dumbairton] Error: {{Lang-xx}}: text has italic markup (help)

{{lang-sco|'Dumbairton'}} → [[Scots language|Scots]]: 'Dumbairton' → Scots: 'Dumbairton'

for bold face without italics:
{{lang-sco|'''Dumbairton'''|italic=no}} → [[Scots language|Scots]]: '''Dumbairton''' → Scots: Dumbairton

—Trappist the monk (talk) 18:19, 7 November 2017 (UTC)[reply]
The behavior of initial or final single quotes should be changed; when I do {{lang-nl|'t}} → Dutch: 't on its own, the apostrophe is not italicized.

When I do {{lang-sco|'Dumbairton'}} blah blah {{lang-nl|'t}} → Scots: 'Dumbairton' blah blah Dutch: 't, this paragraph is messed up with uncontrolled bolding and italicization. — Eru·tuon 23:27, 21 November 2017 (UTC)[reply]
fixed.

—Trappist the monk (talk) 15:02, 26 November 2017 (UTC)[reply]
probably a good idea to consider single-template support for languages with multiple writing systems. Kazakh, for example, uses Latin, Cyrillic, and Arabic scripts. One template accepts a language code and |textn= and |scriptn= so for Kazakh {{lang-kk|text=<Latin text>|script=Latn|text2=<Cyrillic text>|script2=Cyrl|text3=<Arabic text>|script3=Arab}} where the text in all three cases is the same thing written in different scripts distinct from transliterations. No idea yet how this might be implemented.—Trappist the monk (talk) 14:15, 8 November 2017 (UTC)[reply]
- @Trappist the monk: If the problem is gathering the parameters, list parameters like |text=, |textN= would be simple to implement using wikt:Module:parameters. It can gather list parameters into an array, or convert a parameter to a boolean. The latter would be useful for |rtl=. It's sort of the Wiktionary equivalent of Module:Arguments. — Eru·tuon 22:41, 10 November 2017 (UTC)[reply]
Language-agnostic script-detection function, to make the |script= parameter unnecessary. It can be built on something similar to the function in wikt:Module:Unicode data that determines the script of a single character by looking up the codepoint in wikt:Module:Unicode data/scripts. It would need some way to determine which script code to return when text consists of multiple scripts (or characters not assigned a script). — Eru·tuon 21:36, 9 November 2017 (UTC)[reply]
How much overhead would that add if, say, the template were used 100 times in a long article? — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 03:19, 10 November 2017 (UTC)[reply]
The detection function itself is very fast, much faster than the language-dependent script detection function that is used in Wiktionary language- and script-tagging templates. For instance, in one of my sandboxes, detecting the script of each character in about 28,000 bytes of text from Russian language takes less than a second. (The list of scripts is at the bottom.) So, it probably won't add much overhead, assuming the function for deciding which of the scripts to return is relatively simple. — Eru·tuon 18:36, 10 November 2017 (UTC)[reply]

Current version that returns official Unicode script properties uses quite a bit of memory on my massive testcase (8 something MB), or about 2-3 MB with a smaller amount of text. If this is too much memory for a function like this, perhaps it could be reduced by breaking up the data module, or removing Zyyy and Zinh. Or maybe there would be a more creative solution. — Eru·tuon 01:07, 15 November 2017 (UTC)[reply]

Fixed memory problem and sped things up. Memory and time are now at about 1.6 MB and 0.05 second in my giant testcase. — Eru·tuon 11:33, 16 November 2017 (UTC)[reply]
Have the lang-xx templates stop transcluding the lang|xx variant, and instead directly call the same Lua functions to reduce the transclusion count. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 21:06, 13 November 2017 (UTC); struck as moot: 19:02, 16 November 2017 (UTC)[reply]
Nothing to do here; not a new feature.

—Trappist the monk (talk) 21:44, 13 November 2017 (UTC)[reply]

Error

{{langnf}} doesn't work in Israel infobox. I can't figure out how to fix it. --Triggerhippie4 (talk) 04:32, 19 November 2017 (UTC)[reply]

@Triggerhippie4: The first parameter was empty, it needs to be a language code; specifically, the code for the language that the third parameter is written in. This edit should fix it. --Redrose64 🌹 (talk) 10:25, 19 November 2017 (UTC)[reply]

(edit conflict)

{{langnf}} is calling {{lang}} without providing a valid ISO 639 language code. {{lang}} requires (has always required) a language code so that it knows how to correctly supply html markup for the text. In this template, the first positional parameter, {{{1}}}, the language code, is empty:

{{langnf||Hebrew|"The Hope"}}

Many might leap the the conclusion then that they should add the language code that matches the language name. They would be wrong. In this case, the correct code is en because "The Hope" is English.

It appears that the documentation for {{langnf}} is inadequate. I can also imagine, though have not given it sufficient thought to recommend, that in {{langnf}} this line:

}} for {{Lang|{{{1|}}}|{{{3}}}|rtl={{{rtl|}}}}}<noinclude>

might be changed to:

}} for {{Lang|{{{1|en}}}|{{{3}}}|rtl={{{rtl|}}}}}<noinclude>

if {{{3}}} is usually an English translation. If {{{3}}} is always an English translation then there is no need for {{lang}} in {{langnf}}

Perhaps Editor Hyacinth, the original author of both {{langnf}} and its documentation, can be persuaded to revisit that template.

—Trappist the monk (talk) 10:31, 19 November 2017 (UTC)[reply]

It seems that it used to work in the past without a language code in the first parameter, though. If you go to Template:Language with name/for today, you can see (or at least I can see) that the examples in the documentation do not produce errors. If you null-edit the template, they will produce errors. I have not yet looked at the old {{lang}} code to puzzle out this apparent effect. – Jonesey95 (talk) 23:51, 19 November 2017 (UTC)[reply]

I put the old lang template code in Template:lang/sandbox2 and used that template in the sandbox for {{langnf}}. It appears that leaving the language blank did not produce an error in the past:

{{lang/sandbox2||Foo}} → Template:Lang/sandbox2

{{Language with name/for/sandbox||2=German|3=[[Thuringia]]}} → Error: {{language with name/for}}: missing language tag or language name (help)

{{Language with name/for/sandbox|en|2=German|3=[[Thuringia]]}} → German (English for 'Thuringia')

{{Language with name/for||2=German|3=[[Thuringia]]}} → Error: {{language with name/for}}: missing language tag or language name (help)

{{Language with name/for|en|2=German|3=[[Thuringia]]}} → German (English for 'Thuringia')

It appears to me that even though the language name in {{lang}} was listed as a Required parameter, there may not have been code that enforced that requirement. Still researching. – Jonesey95 (talk) 00:06, 20 November 2017 (UTC)[reply]

(edit conflict)

seems that it used to work. The output of this particular example when rendered by the previous version of {{lang}} looks like this:

[[Hebrew language|Hebrew]] for "The Hope"

The purpose of {{lang}} is to indicate that the text belongs to a language. If the language is going to be English there isn't much sense in calling {{lang}}, no need to wrap the text in ... tags. Because {{lang}} expects to have a language code so that it can do its job, the module whines and complains when that important piece is missing. {{lang}} cannot know that the template that's calling it doesn't really need its services. So it complains.

—Trappist the monk (talk) 00:24, 20 November 2017 (UTC)[reply]

I've tweaked {{langnf}} so that is only calls {{lang}} when a language code is provided in {{{1}}}.

—Trappist the monk (talk) 11:28, 20 November 2017 (UTC)[reply]

zh-yue

zh-yue does not seem to be working any more, in e.g. [係] Error: {{Lang}}: unrecognized language tag: zh-yue (help) , copied from Written Cantonese.--JohnBlackburne^words_deeds 13:37, 19 November 2017 (UTC)[reply]

Isn't the language code supposed to be simply yue rather than zh-yue? – Uanfala 13:42, 19 November 2017 (UTC)[reply]

From the registry 'yue' is the subtag for "Yue Chinese" or "Cantonese", the tag for "Chinese" is 'zh'.--JohnBlackburne^words_deeds 13:54, 19 November 2017 (UTC)[reply]

The subtag registry identifies yue as a language code. See Yue language where yue is identified as the ISO 639-3 language code.

{{lang|yue|係}} → 係

It may once have been true the the correct code was zh-yue (language subtag with an extlang subtag). According to the subtag registry, that is no longer true and the preferred subtag is yue. {{lang}} and Module:lang do not currently (may never) support extlang subtags.

—Trappist the monk (talk) 14:13, 19 November 2017 (UTC)[reply]

Thanks, I see that it’s deprecated now. I can’t remember ever using it, but recalled it from a previous browse of the registry and so thought it OK. I guess with the template previously passing through anything, but now actually checking what’s passed to it, there will be quite a few like this to fix.--JohnBlackburne^words_deeds 16:12, 19 November 2017 (UTC)[reply]

I've been keeping an eye on Category:Lang and lang-xx template errors looking for indications that the new module is doing the wrong things. I haven't seen any zh-yue errors. But, I have seen plenty of zh-han and zh-t errors. grc-gre is fairly common as is jp.

—Trappist the monk (talk) 16:35, 19 November 2017 (UTC)[reply]

Odd error at Dalian Mosque

The following works on its own مسجد داليان but is generating an error when used in a {{Chinese}} template at Dalian Mosque.--JohnBlackburne^words_deeds 00:40, 20 November 2017 (UTC)[reply]

Garbage in, garbage out.

{{Chinese}} is a redirect to {{Infobox Chinese}}. That template calls {{Infobox Chinese/Blank}} with the values provided to the infobox template:

|lang2=Arabic
|lang2_content={{lang|ar|مسجد داليان}}<br/>(''Masjid Dālyān'')</span>

{{Infobox Chinese/Blank}} calls {{lang}} with the content of |lang2= and |lang2_content=:

{{lang|Arabic|{{lang|ar|مسجد داليان}} (''Masjid Dālyān'')}}

The 'inside' {{lang}} produces this:

مسجد داليان

which is partially correct – partially because Arabic script is right-to-left and should be marked as such but the infobox has no support for that.

So now, the outside {{lang}} looks like this:

{{lang|Arabic|مسجد داليان (''Masjid Dālyān'')}}

But, that won't work because the value assigned to {{{1}}} is not an ISO 639 language code. Module:lang rejects 'Arabic' because it expects a code, not a language name so instead of rendering bogus html it emits an error message.

The quick fix? There are at least two and probably more.

|lang2=ar
|lang2_content=مسجد داليان (''Masjid Dālyān'') – were it me, I would remove the   and the transliteration because that is left-to-right and Latn script.

For the time being, because there are a lot of {{lang-??}} templates that call {{lang}} and a lot of them impose italics on the 'text', the italic detection and associated error messages are disabled so in future the error message will be back.

A long-term fix to properly support the transliteration of the Arabic is needed and will require modifications to {{Infobox Chinese}} and {{Infobox Chinese/Blank}}.

—Trappist the monk (talk) 01:23, 20 November 2017 (UTC)[reply]

{{Infobox Chinese/Blank}}. Ugh. I keep intending to have a go at rewriting Infobox Chinese to use Lua, but every time I’ve looked at it I’ve been put off by things like that. It’s not even clear that belongs in the template, which seems to have grown to do too much over the years. People don’t notice or object as most fields in the template default to hidden, but if they are hidden so no-one sees them they probably aren’t all needed.--JohnBlackburne^words_deeds 13:30, 20 November 2017 (UTC)[reply]

"Module:Language/data/wp languages" ?

The documentation says that Module:Language/data/wp languages supports non-standard language codes (e.g. Linguist List codes), but I added one, and it did not help. Should the documentation be modified to remove that link? Thanks. – Jonesey95 (talk) 13:14, 20 November 2017 (UTC)[reply]

Related: Is there a recommended way to fix templates in Category:Lang-x templates with other than ISO 639? – Jonesey95 (talk) 13:17, 20 November 2017 (UTC)[reply]

For templates that truly don't use IETF tags, I think there is nothing to 'fix'.

There is one, {{lang-ca-valencia}} that has prompted me to tweak Module:lang/sandbox so that when the IETF language tag includes a variant, the module fetches the language name from the variants table:

{{#invoke:lang|lang_xx|code=ca-valencia|text=Lucrezia Borgia|italic=no}} → Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

{{#invoke:lang/sandbox|lang_xx|code=ca-valencia|text=Lucrezia Borgia|italic=no}} → Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

Alas, that is the only one of the templates listed at Category:Lang-x templates with other than ISO 639 with a proper variant subtag.

—Trappist the monk (talk) 16:08, 20 November 2017 (UTC)[reply]

I'm having second thoughts about this sandbox tweak. Consider:

{{#invoke:lang/sandbox|lang_xx|code=pt-ao1990|text=some pt text}} → Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

Not what we really want. Perhaps an alternate language parameter that when concatenated with ' language' can be used to replace the default language name (from the data tables) so that we link to the variant language name article.

—Trappist the monk (talk) 17:03, 20 November 2017 (UTC)[reply]

From my previous work with the ISO 639 name templates, my experience is that every language and dialect has an article or redirect at "XXXX language". The templates and categories depended on that construction. I messed with hundreds of those templates, and I do not recall encountering any missing articles or redirects. See, for example, Middle Scots language, which would be the destination for "sco-smi" below if we could put in some sort of override.

I'll let you continue to think about it. You always come up with something that works. – Jonesey95 (talk) 17:41, 20 November 2017 (UTC)[reply]

I think you meant Module:lang/data and in particular the override table.

You added sco-smi which looks like a valid IETF language tag but is not. Were it valid, smi would be listed as an extlang in the IANA language-subtag-registry file. At present, there are no plans to support extlangs because there are preferred language codes for all of the existing extlangs.

Because Module:lang expects a valid IETF language tag, it emits an error when it disassembles sco-smi into its separate parts, the code sco and this other thing smi which doesn't match the required patterns for script (4 letters), region (2 letters or 3 digits), or variant subtags (4 digits or 5–8 alphanumeric characters).

It may be that we will want to create a table that specifically holds Linguist List codes so that we can handle them. The question that I have about any of these codes that are not in language-subtag-registry file is: What to put in the lang="" attribute of the enclosing ...? Browsers and screen readers probably don't know about (aren't required to know about) 'private' language codes that aren't in the registry.

—Trappist the monk (talk) 13:56, 20 November 2017 (UTC)[reply]

Sorry, I should have linked to the documentation in question. I meant Module:Lang/doc, which refers to files that apparently do not work. Should those references be removed in order to avoid confusion? – Jonesey95 (talk) 16:38, 20 November 2017 (UTC)[reply]

What do you mean by files that apparently do not work?

The only place where code 1ca is defined for Module:lang is in Module:lang/data in the override table. That code works as it should (there is no extlang tacked onto it):

{{#invoke:lang|lang_xx|code=1ca|text=كَیکاوس|rtl=yes|italic=no}} → Script error: The function "lang_xx" does not exist.

Adding sco-smi to the override table doesn't work because the module has extracted the language code (sco) from it and cannot find that in the override table and there is, at present, no mechanism to make the module search for the (invalid) extlang either alone (smi) or in combination with the language code (sco-smi).

—Trappist the monk (talk) 17:03, 20 November 2017 (UTC)[reply]

Re "files that do not work": Module:Language/data/wp languages is linked from Module:Lang/doc. It doesn't do anything, right? – Jonesey95 (talk) 03:07, 21 November 2017 (UTC)[reply]

Actually, it does. Module:Language/name/data creates a single table from the /wp languages, /ISO 639-3, and /iana languages modules. The first module read is /wp languages. First language 'code' into the composite table wins so when code exists in all three of the data modules, only the code and data in /wp languages is used. For example, code gem is present in both /wp languages and in /iana languages. Module:Language/name/data reads /wp languages first so its value for code gem (Proto-Germanic) is the value used by Module:lang; not the 'official' Germanic languages:

{{#invoke:lang|lang_xx|code=gem|text=Example text}} → Script error: The function "lang_xx" does not exist.

The 'codes' that do not work in /wp languages are the hyphenated codes.

—Trappist the monk (talk) 03:49, 21 November 2017 (UTC)[reply]

I am writing this mostly as a note-to-self before I jump on the plane to Elsewhere.

It occurs to me that we can make use of the IETF language tag's support for private-use subtags. So, subtags that we have invented, like code grc-gre, might be renamed grc-x-gre. When the module sees the -x-subtag, it knows that subtag is non-standard and will look for that code in a special wp_private_subtags table for the language name. Someone else apparently had a similar idea because be-x-old exists in Module:Language/data/wp languages.

Another thing we might do, if and when we add support for label control (see #Wish list for future enhancement), is to overload that parameter so that |label=none hides all labels (language name, transliteration and translation static-text) and |label=name display's a name different from the name usually associated with the code. I'm not sure if there are any real benefits to this particular idea.

—Trappist the monk (talk) 11:27, 21 November 2017 (UTC)[reply]

Possibly the usefulness for the |label= idea: the label provided by the current {{lang-de-CH}} is 'Swiss German' but the language name retrieved from the module's language tables is 'German' so in that template we might write: {{#invoke:lang|lang_xx|code=de|label=Swiss German}}.

—Trappist the monk (talk) 11:50, 21 November 2017 (UTC)[reply]

This would also be useful for suppressing the appearance of the same word in successive instances of the template (Foo, Bar, and Baz Quuxian, instead of Foo Quuxian, Bar Quuxian, and Baz Quuxian). Also useful for cases where one ethnic or national group uses one name for the language and a neighboring one does different. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 18:15, 21 November 2017 (UTC)[reply]

Broken Doxology

In 2014, Doxology started using the lang template with the parameter "grc-gre". I think this was to indicate that it was the same in ancient and modern Greek, but I'm not sure. In any case, the current version of Doxology is broken. --SarekOfVulcan (talk) 13:30, 20 November 2017 (UTC)[reply]

The code grc-gre is not a valid IETF language tag (see my comments at "Module:Language/data/wp languages" ?) so Module:lang emits an error message because it cannot make sense of the 'code'. There is a related template, {{lang-grc-gre}}, which has documentation that, to this reader, is far from clarifying. That template does not emit an error because it drops the -gre thing and calls {{lang}} with only the IANA/ISO 639-3 language code.

It does not appear that grc-gre is a Linguist List code so I would guess that someone here at en.wiki created it.

—Trappist the monk (talk) 14:17, 20 November 2017 (UTC)[reply]

It might make sense to treat any foo-bar as just foo any time the foo-bar combo doesn't resolve. I'm skeptical we can prevent people adding -bar instances that don't resolve to something in our table, since they're introduced (albeit slowly) all the time, e.g. in linguistics papers. PS: grc-gre was previously discussed in an older thread, above: #zh-yue. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 22:27, 20 November 2017 (UTC)[reply]

Clearly we differ on what constitutes a 'discussion'. At #zh-yue, I merely mentioned grc-gre as a common cause of error messages displayed by Module:lang.

—Trappist the monk (talk) 11:00, 21 November 2017 (UTC)[reply]

Sure; I was just making sure the threads were connected, in case the earlier post mattered in the context. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 18:11, 21 November 2017 (UTC)[reply]

Formatting of first line of multiline text

I ran across this problem at Jacques Dutronc#Discography. When there are multiple lines of foreign language text, the wiki syntax of the first line is not properly displayed.

The template seems to have been used without change on the above page for many years, so I assume that something's changed with the template, rather than the article having been wrong for all that time.

Here's a simple example, where the first bullet-point is shown as a standard asterisk, not as a list item:

een

twee

drie

Is there a problem with the template, or is this now the expected behaviour? (I'm not sure it's really being used properly on that page anyway, but maybe that's a different matter.) --David Edgar (talk) 00:28, 22 November 2017 (UTC)[reply]

This is an issue with all templates. The fix is to do this {{lang|nl|<nowiki />. However, the template shouldn't be used this way, but should be used for the exact content in the other language: *{{lang|nl|een}}, etc. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 00:49, 22 November 2017 (UTC)[reply]

I use {{Lang|xx|...}} often for multi-line text, and this report gave me a fright. Turns out, the behaviour as described seems to be triggered by * (and similar), and surprisingly goes away when <poem>...</poem> is used:

*een
*twee
*drie

although that creates undesirable white space. I agree that for list items it's best to use the template for each item. -- Michael Bednarek (talk) 05:42, 22 November 2017 (UTC)[reply]

The problem is twofold: one part is that the MediaWiki parser treats the "*" character as invoking a list only if it occurs in certain defined positions, such as the start of a line; the other part is that the module underlying this template strips leading (and trailing) whitespace, which includes newlines. So you need an initial newline, but also need to prevent that from being stripped as whitespace - all that you need is to hide that newline using a character which doesn't test as whitespace, so won't be stripped off; yet one which won't be visible when rendered by the browser. The entity for a space is ideal:

*een *twee *drie

You can use either   or   they behave exactly the same. --Redrose64 🌹 (talk) 08:06, 22 November 2017 (UTC)[reply]

Just use the <nowiki /> as the fist content in the parameter. This prevents any spacing problems that result from other tricks. Been doing it this way for over a decade, and you'll find it documented at block templates, e.g. Template:Block_indent/doc#Technical issues with block templates. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 20:13, 22 November 2017 (UTC)[reply]
PS: That documentation snippet actually lives in a page of its own and can be transcluded as needed with {{Block bug documentation}}. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 20:15, 22 November 2017 (UTC)[reply]

It might be possible to add a hack to fix this in Lua. Say, if the text starts with an asterisk and contains a newline (or newline plus asterisk), assume it's a bulleted list and add a newline at the beginning. That might result in unintuitive behavior in certain cases, though. — Eru·tuon 22:02, 27 November 2017 (UTC)[reply]

Doesn't have anything to do with * in particular, though. This affects all wikimarkup the effect of which is triggered only by newline followed directly by a special character (#, ;, :, probably others), or by newline then HTML comment then special character. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 00:15, 28 November 2017 (UTC)[reply]

That's right. The logic could easily be modified to accommodate #, ;, : along with *. (I think that's all of the list-ish special characters.) It would also be possible, but more costly, to accommodate HTML comments after the newline. I'll test the idea in Module:Lang/sandbox. — Eru·tuon 01:03, 28 November 2017 (UTC)[reply]

Another case is {|, I just remembered. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 07:57, 10 December 2017 (UTC)[reply]

As an FYI about this thread, there is a secondary issue with the above invocation of this template: <ul><li></li><li></li></ul> is bad HTML and will not render to HTML correctly on MediaWiki at some point in the future, defeating the purpose of the template (it adds items in the misnested HTML5 or the general misnesting lint errors). Since this template presently generates a span, "detecting the issue and fixing it for these limited cases" doesn't fix a secondary issue of the invocation above (and additionally is inconsistent with every other template that does something like the proposed change). I would suggest that Template:Lang should be able to output a <div></div> rather than a span (while the majority of cases are inline, I've seen quite a few where a block lang template would be helpful--poetry is one). --Izno (talk) 20:58, 28 November 2017 (UTC)[reply]

If a block lang version of this template (or some sort of switch that flips this template from default inline to block) is required, then you should add that to the feature request list. Now, while basic functionality is still at issue, new features should not be getting in the way.

—Trappist the monk (talk) 23:20, 28 November 2017 (UTC)[reply]

Category:Articles containing Pushto-language text

Please be advised, Category:Articles containing Pushto-language text has been nominated to be merged into Category:Articles containing Pashto-language text. Cheers, -- Black Falcon ^(talk) 19:25, 23 November 2017 (UTC)[reply]

Kikuyu language category?

I think something changed with one of the templates but I can't figure out what, specifically. But now using the template Template:Lang-ki in an article makes the Category:Articles containing Gikuyu-language text appear at the bottom of the page. This is unlike other "hidden" categories like Category:Articles containing French-language text. Hopefully this is the right place to let people know. :) Thanks! Umimmak (talk) 00:52, 25 November 2017 (UTC)[reply]

It looks like this is also happening with {{lang-ps}} and Category:Articles containing Pushto-language text and Category:Articles containing Pashto-language text (the former is used by the template after the change to use the module). – Jonesey95 (talk) 04:13, 25 November 2017 (UTC)[reply]

Whether a category is hidden or not has nothing to do with the way that a category is added to an article; it's entirely down to the way the category page itself is set up (see WP:HIDDENCAT). Category:Articles containing French-language text has the template {{Category articles containing non-English-language text|French|fr|fre|fra}} which contains code to set __HIDDENCAT__, whereas Category:Articles containing Gikuyu-language text does not have a similar template. I note that the latter is set up as a redirected category (that's the {{Category redirect|Category:Articles containing Kikuyu-language text}} on that page), so the module needs configuring to use Category:Articles containing Kikuyu-language text instead of Category:Articles containing Gikuyu-language text. --Redrose64 🌹 (talk) 09:38, 25 November 2017 (UTC)[reply]

Well in the meantime I made the redirect category hidden so it won't show up on articles. Umimmak (talk) 10:21, 25 November 2017 (UTC)[reply]

The primary name for the ISO codes ki and kik' seems to be Gikuyu and that's why articles got categorised into Category:Articles containing Gikuyu-language text. I've added entries for these two codes in the "override" table in Module:Lang/data, this should make the templates use the preferred name now. – Uanfala 11:00, 25 November 2017 (UTC)[reply]

Thanks; there were only three pages tagged as having {K/G}ikuyu text so I purged them all and that should be fixed. I'm not sure what the right way to fix the P{u/a}shto categories Jonesey95 mentions -- should the __HIDDENCAT__ be placed on them in the meantime I guess? Or the actual Template:Category articles containing non-English-language text? Umimmak (talk) 11:38, 25 November 2017 (UTC)[reply]

Apparently, the Pashto categories have been nominated for merging by SMcCandlish, there's some discussion at Wikipedia:Categories for discussion/Speedy. I'm not sure what the due process is, but after it is over I guess the same remedy will work as with Kikuyu. – Uanfala 12:29, 25 November 2017 (UTC)[reply]

Question re: bolding and ' ' marks

G'day, I use template:lang-sh-Latn a fair bit, and I've noticed it is now rendering with the sh word like this 'bold'. See the Background section of Yugoslav coup d'état for examples. Is there a recent change that has caused this? It doesn't comply with MOS:BOLD etc. Thanks, Peacemaker67 (click to talk to me) 08:25, 27 November 2017 (UTC)[reply]

Yes. We're in a transition period where things don't always work as expected. The problem with {{lang-hbs-Latn}} is that its language code, hbs-Latn, includes a script subtag, Latn, that specifies italics and, internally, the template also includes italic markup: ''{{{1}}}'' which together created: ''''{{{1}}}''''. I've converted {{lang-hbs-Latn}} to use the module:

{{lang-hbs-Latn|banovine}} → Serbo-Croatian Latin: banovine

—Trappist the monk (talk) 11:01, 27 November 2017 (UTC)[reply]

Thanks. Peacemaker67 (click to talk to me) 01:19, 28 November 2017 (UTC)[reply]

@Trappist the monk: Here's a weirdness, with {{lang|arc-Latn}} and presumably some others:

On this page, this italicizes, but in mainspace the link markup is broken and it's non-italic: [[Frahang-i Pahlavig|{{lang|arc-Latn|hozwārishn}}]] → hozwārishn
On this page, this italicizes, but in mainspace the link markup is broken and it's italic: ''[[Frahang-i Pahlavig|{{lang|arc-Latn|hozwārishn}}]]'' → hozwārishn
On this page, this italicizes, and it renders (italic) in mainspace: {{lang|arc-Latn|[[Frahang-i Pahlavig|hozwārishn]]}} → hozwārishn
On this page, this does not italicize, and it renders (non-italic) in mainspace: ''{{lang|arc-Latn|[[Frahang-i Pahlavig|hozwārishn]]}}'' → hozwārishn
On this page, this italicizes, and it renders (italic) in mainspace: {{lang|arc-Latn|[[Frahang-i Pahlavig|''hozwārishn'']]}} → [[[Frahang-i Pahlavig|hozwārishn]]] Error: {{Lang}}: text has italic markup (help)
On this page, this does bold and 'single quotes' (non-italic), and it renders (the same way) in mainspace: {{lang|arc-Latn|''[[Frahang-i Pahlavig|hozwārishn]]''}} → [hozwārishn] Error: {{Lang}}: text has italic markup (help)

Another glitch (not namespace dependent):

Italics: {{lang|ar-Latn|shamia}} → shamia
Bold and 'single': {{lang|ar-Latn|''shamia''}} → [shamia] Error: {{Lang}}: text has italic markup (help)
Non-italic: ''{{lang|ar-Latn|shamia}}'' → shamia

None of the {{lang|foo-Latn}} instances should auto-italicize, since {{lang|es}}, etc., do not; only the {{lang-foo}} templates emit italics around Latin script by default.

I think the latter problem is the same as the one reported by Peacemaker67 above, but the namespace problem may be something new. PS: I assume the "bold and 'single'" problem is fixed in the Lua by doing italics directly instead of by wiki ''...'' markup.
— SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 05:29, 28 November 2017 (UTC)[reply]

I took a look at the source code. ''[[Frahang-i Pahlavig|{{lang|arc-Latn|hozwārishn}}]]'' results in

''[[Frahang-i Pahlavig|<span lang="arc-Latn">''hozwārishn''</span>[[Category:Articles containing Aramaic-language text]]]]''

in mainspace. Outside of mainspace, the category isn't included, so the link works, and oddly the italics don't cancel out. (Maybe because they have no displayed text between them.) Three ideas: a |nocat= parameter to remove the category, or a |link= parameter to provide the article that the text should link to, or require that the link be placed inside {{lang}}: ''{{lang|arc-Latn|[[Frahang-i Pahlavig|hozwārishn]]}}''. — Eru·tuon 08:19, 28 November 2017 (UTC)[reply]

I have taken the liberty of changing the list markup above from unordered to ordered.

Items 1 & 2 render as they do for the reason given by Editor Erutuon: a category link inside a wikilink does not work.

Item 3 renders in italic font because the language code specifies a Latin script

Item 4 renders in upright font because Latn script italics reversed by external wiki markup

Item 5 renders in italic font because, I suspect, while there is something between the italic markup in the wikilink and the italic markup provided by the template, that 'something' is not displayable text so the markup is not displayed

Item 6 renders as it does because the template applies italic markup (from the Latn script) to displayable text already wrapped in italic markup

From the very beginning of this experiment, Module:Lang has supported |italic= so you can write:

{{lang|arc-Latn|[[Frahang-i Pahlavig|hozwārishn]]|italic=no}} → hozwārishn

|italic= always overrides any automatic italic setting.

{{lang}} only auto italicizes when the IETF language tag tells it to italicize with a Latn (case insensitive) subtag.

Those {{lang-??}} templates that have been converted to use Module:Lang, emit error messages when the 'text' to be rendered includes italic wiki markup (presumably to override the wiki markup included in the wikitext templates). That same error message code is available to {{lang}} but is disabled for now because all of the unconverted {{lang-??}} templates call {{lang}} and many of them have hard-coded italic wiki markup which would cause an error message for each of the rendered {{lang-??}} that has hard-coded italic wiki markup.

—Trappist the monk (talk) 11:43, 28 November 2017 (UTC)[reply]

Expected behavior of {{lang}} versus {{lang-xx}} is that the former will always produce non-italic output; it's too often used for proper names which are not italicized in most contexts. Having it switch to producing italicized when particular language codes are used will confuse and result in wrong output; pretty much no one will remember that one particular kind of case is going to produce different output. We do expect {{lang-xx}} to italicize unless it's non-Latin material, so this auto-handling of -Latn would make sense.

Another thing: the prescribed method of representing italics within material already italicized is to turn the italics off for nested-italics. The obvious way to do this is ''Blah Blah ''Foo'' Yadda Yadda''; MoS probably actually advises this somewhere. But that's going to work with the Lua version if it spits an error instead. What to do about this?

On the other matter: The purpose of these templates is proper markup and formatting of text. If insertion of an optional and possibly redundant category by some of them is causing these central purposes of the markup to fail, then there needs to be a parameter for adding a category not for removing one - it should not be done by default. Who actually uses "Category:Articles containing Foo-language text" and for what? That's a maintenance/tracking category type, not a category for readers. It could be made invisible or even (old-school style) be moved to the article's talk page. In the end it would make more sense for a bot to detect when a particular {{lang|foo}} etc. has been used in article, using a list of templates and parameters that do this sort of stuff, then add the categories. We don't really need to the templates to do it at all.
— SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 00:43, 30 November 2017 (UTC)[reply]

{{lang}}, in most contexts, will not produce italic rendering. Only in the special case, where the editor has taken the trouble to explicitly specify a Latin script by including the Latn IETF subtag, will the module apply italic markup.

Italics markup is being discussed at #html italic markup vs wiki italic markup.

Yes, the purpose of these templates is proper markup and formatting of text, which statement I would clarify: text supplied to the template; the template cannot know what exists outside the opening {{ and closing }}. The insertion of categories does break the wikilinks for your examples 1 & 2 above. This breakage was also true for the old, non-module version (I recreated Test page with your examples, selected this version of {{lang}}, clicked edit, typed Test page into the Preview page with this template text box, and clicked the adjacent Show preview button). From this experiment, I believe that the Module version of {{lang}} breakage of examples 1 & 2 is not new and existed in the old template. I do not know why the templates add hidden categories, nor do I know if no one uses them or if a lot of editors use them. Clearly, someone thought it important. If you wish to do away with these categories, this is probably not the correct venue.

—Trappist the monk (talk) 14:24, 30 November 2017 (UTC)[reply]

Oh, I don't care if the categories exist, they just shouldn't be added by these templates if breaks the main purpose of the templates [sometimes]; a bot can do the categorizing instead. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 07:56, 10 December 2017 (UTC)[reply]

Italicization

Good day! Why do we use italics in this template, but not in others? Take a look here. However, when I tied using the template here, it didn't italicize: Uzbek: Oʻzbek gimnaziyasi/Ўзбек гимназияси; Russian: Узбекская гимназия; Kyrgyz: Өзбек гимназиясы. But the the uz template does italicize its content in articles. Can you change it so that it doesn't? Shouldn't these templates be uniform? Nataev ^talk 15:25, 28 November 2017 (UTC)[reply]

I changed the template a couple of hours ago. If you click Edit on an article and then save it without making any changes (this is called a "null edit"), the italics should go away. – Jonesey95 (talk) 15:51, 28 November 2017 (UTC)[reply]

There is more to it than that. The example holds text in two scripts when it should not (Uzbek can be / has been written in three):

{{lang-uz|Oʻzbek gimnaziyasi/Ўзбек гимназияси}}

The first part of that text (left of the solidus) uses Latin script, the second part uses Cyrillic script. The Latin should be italicized but the Cyrillic should not be. {{Lang}} and the {{lang-??}} templates do not support more than one script simultaneously (there is an expectation that in future, multiple scripts will be supported; see #Wish list for future enhancement). The third Uzbeck script is Arab which, unlike Latn and Cryl, is written right-to-left so requires special handling.

We are in a transition so there is a mix of old and new. For the time being, I would write the example:

{{lang-uz|Ўзбек гимназияси}} / {{lang|uz-Latn|Oʻzbek gimnaziyasi}} → Uzbek: Ўзбек гимназияси / Oʻzbek gimnaziyasi

Order reversed because of Editor Jonesey95's template edit. I would note here that that edit doesn't really let lang module manage italics because {{lang-uz}}'s call of {{lang}} (from inside {{language with name}}) doesn't give the module any direction on how italics are to be managed.

—Trappist the monk (talk) 16:32, 28 November 2017 (UTC)[reply]

OK, I see. The fact that Uzbek is currently written using two (actually, three) scripts is a huge pain in the neck. Nataev ^talk 08:27, 29 November 2017 (UTC)[reply]

html italic markup vs wiki italic markup

Editor Izno has changed Module:Lang/sandbox with this edit.

To me, this looks like the similar change (discussion) that was made to many of the {{lang-??}} templates and since reverted. Because of that, I'm inclined to revert Editor Izno's change without we first establish that the current template is proven to be causing these unspecified lint errors.

Is there unequivocal evidence of such damage?

—Trappist the monk (talk) 15:37, 29 November 2017 (UTC)[reply]

Because wikitext '' is ambiguous as to opening or closing tag, use like ''{{lang|Latin character language|Name of work}}'' (which the documentation apparently calls for!) causes the future parser plus RemexHTML to emit Name of work, which causes a number of the lang-related errors in Special:LintErrors/html5-misnesting (the other vast majority are related to using an inline template where a block template is called for, as I noted above). The current processor plus HTML Tidy actually nukes all of the italics markup entirely, which is also probably undesirable (Name of work). I have not tested the change to see if it fixes the problem or not.

This can be evidenced on War and Peace with Russkiy Vestnik (''{{lang|ru-Latn|Russkiy Vestnik}}''), which currently generates Russkiy Vestnik; which would generate Russkiy Vestnik in the future (using the Parser migration tool to get that HTML directly from future version source); and which clearly should generate Russkiy Vestnik as the current editor intent.

(Now, whether the external italics should be there as the correct editor intent is irrelevant to the technical question and can't be evaluated by the template anyway, per a whole lot of the discussion about Lua-fying above.)

The concerns stated by the editors in the discussion on WOS's talk page are mostly illegitimate in this context. The reason we use '' in the MOS is because that's primary an article-driven styling for wikitext and we use wikitext in articles because that's the wiki way (I doubt anyone needs to understand here why we use wikitext in articles). We are not so-constrained in the template space where we typically bump into concerns or limitations regarding well-formed HTML.

Justletter's use of {{lang-it|'ABC'}} to add bolding rather than italics is not documented as a valid use case nor do I believe it should be supported internal to the template. Items which should receive bolding for some other reason (MOS:BOLD) should also take the italics where italics are called for as highlighting non-English non-Latin text in my opinion. --Izno (talk) 16:22, 29 November 2017 (UTC)[reply]

And indeed, this revision shows that my change fixes the problem, with correct HTML provided by both the current and future parsers. --Izno (talk) 17:31, 29 November 2017 (UTC)[reply]

(edit conflict)

Yeah, I agree that '' is ambiguous; it exists so will be used. The {{lang}} documentation that suggests italic markup outside the template is, I think, suspect. {{Lang}} did not, at the time the documentation was written, render any italics so some mechanism was necessary to apply those italics. Now comes Module:Lang which can, more-or-less intelligently, read the IETF language tag and |italic= and decide what to do with the 'text' based on that reading. Yeah, it can't see outside of its parent template so it doesn't / can't know that it is contributing to misnested html.

Is this issue only related to instances of {{lang}} templates that specify Latn script in the IETF tag parameter? If so, we can remove that functionality, though I'd rather not, because it seems an obvious thing to have the template/module do (and editors can still override the automatic italics with |italic=no).

Really? This is 'correct'? where one ... is nested inside of another? Clearly it 'works', but just as clearly, one of the ... is superfluous:

Russkiy Vestnik → Russkiy Vestnik

Perhaps that's a job for a bot down the road: to remove the extraneous wikimarkup outside the template.

Pinging editors from the other discussion because you mentioned some of them without pings. @WOSlinker, Justlettersandnumbers, Uanfala, Jonesey95, and Scriptions:

—Trappist the monk (talk) 18:31, 29 November 2017 (UTC)[reply]

Thank you for the ping, Trappist. I agree with Izno that my use of single primes to create bold, non-italic text is undocumented, but it's a lot simpler than using {{noitalic}} and triple primes. It worked, so I did it; I'd like not to need to do it. I've asked above for proper support of bold text rather than this or any other clumsy work-around. I envisage a |bold= parameter which would default to no but could be set to yes. Is that something that could easily be realised? Ideally, if set to yes it would automatically set |italic=no, unless that is purposely over-ridden. I had assumed that some sort of a bot run would eventually be needed to sort out this "undocumented use" of mine and others like it. Once again, my thanks to those who are working on making this better. Justlettersandnumbers (talk) 19:21, 29 November 2017 (UTC)[reply]

@Justlettersandnumbers: I think a |bold= is quite reasonable and obviously do-able... but why? What is the use case for bold rather than italics, and why is that use case not better met by both bold and italics (bold for whatever meaning you are attempting to assign to the markup and italic for the non-English language meaning)? MOS:BOLD provides for very few uses bold. --Izno (talk) 22:54, 29 November 2017 (UTC)[reply]

For templates that use Module:Lang, bold font can be achieved with standard wikimarkup:

{{lang-af|'''bold text'''}} → Afrikaans: bold text

Don't want italic? set |italic=no.

—Trappist the monk (talk) 13:07, 30 November 2017 (UTC)[reply]

@Justlettersandnumbers: They're not primes, they're apostrophes. If you use primes, ′′like this′′ or ′′′this′′′, they're displayed literally. --Redrose64 🌹 (talk) 21:28, 30 November 2017 (UTC)[reply]

I can contrive an example that you might see elsewhere (especially in fiction or in briefly writing about some memory): The USS Enterprise was sailing from the port of San Diego when.... You wouldn't close the first italic tag when you reach the second open italic tag because you had not yet reached the end of your voiced thought. Maybe one that might actually appear would be some quotation about the USS Enterprise in another language e.g. <q>Spanish text USS Enterprise Spanish text</q> (mind you, <q lang="es">...</q> might be preferable but not as easy to integrate into this module--Erutuon has a comment in that direction).

In the general case, it's not incorrect for this markup to render like so, and I believe either browsers implement the switching natively in their default CSS files or that we currently have something in one of the CSS files Wikipedia sends to the browser instructing the browser to do so. It's only in this specific case that we have identified that the italics on the inside and the italics on the outside are superfluous.

I believe this issue presents itself where any italics are used, automated or not, not solely with Latn script in the tag. I think it preferable to keep the automated functionality in the template. I agree that we should fix the cases where ''{{lang|Latin character language|Name of work}}'' is used (I am unsure that it is bot-able per WP:CONTEXTBOT, but probably is WP:AWBable). I think we should also detect cases like JLAN's apostrophers (e.g. {{lang|ru-Latn|'Russian in Latin characters'}}) and a suitable fix for that issue programmed also (whether that's removal of the offending characters internal to the template or addition of a separate parameter--the functionality for which I have just queried Justlettersandnumbers). --Izno (talk) 22:54, 29 November 2017 (UTC)[reply]

I understand that it is sometimes appropriate to nest ... tags.

If I understand you, every {{lang-??}} template that does not directly use Module:Lang (they do use the module indirectly through {{lang}}) and that has hard-coded italic wiki markup, will be causing lint errors. Is that correct?

What are JLAN's apostrophers?

—Trappist the monk (talk) 13:07, 30 November 2017 (UTC)[reply]

It is possible that every one will cause lint errors. I do not know for certain, however.

JLAN's apostrophers == Justlettersandnumbers's apostrophes. --Izno (talk) 13:55, 30 November 2017 (UTC)[reply]

Ah, right. Your example ({{lang|ru-Latn|'Russian in Latin characters'}}) differs from Editor Justlettersandnumbers's example ({{lang-it|'Livorno'}}). Still, in both cases, I think that the module is doing the correct thing in that it does not convert a single apostrophe into Roman bold-face font.

—Trappist the monk (talk) 18:57, 30 November 2017 (UTC)[reply]

(←) I believe this search is pretty close to the upper bound of all articles where this bolding hack is present that can be fixed with TTM's suggested fix. In review, I don't think a |bold= is necessary. However, the templates used appear to need to be able to take |italic= as a passthrough their parent template. (Or is it the case that Template:language with name is going away?) --Izno (talk) 19:27, 1 December 2017 (UTC)[reply]

Well, it's not a suggested fix per se, rather, it's a progression. As we migrate the {{lang-??}} templates to use Module:lang directly, the bolding hack will fail to work and the text that was hacked will be rendered single-quoted in italic font. We cannot / should not change all {{lang-??}} templates in a single operation. It will take time. If anyone is reading this and cares about hacked instances of these templates, keep an eye on the template pages that are of interest to you so that when the template is switched to use the module, you can undo the hacks and write the template instances correctly.

{{Language with name}} isn't going away but it will no longer be used by the {{lang-??}} templates. The functionality currently provided to unconverted {{lang-??}} templates by {{language with name}} has been subsumed into Module:lang.

—Trappist the monk (talk) 20:27, 1 December 2017 (UTC)[reply]

If  is used, it would probably be best to move the language (and any other) attributes there () instead of having two tags . For what it's worth, that's what we do on Wiktionary. — Eru·tuon 22:35, 29 November 2017 (UTC)[reply]

Moving that direction is probably a bit more complex than my suggested change for right now, while the module is a bit in motion. It might help prep the module for a block implementation as well as the current inline implementation. --Izno (talk) 22:59, 29 November 2017 (UTC)[reply]

This sort of markup is perfectly suited to the transliteration rendering of the {{lang-??}} templates because the transliteration is always Latin script and so always italicized:

{{#invoke:lang|lang_xx|code=el|text=Θεοτόκος|italic=no|translation=God-bearer|translit=Theotokos}}

Script error: The function "lang_xx" does not exist.

<strong class="error"><span class="scribunto-error" id="mw-scribunto-error-c20ce3b4">Script error: The function &quot;lang_xx&quot; does not exist.</span></strong>

—Trappist the monk (talk) 18:57, 30 November 2017 (UTC)[reply]

So, it seems that the template has been updated, for which many thanks (due, I believe to Trappist the monk). But it's been updated with the change of handling of wiki-markup for italic text, such that forms such as {{lang-it|'Livorno'}} now display as 'Livorno' instead of Livorno, exactly the bug I complained of further up this page on 16 October. So I ask again, as I asked then: if the change is an important improvement, can someone please give some hint as to how the various affected pages could be tracked down and fixed? I have tried searching for insource: "{{lang-it|'", for example, but that does not yield useful results (our search engine ignores the crucial apostrophe). Ideas? Justlettersandnumbers (talk) 18:52, 9 December 2017 (UTC)[reply]

Do the search with a regex: insource:/\{\{lang\-it\|'[^']/

—Trappist the monk (talk) 18:59, 9 December 2017 (UTC)[reply]

I'm still trying to understand: [...] why? What is the use case for bold rather than italics, and why is that use case not better met by both bold and italics (bold for whatever meaning you are attempting to assign to the markup and italic for the non-English language meaning)? MOS:BOLD provides for very few uses bold. My expectation is that wherever WP:MOSBOLD calls for bolding, and where WP:MOSITALICS would call for italics, should receive both. The search I linked above works. --Izno (talk) 19:34, 9 December 2017 (UTC)[reply]

Izno, because we don't use italics for proper names (hence the requests for the capability to disable italics in the template), but do sometimes need to bold-face them. Thanks, Trappist the monk, will try. It'd be really good if the documentation for our search engine actually gave some advice on how to search for things. Justlettersandnumbers (talk) 19:41, 9 December 2017 (UTC)[reply]

OK, I just happened across this: Rocco and His Brothers. Why does using italics for a proper name in English (in accordance with our MOS) trigger such an alarming warning? I don't see any reason why lang-en needs to be used there, but nor do I see any reason for it to create such a bloodbath. Could we perhaps tone down or eliminate such messages for this non-fatal error? Justlettersandnumbers (talk) 23:07, 9 December 2017 (UTC)[reply]

In general, templates are responsible for the 'style' of the rendered output. All of the 600ish {{lang-??}} templates have a default setting to italicize or to not italicize according to the language's writing system. When that writing system is a Latin script, the templates default to italic rendering in accordance with MOS:FOREIGNITALIC. The English language version of the template {{lang-en}} does not italicize because this is the English Wikipedia where we do not italicize normal English text.

Because of this, use of {{lang-en}} at en.wiki is mostly pointless; the template's documentation says as much. For your example, it is correct to italicize Rocco and His Brothers because it is a film title per WP:ITALIC but there is really no need to use {{lang-en}} there and, my opinion, no need to label Rocco and His Brothers as English text.

The error message is there because, instead of wiki markup, these templates use a parameter to control the italic rendering which is not in keeping with the hard-coded method used by the old templates. It is necessary to know where templates with italic markup are located. You noticed, so the mechanism must work. Each error message has a link to help information at Category:Lang and lang-xx template errors so that editors can learn what the error message means and then take appropriate action. As I write this, of the 636,000ish pages that transclude Module:lang, there are 9200ish pages with errors; a number that appears to be going down.

—Trappist the monk (talk) 00:30, 10 December 2017 (UTC)[reply]

Lang-en (without something more specific) is probably only useful for English-language text inside quoted non-English language text, e.g. {{lang|es|Mi casa es su casa, {{lang|en|brother}}.}}. Even with something more specific, like en-US it's only useful for a) pre-modern dialects, b) linguistic markup of regionalisms, and c) direct comparisons of things like US versus UK English. I sometime encounter it (in ref citations especially, of all places) simply to mark up something as American or British or whatever, and this is pointless and annoying. I remove such instances on-sight. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 08:12, 10 December 2017 (UTC)[reply]

I have been wondering of late if the correct solution to the 'italics problem' wouldn't be to use css in the ... tag. For text that is to be italic the templates would write:

and for upright, non-italicized text:

We might extend the |italic= parameter to allow for a third value unset which would have the templates write

 (or just leave out the font-style property)

normal text with italic text in the rendering – like this if |italic=yes
italic text with normal text inside of italic markup – like this if |italic=no
italic text with italic inherit text inside of italic markup – like this if |italic=unset
normal text dictates normal inherit text in the rendering – like this if |italic=unset

—Trappist the monk (talk) 14:07, 14 December 2017 (UTC)[reply]

is correct/fine per the spec. --Izno (talk) 14:31, 14 December 2017 (UTC)[reply]

As far as it goes, yes. But we can't do case 2 and use |italic= to control the template's rendering:

''some italic text {{lang|en|with normal text embedded|italic=no}} and more italic text''

some italic text with normal text embedded and more italic text

For this case we have to override the wiki markup somehow. If we keep the wiki markup in the module, as it is now implemented, we can write:

''some italic text {{lang|en|with normal text embedded|italic=yes}} and more italic text''

some italic text with normal text embedded and more italic text

but that is semantically incorrect and confusing. The font-style property allows us to say that no means no and yes means yes and unset means 'use the already set style'. Also, by doing this we switch a property rather than a whole tag which to me seems cleaner.

—Trappist the monk (talk) 15:18, 14 December 2017 (UTC)[reply]

In Module:Lang/sandbox I have reworked italic handling so that it supports the font-style property I described above. This version of the sandbox code introduces two additional accepted values for |italic=. The table attempts to illustrate how the new code works:

lang |italic= parameter operation
\|italic= value	description	example code	result
parameter not present; parameter present, not set; invalid value	module applies default (upright) style; yields to script subtag `latn`; invalid values treated as default	`{{lang/sandbox\|ru\|тундра}}`	тундра
		`{{lang/sandbox\|ru-latn\|tûndra}}`	tûndra
`default`		`{{lang/sandbox\|ru\|тундра\|italic=default}}`	тундра
`default`		`{{lang/sandbox\|ru-latn\|tûndra\|italic=default}}`	tûndra
`no`	module applies upright style; overrides script subtag `latn`	`{{lang/sandbox\|ru\|тундра\|italic=no}}`	тундра
`no`		`{{lang/sandbox\|ru-latn\|tûndra\|italic=no}}`	tûndra
`yes`	module applies italic style; ignores script subtag `latn`	`{{lang/sandbox\|ru\|тундра\|italic=yes}}`	тундра
`yes`	module applies italic style; ignores script subtag `latn`	`{{lang/sandbox\|ru-latn\|tûndra\|italic=yes}}`	tûndra
`unset`	module applies no style; overrides script subtag `latn`; style inherited from external markup	`{{lang/sandbox\|ru\|тундра\|italic=unset}}`	тундра
		`''{{lang/sandbox\|ru\|тундра\|italic=unset}}''`	тундра
		`{{lang/sandbox\|ru-latn\|tûndra\|italic=unset}}`	tûndra
		`''{{lang/sandbox\|ru-latn\|tûndra\|italic=unset}}''`	tûndra

Results are similar for the {{lang-??}} templates. You can see that at my sandbox though the rendering there is a bit more cryptic.

A variant of this table should become part of the documentation for the {{lang}} and {{lang-??}} templates. A better base language is in order because es-Latn is a malformed IANA language tag (Latn is a suppressed script for es – we should detect and do something about that) but it works here as an illustration for now.

—Trappist the monk (talk) 17:57, 20 December 2017 (UTC)[reply]

Simpler subtag parsing

I created a simpler subtag parsing function in Module:Lang/sandbox. It does not require enumerating every combination of language, script, region, and variant code. It doesn't work perfectly yet. See Module talk:Lang/codes/testcases for the result. (I didn't want to add the testcases to Module:Lang/testcases because that page is kind of long already.) More examples would be appreciated. — Eru·tuon 02:03, 30 November 2017 (UTC)[reply]

This is something that I had intended to do but was leaving to later. I have replaced the non-working variant validation/consolidation code at the bottom of get_ietf_parts() with the working code from the live module. This new snippet also includes more helpful error messaging. Apparently there is something wrong with how Lang/sandbox or Lang/codes/testcases evaluates/renders actual column results. In the three failures, where is the fourth table element?

{{lang-??}} templates using the module are allowed to use |script=, |region=, |variant= to supplement the IETF language tag provided by the template (overriding is not currently permitted but is contemplated). The values supplied by these parameters are validated in get_ietf_parts() and why each of script, region, and variant are all individually made lowercase; not done all at once as you have done in /sandbox.

—Trappist the monk (talk) 12:16, 30 November 2017 (UTC)[reply]

I don't know what's happening to make the tables have only three elements, but I'll look into it.

I don't understand how what you are saying relates to letter case. Could you clarify? At which point is letter case significant in the function get_ietf_parts? ~~I do notice now that language code is never lowercased, so perhaps letter case for it should be preserved, and an error be triggered for something like {{lang|GrC|...}}.~~ — Eru·tuon 20:36, 30 November 2017 (UTC)[reply]

Actually, code is lowercased in Module:Lang before it is validated, so {{lang|GrC|...}} would not return an error. — Eru·tuon 20:40, 30 November 2017 (UTC)[reply]

In Module:Lang, get_ietf_parts() is called at line 665:

get_ietf_parts (args.code, args.script, args.region, args.variant);

In that call, args.script, args.region, and args.variant come from the template parameters |script=, |region=, and |variant= respectively and can be any case (IETF language tags have no standardized case; there is a 'common' way of writing the various subtags that, by some sort of convention, uses particular case – we mimic that in format_ietf_tag())

In get_ietf_parts() (Module:Lang) the parsing (that part you are rewriting) is case insensitive. Once parsed, we look to see if any of the template parameters (|script=, |region=, and |variant=) is set. If any of these is set, and there is no matching subtag in source, then we assign the template parameter's value to the appropriate local variable (lines: 194, 209, and 224). Then we validate. Before we can do that, we down-case the content of the variable in question because the data tables are all indexed with lowercase keys (because of __preprocess() in Module:Language/name/data).

In Module:Lang/sandbox you down-case only the value in source. That works fine for {{lang}} which doesn't support the subtag parameters but won't work for the {{lang-??}} which do/will support them.

—Trappist the monk (talk) 21:18, 30 November 2017 (UTC)[reply]

Okay, I see. The supplied subtags need to be lowercased. — Eru·tuon 21:54, 30 November 2017 (UTC)[reply]

Category:Articles containing Semitic languages-language text

It appears that Template:Transl/Template:Lang is the common denominator between the articles in Category:Articles containing Semitic languages-language text rather than Category:Articles containing other Semitic-language text. In Aleph "sem" is only a variable in the Template:Transl but in Adnan, "sem-Latn" is called in Template:Lang. Hyacinth (talk) 22:12, 1 December 2017 (UTC) Hyacinth (talk) 22:19, 1 December 2017 (UTC)[reply]

Template:ISO 639 name sem may contain a problem. Hyacinth (talk) 22:34, 1 December 2017 (UTC)[reply]

The code "sem" should link to Semitic language. I don't know why the word "other" was in the template, but it was breaking things. The ISO 639 templates are a mess. – Jonesey95 (talk) 23:19, 1 December 2017 (UTC)[reply]

I don't think my changes have fixed anything, though. I used to understand how these templates worked, but with the module-ization still in progress, I'm a bit at sea. – Jonesey95 (talk) 23:26, 1 December 2017 (UTC)[reply]

(edit conflict)

sem is an ISO 639-2 collective. See sem @ sil.org and their definition of Collections of languages – which definition, to me any way, is rather obtuse. The correct name for sem is 'Semitic languages' and this is the name that {{lang}} was getting from the IANA data table and the name that Module:Lang used when creating the category link for that code. For the time being, I have created an entry in the override table in Module:Lang/data. This will work until someone creates a {{lang-sem}} template by which time we may have figured out how to handle collectives. I'll add notes and TODOs in the appropriate places.

—Trappist the monk (talk) 23:48, 1 December 2017 (UTC)[reply]

Not sure that this is the venue to discuss the ISO 639 name templates. {{Lang}} and the {{lang-??}} templates are abandoning all of the ISO 639 name templates along with the templates that might have called them.

—Trappist the monk (talk) 23:48, 1 December 2017 (UTC)[reply]

One issue is that some of these "collections" are language families with a proto-language (for instance, the Semitic languages with Proto-Semitic), and in that case the code for the language family is sometimes used for the proto-language. For example, in {{proto}}, sem is used as the code for Proto-Semitic. Wiktionary distinguishes the proto-language by appending -pro (sem → sem-pro). This is because language and codes must be distinct, as they are both used in etymology templates and there are distinct categories pertaining to each (for example, "Terms derived from Semitic languages" and "Terms derived from Proto-Semitic"). But I don't know if on Wikipedia this polysemy (a code being used for both language family and proto-language) will cause similar problems or what solution would be appropriate. — Eru·tuon 00:35, 2 December 2017 (UTC)[reply]

This makes my brain hurt. I'm not sure that we care what is done at {{proto}}. There, the code just feeds a {{#switch:}} that chooses a wikilinked article name to precede the 'text'. The template takes no care to properly identify the language in metadata as {{lang}} does.

IANA doesn't apparently recognize proto (that word isn't in the registry). The only 'fit' would be as a variant (5-character length) but because proto isn't registered with IANA, shouldn't the proper form be a private use subtag: sem-x-proto? We should not / must not redefine tags that are already defined by international standards organizations.

In Module:Language/data/wp languages there are three 'proto' language codes defined:

cel – IANA name: Celtic languages; WP name: Proto-Celtic, a redirect to Proto-Celtic language (ISO 639-2 collective)

gem – IANA name: Germanic languages; WP name Proto-Germanic, a redirect to Proto-Germanic language (ISO 639-2 collective)

pgl – IANA name: Primitive Irish; WP name: Proto-Irish, a redirect to Primitive Irish (ISO 639-3 individual)

Of those, pgl should probably be deleted from the WP languages table because it is an ISO 639-3 individual language so we should be displaying 'Primitive Irish' with {{lang-pgl}}. The other two inappropriately redefine the international standards organizations' code/name assignments so if we are to keep them as 'Proto-something' then we should create correct private use subtags cel-x-proto and gem-x-proto.

—Trappist the monk (talk) 01:52, 2 December 2017 (UTC)[reply]

I have deleted pgl from Module:Language/data/wp languages.

—Trappist the monk (talk) 12:10, 2 December 2017 (UTC)[reply]

Oops, I wrote -proto but meant -pro. Corrected.

Just to clarify, I'm not proposing that Wikipedia use the same convention (-pro) as Wiktionary. Wikipedia wants to follow external standards, while Wiktionary is perfectly comfortable with creating its own idiosyncratic hyphen-containing language codes that have nothing to do with IETF subtags. So Wikipedia and Wiktionary are incompatible here. Your idea of a private use subtag sounds more consistent with Wikipedia's preferences. — Eru·tuon 06:05, 2 December 2017 (UTC)[reply]

I believe that "other" is a reference to transliterated Semitic. Hyacinth (talk) 01:40, 2 December 2017 (UTC) As is "sem-Latn". Hyacinth (talk) 01:41, 2 December 2017 (UTC) See: Template:Category articles containing non-English-language text. Hyacinth (talk) 01:44, 2 December 2017 (UTC)[reply]

Category:Articles containing Eskimo-Aleut languages-language text is on String figure. Hyacinth (talk) 03:22, 5 December 2017 (UTC) Template:Lang-esx. Hyacinth (talk) 03:26, 5 December 2017 (UTC)[reply]

{{lang-esx|put two things together}} is a misuse of the template: 'put two things together' is English. Don't do that.

How the templates deal with language collections is a known unresolved issue. The fix that worked for sem won't work here because {{lang-esx}} exists.

—Trappist the monk (talk) 04:10, 5 December 2017 (UTC)[reply]

Italic related, don't know how to fix

Hi. Can you have a look at Poor Dionis? Two blocks of text, both of which have italics for just one sentence, have vanished, and, looking over the potential fixes, I could find nothing to address the specific problem. Dahn (talk) 12:17, 9 December 2017 (UTC)[reply]

I think I've fixed it now (converting on the way from {{lang-xx}} to simply {{lang}} as I don't think it's necessary to add language labels, but feel free to reinstate them using template-external text). Now, the problem (visible in this old revision) was that lang-xx templates were used for an entire paragraph of text, and this text contained within it italicised phrases. The template assume that the markup is meant for the whole text and see it as an error, but that's a legitimate use. Is there any way to fix that? – Uanfala (talk) 12:44, 9 December 2017 (UTC)[reply]

Thank you, that's a very good solution. As for the rest: I'm sure the problem can easily pop up in other templates where the same was used, so maybe it's a good idea to add that to the list of potential script errors? Dahn (talk) 12:52, 9 December 2017 (UTC)[reply]

Thousands of articles generating visible errors due to italic markup; solutions?

After fixing a couple of dozen ones at random in Category:Lang and lang-xx template errors, broadly speaking I see four cases:

{{lang-xx|''All the text inside is italicised''}}
- Can be replaced with {{lang-xx|italic=yes|All the text inside is italicised}}
- Or sometimes {{lang-xx|italic=no|All the text inside is unitalicised}} when "xx" is written in Latin script in the first place
{{lang-xx|Тэкст виФ транскрипцион ''Text with transcription''}}
- Can be replaced with {{lang-xx|Тэкст виФ транскрипцион}} {{lang|xx-Latn|italic=yes|Text with transcription}}
- Could also use "translit" parameter, though that introduces extra WP:LEADCLUTTER which could be undesirable in some cases
{{lang-xx|Name1 ''or'' Name2 ''or'' Name3}}, where the "italics" markup on "or" is actually intended to de-italicise
- Could be replaced with {{lang-xx|Name1}} or {{lang|xx|italic=yes|Name2}} or {{lang|xx|italic=yes|Name3}}
- But I suspect bot regex replacement wouldn't be safe, probably there's similar-looking cases where something else is intended
Other stuff which will require manual intervention

How shall we go about clearing this backlog? Should I go to Wikipedia:Bot requests, or is someone handling this already? (I think at least the first two cases can be handled by bots, allowing human effort to be focused on more difficult cases). Cheers, 59.149.124.29 (talk) 05:12, 11 December 2017 (UTC)[reply]

Thanks for fixing what you've fixed. Unfortunately it isn't quite as you describe.

{{lang-??}} templates do not all default to italic rendering so the italic markup might have been used to negate the default italic or been used to force italics.
it is not always clear that the 'text with transcription' you refer to is a transliteration or is a restatement of the text written in the language's 'other' (often Latin) script (Serbian uses both Cyrillic and Latin, for example; there are quite a few others). When the italicized text is a transliteration, and the static text provided by the {{lang-??}} template is not desired, perhaps a better choice is to use the more semantically correct {{transl}} template.
yeah, I think this is the correct solution assuming that the script used for the non-English text is supposed to be italicized. The rather larger issue with your example is that the original template mixes the English 'or' with the non-English text which is counter to the underlying html markup that identifies all text in {{{1}}} as the non-English language.

I suspect then that fixes, rather than being general, are perhaps easier if done on a per-template basis. Recently I fixed several hundred instances of {{lang-so}} which by default rendered text in an upright font even though Somalian is written primarily using a Latin script (this is a case where the template should have been fixed long ago, rather than editors 'fixing' each instance to italicize). I fixed the template to italicize and then used a simple search and replace regex in AWB:

find: (\{\{\s*lang\-so\s*\|)''([^'\|\}]+)''(\s*\}\})

replace: $1$2$3

I suspect that something similar may work for a lot of other templates. Of course, hundreds of editors each fixing the templates on pages that they care about could go a long way to clearing the error category. I know, that's being overly optimistic.

—Trappist the monk (talk) 11:55, 11 December 2017 (UTC)[reply]

-Latn stuff

So, how are we going to move forward for things like {{lang-lo-Latn}}. Do we just start creating these templates (and if so, with what calls to the module?), or do we want to use {{lang-lo}} and pass a parameter to it, or ...? I may have missed some prior discussion on this, but so much has been going on I wanted to explicitly ask about this before taking any action. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 16:32, 13 December 2017 (UTC)[reply]

This topic hasn't been discussed. Were it up to me, I would say: don't fork {{lang-lo}} → {{lang-lo-Latn}}. In the old days you would have had to fork but now, there is |script=Latn so use that.

Additionally, I think that by using the parameter we can think about a bot task to rewrite instances of templates like {{lang-sr-Cyrl}} and {{lang-sr-Latn}} to use {{lang-sr|script=<script>}} and then delete {{lang-sr-Cyrl}} and {{lang-sr-Latn}}.

—Trappist the monk (talk) 11:44, 14 December 2017 (UTC)[reply]

Quick test cases:

{{lang|lo|script=Latn|fu}} → fu
{{lang-lo|script=Latn|fu}} → Lao: fu
{{lang|es|casa}} → casa
{{lang-es|casa}} → Spanish: casa

The {{lang-xx}} output is inconsistent. If we're going to italicize by default in that template family, that needs to apply to |script=Latn output or people are going to get confused, and we'll have inconsistent results in our output. There is no use case for Latn script output being non-italic where the same output in a Latin-script language would be italicized, or vice versa. The only variance of any kind I'm aware of is that, per MOS:TITLES, the title of a work that should be italicized is italicized in Greek and Cyrillic (and by extension other alphabets close to Latin) but not in CJK (nor by extension others that have no relationship to Western letterforms, e.g. Arabic and various Indian languages); and that's a manual case-by-case tweak, not something the template needs to "know" about.

Given the amount of |script=Latn I intend to use, now that we have that feature, having to also manually italicize would be seriously onerous.
— SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 17:09, 19 December 2017 (UTC)[reply]

It all goes back to the presumed 'default' states doesn't it? {{lang-??}} templates default to italic rendering unless that condition is overridden by |italic=no. In your example, {{lang-lo}} defaults to italic because it's a {{lang-??}} template but that is overridden with |italic=no because that was the template's state when I converted it to use Module:lang.

When there are competing and possibly contradictory parameters as in your example, {{lang-lo|script=Latn|fu}}, one must prevail or there must be an error message. I chose |italic= to be the winner when competing with |script=Latn (because fu might be a proper name – which was what started all of this). To make your example work, you must write:

{{lang-lo|italic=yes|script=Latn|fu}} → Lao: fu

I think that all of this is documented at Template:Lang-lo#Parameters.

—Trappist the monk (talk) 17:54, 19 December 2017 (UTC)[reply]

I understand that's what's happening now; it's just undesirable, because 99% of the time it's not going to be a proper name and we will want the italics, for any language if we emit Latn. It's already a hassle – but one we're used to – that {{lang|de}}, etc., don't italicize while {{lang-de}}, etc., do when in Latin-based scripts. It gets into "this is so confusing I want to shoot someone" when it veers back in the other direction and doesn't italicize when |script=Latn. It's re-inspire the original idea of creating {{lang-foo-Latn}} templates that emit the expected italics that are consistent with {{lang-de}}, etc., etc.

Another way of putting it, for a class of templates that does italicize, the only logical output for |script=Latn is the italics, for the same reason as the original italicization. For such a template group, |script=Latn implies italics, rendering |script=Latn|italics=yes redundant and a waste of time. Or in other words, {{lang-de}} essentially is {{lang-de|script=Latn}} and {{lang-de|script=Latn|italics=yes}}, simultaneously (conceptually speaking).

This is most especially problematic in in the {{lang-xx}} case, because we cannot do either of the obvious wikimarkup things: both ''{{lang-lo|script=Latn|fu}}'' and {{lang-lo|script=Latn|''fu''}} will fail (for different reasons). If there's a way to make the latter case stop failing that might be good, too, though it has the benefit of preventing people wrongly italicizing non-Latin-based scripts with {{lang-lo|''[whatever the character is]''}}.

I'm not trying to have a sport argument or philosophical debate with you. I'm super-impressed by all the work so far, but am begging you to fix this one issue, because having to deal with it, as-is, may waste untold of person-hours for many editors over the long haul: manually adding |italic=yes= a zillion times, and using searches to find a zillion cases where people left it out and then going and fixing them (again and again and again), and writing bots to do it, and arguing with people confused by the docs, and etc. If you're telling me this can only be addressed at the language-specific template level, then that's very depressing, but it sounds like its not ("I chose |italic= to be the winner ...").

Aside: In the case of don't-italicize-a-proper-name, we'd want {{lang-lo|script=Latn|italic=no|Fu}}, and this would be very, very rare because we generally don't use lang markup around proper names except for titles of works sometimes, and when using a place or personal name in a words-as-words manner, e.g. a cross-language comparison like "Munich (German: München)" – both cases that usually call for italics. Virtually all Latn cases will need the italics. When referring to a Laotian named Fu, we'd just use Fu, without markup at all (or most of us would – there's no policy about it or anything; maybe it will become more common, but I doubt it).
— SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 23:32, 20 December 2017 (UTC)[reply]

An important reason for the existence of these templates is that which is unseen. The templates make for correct html so that your browser or your screen reader does the right thing. You should not rely on |script=Latn as a guaranteed way of rendering text in italics. There are 90 language codes in the IANA subtag registry that expressly include Suppress-Script: Latn – there are 44 other languages that suppress other scripts. One of those things yet to be done is to add support to the module so that we can detect de-Latn so that we don't write ... – we write that now but shouldn't. And an oh-by-the-way: |script= isn't supported by {{lang}} because you can add the script subtag directly to the language code in the template call, something that you can't do with the matching {{lang-??}} template.

Pretty much the last thing that I want to do is break every existing {{lang-??}} template – that's multiple hundreds of thousands of articles. I'm actually astonished that the count of articles in Category:Lang and lang-xx template errors only just peeked over 10,000 after I finished converting most of the 600+ {{lang-??}} templates to use the module. When I did that, in most cases I set |italic= only when the wikitext version of the template did not italicize {{{1}}}. The notion that 'all' {{lang-??}} templates rendered in italic is largely false; a lot of them did/do, a lot of them didn't/don't. But, right now, the vast majority of {{lang-??}} templates are rendering just as they were before the change.

I confess, for all of these words, perhaps I'm losing the plot. If you are expecting that all {{lang-??}} templates act exactly the same way, they don't because they never did. I do wonder if we might create an |initial-italic-state= parameter that might be used where we now use |italic= in the {{lang-??}} templates' {{#invoke:}}. This new parameter would set the template's 'default' italic rather than have the module assume that all {{lang-??}} templates are the same and so have the same 'default'. If we did that, for those language codes that permit |script=Latn, it would not be necessary to write {{lang-lo|script=Latn|italic=yes}}. |initial-italic-state= would be a required parameter in every {{lang-??}} template so that we can replace the current module's global italic setting. {{lang}} would not be changed.

Another way might be to have the templates that aren't italicized call a different initial function in the module so {{#invoke:Lang|lang_xx_normal}} where the lang_xx_normal() function sets initial_italic='no'. This is probably better because it will likely be easier to implement. Or not.

There will still be the issue of |script=Latn competing with |italic=no. I still choose to have |italic= win. Were we wanting to write, say, the Arabic name for a certain recalcitrant mountain using Latin script, we would have to write:

{{lang-ar|Mohamed's Mountain|script=Latn|italic=no}}

We require |script=Latn because we use it to make the correct lang=ar-Latn attribute and to turn off right-to-left support. Because |script=Latn is stronger than initial_italic_state=no, italics would be applied. But, because Mohamed's Mountain is a proper name that should not be rendered in italic font, we require |italic=no which must be stronger than |script=Latn.

—Trappist the monk (talk) 01:38, 21 December 2017 (UTC)[reply]

I have implemented the second of my two ideas in Module:lang/sandbox, created {{lang-lo/sandbox}}, and tweaked {{lang-es/sandbox}} to test. Results in my sandbox.

{{lang-lo/sandbox}} specifies an initial font-style:normal because it calls lang_xx_normal(). That style can be overridden by |script=Latn without the need to also set |italic=yes:

{{lang-lo|fu|script=latn}} → Lao: fu

{{lang-lo/sandbox|fu|script=latn}} → Lao: fu

—Trappist the monk (talk) 12:41, 21 December 2017 (UTC)[reply]

No time right now (work calls), but {{lang-ar|Mohamed's Mountain|script=Latn|italic=no}} is not what -Latn is for. 'Mohamed's Mountain' is a gloss, not a Latin transliteration of or encoding of Arabic. That would be ǧbl mḥmd (among many other approaches for latinizing/romanizing Arabic), for Arabic script جبل محمد. Maybe there are {{lang-xx}} templates for language that usually/always use a Latin-based script and which are not italicizing; most of them do, so this is an inconsistency problem to fix, not an excuse for us to make things even more confusing. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 21:57, 21 December 2017 (UTC)[reply]

You know, sometimes I wonder that people can communicate at all. The Mohamed's Mountain example was merely an illustration. I did write the Arabic name assuming, more fool I, that it is understood to be a placeholder and not an accurate transliteration of actual Arabic. The purpose of the illustration is to show that for proper names transliterated into Latin script, we require:

|script=Latn
1. so that the html lang=ar-Latn attribute is correct
2. so that the html dir=rtl attribute, normally associated with code ar is properly omitted
|italic=no
1. to undo the automatic italic font style set by |script=latn

—Trappist the monk (talk) 22:32, 21 December 2017 (UTC)[reply]

Ah, okay. Well, I had been on my way out the door to catch a train, so maybe I didn't read that carefully enough. Anyway, the lack of consistent behavior is the issue, nothing more. I'm not making any kind of philosophical point, or even a preferences one (why were {{lang|xx}} and {{lang-xx}} ever doing something different, italics-wise, for the same language encoding in the first place? Doing either italics or no italics predictably would be preferable, whichever default people want it to be; doing the italics when the script is Latin-based [usually or via -Latn] would be the most practical, since most uses do need to be italicized – that's a "don't waste editors' time assessment, nothing more). Maybe we just need to have an RfC to normalize it all after the re-coding dust settles. — SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 16:51, 23 December 2017 (UTC)[reply]

doing the italics when the script is Latin-based [usually or via -Latn] would be the most practical is what most of the {{lang-??}} templates do now – languages that are written in multiple scripts are problematic: which script do you choose as the 'default'? Fortunately for me, someone else has already taken the trouble to make those decisions so all that I have done is to continue to use the default that that someone chose.

Making {{lang}} do the same thing is difficult for a couple of reasons: it has always rendered in normal font style so where italic rendering is required, italic wiki markup is applied either inside (where the template can detect it) or outside (where the template cannot detect it). We would need to create some sort of mechanism to know from the language code (a big-damn table listing all language codes that are to default render in Latin script – yuck), or a mechanism to read the content of {{{2}}} and make a decision from that – is all text in {{{2}}}, or all of the text in the display portion of a wikilink, Latin script? We might crib something from Module:Language/scripts (isLatn() looks promising). We should also make sure that the is_latn() test (I hate camelCase) doesn't fail on punctuation because some languages use rather a lot of it: Yavapai: Wi:kaʼi:la. This latter doesn't, of course solve the problem of the tradition that {{lang}} never automatically italicizes its content.

—Trappist the monk (talk) 17:52, 23 December 2017 (UTC)[reply]

I have moved the weaker-initial-style code to the live module and have an AWB script that I am using to revise the individual {{lang-??}} templates; 50 done, 550ish to go.

—Trappist the monk (talk) 20:41, 23 December 2017 (UTC)[reply]

italics and proper names

There have been a few comments on this talk page saying that non-English proper names are not to be italicized:

while italics are commonly used for words in other languages, they are not used for proper names – Justlettersandnumbers (diff)
because we don't use italics for proper names – Justlettersandnumbers (diff)
We need to be able to selectively disable ... the auto-italicization of non-English content in ... templates that auto-italicize ..., so that the style is not applied to proper names – SMcCandlish (diff)
Expected behavior of {{lang}} versus {{lang-xx}} is that the former will always produce non-italic output; it's too often used for proper names which are not italicized in most contexts. – SMcCandlish (diff)

I recently stumbled upon WP:NCPLACE#Emphasis which appears to disagree. I went looking for something to support the comments above and did not find it. Where is it written that non-English proper names are not to be italicized?

—Trappist the monk (talk) 15:00, 19 December 2017 (UTC)[reply]

MOS:ETY is what I usually cite for that, Trappist the monk. McCandlish is more likely than I to know if it is also covered elsewhere. Justlettersandnumbers (talk) 15:58, 19 December 2017 (UTC)[reply]

Right in front of my face and I didn't see it. Always been a failing of mine, Mom says so.

—Trappist the monk (talk) 16:10, 19 December 2017 (UTC)[reply]

The shortcut MOS:BADITALICS gets even closer to it (same section, though).

In theory, we shouldn't even need to spell that out, because it's common sense. No one writes "US President Donald Trump had an informal meeting with Mexican President Enrique Peña Nieto and several other Latin American dignitaries in Buenos Aires, Argentina, before returning to Washington, DC." That's not a recognizable style advised in any style guide anywhere. But I guess just enough people have gone around italicizing non-English proper names that we had to insert an "AJ rule" to stop doing it. (That seems plausible – I've encountered just enough italicization of non-English PNs to consider it a nuisance.)

Non-English titles of books, films, etc., take italics, but they would anyway because they're titles (MOS:TITLES). Short works and sub-works go in quotation marks, and should also be italicized inside the quotes as non-English. It's sometimes not uniformly done with song titles and such if they're familiar to English speakers. It probably shouldn't be done if the work itself isn't really in a non-English language except for bits of it; "Que Sera, Sera (Whatever Will Be, Will Be)" seems to be more hair-splitting than anyone cares for. Non-English article titles in journals and such should be italicized within the quotation marks by default; but, per WP:CITEVAR, if our article is consistently using a citation style, and it happens to be one which forbids that [and I'd want to see the rule stating that! People make up too much BS about citation styles.], then the italics could be dropped.

When we're comparing the English and non-English names of things, that's a words-as-words case (MOS:WAW), so italics are employed even for a proper noun in that special case: "Munich (German: München)". MOS:BADITALICS/WP:ETY covers this, I think. Historically, this has often not been done very consistently in leads, so it'll be fine if the template does it by default.
— SMcCandlish ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 17:09, 19 December 2017 (UTC)[reply]

pop directional format

A recent change to Module:Lang/sandbox changed this:

table.insert (span, '&lrm;');

to this:

table.insert (span, '&#x202C;');

‬ is Unicode character U+202C Pop Directional Format.

Editor Great Brightstar: please explain why you think that ‬ is a better choice than the existing &lrm;.

—Trappist the monk (talk) 11:21, 20 December 2017 (UTC)[reply]

I just leave it as experimental feature for test porpose that anyone can try it. Since you asked me to explain now, I decided to made a test case for this.

With PDF:

David; Hebrew: דָּוִד‬, Modern David

With LRM:

David; Hebrew: דָּוִד‎, Modern David

However both PDF and RLM characters performanced the same (tested on both Chrome and Firefox), ~~so I revert my change.~~ --Great Brightstar (talk) 02:59, 21 December 2017 (UTC)[reply]

OK, I saw this is already reverted. --Great Brightstar (talk) 03:07, 21 December 2017 (UTC)[reply]

A bug with the new non-italics entry point?

I'm just adding language tagging to the lead of Grand Embassy of Peter the Great using {{lang-ru}} and noticed that the Russian is being italicised unless I add |italic=no. Template:Lang-ru does correctly call Trappist the monk's new lang_xx_normal, so I'm not sure what's going on:

{{lang-ru\|Великое посольство\|translit=Velikoye posolstvo}}	Russian: Великое посольство, romanized: Velikoye posolstvo
{{lang-ru\|Великое посольство\|translit=Velikoye posolstvo\|italic=no}}	Russian: Великое посольство, romanized: Velikoye posolstvo

— OwenBlacker (talk) 01:09, 24 December 2017 (UTC)[reply]

fixed. Thanks for spotting that. I only left out the most important bits.

—Trappist the monk (talk) 01:34, 24 December 2017 (UTC)[reply]

I found a similar error while viewing the page for Vínarterta:

{{lang|is|striped lady cake}}

striped lady cake

Lua error in Module:Lang at line 367: invalid value (boolean) at index 2 in table for 'concat'.

This error only appears for users not logged in.REH11 (talk) 02:45, 24 December 2017 (UTC)[reply]

strange linebreak and space created for some languages

How to display a Japanese ellipsis without resorting to ･･･?

Flemish (Belgian Dutch)

Fraternities and this template.

Completely incorrect advice

Recent change

Parameter to selectively disable auto-italics in the Lang-xx templates

converting to lua

categorization

translation and transliteration

links=no

sandbox testing

iana data

problems with the data set

multiple text scripts in a single template

live testing

switching |lang= to the module

what about lang-?? with this ?

recent changes and lang-ar

Auto-italicization of Latin scripts

most lang-?? templates switched to the module

promoting ISO 639-2/3 codes to ISO 639-1

using private-use tags

Wish list for future enhancement

Error

zh-yue

Odd error at Dalian Mosque

"Module:Language/data/wp languages" ?

Broken Doxology

Formatting of first line of multiline text

Category:Articles containing Pushto-language text

Kikuyu language category?

Question re: bolding and ' ' marks

Italicization

html italic markup vs wiki italic markup

Simpler subtag parsing

Category:Articles containing Semitic languages-language text

Italic related, don't know how to fix

Thousands of articles generating visible errors due to italic markup; solutions?

-Latn stuff

italics and proper names

pop directional format

A bug with the new non-italics entry point?

Parameter to selectively disable auto-italics in the Lang-`xx` templates

what about lang-?? with this ^?