Template talk:ISO 15924 script codes and related Unicode data

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Writing systems (Rated Template-class)
WikiProject icon This template falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.
 Template  This template does not require a rating on the project's quality scale.

Geok issue[edit]

Geok = Khutsuri (Asomtavruli and Nuskhuri). More to follow. -DePiep (talk) 21:27, 16 June 2014 (UTC)

ISO 15924 is published on the Unicode site. Unicode adds a "Property Value Alias" (PVA) to script codes, for scripts in Unicode. The PVA is usually a short name for the script (see the template list for differences).
ISO 15924 is published at ISO 15924 Code Lists
1, ISO: There is a link "Table 5. Alphabetical list of four-letter script names (normative plain-text data file)" (filename: iso15924.txt.zip; datafile unzipped is named iso15924-utf8-20131012.txt)
2, PVA: And "The Property Value Alias is defined as part of the Unicode Standard".

The ISO file contains these rows:

Geor;240;Georgian (Mkhedruli);géorgien (mkhédrouli);Georgian;2004-05-29
Geok;241;Khutsuri (Asomtavruli and Nuskhuri);khoutsouri (assomtavrouli et nouskhouri);Georgian;2012-10-16

The pre-last data position is the "PVA" value, being "Georgian" for both.

The PVA file says:

# Script (sc)
sc ; Ethi                             ; Ethiopic
sc ; Geor                             ; Georgian
sc ; Glag                             ; Glagolitic

(so, no Geok script data present)
This appears to be a contradiction. For now, I have added "Geok" (PVA: "Georgian" too) to the (PVA/ISO 15924) Alias list, and so it shows in this template table. -DePiep (talk) 22:09, 16 June 2014 (UTC).

(in reverse, the ISO file is not updated for new PVAs (e.g., Bass has no PVA in there). However, this does not contradict.
I don't get why the normative, defining file is not updated, while its definitions are used in a published version.) -DePiep (talk) 08:36, 17 June 2014 (UTC)
I don't see a problem here. Just because the ISO 15924 Registration Authority is hosted on the Unicode site does not imply that the Unicode Consortium is responsible for ISO 15924 (the actual ISO 15924 standard is not "published on the Unicode site" but on the ISO site), or that there is necessarily a one-to-one relationship between ISO 15924 script codes and Unicode script property value aliases. ISO 15924 recognises two varieties of Georgian script, Geor and Geok, but the Unicode standard only recognises a single Georgian script with PVA=Geor; therefore there is no "Geok" script code in Unicode, and the Unicode alias column of the template should be left blank. The situation is analogous to that of Latin, Gaelic and Fraktur: ISO 15924 defines Latn, Latg and Latf codes, but Unicode only defines a single Latin script (PVA=Latn). There is no contradiction. BabelStone (talk) 11:13, 17 June 2014 (UTC)
PVA is defined by Unicode, not by ISO 15924. Even in the ISO file. So this way Unicode defines and publishes two different definition lists. That is a contradiction by Unicode. End of story. Whichever Unicode definition list one chooses, it introduces an error. More so in automated applications. -DePiep (talk) 11:56, 17 June 2014 (UTC)


There should be a better way rather than reverting changes to the statement "Not in Unicode". --Shervinafshar (talk) 20:17, 20 May 2015 (UTC)

I already wrote about this on your talkpage. I maintain (as U:Babelstone does, IMO well familiar with Unicode) that we should only publish defined not proposed Unicode versions. All these "proposed/projected/to_come/in_the_pipeline" characters do not describe reality (unless in a subsection like Unicode#Future). -DePiep (talk) 20:33, 20 May 2015 (UTC)

Accessibility of table[edit]

Including the title of a table as a heading row is not appropriate, as it is not part of the tabular data. Adding a grouping row make the table a complex table. Wikipedia does not provide WCAG 2.0 accessible markup for complex tables, therefore the table needs to be converted to a simple table, which I did.

I think the accessibility of the table would be further improved if the ISO code and ISO name columns were to be swapped, since the name is more meaningful as a row descriptor. However, if the name cannot be assumed to be unique, then perhaps the code is the most appropriate row descriptor. Since this is potentially controversial I did not include this in my accessibility cleanup.

Also, as a side effect, the VTE links (navbar template) seem to float above the template. I didn't put it into the caption because semantically, the template edit links do not describe the table.

Thisisnotatest (talk) 21:55, 6 June 2015 (UTC)

I reverted. Maybe in minor points you are right, but the general move is wrong.
1. w3c you link to says about that 'every cell must have a row & column header'. That was served, nothing wrong. I note that the T-heading structure exactly and clearly reflects the two offices involved.
2. 'Title should be external'. - We can make that. Use the sandbox.
3. Concluding to 'convert to a simple table' is bad, exactly because we need two top headers.
4. Swapping ISO name and ISO code: no opinion. Yes this is up for discussion/improvement, but this does not matter to the bad edit I reverted :-).
5. v-t-e links must & will always follow any table outcome. No issue.
6. The editsummary with my reversal should read '... ill-concluded'.
-DePiep (talk) 22:21, 6 June 2015 (UTC)
DePiep, thank you for taking this seriously. I disagree on whether the point is minor and whether the table is compliant, so I am adding an {{Accessibility Dispute}} template to this template (point 1 of discussion). I am linking to my accessibility edit of this template for reference. I will also post a link to this discussion at the Village Pump.
1. That every cell must have a row and column header is necessary but not sufficient. The W3C page I linked to able also requires that each header cell in multilayer headings have an id attribute and each data cell have a header attribute. Actually, it appears that there is an alternate way to make complex tables accessible, that of the colgroup. However, Wikipedia doesn't seem to support the colgroup tag.
2. Thank you. I'll do that.
3. I'm not clear why we need two top headers as opposed to distinguishing the ISO headings by adding "ISO" as I did in the reverted table.
4. Awaiting comment from others.
5. No disagreement. — Preceding unsigned comment added by Thisisnotatest (talkcontribs) 23:29, 6 June 2015 (UTC)
About #3: Below the title, the main essential and core point of this table is that column values are defined by ISO or defined by Unicode. So that is what the table must show. IMO W3C exactly allows or wants this. The rest is mice meat for now. -DePiep (talk) 23:40, 6 June 2015 (UTC)
I've made a sandbox version of {{ISO_15924_script_codes_and_Unicode/sandbox}} dealing with a couple pieces of mouse meat. I agree that W3C allows for complex tables. Where we disagree is whether Wikipedia's ability to code complex tables meet the requirements of WCAG 2.0. IMO, if it does not, then it needs to be replaced with a simple table, that is, without column groupings. Anyway, I'll go post at the Village Pump so we can get others to weigh in. Thisisnotatest (talk) 23:50, 6 June 2015 (UTC)
Sandbox looks good. (I require the T-lining to group columns L-R). v-t-e box will end up OK. Unless W3C or WP:ACCESS proves 'unacceptable' (so far, you did not), I'll agree. (Will not / can not respond fast from now). -DePiep (talk) 00:31, 7 June 2015 (UTC)
See the sandbox, I edited. Simply: this is how wiki/w3c does show a table. -DePiep (talk) 00:40, 7 June 2015 (UTC)
Restored accessibility dispute template to sandbox and awaiting input from others. The sandbox no longer reflects my intent as mentioned in my previous comment. I know that a sandbox is a sandbox but now it is once again structurally the same as the original template and now pointless as a display of our differences. (Although it might also not reflect your intent either. Your comment on your last revision reads "how wiki-w3c *does* a title (namely, by |+))" but your sandbox revision itself does not reflect the |+code. I'm guessing this is an accidental omission; I'm still unsure what your remedy was as reflected in your revision comment. And even if the code reflected the change your comment intended, I still dispute the accessibility of the table in the sandbox, your and my versions both.) Thisisnotatest (talk) 01:15, 7 June 2015 (UTC)
I don't get the impression you are interested in w3c/access improvement at all. Happy fly catching. -DePiep (talk) 01:54, 7 June 2015 (UTC)
I think we're miscommunicating. Who was lol'ing about the sandbox in their edit comment? It's hard to tell tone in print, and I, rightly or wrongly, took it as a lack of seriousness about the issue. I suppose it's refreshing to be disagreeing over whether something is accessible rather than over whether it should be accessible, but I'm not feeling particularly refreshed right now.
It is true that W3C accessibility allows for complex tables. It's just that implementing complex tables requires additional tagging (id attributes on th tags, header attributes on td tags) that is not easy to do or keep maintained and may or may not be supported by Wikicode. Therefore it is better to use a simple table. Anyway, only two of us have commented on this issue so far. It would be helpful to have others whose views are on various sides of the argument. Thisisnotatest (talk) 07:03, 7 June 2015 (UTC)
What tagging do you mean? A colspan makes an complete and correct header. -DePiep (talk) 10:11, 7 June 2015 (UTC)

This has already gotten too back-and-forth to follow. I liked the numbered points. Can we start again with those, based on the current sandbox? What issues remain?  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  15:56, 14 June 2015 (UTC)

No, the current sandbox [1] is unacceptable. A pity you could not follow this thread, but IMO the line is: current live version is OK, objections are making the table worse. And most of all: those objection sources are not clear or even present. In this thread I've written why the current version is better. -DePiep (talk) 20:50, 14 June 2015 (UTC)
@SMcCandlish, at this point my position is the numbered points stand as I last numbered them, plus that the sandboxed version would be accessible. I believe DePiep's position is the numbered points are where he last numbered them, plus a negative opinion on the sandboxed version.
@DePiep: Please explain how the sandbox version is making the situation worse, aside from aesthetics? Is the Direction column mislabled? That is, would it more correctly read "Unicode direction"? (If "Unicode direction" is not more correct, then the original table is wrong by applying the Unicode grouping header to the direction column.) Thisisnotatest (talk) 04:45, 15 June 2015 (UTC)
Because it has removed those columnheaders that are over multiple columns. Some columns are ISO, some are Unicode. That is what columns are about, that is what we want to convey. (I still have not gotten why accessability would prohibit such colspans). -DePiep (talk) 20:26, 17 June 2015 (UTC)
I've now added the ISO or Unicode to every column so that the sandbox version is now conveying the same info as the current version. Accessibility does not prohibit colspans, but WCAG 2.0 AA requires that they be accompanied by id attributes in the headers and header attributes in the data cells per WCAG 2.0 accessible markup for complex tables. The larger issue is that Wikipedia doesn't support such tables, and if it did it would have to do so in a way it was reasonable to expect editors to follow. I posted the larger discussion at the WikiProject Accessibility talk page Thisisnotatest (talk) 05:58, 18 June 2015 (UTC)

Reordering ISO columns[edit]

Split off prom previous ACCESS-section. Treat as separate topic. -DePiep (talk) 21:01, 21 June 2015 (UTC)
thx. (As an unrelated sidenote, woulnd't it be better & nicer to open a row (lefthand) with the readible ISO name, followed by code and number? Might be even bold/rowheader...). -DePiep (talk) 07:31, 18 June 2015 (UTC)
You're welcome; that said, I'll wait until there's some more response and consensus on the general discussion before I push to promote the sandbox here (and I agree with you on the sidenote; however, the first five columns in each row of the table actually are created by a template {{ISO 15924 script codes and Unicode/5 cells by ISO code}}, so that template would need to be changed. I tried creating a sandbox for that one but it doesn't seem to work. And of course, any table using that template would have to have the column headings rearranged, which suggests replacing the manual headings with their own heading template.) Thisisnotatest (talk) 08:18, 20 June 2015 (UTC)
Yes that's needed. Can do that. But I don't want to interfere with the major sandbox topic now open, so I won't make this edit now. Unless you say it's OK and won't confuse. -DePiep (talk) 21:01, 21 June 2015 (UTC)
Time for a proposal. -DePiep (talk) 20:12, 25 July 2015 (UTC)


I propose to change the column order into: 1. Name 2. code 3. ISO number 4+. unchanged. IT shold read: "code assigned to a named script". Technically I see no issues. -DePiep (talk) 20:12, 25 July 2015 (UTC)

Bad move[edit]

I object to the move from template space into article space. Maintenance is easier in template space. Also, some quirks are introduced. I asked Eldizzino. -DePiep (talk) 08:34, 14 July 2015 (UTC)

Very bad move which has caused havoc with pages that transcluded the template. I have moved it back to template namespace. BabelStone (talk) 22:34, 24 July 2015 (UTC)
Subpages are lost by now (/doc, 5-column tablerow, ....). -DePiep (talk) 00:10, 25 July 2015 (UTC)

Zzzz count[edit]

@BabelStone: asked "Why exclude the 66 noncharacters from Zzzz? It is clear from http://unicode.org/Public/UNIDATA/Scripts.txt that noncharacters are also Zzzz". The answer is that I excluded them for the v9.0 update of this page bgraphic ecause they were excluded in the v8.0 update of this page. I'm not sure why we've been excluding them but the Scripts.txt verbiage makes me agree that we shouldn't: "All code points not explicitly listed for Script have the value Unknown (Zzzz). @missing: 0000..10FFFF; Unknown". I'm happy to include the noncharacters in Zzzz. DRMcCreedy (talk) 15:01, 23 June 2016 (UTC)

This is about unassigned code points, right? Reading ISO 15924, it quotes Script is defined as "set of graphic characters used for the written form of one or more languages". To me that reads: no character, no script. And this should take priority over the "All code points not explicitly listed for Script ... quoted above, correctly, from Scripts.txt. Sure Unicode can not include a "character" into a script that is not a character (by thier own definition). And Zzzz is a regular script (say, a list of graphic characters), the only issue being that Unicode(! not ISO 15924) has not encoded them into an Unicode-covered script. -DePiep (talk) 07:13, 24 June 2016 (UTC)
I cannot source this right now, but I am quite convinced that Unicode adheres to the ISO 15294 definition of script. Unicode does not re-define "script", there is no definition "Unicode scripts" (so bad WP article title. I advocate naming it "Scripts in Unicode"). Ans ISO-scripts are defined to consist of graphic characters.
Then, Unicode normatively defines this quality of a any code point in General Category. So by their own rules, Unicode should exclude these other code points from any script (like Zzzz). That's mainly Control characters, formatting characters. Some individual border issues might exist (SHY is not a graphic character ...).
The blanket script text in script.txt, is not normative, and should be corrected. -DePiep (talk) 08:09, 24 June 2016 (UTC)
ISO 15924 does not define what characters belong to any given script; only the Unicode Standard does that, and as the character statistics in the Wikipedia table are derived from data in the Unicode Standard, I think we should be consistent and use the implicit count for Zzzz given in the Unicode Standard. BabelStone (talk) 17:59, 24 June 2016 (UTC)
Not disputed. Still, ISO requires that they are (readible) characters, not formatting or control etc stuff. So Unicode can not add non-characters to a script (any script) that by ISO definition can only have graphic characters. -DePiep (talk) 00:24, 25 June 2016 (UTC)