Jump to content

Module talk:Tabular data

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Great

[edit]

@Mxn: this is great! Now we just need easier ways to edit tables, like Vera's work. Let me try this with this data. – SJ + 12:41, 10 May 2020 (UTC)[reply]

@Sj: I didn't realize it during the hackathon, but phab:T251759 is already well underway. Can't wait to see it live! – Minh Nguyễn 💬 19:38, 10 May 2020 (UTC)[reply]
hot dog! thanks for the link :) and the illustrative attempt here too, good practice to parse. – SJ + 00:26, 11 May 2020 (UTC)[reply]

Multiple fields

[edit]

@Mxn: what would be nice (and significantly help performance in some cases) is the ability to get multiple fields. Like how Module:Covid19Data is called on User:EProdromou (WMF)/COVID-19 case data as {{#invoke:Covid19Data|regionTable|CA|QC|<tr><td> %s</td><td> %s</td><td> %s</td><td> %s</td></tr>}}.

For example:

{{#invoke:Tabular data|lookup
|search_column=model
|search_value=XYZ
|output_column=brand,year
|format=<li> The XYZ was made by [[%s]] and released in [[%s]].
|Example.tab}}

Not sure how difficult this would be. - Alexis Jazz 06:11, 19 May 2020 (UTC)[reply]

@Alexis Jazz: Thanks for the idea! It would certainly be feasible, but if efficiency or tidiness is the primary consideration, then I think it would be even better to refine {{#invoke:Tabular data|wikitable}} or {{Json2table}} to allow for the desired format on each row or create a separate function that outputs the whole table in list form. Or were you thinking of a use case where each row would come from a different Commons data table? – Minh Nguyễn 💬 10:19, 20 May 2020 (UTC)[reply]
@Mxn: I was actually thinking of a case where two (or more) fields from the same row are needed. A template like this one has to do two lookups for the same row, only to return a different field each time. This increases the page preview/rendering time and load on the Wikimedia servers. - Alexis Jazz 03:11, 21 May 2020 (UTC)[reply]
@Alexis Jazz:  Done, although I'd expect that large or complex tables or lists would be better served by a custom Lua function that interacts with the tabular data directly, since that also affords more control over formatting and allows lookups to be reused. – Minh Nguyễn 💬 05:14, 21 May 2020 (UTC)[reply]
@Mxn: Thanks, I also updated {{Tabular query}}. I've noticed though that [1] seemingly took almost a second to preview after I updated the module where it was about half a second before. This was before I updated that template to use the new functionality. Now that I've updated it, it's back taking half a second where I was hoping to get the preview/rendering time down to about 0.3 seconds. - Alexis Jazz 06:31, 21 May 2020 (UTC)[reply]
I see the performance has increased, down to about 0.4 seconds now. - Alexis Jazz 08:47, 1 June 2020 (UTC)[reply]

Performance

[edit]

Recently @Johnuniq: developed {{NUMBEROF}} which uses c:Data:Wikipedia statistics/data.tab generated by GreenC bot. One of the issues we ran into was performance, because each time the template is invoked, the Commons file is retrieved via mw.ext.data.get() which is slow. List of Wikipedias had over 4,000 invocations which exceeded Lua's 10 second time and rendered red errors. @Pppery: suggested a solution to load the Commons file 1 time per page but mw.ext.data.get() does not support this, however mw.loadData() does. So the mw.ext.data.get() is used in {{NUMBEROF/data}} which is then loaded by mw.loadData() in {{NUMBEROF}}. It works to ensure the file from Commons is loaded 1 time regardless of how often the module is invoked on a page. Is this an issue with this module? Should we recommend readers to use {{NUMBEROF}} vs. this template, since it is being used as an example? -- GreenC 02:26, 24 May 2020 (UTC)[reply]

Module:NUMBEROF/data is easily able to provide a cache of the Commons data because the module was written specifically for that data format, and with knowledge of what was wanted by the main module. To do that more generally would be tricky. Using hundreds of calls to Module:Tabular data would consume a lot of resources. If that is ever required, I would think a workable solution would require a custom module like Module:NUMBEROF/data. Re the question: yes, {{NUMBEROF}} should be used although I suppose the example in the docs here was intended to show this module's flexibility. I think the docs should include a "but see {{NUMBEROF}}" note. Johnuniq (talk) 03:43, 24 May 2020 (UTC)[reply]
@GreenC: This module is intended to serve a variety of use cases generically, so it's different than {{NUMBEROF}}, but I added a "See also" link to that template, just in case. This module provides {{#invoke:Tabular data|wikitable}} for situations where the entire table is needed on a given page, as opposed to a lookup of a few values. That function could be made more flexible, along the lines of {{Json2table}}, but I think ultimately any use case that requires looking up a lot of values from the same Commons table and including the results on the same page warrants a dedicated Lua module to build that entire portion of the page. Then caching wouldn't be so relevant, because the Commons table would only get loaded once anyways. – Minh Nguyễn 💬 22:44, 25 May 2020 (UTC)[reply]

Getting row or column data

[edit]

Just an idea, not sure about the technical feasibility. Similar to getting cell value, is it possible to get the column values or row values? Output shall be csv(or some delimited values) in place of a value. One of the usage I'm looking for is to use in {{Graph:Chart}} as data series.- Timbaaa -> ping me 13:31, 14 July 2020 (UTC)[reply]

@Timbaaa: That's definitely feasible, though it might be easier to integrate something with {{Graph:Lines}}, which is already pretty usable with tabular data, as seen in COVID-19 pandemic in the San Francisco Bay Area#Cases by county over time. It would look pretty similar to the existing _wikitable() function, but just the part that collects the titles of the elements in data.schema.fields. If you're planning to use this functionality inside a module instead of directly inside a template or article, I'd suggest working with mw.ext.data.get(…).schema.fields directly so you have maximum control over formatting. – Minh Nguyễn 💬 19:46, 19 September 2020 (UTC)[reply]

Search as Number

[edit]

Great job!

For some reason, it doesn't work for me. For example, a request like this:

{{#invoke: Tabular data | lookup | COVID-19 Slovenia cases per capita.tab | search_value = 261 | search_column = cases | output_column = name}}

returns an empty string instead of "Ajdovščina".

Help me please.— Preceding unsigned comment added by Игорь Темиров (talkcontribs) 19:14, 8 November 2020 (UTC)[reply]

There's no 261 in the cases column of c:Data:COVID-19 Slovenia cases per capita.tab. The cases value for Ajdovščina is 2204. — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  01:05, 24 June 2021 (UTC)[reply]

Search more than 1 column

[edit]

Would it be possible to make it search two (or more) columns?
I.e.:
{{#invoke:Tabular data|lookup|Page name.tab|search_value=|search_column=|search_value2=|search_column2=|...|...|output_format=}}
E.g.:
{{#invoke:Tabular data|lookup|UN:Total population, both sexes combined.tab|search_value=Afghanistan|search_column=Country|search_value2=1950|search_column2=Year|output_column=Value}}

𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  01:00, 24 June 2021 (UTC)[reply]

I've created Module:Tabular_data/sandbox with a function to try and handle the second search requirement. It doesn't work. However, I can't get the existing module to return data from c:Data:UN:Total population, both sexes combined.tab.
{{#invoke:Tabular data|lookup
|search_column=date
|search_value=2020-03-16
|output_column=totalConfirmedCases
|COVID-19 cases in Santa Clara County, California.tab}}

Lua error in Module:Tabular_data at line 48: Output column “totalConfirmedCases” not found..

{{#invoke:Tabular data|lookup
|search_value=Afghanistan|search_column=Country 
|output_column=Value
|UN:Total population, both sexes combined.tab}}

40754.388

{{#invoke:Tabular data/sandbox|lookup
|UN:Total population, both sexes combined.tab 
|search_value=Afghanistan|search_column=Country 
|output_column=Value}}

40754.388

{{#invoke:Tabular data/sandbox|lookup2
|UN:Total population, both sexes combined.tab 
|search_value=Afghanistan|search_column=Country 
|search_value2=1950|search_column2=Year 
|output_column=Value}}

7752.118

{{#invoke:Tabular data/sandbox|lookup2
|UN:Total population, both sexes combined.tab 
|search_value=Zambia|search_column=Country 
|search_value2=2020|search_column2=Year 
|output_column=Value}}

18383.955

What am I missing? Could it be the page name with a colon that is invalid? —  Jts1882 | talk  13:41, 30 June 2021 (UTC)[reply]
There were a few issues:
  1. The page name is a numbered parameter so should be trimmed. The sandbox does this now and the non-sandbox example above is edited to remove white space and linefeeds.
  2. The search comparisons assume string values. The population data has numbers so these need to be converted before comparing (as done in the sandbox) or the module modified to use the types (more involved).
Anyway, this shows how the data in the data page at commons can be retrieved. The two examples (Afghanistan 1950 and Zambia 2020) get the right numbers. —  Jts1882 | talk  16:32, 30 June 2021 (UTC)[reply]
Great! Thanks, Jts1882.
I wrapped it (that particular application) in a template, and started testing it out here. It works, but runs out of time limit pretty quickly. Could it be made more efficient? — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  02:46, 1 July 2021 (UTC)[reply]
You probably don't need ustring (do you?), and string will do the trick, but I don't see that making a big difference there.
One option is specifying the columns by number so the module doesn't have to search for them by name. Inconvenient, but... faster. (Then, again, with only 3 columns on that table, I don't see it making much of a difference). — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  02:55, 1 July 2021 (UTC)[reply]
Oh, I see, you have to pull down the whole table at every call:
local data = args.data or mw.ext.data.get(page)
Yeah, it ain't small (it's at the 2MB limit). What's the alternative; slicing it into a different table for each year? Any batch proc for doing that? — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  07:40, 1 July 2021 (UTC)[reply]
That was a concern for memory usage but I think that is cached. I did a few tests on a blank page and the memory usage didn't increase dramatically when calling the template multiple times. It's clearly processing time which goes up with the number of calls. There is no noticeable difference between Afghanistan and Zambia so the looping is fast (as expected). It's still possible mw.ext.data.get() is responsible, even if not loading each time, as dealing with the cache might take time. It needs some more tests.
Incidentally your template seems to have considerable overhead, both doubling the time and causing a high expansion depth. Using invoke gets the time down to just over 100ms each call. —  Jts1882 | talk  08:00, 1 July 2021 (UTC)[reply]
I've created a function that does the bare minimum (p.lookup2_minimal(), line 208). It doesn't reduce the time substantially (100-150ms depending on run). So I think this sort of template can only be used safely about 50 times on a page.
For generating tables, the alternative is a module that takes a list of countries like Module:Country population. A lot more work, but you can get it to do exactly what is needed with appropriate options. —  Jts1882 | talk  09:05, 1 July 2021 (UTC)[reply]
Incidentally your template seems to have considerable overhead, both doubling the time and causing a high expansion depth. Using invoke gets the time down to just over 100ms each call.
Can you pinpoint what it is? I invoke the module once if the given date parameter is a year, and twice to interpolate values for a specific date (and once again before those to verify that the country is on the table). What do you reckon needs to be done to reduce the overhead, Jts1882? — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  08:30, 3 July 2021 (UTC)[reply]
It looks like it's invoked twice if given date parameter is a year and three times if a specific date. It's invoked once for the test and then once or twice for the output. Is the test necessary? I've removed it in the template and the output in the documentation is the same, but takes less processing time (a bit more than half). —  Jts1882 | talk  09:01, 3 July 2021 (UTC)[reply]
I saw that a large chunk of the transclusion expansion time was from {{density}}, so I replaced the call to {{convert}}, which in turn invokes a very general and flexible module, by a simple calc only for km2 and sqmi. {{Density}} is still at 77% of transclusion expansion time (not sure how those pcts add up, with {{UN pop}} at 65%), and Draft:List of countries and dependencies by population density is still timing out Lua (it seems) at the 14th table row (Jersey). I see that 93% of the Lua time is consumed by Scribunto_LuaSandboxCallback::get -- what's that? Cheers. — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  12:24, 3 July 2021 (UTC)[reply]
I assume that is the function behind mw.ext.data.get().
There is something odd in Draft:List of countries and dependencies by population density. While it starts giving timeout error in the 14th line, other lines display without errors down to line 59. What makes those lines avoid the timeout? —  Jts1882 | talk  13:46, 3 July 2021 (UTC)[reply]
Yeah, I noticed that. Bloody good question. I sort of assumed the invoke queue doesn't follow the order in the code. — 𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚  14:23, 3 July 2021 (UTC)[reply]

Null value and output format error

[edit]

If the lookup points to a null value cell and there is an output format, it gives an error.

A{{#invoke:Tabular data|lookup
|search_column=date
|output_column=hospitalized
|search_value=2020-01-27
|COVID-19 cases in Santa Clara County, California.tab}}B

AB

A{{#invoke:Tabular data|lookup
|output_format=There are %d people
|search_column=date
|output_column=hospitalized
|search_value=2020-01-27
|COVID-19 cases in Santa Clara County, California.tab}}B

ALua error in Module:Tabular_data at line 70: bad argument #2 to 'format' (no value).B

Could you fix this, please? Bean49 (talk) 20:05, 23 January 2024 (UTC)[reply]

Could be more output columns and only one null. %d out of %d users are administrators Bean49 (talk) 20:24, 23 January 2024 (UTC)[reply]