Wikipedia:Bot requests

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This is a page for requesting tasks to be done by bots per the bot policy. This is an appropriate place to put ideas for uncontroversial bot tasks, to get early feedback on ideas for bot tasks (controversial or not), and to seek bot operators for bot tasks. Consensus-building discussions requiring large community input (such as request for comments) should normally be held at WP:VPPROP or other relevant pages (such as a WikiProject's talk page).

You can check the "Commonly Requested Bots" box above to see if a suitable bot already exists for the task you have in mind. If you have a question about a particular bot, contact the bot operator directly via their talk page or the bot's talk page. If a bot is acting improperly, follow the guidance outlined in WP:BOTISSUE. For broader issues and general discussion about bots, see the bot noticeboard.

Before making a request, please see the list of frequently denied bots, either because they are too complicated to program, or do not have consensus from the Wikipedia community. If you are requesting that a template (such as a WikiProject banner) is added to all pages in a particular category, please be careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively (see example difference). If your task involves only a handful of articles, is straightforward, and/or only needs to be done once, consider making an AWB request at WP:AWBREQ. If it might be solved with a SQL query, try a request at WP:SQLREQ. URL changes may seem deceptively simple, however a search-replace of text within a URL is not advised due to archive URLs and other issues. A number of bots specialized for URL work can be notified for requests at WP:URLREQ.

Note to bot operators: The {{BOTREQ}} template can be used to give common responses, and make it easier to keep track of the task's current status. If you complete a request, note that you did with {{BOTREQ|done}}, and archive the request after a few days (WP:1CA is useful here).

Please add your bot requests to the bottom of this page.
Make a new request

Pages using deprecated image syntax[edit]

Hello. I was wondering if there was a bot that could go through the backlog of Category:Pages using deprecated image syntax. I have zero bot experience, and I would rather leave it to the experts :) I don't think this falls under any of the commonly requested bots either. Thanks! --MrLinkinPark333 (talk) 17:29, 27 July 2019 (UTC)

There's roughly three types of pages in this category:
  1. Images used without any additional metadata, like |image=[[File:Example.png|thumb]]. These can be easily fixed by removing the excess markup.
  2. Images used with additional information, like |image=[[File:Example.png|thumb|175px|Logo for Example]]. This additional markup should not be removed automatically, but should instead be moved to the appropriate parameters in the template.
  3. Pages where markup is used to do something more complicated, like displaying two images side-by-side. As far as I can tell, there isn't really anything to fix here.
The other problem is that the various infobox templates are not always consistent with their parameters. Module:Infobox supports upright scaling, but not all infoboxes have been updated with the correct parameter. Some infoboxes have multiple different image fields (image, logo, seal, flag), while others alias them together into one. The Type 1 pages could be fixed pretty easily, but the Type 2 ones may have more issues requiring testing and human review. --AntiCompositeNumber (talk) 18:44, 27 July 2019 (UTC)
@AntiCompositeNumber: Hmm. That's complex. Would it be useful to go through #1 only with a bot then #2 done manually? Also, why would #3 be there in the category if nothing needs to be fixed? --MrLinkinPark333 (talk) 19:25, 27 July 2019 (UTC)
@MrLinkinPark333: Currently, Module:InfoboxImage basically says "If the image paramater starts with [[ and it isn't a thumbnail, then it's deprecated image syntax." (Thumbnails get put in their own category). As a first step, it might be a good idea to have the module categorize image parameters with multiple files differently. AntiCompositeNumber (talk) 21:19, 27 July 2019 (UTC)
Is this just a cosmetic change? If it is adding it as a default AWB fix may be a better way to fix this as that would couple it with more substantial edits. --Trialpears (talk) 19:50, 27 July 2019 (UTC)
@Trialpears: I'm just more interested in finding a way to take a chunk out of the backlog. If AWB is more suitable, feel free to point me that way :) --MrLinkinPark333 (talk) 20:12, 27 July 2019 (UTC)
MrLinkinPark333 I've been looking a bit and it seems all standard AWB fixes are only made outside of templates, so it can't just be added to Wikipedia:AutoWikiBrowser/Typos wouldn't work. I suggest asking at WT:AWB or the AWB phabricator. If the edit isn't considered cosmetic I would happily make the bot. Pinging @WOSlinker: since they created the module and can probably answer the cosmetic edit question. --Trialpears (talk) 23:03, 12 August 2019 (UTC)
I created the module and added the tracking category but the category added at the request of User:Zackmann08, see Module talk:InfoboxImage/Archive 1#Pages_using_deprecated_image_syntax -- WOSlinker (talk) 11:57, 13 August 2019 (UTC)
I had a brief look at this. There is a problem with {{Infobox election}} which uses two or more side-by-side images. They are generally set with size of 160x160 or 150x150. This will give the images the same height (as they are portraits) and the two columns will be, potentially different widths. This works reasonably well, though identical aspect ratios would be better. Unfortunately using the recommended syntax we have the upright scaling factor, which only scales width. It's trivial to calculate this factor if constant width was what we required, e.g. 160/200 = 8/11 = .727272... But the factor we actually want will require recovering the aspect ratios from the file pages. Still not hard, though not within the scope of AWB-like tools. Unfortunately this is not very user friendly: if someone changes an image, or uses our page as a cut-and-paste basis for something new, they will need to know how to calculate the necessary number. All the best: Rich Farmbrough, 02:00, 3 October 2019 (UTC).

Stub sorting[edit]

When an article has multiple stub templates, say {{X-stub}} and {{Y-stub}} and if there exists a template {{X-Y-stub}} or {{Y-X-stub}}, a bot should replace the two stub templates with the combined one. SD0001 (talk) 14:18, 17 August 2019 (UTC)

Example edit. Existing stubs tags should be removed from wherever they are and the new tag should be placed at the very end of the article with 2 blank lines preceding it per WP:STUBSPACING. SD0001 (talk) 14:26, 17 August 2019 (UTC)

SD0001, the number of permutations of name combinations seems huge. Using a brute-force method, the bot would need a list of every category on Wikipedia and then for each article, it would generate every possible permutation of combined names for each category in that article, and check each one against every category name on Wikipedia - sort of like cracking a safe by trying every possible combination. Can you think of a better way to narrow it down? -- GreenC 13:23, 22 August 2019 (UTC)

You could start with a list of templates of the form Foo-Bar-stub, check whether Foo-stub and Bar-stub both exist, and consider editing articles which use both. Beware of false positives: {{DC-stub}} plus {{Comics-stub}} denotes a comic in Washington, not necessarily a {{DC-Comics-stub}}. Certes (talk) 14:39, 22 August 2019 (UTC)
@GreenC: There are about 30,300 stub templates on Wikipedia. Store their names as strings in a hash table (not an array) so that we can search whether a given string is in the list in O(1) time (rather than O(n) time). Most programming languages have built-in support for hash tables. Strictly speaking, it's O(length of string) rather than O(1), though the lengths of strings are small enough. We could use advanced data structures like ternary search tries that can be searched even faster. But of course they are very difficult to code and the use would be justified only if we had millions of strings to search from.
Additionally, there are about 3400 single-word stub templates (eg "castle-stub") for which we'd never be looking for, and hence can be removed from the list. But again this is not necessary as efficiency of search in hash table doesn't depend on the number of items in the table.
Regarding the generation of permutations: (i) if there are two stub tags, X-stub and Y-stub, there are only two permutations: X-Y-stub and Y-X stub and its really unlikely that both are available so this is an easy case. (ii) if there are 3 stub tags: X-stub, Y-stub and Z-stub, then first check the 6 all-in-one permutations: X-Y-Z, Z-X-Y etc. If not found, search for the 6 two-in-one combinations: X-Y, Y-Z, Y-X etc. If 2 of them match, add both. If 3 of them match (very unlikely) add the page to a list for human review. (iii) if there are 4 or more stub tags: (there shouldn't be that many) ignore it and add the page to a list for human review. SD0001 (talk) 15:31, 22 August 2019 (UTC)
Ok. Any thoughts on the context problem raised by Certes with {{DC-stub}} + {{Comics-stub}} != {{DC-Comics-stub}}. -- GreenC 16:20, 22 August 2019 (UTC)
I think this would be really rare given the stringent stub type naming conventions which specifically try to avoid this sort of thing. I can't think of any other such exception even though I have been stub-sorting a lot lately. Regarding the one given, clearly DC-stub and Comics-stub won't be present together on any page. So I don't think this is an issue (unless someone finds more such exceptions). SD0001 (talk) 17:05, 22 August 2019 (UTC)
The way to go here is not by analysing stub templates, but by looking at their categories to see if they have a common subcategory. For example, an article might have {{Scotland-stub}} and {{Railstation-stub}} - the former categorises to Category:Scotland stubs, the latter to Category:Railway station stubs - but if you go deep enough in the category tree, these have a common subcategory, Category:Scotland railway station stubs for which the stub template is {{Scotland-railstation-stub}}. --Redrose64 🌹 (talk) 22:17, 22 August 2019 (UTC)
That definitely sounds ideal. But I don't think it is possible because of no one-to-one correspondence b/w stub templates and categories. Example: {{Oman-cricket-bio-stub}} categorises into both Category:Omani sportspeople stubs and Category:Asian cricket biography stubs, both of which have a lot of stuff unrelated to Oman cricket bios. SD0001 (talk) 03:11, 23 August 2019 (UTC)
That is what we call an "upmerged stub template", and pretty much all of these are dead-ends as far as further specialisation goes. There won't be, for example, any decade-specific templates like {{Oman-cricket-bio-1970s-stub}} (compare {{England-cricket-bio-1970s-stub}}). --Redrose64 🌹 (talk) 23:20, 23 August 2019 (UTC)
I see. That is great. But can you think of a way to find whether two cats have a common subcat, Redrose64? SD0001 (talk) 04:35, 26 August 2019 (UTC)
Is there any indication of how many pages have multiple stub templates? Would it be possible to create a report of the most common combinations and knock those off first? Spike 'em (talk) 20:24, 22 August 2019 (UTC)
Not all articles with multiple stub templates have the potential for refinement. For example, an article such as Cheltenham High Street Halt railway station might have {{SouthWestEngland-railstation-stub}} and {{Gloucestershire-struct-stub}}, which categorise to Category:South West England railway station stubs and Category:Gloucestershire building and structure stubs respectively - they have no common subcategory, so no further refinement may be preformed by a bot. --Redrose64 🌹 (talk) 22:17, 22 August 2019 (UTC)
BTW, just discovered that there used to be a bot long back approved for this task. That bot also did resortings based on categorisation and infoboxes (manually triggered by op for each infobox/category type, I think). SD0001 (talk) 03:15, 23 August 2019 (UTC)
Another complication templates can have multiple names ie. redirects. It might be safe to assume the template's primary name is what should be used but a database of redirect names mapped to primary template names would also be needed. -- GreenC 03:37, 26 August 2019 (UTC)
Redirects are very uncommon for stub templates. But if they do pop up, I don't think there's a problem whether we use the primary name or redirect name. SD0001 (talk) 14:59, 27 August 2019 (UTC)

Operator to take over Legobot Task 33[edit]

See discussion at Wikipedia:Bots/Noticeboard#User:Legobot Request. Legoktm is no longer taking feature requests for User:Legobot (just keeping the bot alive), specifically at WP:GAN. Since Legobot runs many important tasks, it would be helpful if a new operator would be willing to take over control and maintenance of the tasks Legobot performed, either as a whole or as a subset of the task (i.e. only WP:GAN tasks). Legoktm mentioned they are happy to hand off the task(s) to another operator. Anyone interested? « Gonzo fan2007 (talk) @ 16:04, 19 August 2019 (UTC)

Gonzo fan2007, a similar request was made here this past February 6 by Mike Christie: Wikipedia:Bot requests/Archive 77#Take over GAN functions from Legobot, which also has a great deal of information about the work likely involved and a number of the known bugs. Pinging TheSandDoctor and Kees08, who were active in that discussion; the final post was from TheSandDoctor, who had been working on new code and checking the GAN database made available by Legoktm, on June 25. I believe Wugapodes has expressed some interest in further GAN-related coding (they took over the /Reports page last year), though I don't know whether they had this in mind. BlueMoonset (talk) 16:29, 19 August 2019 (UTC)
Thanks for the background discussion BlueMoonset. « Gonzo fan2007 (talk) @ 16:32, 19 August 2019 (UTC)
Looking into the existing code, I agree that the best course is probably a port from PHP to a new language so TheSandDoctor's work so far is probably a good starting point. I don't know PHP at all and the database uses SQL which I don't know, so I am probably not a great candidate for taking this task on. I'm willing to help out where I can because this is a big task for anyone, but I'm pretty limited by my lack of knowledge of the languages. Wug·a·po·des​ 18:43, 19 August 2019 (UTC)
Occasionally bug reports are posted at User talk:Legobot or User talk:Legoktm concerning user talk page notifications suggesting that a GA nom has failed whereas the reality is that it passed. These seem to be second or subsequent attempts at putting a page through GA after the first failed. Looking at some of the bot source, I find SQL code to create tables, to add rows to those tables - but little to update existing rows and nothing to delete rows that are no longer needed. --Redrose64 🌹 (talk) 22:14, 19 August 2019 (UTC)
Speaking as an end user (and not a bot operator) and as the person requesting the bot, if User:Legobot failed at WP:GAN, there would be significant disruptions to the project. Its GA work is completely taken for granted. I think that the preference would be for a new bot to take on just the GA tasks (note that Legobot has other active tasks). It would appear based on my review and a look back at past comments that this would include:
  1. Updating Wikipedia:Good article nominations after {{GAN}} has been added to an article talk page, or if a step in the review process has changed (on-hold, failed, passed, etc)
  2. Notifying nominators of new status of reviews (begin review, on-hold, failed, passed, etc)
  3. Adding {{Good article}} to promoted articles
  4. Update individual topic lists at Wikipedia:Good article nominations/Topic lists
  5. Updating User:GA bot/Stats
  6. Adding |oldid= to {{GA}} when missing (Legobot Task 18)
As previously mentioned, it would also be beneficial to fix some bugs and streamline the process. I'm not sure if it is preferable to go this way, but maybe if a bot owner wants to take this on, that we work on slowly weaning User:Legobot from GA tasks, instead of trying to completely replace it in one shot. As an example, sub-task 3, 5, and 6 are fairly straightforward items (in my limited understanding of coding) and could probably be submitted to WP:BRFA as individual tasks. That way, as individual sub-tasks are brought on-board, we (the end users) could work with the new bot owner to ensure each process is working smoothly. It would be wonderful if a naming structure like User:Good Article Bot (similar to User:FACBot) or something similar could be utilized to specialize the process. Just my input and thoughts on how to go about this. Obviously need an interested party first; I am happy ot assist with manual updating of pages and working through the new process. « Gonzo fan2007 (talk) @ 23:12, 19 August 2019 (UTC)
Before Legobot took over the tasks by taking over the code base, the bot handling the GAN page was known as GAbot, run by Chris G (who I think got the code base from someone else). I'm not sure how easy it would be to peel off some but not all of the update tasks into a new bot while leaving Legobot with the rest; someone who's looked at the code would have a better idea of how to turn off parts of the GAN code (if it can be) as the new bot is activated piece by piece. The one thing that has been long requested that isn't covered above is the ability to split topics into more subtopics. I didn't see that this was a part of the SQL database—there didn't seem to be a table there for topics and their subtopics—so perhaps if someone can take a dive into the code they can figure out how the bot makes those determinations and therefore what modifications we would need to make. Just a thought. (And believe me, Legobot's GAN work is not taken for granted; we've had a few outages over the years that have been extremely disruptive, but Legoktm has been able to patch things together.) BlueMoonset (talk) 05:19, 20 August 2019 (UTC)
Perhaps Hawkeye7 would be interested in expanding out FACbot to include GAbot functionality (depending on Sand Doctors progress etc). Kees08 (Talk) 15:31, 20 August 2019 (UTC)
I had considered it in the past. There are various technical issues here though. Like the others I am not too familiar with PHP or Python (I normally write bot code in Perl and C#) although I do know SQL well. (No deletions is a bad sign; if true it means that the database will continue expanding until we run into limits or performance problems.) The Legobot runs frequently and having another bot performing tasks could result in causing the very problems we have been discussing. Shutting it down is guaranteed to be disruptive and any full replacement is likely to be buggy for a while. (I would personally appreciate a bot updating Wikipedia:Good articles instead of a reviewer having to do it.) Hawkeye7 (discuss) 19:55, 20 August 2019 (UTC)
BlueMoonset, as long as User:Legobot is {{nobots}} compliant, we could fairly easily exclude Legobot from editing specific pages for tasks 1, 4, and 5 from the list above. Task 6 is also a separate Legobot task, so presumably this is separate coding from other GA-related tasks (and could more easily be usurped by the new bot). We could also develop mirrored pages that would allow the new bot to edit concurrently with Legobot for a certain time until all tasks are running smoothly. « Gonzo fan2007 (talk) @ 20:01, 20 August 2019 (UTC)
@BlueMoonset, Hawkeye7, TheSandDoctor, Wugapodes, Legoktm, Kees08, and Mike Christie: any additional ideas or information to add? I would be especially interested to hear from TheSandDoctor on their status, if any. « Gonzo fan2007 (talk) @ 21:01, 26 August 2019 (UTC)
Hello @Gonzo fan2007, BlueMoonset, Hawkeye7, Wugapodes, Legoktm, Kees08, and Mike Christie:. My apologies for my delayed response - I am quite busy "in the real world" at the moment unfortunately. I currently have a GitHub repo relating to this, but haven't been able to dedicate the time required, nor has DatGuy. If someone is willing to assist with this, I would be quite open to the idea of another hand to help out. Most - if not all - of the existing PHP code has been translated/updated to Python, but I have not been able to test it as of yet. It might be ready, but I simultaneously think that it needs further tests of sorts prior to filing a BRFA (thus allowing for testing). --TheSandDoctor Talk 05:25, 28 August 2019 (UTC)

──────────────────────────── @TheSandDoctor: Oh wow this is great! If all we need is to test it some I could probably do that over the next couple weeks. I'll make a pull request if I need to make changes to get it working. Wug·a·po·des​ 05:43, 28 August 2019 (UTC)

@TheSandDoctor: thanks for the update! Appreciate the work you have done so far. Let me know if you need any assistanc. « Gonzo fan2007 (talk) @ 16:07, 28 August 2019 (UTC)

Bot to fix gazillions of formatting errors in "Cite news" templates[edit]

Recently, apparently due to some change in the way the Template:Cite news works, it is no longer permissible to italicize the publisher name in the "publisher=" parameter. There are therefore now countless articles with error messages in the footnotes saying "Italic or bold markup not allowed in: |publisher=". It would be nice, therefore, if a bot could sweep through and remove the preceding and following '' (and, I guess, ''' or ''''') formatting occurring in these "publisher=" parameters. bd2412 T 01:24, 10 September 2019 (UTC)

It is this: Category:CS1 errors: markup. 39k pages. -- GreenC 01:51, 10 September 2019 (UTC)
@BD2412: Wikipedia:Bots/Requests for approval/DannyS712 bot 61 - not just for cite news DannyS712 (talk) 01:56, 10 September 2019 (UTC)
WP:Bots/Requests for approval/Monkbot 14. --Izno (talk) 02:13, 10 September 2019 (UTC)
Good, I'm glad someone is doing this. I started doing it manually and calculated it would take about twenty years by hand. bd2412 T 02:14, 10 September 2019 (UTC)

List of Wikipedians by article count[edit]

Hi. This page has been dormant for two years. Would it be possible to re-activate this list so it is updated daily, much like the edits list? I contacted the bot owner who used to do this some time ago, and they suggested I try here. Nothing fancy with this, same as before with a basic count of number of pages and number of redirects. Thanks. Lugnuts Fire Walk with Me 12:47, 16 September 2019 (UTC)

Should we update daily or weekly? --Kanashimi (talk) 22:34, 18 September 2019 (UTC)
@Kanashimi: ideally daily, if possible, but a weekly update would be better than nothing. Thanks. Lugnuts Fire Walk with Me 08:18, 19 September 2019 (UTC)
Hrm, is it really a (socially) good idea to have such a "ranking" of Wikipedians? Jo-Jo Eumerus (talk, contributions) 08:37, 19 September 2019 (UTC)
@Jo-Jo Eumerus: - I can't really answer that, but the sister list of users by edits is updated daily. Lugnuts Fire Walk with Me 07:52, 20 September 2019 (UTC)
Yeah, if memory serves some of the people on that list have been causing problems while pursuing ever increasing edit counts. That's a big part of the reason why I am concerned with the existence of both lists. Jo-Jo Eumerus (talk, contributions) 08:10, 20 September 2019 (UTC)
@Jo-Jo Eumerus: Some of the people in higher positions use AWB to achieve their high counts. Some of these have done things like add non-existent WikiProject banners to large numbers of talk pages; and have objected when I have asked them to preview their edits, claiming that they "don't have time". They apparently also don't have time to read WP:AWBRULES no. 1. --Redrose64 🌹 (talk) 19:39, 20 September 2019 (UTC)
What purpose does it serve to update this daily? Leaky caldron (talk) 08:30, 20 September 2019 (UTC)
@Leaky caldron: Wikipedia:List of Wikipedians by number of edits/1–1000 was updated weekly (on Wednesday mornings European time, late Tuesdays American time) until 25 June 2014. Then it didn't update for several weeks - and when it resumed on 30 July 2014, the update frequency became daily. MZMcBride (talk · contribs) is the person to ask as to why it was changed. --Redrose64 🌹 (talk) 19:18, 20 September 2019 (UTC)
I'm not interested why it was changed. I am only keen to know what purpose it serves to provide a daily running total of articles created. Leaky caldron (talk) 19:34, 20 September 2019 (UTC)
As I noted, that's one for MZMcBride - presumably somebody said "please make it daily". --Redrose64 🌹 (talk) 19:38, 20 September 2019 (UTC)
"Nothing fancy with this, same as before with a basic count of number of pages and number of redirects" -- Is this actually a basic problem eg. a simple API call that responds quickly without using too many resources. And why did the previous bot stop working, was it too hard to keep up and running and too many complications. -- GreenC 14:49, 20 September 2019 (UTC)
According to the old page it worked thusly: "For every page currently in the article namespace on the English Wikipedia, the author of the earliest revision of that page is queried. This information is aggregated by author name". From recent experience (yesterday), querying every page on enwiki (nearing 6 million) with a low-byte-count API call takes upwards of 15 days to complete. It could be done faster by requesting more than 1 article per query. Still, this is a pretty significant amount and probably shouldn't be done more than once a month or less to be kind on resources, not done at max speed if possible. It could also run on the Toolforge Grid so it doesn't use up bandwidth sending data to a remote location. -- GreenC 15:05, 20 September 2019 (UTC)
We may cache the result to avoid query again since the creator of article won't change. --Kanashimi (talk) 00:11, 21 September 2019 (UTC)
That would miss the deletion of older articles and redirects, wouldn't it? bd2412 T 00:28, 21 September 2019 (UTC)
Perhaps we can cache deleted articles (query deleted articles after lastest operation only) as well. --Kanashimi (talk) 01:49, 21 September 2019 (UTC)
BTW my mistake, the creator requests for the full list wouldn't be 15 days but probably only a few hours because it can retrieve up to 5000 per query (the other project I was working on had to be 1 at a time which wouldn't be the case here). -- GreenC 00:56, 21 September 2019 (UTC)
Thanks for everyone's input and comments. I can't comment on performance/resource issues with compiling the info. All that I know is that it was updated daily, but that ceased two years ago. If daily is not possible, then a weekly or even a monthly update would be better than the current situation. Thanks again. Lugnuts Fire Walk with Me 11:30, 21 September 2019 (UTC)
I agree with that. bd2412 T 13:23, 21 September 2019 (UTC)

Coding... -- GreenC 14:30, 21 September 2019 (UTC)

@GreenC: Well... I have already started writing the code... I will stop coding, think you for fast response. However, I think using database will be a good idea than using API. Just a idea. --Kanashimi (talk) 21:51, 21 September 2019 (UTC)
@Kanashimi: I am about 80% done because a lot of the code is repurposed from other projects, why I am using the API. But if you like to continue coding, please go ahead. If you run into trouble or decide to not pursue it let me know and I will pick it up. There is plenty of other work for me to do on other projects. Also I was not worrying about deleted articles, it seems impractical to track deleted articles from the beginning of Wikipedia. Also so many of the articles started out as short stubs or redirects, then someone else made them into longer articles, it's unclear what the list is really showing. Not sure how to address it. -- GreenC 22:21, 21 September 2019 (UTC)
Thanks a lot for your comments. I also have other coding works and will not finish this task soon. I think you will fast than me. About the problem of article quality and author contribution, we may count the words of article. But this will significantly increase the burden of query, and not a absolute standard. --Kanashimi (talk) 22:45, 21 September 2019 (UTC)

Oh yeah, people used to get upset that the article count report included redirects and bots in the rankings. Good times. --MZMcBride (talk) 01:10, 22 September 2019 (UTC)

@Kanashimi and Lugnuts: The program (pgcount ie. page count) is running. It will take a while (week?) because it is the first run building a cache. And I thought it might go faster and retrieve 5000 (or 500) at a time, but the API doesn't support multiple articles for revision information, only 1 article per request. Will post again when it starts posting the tables. Future cached-based runs should go much faster. Have not decided how often it will run, depending how long the cache runs take. It might actually run faster and use less resources to run daily since the diffs are less and the cache hits higher, will see. -- GreenC 17:02, 22 September 2019 (UTC)

Superb - thanks for your work and the update. Lugnuts Fire Walk with Me 17:07, 22 September 2019 (UTC)
Good. If I have time, I may try a database version. I guess it may be faster than API version and even do not need cache. --Kanashimi (talk) 22:40, 22 September 2019 (UTC)
@Kanashimi: Being discussed at WP:SQLREQ. -- GreenC 20:59, 23 September 2019 (UTC)
@GreenC: - Do you have a progress update? Thanks. Lugnuts Fire Walk with Me 19:24, 29 September 2019 (UTC)
@Lugnuts: it finished creating the cache today, took about 7 days. It will rebuild the cache every so often to account for username renames. Normal runs will finish quicker, but it won't run every 24hrs. It will not track redirects for now for a couple of reasons. -- GreenC 21:14, 29 September 2019 (UTC)
Excellent work - thank you! Lugnuts Fire Walk with Me 06:54, 30 September 2019 (UTC)

Y Done -- GreenC 02:02, 22 October 2019 (UTC)

Remove Template:Expert needed when placed without reason[edit]

Needs wider discussion

From {{Expert needed}} documentation: Add |talk=, |reason=, or both. Uses of this template with neither may be removed without further consideration.

These maintenance tags appear to have been fairly widely bombed onto articles without adequate explanation. Would be fine if it only attended to tags older than 3 months.

Also remove multiple issues if only one remaining maintenance template. –xenotalk 18:03, 30 September 2019 (UTC)

Where is the consensus/discussion about mass removal? Headbomb {t · c · p · b} 18:37, 30 September 2019 (UTC)
Could you suggest a good venue for that? WT:Maintenance maybe? –xenotalk 18:52, 30 September 2019 (UTC)
Anywhere that's well advertised. An RFC at WT:Maintenance would work. Cross-posted to WP:VPR and Template talk:Expert needed if it's there. But an RFC directly at Template talk:Expert needed with notices at the other places seem more natural. Headbomb {t · c · p · b} 19:03, 30 September 2019 (UTC)
Yes, that works. Will be back, thank you for the suggestion. –xenotalk 19:29, 30 September 2019 (UTC)

User:xeno: there are less than 5,000 instances of {{Expert needed}}. I ran a quick script and the majority have no |reason= or |talk=. And those that do are mostly of little value, for example

  • "Article needs help"
  • "horribly disjointed"
  • "Expert needed on early 20th century Russian theater"
  • "Expert on science needed"
  • "Needs expansion and more details"
  • "Ambiguous and rambling"
  • "There needs to be more information about this person"

I don't think removals are a good idea. It would wipe out most of them, and those that remain are mostly like the above of little value. Whatever the docs say, in practice the reason/talk field is optional and casual. -- GreenC 01:55, 22 October 2019 (UTC)

@Xeno: again in case the split-line post doesn't trigger a notification. -- GreenC 02:01, 22 October 2019 (UTC)
GreenC: Thanks for the stats. In light of this, I would argue they’re almost all of little value then but perhaps this should be taken up at TfD. Are you able to provide any insight into whether anyone is actually addressing these tags? What is the average age of the tag, for example? –xenotalk

Using a sample of 3500 cases:

Count of {{expert needed}} by year
  • 648 2009
  • 529 2008
  • 468 2011
  • 320 2010
  • 244 2015
  • 241 2018
  • 208 2016
  • 206 2017
  • 206 2012
  • 175 2013
  • 151 2014
  • 106 2019
  • 28 2007

This is a little off because this example shows a date of February 2009, when a bot added a date, but the template itself was added in January 2006 - it took 3 years for a bot to find and add a date. The template was created in 2006 so the missing entries for 2006-2007 are probably dated in 2008 and 2009 when the date bots ran. Most of them have dates right now so the bots seem to be keeping up more recently (80 out of 3500 are missing dates). There is no [easy] way to count how many were resolved/removed, it would require downloading full dumps every month going back to 2006 and grepping through them, a major project and resources. @Xeno: -- GreenC 16:18, 22 October 2019 (UTC)

Replacement for User:UTRSBot[edit]

UTRSBot has been down for some time, and its maintainer is inactive. A discussion at WP:BOTN seemed to indicate the code was hosted at github and it maybe wouldn't be that hard to replace it. I think this is very important as the bot provided a level of transparency for the UTRS process that is now entirely absent. Beeblebrox (talk) 21:27, 4 October 2019 (UTC)

Beeblebrox, I've got some ideas on this - I'm hoping to start work on it tomorrow. SQLQuery me! 02:40, 5 October 2019 (UTC)
Awesome! Beeblebrox (talk) 17:07, 5 October 2019 (UTC)
TParis responded, btw. --Izno (talk) 17:29, 5 October 2019 (UTC)

WikiProject NRHP project tracking tables and maps[edit]

File:NRHP Articled Counties.svg, one of the four maps displayed at wp:NRHPPROGRESS which would be updated

The WikiProject NRHP has an extensive javascript-based system supporting wp:NRHPPROGRESS, which is a Wikipedia-space work status report with maps. There are related programs run occasionally by User:Magicpiano and/or User:TheCatalyst31 (using related User:NationalRegisterBot), but the main update is one program which those two editors plus myself can run as often as we wish, using our own computer devices. I personally would run it every night, if I could, but the main program run takes a long time to run at least on my own computer devices and during normal editing hours (and then often fails to complete, for me). And then it would further cost me about 10 minutes (if I was well-practiced and trying hard) of focused work to regenerate the four associated maps, whose production is largely but not completely automated, so I don't usually do that. There are numerous editors, however, who would be interested to see regularly (daily, i think) complete reporting, like to be able to see the impacts of their own article creation or improvement efforts reflected in the maps. Would any regular bot editor be willing to set up a more centralized system, with update runs on some server to be scheduled to run sometime late every night? Some info about how a user like me can run the main updating script and generate new graphs is written out at Wikipedia:WikiProject National Register of Historic Places/Progress/Instructions and/or reflected in discussion at wt:NRHPPROGRESS.

Also, right now the updating program is not running for me and for TheCatalyst31, apparently due to some edit tokens change which is affecting a lot of user scripts right now. Per wt:NRHP#Is anyone else having trouble with the scripts lately? and Wikipedia:Interface administrators' noticeboard#editToken --> csrfToken migration. Even if the edit token issue is fixed, there still remains more than 3,650 minutes of editor time to be saved per year, and other advantages, which could be achieved from having a centralized run set up.

The overall system has other functions, too, by the way, supported by further scripts, and also could need to be developed more in some ways now and in the future. It would be great if one or a few regular bot editors would be willing to consider setting up a daily run plus be willing to assist on some refinements. :) --Doncram (talk) 15:41, 20 October 2019 (UTC)

Not saying yes, but could you break down, functionally what this will do? Hasteur (talk) 17:16, 20 October 2019 (UTC)
A daily update would run the javascript which I can usually run by just hitting "Update Statistics" button on that page (which displays for me because my vector.js is set up to recognize it, i.e. it has "importScript('User:Magicpiano/NRBot/UpdateNRHPProgress.js');" ). I used to run a version of that located elsewhere; Magicpiano took over operating the script several years ago and has maintained that version. Last time I ran it successfully, it implemented this diff on the page, which updated various numbers. It consults all the separate county- and city-level NRHP list-articles in mainspace to do this, and its result is to update the Wikipedia-space work status page (and not to make any change in mainspace). Also a daily update would generate four new maps to replace the ones at Commons, which display on that page. To do that, I run the javascript which partly generates new map images, which I would run by hitting the other button, "Generate SVG Output", on that page (running 'User:Magicpiano/NRBot/NRHPmap.js'). That script updates Wikipedia:WikiProject National Register of Historic Places/Progress/SVG page, which has four data sections for the four maps. I would create new complete map files by copy-pasting those data sections into copies of the SVG map files, then upload those to commons to replace the files there. E.g. one is File:NRHP Articled Counties.svg, whose edit history reflects hundreds of updates, all done manually. Hopefully a bot could generate complete new files, i.e. concatenate starting part of a file, plus the updated data part, plus a closing part of file. And post to Commons. The process to do this manually is described at Wikipedia:WikiProject National Register of Historic Places/Progress/Instructions#Map update process.
I have posted this request, but Magicpiano would have to be on board too. Rarely but sometimes the script fails, like if the system of NRHP county list-articles in mainspace has been altered in some significant way, and the script then needs to be amended by them. They should be contacted and be willing to be involved, if it does fail. I would hope centralization and auto-running of the updates would help, would save time for, Magicpiano, who does many other things for NRHP coverage development. Hopefully they don't mind my initiating this request. --Doncram (talk) 18:24, 20 October 2019 (UTC)
That really doesn't tell me anything useful. How I think the script builds the page: Ex I see the National totals are made up of State totals, and State totals are made up of "subdivision/county" totals. It appears that the subdivision totals come from the linked page/section, and then from there it appears for each line: Illus is generated from the count of Non-empty Image rows, % Illus is based on the Illus count divided by Total count, Art is 1 if there is a non-redlink to the individual item, % Art is Art divided by total. Stubs is 1 if there is at least 1 Stub template on the article. NRIS is counted as 1 if {{NRIS-only}} is on the page. Start+ is 1 if the WP NRHP talk page banner lists class as not redirect, stub, unass, or blank. %Start+ is Start+ divided by Total. Unass is the total of pages that have the WP NRHP talk page banner that have an unaddressed or blank class. Not sure what Untag evaluates as, though I think it's not havving the WP NRHP banner. Net quality appears to be

rounded to the nearest tenth. If I have that right, it's relatively straight forward to build this. I want to first work at automating the table generation. Once we nail that down, we look at populating the SVG template boxes at Wikipedia:WikiProject National Register of Historic Places/Progress/SVG, and finally we can look at getting a bot provisioned at Commons to extract the data from enWiki and move it over as SVG content to Commons. Hasteur (talk) 19:09, 20 October 2019 (UTC)
S- StartPlus, St - Stubs, Un - Unassessed, Ua - Untagged, NRIS - "NRIS Only" citing, Ill - Illustrated Hasteur (talk) 19:11, 20 October 2019 (UTC)
I think your understanding is correct. About Untag, that is indeed about not having the WP NRHP banner. Perhaps the only thing you don't comment about explicitly in the calculations is incorporating the adjustments for duplicates in the totals. Duplicates occur when a NRHP site overlaps into multiple county, city, or state areas. The information about those is reflected in explicit rows, e.g. within the Alabama section there are four rows for Jefferson County: one for city of Birmingham's table, one for rest of Jefferson County, one for duplicates spanning the city-county border, and one for total. In a daily update, the bot also goes to, I guess, Wikipedia:WikiProject National Register of Historic Places/Progress/Duplicates to look up the appropriate values, e.g. for Jefferson County, AL. The duplicates' row information there doesn't change often and never changes by much, so only occasionally is the information updated by a run by User:NationalRegisterBot. (Frankly, in my personal opinion, all usage of "duplicates" rows and all related programming could perhaps be dropped. Then the NRHPPROGRESS report's state and national totals would be slightly over- or under-stated, which IMHO would not be a problem. One could still discern progress being made, and one could zero in wherever of interest. It happens to be the case that detailed treatment of duplicates was built into the system, though.)
It's great that you see how to do all this, in principle, including having a Commons bot run to produce complete map files. That alone would be a great advance, IMHO.
Would you be planning to re-program this into something different than javascript, or would you essentially move the javascript scripts over to somewhere else? I am just wondering whether Magicpiano or anyone else familiar with the javascript (not me) would still be able to have a parallel version of the scripts, which they could run independently on occasion, like if they were testing out some change to be made. I am not sure if this would matter to them, but I think they would want to know if this is being changed to a different programming language that they are not familiar with. They could well have a view about switching over or not, depending. --Doncram (talk) 00:52, 21 October 2019 (UTC)
probably unnecessary stuff about how the system works
BEGIN probably unnecessary stuff: I hope this following won't be too much; i think you may probably ignore all of this passage. But I am wondering if it would be necessary or helpful for you to understand all other programming parts of the system. FYI those include:
  • 1) stuff which supports "yellow boxes display" on each of the NRHP county and city list-article pages, for a number of editors who have that enabled in their account (maybe 10 or 15 or so editors, including User:Bubba73 and I am not sure who else). By clicking within the yellow box we get to see reduced tables, where only the relevant rows are displayed. This enables us to see which are the specific articles/rows for NRHP listings that are "NRIS-only", or that are Stubs, or that are Unassessed, or that are Untagged. The NRHPPROGRESS page informs us how many of each such type are included in any given county list-article. I don't think the daily updating bot does anything that affects the "yellow boxes display", but it is an important part of the existing system, because it allows us to find our way to articles that need to be improved.
  • 2) occasional runs of NationalRegisterBot which go through all NRHP articles and determine whether "NRIS-only" tag needs to be added or dropped for each article, and make the change where needed
  • 3) other aspects of User:NationalRegisterBot's changes which can be seen in its contributions history. These seem to be involved in updating the duplicates information, and include making changes to:
These all seem to be helpful only indirectly, and I can't point to exactly how all these results affect the NRHPPROGRESS page. (Much of this is about addressing duplicates exactly, though. IMHO, all usage of "duplicates" rows and all related programming could perhaps be dropped. Then the NRHPPROGRESS report's state and national totals would be slightly over- or under-stated, which IMHO would not be a problem. One could still discern progress being made, and one could zero in wherever of interest. However, detailed treatment of duplicates was built into the system.)
  • 4) a "Wiki-blame" type script reporting upon editors who created articles having NRIS-only tag. A button to run that script is or was available to me at the User:NationalRegisterBot/NRISOnly/All page. It generated a summary report tallying the numbers of "NRIS-only" articles created by each of about 300 editors. And it generated a list, for each editor, of the specific NRIS-only articles they had created, organized by state. It didn't write to any Wikipedia page, instead it generated into a new window that would be opened. See User:Doncram/SubStubsByOriginalAuthorByState for copies of those results copy-pasted by me to that page, and then re-organized and edited by me, from two or more occasions when I ran it. This could be helpful for someone like me trying to take responsibility to go back and expand all the pages for which I was responsible. I think it was used only by me that way as I brought my own number down from 1500 or some such number to zero at one point; I don't think any other editor was concerned or took on such a personal campaign. (I think i did use its results also when, in combination with some other editors, I helped run campaigns to reduce a few given states' numbers down to zero, but such could probably be supported by the main NRHPPROGRESS report page instead.) This script has effectively been broken for a few years: it does not run any longer, or it might seem to run but then does not generate its results to a new window or anywhere else. Maybe/probably this particular Wiki-blame report does not need to be revived, because the numbers of "NRIS-only" articles have dropped dramatically; the total is now just 1,614 per the wp:NRHPPROGRESS page, i.e. the perceived problem has been largely solved. I mention it in part because a similar "wiki-credit" type report might be useful to run, on the subject of geographical coordinates updating, which I could talk about elsewhere, which would be an enhancement to the system.
  • 5) Okay let me just suggest about geographical coordinates updating quickly. This is not yet part of the system, but could be done similarly to how NRIS-only counting/tagging is done. Accuracy of geographic coordinates is a long-running issue because original NRIS source coordinates were often bad, and there is no tracking system about which NRHP sites' coordinates have been checked and corrected/improved if necessary. A few NRHP editors have been using a "coordsource=" field in NRHP list-articles rows and a "source:" field within {{coord}} templates at NRHP articles, for us to indicate whether the coordinates shown are known to be from original, suspect NRIS2013a source, or whether they have been changed/improved, and if so, perhaps by whom. I and a few editors have been putting our usernames in as "sources", so that we are accountable for the coords we have changed. It would be nice if NRHPPROGRESS could have a column for "coords to check" akin to "NRIS-only". And for corresponding machinery to make that possible. And for a nice "wiki-credit" type report to be generated, by source (usually editor name) and by state. And a different type of bot could be created to cross-check between coordinates reported on county list-article pages vs. coordinates reported in articles, and where different which are the sources of each, and in some cases to implement improvements. Like if one specific responsible editor fixed/improved the county-list-article's coordinates for a site, while the coordinates showing in the article are known to be inferior "NRIS2013a" ones, then change the latter. This has not been done or requested yet. But if the reporting system were being reprogrammed, maybe it would not be hard to build the reporting into NRHPPROGRESS page.
  • 6) Another topic which might be addressed in the future is to accomodate ratings of photos, i.e. whether the main photo given for an NRHP site should really be improved or not, and whether specific additional photos are wanted or not. The latter, especially, could help in the annual Wikipedia Loves Monuments campaign, potentially. There's no one calling for this yet besides myself, though. I raised the possibility of Wikipeda Loves Monuments adjustment with User:Multichill who handles most related programming, but it didn't go further.
  • 7) FYI, potentially this kind of tracking and maps could be used for other nations' historic sites, or for tracking other topics. I personally have thought creating a parallel system for Australia's historic sites could be helpful, but haven't properly raised that with editors there, and I don't have means to implement a new system from scratch, myself. I think it all could be generalized, though.
END probably unnecessary stuff. I probably am missing some other aspects of the system to be aware of. Again I don't think any of this needs to be understood to implement the request as stated above (i.e. just to do daily update and to generate the maps). --Doncram (talk) 02:01, 21 October 2019 (UTC)
Not to put words in the requester's mouth, but what he seems to want is to have the updating of WP:NRHPPROGRESS (both data and maps) to be more automated than it is. It would be an error to describe the collection of scripts used by editors in the NRHP project as a "system" -- they are just a collection of scripts, most of which are unrelated to each other.
The script that updates the data at WP:NRHPPROGRESS is User:Magicpiano/NRBot/UpdateNRHPProgress.js. I don't know why it takes Doncram so long to run it; my runs rarely take more than five minutes on two-year-old hardware. The bulk of this script was written by Dudemanfellabra, who withdrew from Wikipedia a few years ago. I have maintained it (with minimal changes to its data-collection functions) since then. Some of those changes have been mandated by changes in the wikimedia ecosystem, such as the recent csrfToken business, while others have been focused on improved error handling and logging, which the code I inherited did poorly.
Related to this script is the SVG map generator, User:Magicpiano/NRBot/NRHPmap.js. This script parses the progress page and produces fragments of SVG, which need to be edited on the editor's computer to produce the actual image files. This process is documented at Wikipedia:WikiProject National Register of Historic Places/Progress/Instructions, and takes me about 10-15 minutes to do all four maps. This process could be more fully automated, since it basically involves pasting the created fragments into a boilerplate surround, and then uploading the resulting SVG to Commons.
The duplicates data is produced by NationalRegisterBot, which is run irregularly by me under User:NationalRegisterBot. (It does not need to run frequently, since the two things it does, duplicate gathering and NRIS-only tagging, don't actually change a great deal in any given month.)
The principal issue militating against fully automating some of these processes is a somewhat fragile dependence on the state of the underlying pages. The progress update script parses NRHP list pages, NRHP article pages, and NRHP article talk pages in its data gathering process, and changes to them can break the update script. For example, it once broke when some "National Register of Historic Places listings in X" pages were moved (with redirects), because it made fragile assumptions (since corrected I believe) about the relationships between page names and NRHP locations. These sorts of issues lurk in the code base. Some types of changes, notable decisions around splitting long NRHP lists, have to be reflected in the content of WP:NRHPPROGRESS before the script is run, something editors who execute such moves may be unaware of. The script runner is of course unaware of the changes occurred, and it then fails, requiring a diagnose-and-fix process. (This fragility also affects NationalRegisterBot.)
That said, it may be possible to re-engineer what these two scripts do to be more resilient in the face of those sorts of changes, and in changes editors make to the pages it reads that break its parser (which has also happened on more than one occasion). This work is of a scope I am unwilling to tackle. I am willing to impart what I know about how the existing scripts work to someone willing to take on such a task. Magic♪piano 03:01, 21 October 2019 (UTC)
I can't, don't disagree with any of that. About run-times for me, I ran NRHPPROGRESS update bot at about 3:00 am U.S. eastern time last night, when it should have been fast, it reported 4 min 52 sec elapsed time before starting to save the result (at least it finished, thanks for fixing the edit token issue!). It would be nice if the bot recorded the elapsed time into the edit summary it writes, BTW. In recent months at peak times it has taken 20 minutes or more, and clearly was stalled, before I would kill the process and try again later, and maybe have to try later later. Sometimes that was frustrating, it seemed i had to keep cursor in the window and not do anything else, or else it would run even slower. It could take as little as about a minute, some other times. If an update bot could be scheduled to run late at night and be pretty much guaranteed to complete, runtime length wouldn't matter at all.
I get it that some of the scripts are not central to any system, but the duplicates script is central. I understand it records its resulting data table or whatever into hidden comments or section(s) within the page(s) it writes, which the daily update bot goes to consult. The overall system would break down if this javascript update bot did not run occasionally, so if the others are moved over to Python, this should too, probably. And/or, it would seem good to me if the data table it generates could be written out visibly, and have the update bot draw from the visible data. That might allow editors to make manual edits (relatively few ever required) there, so running the duplicates script would be more optional. At least (all?) results of the main update script and the SVG script are visible. --Doncram (talk) 20:58, 21 October 2019 (UTC)
@Doncram: "How do we eat hippopotamus? One bite at a time." Let's first focus on getting the Data table updates automated. Then we can focus on getting the SVG snippets, then we can focus on the commons bot to update the images. As to re-engineering, I'm planning on re-implementing this in Python3 with the Pywikibot framework. My goal is to both develop the script that will do the data table updates by driving down in the data oriented format. I can invision adding message to where if one of the subunit pages fails to parse correctly, the bot reports to the NRHP Talk page that the subunit page is malformed and needs review. This will help the project stay on top of the indexes. I'm a python programmer by trade and my goal is to leave enough comments and data behind so that if future changes are needed when I shuffle off my mortal coil (or others want to make improvements) we won't have to unwind all the page again. By also picking pywikibot, I'll be able to run it on the Wikitech Toolforge compute cluster which will make it significantly faster to run these numbers instead of streaming data down to your local desktop and then pushing data back to Wikipedia. Hasteur (talk) 03:32, 21 October 2019 (UTC)
@Magicpiano: See above, but my goal is to reach out behind the scenes from Toolforge so that we don't have as much data transiting the wire. Also if we fail to parse one of the indexes, we can add a notice (either at a purpose built landing box like User:HasteurBot/Notices or at the NRHP project talk page) to flag down attention. Also if we have one of the index pages loose some significant percentage of it's items, the bot could send an additional notice that the membership appears to have gone down quite a bit and might have been split unsuccessfully. Hasteur (talk) 03:38, 21 October 2019 (UTC)
Those types of messages sound good, perhaps could best be sent to the Talk page of the NRHPPROGRESS report, rather than the main Talk page of WikiProject NRHP page. Hasteur, about going forward this way, I appreciate your willingness, and I guess I hope that you will, though not if Magicpiano has reservations. I worry somewhat about future programmers' willingness in the future to modify/develop the system in Python, but same worry applies for Javascript. I have little expectation that I could ever contribute in programming tweaks in the future, but I vaguely think I'd have a tad more chance in Python (more in the freeware world, right?) than Javascript. User:TheCatalyst31, could you comment? --Doncram (talk) 20:58, 21 October 2019 (UTC)
I have experience with both Python and JavaScript, so that's not an issue for me, though I'm not sure how much time I'd have to help anyway; in practice, I haven't actually done much work with the bot. (I'm also a programmer by trade, but in my case that usually means I want to do things other than programming outside of work.) TheCatalyst31 ReactionCreation 00:08, 22 October 2019 (UTC)