Wikipedia:Bot requests

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This is a page for requesting tasks to be done by bots per the bot policy. This is an appropriate place to put ideas for uncontroversial bot tasks, to get early feedback on ideas for bot tasks (controversial or not), and to seek bot operators for bot tasks. Consensus-building discussions requiring large community input (such as request for comments) should be normally be held at WP:VPPROP or other relevant pages (such as a WikiProject's talk page).

You can check the "Commonly Requested Bots" box above to see if a suitable bot already exists for the task you have in mind. If you have a question about a particular bot, contact the bot operator directly via their talk page or the bot's talk page. If a bot is acting improperly, follow the guidance outlined in WP:BOTISSUE. For broader issues and general discussion about bots, see the bot noticeboard.

Before making a request, please see the list of frequently denied bots, either because they are too complicated to program, or do not have consensus from the Wikipedia community. If you are requesting that a template (such as a WikiProject banner) is added to all pages in a particular category, please be careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively (see example difference).

Note to bot operators: The {{BOTREQ}} template can be used to give common responses, and make it easier to keep track of the task's current status. If you complete a request, note that you did with {{BOTREQ|done}}, and archive the request after a few days (WP:1CA is useful here).


Please add your bot requests to the bottom of this page.
Make a new request


Take over GAN functions from Legobot[edit]

Legobot is an enormously useful bot that performs some critical functions for GAN (among other things). Legoktm, the operator, is no longer able to respond to feature requests, and is not very active; they've asked in the past if someone would be willing to take over the code. I gather from that link that the code is PHP; see here [1]. There would be a lot of grateful people at GAN if we could start addressing a couple of the feature requests, and if we had an operator who was able to spend more time on the bot. This is not to criticize Legoktm at all -- without their work, GAN could not function; Legobot is a core part of GAN functionality.

I left a note on Legotkm's talk page asking if they would mind a request here for a new operator, and Redrose64 responded there with a link to the note I posted above, so I think it's clear they'd be glad for someone else to pick this up. Any takers? Mike Christie (talk - contribs - library) 23:10, 6 February 2018 (UTC)

I've heard from Legoktm and they would indeed be glad to have someone else take this over. If you're capable in PHP, this is your chance to operate a bot that's critical to a very active community. Mike Christie (talk - contribs - library) 00:21, 8 February 2018 (UTC)
I would like to comment that it would be good to expand the functionalities of the bot for increased automation, like automatically adding to the GA lists. Perhaps it would be better to rewrite the bot in a different language? I think Legoktm has tried to get people to take over the php for awhile with no success. Kees08 (Talk) 04:44, 8 February 2018 (UTC)
The problem with adding to the GA lists is knowing which one. There is no indication on the GAN as to where. All we have is the topic. Even the humans have trouble with this. Hawkeye7 (discuss) 20:20, 16 February 2018 (UTC)
To correct for the past, we could add a parameter to the GA template for the 'subtopic' or whatever we want to call that grouping. A bot could go through the current listing and then add that parameter to the GA template. Then, when nominating, that could be in the template, and the bot could carry that through all the way to automatically adding it to the GA page at the end. Kees08 (Talk) 20:23, 16 February 2018 (UTC)
Nominators would need to know those tiny divisions within the subtopics; as it's not something we have on the WT:GAN page, I doubt most are even aware of the sub-subtopics. Even regular subtopics are sometimes too much for nominators, who end up leaving that field blank when creating their nominations. BlueMoonset (talk) 22:15, 26 February 2018 (UTC)

@Hawkeye7: For what it is worth, due to your bot's interactions with FAC, I think it would be best if you took over the GA bot as well, for what it is worth. I think at this point it is better to just write a new bot than salvage the old bot; no one seems to want to work on salvaging. Kees08 (Talk) 21:59, 26 February 2018 (UTC)

We'd need to come up with a full list of functionality for whoever takes this on, not only what we have now but what we're looking for and where the border conditions are. BlueMoonset (talk) 22:15, 26 February 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I might interested in lending a hand. A features list and functionality details (as mentioned by BlueMoonset) would be nice to affirm that decision though. I shall actively watch this thread. --TheSandDoctor (talk) 21:30, 11 March 2018 (UTC)

Okay, I will attempt to list the features, please modify as needed:

  • Place notifications on nominators talk page when their nomination is onreview diff, onhold diff, passed diff, failed
  • Update GAN page when status of a review changes (new, on hold, on review, passed, failed, also number of reviews editors have performed) diff
  • Update the stats page (related to the last bullet point, this is where the stats are stored) diff
  • Transcludes GA review on article talk page diff
  • Adds GA icon to articles that pass diff
  • Adds the oldid parameter to the GA tempate diff

@BlueMoonset: Are you aware of other functions? Looking through the bots edit history and going off of what I know of the bot, this is what I came up with. Kees08 (Talk) 22:10, 11 March 2018 (UTC)

Thanks Kees08. Does anyone know if it would be possible to take a look at the database structure? --TheSandDoctor (talk) 22:28, 11 March 2018 (UTC)
@Legoktm: Are you able to answer their question? Thanks! Kees08 (Talk) 23:43, 11 March 2018 (UTC)
TheSandDoctor, it's great that you're interested in this. Kees08, the second item (about updating the GAN page) is much broader. I don't know whether the bot simply updates the GAN page or generates/recreates the contents of all sections on it. It's basically dealing with all the GA nominee templates out there—which indicate what is currently a nominated article that belongs on the GAN page, and the changes to that page. If an entry wasn't on the page last time but is this time, then it's considered new; if it was on last time but a review page has appeared for it, then it's considered under review and the review page is parsed for reviewer information (but if the GA nominee template instead says "status=onhold", then it's noted as being on hold)... there's a lot involved, including cross-checking, and any field in the GA nominee template, including page number, subtopic, status, and note, can change at any time. If the GA nominee template has disappeared and a GA template is there for that same "page" number, then it has passed; if a FailedGA is there for that same "page" number, then it has failed (but the current bot's code doesn't check this properly, so any FailedGA template on the talk page results in the "failed" message being sent to the nominator even if the nomination was just passed with a new GA template showing). Sometimes review pages disappear when they were created illegally or by mistake and are speedy deleted, and the bot realizes their absence and updates the GAN page accordingly, so it's a comprehensive check each time the bot runs (currently every 20 minutes). If the bot doesn't know how to characterize the change it has found, it appears under an edit summary of "Maintenance": status changes to 2ndopinion go here, as do passes and failures where there was something wrong with the article talk page according to its lights. For example, it views with suspicion any talk page of a nomination under review that doesn't have a transcluded review on it, so it doesn't send out pass or fail messages for them (and maybe not even hold messages; I've never checked that).
There's a difference here between features and functionality. I think the features (with the exception of the 2ndopinion status and the display of anything in the "notes" field of GA nominee) have been listed here. The actual functions—how it needs to work and what it needs to check—are harder to break down. One thing that was mentioned above is the use subtopics: we have been unable to add new subtopics for several years now, so new subtopics on the GA page are not yet available on the GAN page. I'm not sure how the bot gets its list of subtopics—I've found more than one possible page where they could be read from, but there may be a database for subtopics and the topics they come under that actually controls them, with the pages I've found being a place for some templates, like GA, FailedGA, and Article history, to figure out what subtopics translate to which topics, and which subtopics are legitimate. GA nominee templates that have invalid subtopics or missing status or note fields (or other glitches) can cause the bot to try every 20 minutes to enter or update a nomination/review and fail to do so; there are times when a transaction is listed dozens of times, one bot run after another, as the GAN edit summary because it needs to happen, but it ultimately doesn't (until someone sees the problem and fixes the problematic GA nominee template or GA review page). I'm hoping any new bot will be smarter about how to handle these (and many other) situations, and maybe there will be an accessible error log to aid us in determining what's wrong. BlueMoonset (talk) 00:55, 12 March 2018 (UTC)
Yeah there is a lot in the second bullet point I did not include diffs for, on account of me being lazy. I will try to do that tonight maybe. I tried to limit what I said to the current functionality of the bot and not include my wishlist of new things, including revamping how subtopics are done. There was an error log at some point in time (located here), not sure when we stopped using that, and if it was on purpose or not. Kees08 (Talk) 01:18, 12 March 2018 (UTC)
@TheSandDoctor: Just giving you a ping in case this slipped off your radar. Kees08 (Talk) 07:48, 20 March 2018 (UTC)
Thanks for the ping Kees08. I had not forgotten, but was waiting for other responses. I am still interested (and might be able to port it to Python), we just need to get Legoktm involved in the discussion. --TheSandDoctor Talk 15:31, 20 March 2018 (UTC)
Kees08 BlueMoonset I have started to (somewhat) work on a Python port of the GAN task. There are some libraries that can be taken advantage of to (hopefully) reduce the number of lines (hopefully simplify it) etc. --TheSandDoctor Talk 22:49, 20 March 2018 (UTC)
That's great, TheSandDoctor. I'm very happy you're taking this one on. There are some places where the current code doesn't do what it ought. Here are a few that I've noticed:
  • As mentioned above, even if the review has just concluded with the article being listed as a GA, if the article talk page also has a FailedGA template on it from prior nomination that was not successful, the bot will send out a "Failed" message rather than a "Passed" message.
  • If a subtopic isn't capitalized exactly right, the nomination is not added to the GAN page even though the edit summary claims it is; for example, the subtopic "songs" isn't written as "Songs", which prevents the nomination from being added to the page until it is fixed.
  • If a GA nominee template is missing the status and/or note fields, a new review is not added to the template, even though it is (ostensibly) added to the GAN page. One example: Abdul Hamid (soldier) was opened for review and appeared on the GAN page as under review, but in actuality, the review page was transcluded but the GA nominee status was not updated because the GA nominee template was missing the "note" field; only after that was manually added did the bot add the "onreview" status. It would make so much more sense for the bot to add the missing field(s) to GA nominee and proceed with adding the status to the template (and the transclusion of the review page on the talk page), instead of leaving its process step incomplete.
  • When an editor opens a GA review, the bot will increment the number of reviews they have, and it will adjust this number on all nominations and reviews that editor has open. Unfortunately, not only does it produce an edit summary that lists the new review, it also includes those other reviews in the edit summary because of that incremented number, when nothing new has happened to the other reviews. This was a problem before, and it's gotten much worse now that edit summaries can be 1024 characters rather than 128 or 256. For example, when Iazyges opened a GA review of Jim Bakker, the edit summary overflowed the 1024 characters, and it shouldn't have; the Bakker review was the only one that should have been listed for Iazyges.
I'm sure there are others; I'll try to think of them and let you know. Thanks again for taking this on. BlueMoonset (talk) 04:52, 21 March 2018 (UTC)
@BlueMoonset: Thanks! At the moment I am just trying to get the current code ported, but once I am confident that it should work, I will see about the rest. (The main issue of course being that I cannot actually test/run the ported script (isn't ready for that stage yet, but once it is. The most I could do would be to output to text files diffs instead of saving for a couple as I dont have bot access etc etc; Lego needs to be a part of these discussions at some point as they involve their bot). --TheSandDoctor Talk 05:17, 21 March 2018 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

@BlueMoonset:@Kees08: I have emailed Legoktm requesting for a glimpse at the database structure. --TheSandDoctor Talk 16:00, 27 March 2018 (UTC)
TheSandDoctor, that's excellent news. I hope you hear back soon. Incidentally, I noticed that Template:GA/Subtopic was modified by Chris G, who was the GAN bot owner (then called GAbot) prior to Legoktm, back when the Warfare subtopic "War and military" was changed to "Warfare", so I imagine this is one of the files that might need to be updated if/when the longstanding requests to update/expand the subtopics at GAN to break up some of the single-subtopic topics (something that's already been done at WP:GA. In particular, the Warfare topic/subtopic and the Sports and recreation topic/subtopic have been on our wishlist for several years, but Legoktm never responded to multiple requests; the last changes we had were under Chris G before his retirement in 2013. I don't know whether Template:GA/Topic is involved, and the underlying Module:Good article topics and the data it loads at Module:Good article topics/data, which would also need to be updated when topics and/or subtopics are revised or added to. BlueMoonset (talk) 23:26, 28 March 2018 (UTC)
Hi there BlueMoonset, I was waiting to hopefully hear from Lego, but have not. Exams have delayed my progress in this (and will continue to do so until next week), but unfortunately, even when I have the bot converted, there is no guarantee with would work (at first) as I don't have a way to test it nor do I have access to the existing database etc. I could probably figure out what the database looks like from the code, but the information contained within would be very useful (especially to get it up and running). It is still unclear if I would gain access to Legobot or have to make a "Legobot 2" (or similar). (cc Kees08) --TheSandDoctor Talk 01:05, 20 April 2018 (UTC)
TheSandDoctor, I don't know what you can do at this point, aside from pinging Legoktm's email account again. I know that Legoktm would like to give over the responsibility for this code, but doesn't seem to be around Wikipedia enough any more to give the time necessary to help achieve such a transition. With luck, one of these pings will eventually catch them when they have time and energy to make it happen. I do hope you hear something soon. BlueMoonset (talk) 03:42, 20 April 2018 (UTC)
@TheSandDoctor: What database are you looking for? This may be a dumb question..but we identified where the # of reviews/user list was, and the GA database is likely just from the GA page itself. Is there another database you are looking for? Kees08 (Talk) 00:04, 22 April 2018 (UTC)
Hi there and sorry for the delay in my response Kees08, Legobot uses its own database to keep track of the page states (to know if they have changed). Having access or at least an outline of the structure would speed things out somewhat as I would not have to regenerate the database and could have it clarified what exactly is stored about the pages etc. It is not a necessity, but would be a nice convenience, especially if I am to take over the bot's functions and maintenance to have access to its database (or at least a "snapshot" of its structure). As for further development on the translation to Python, once finals are wrapped up (by Tuesday PST), I should hopefully have more time to dedicate to working on it. In the meantime, I have an important final in between me and programming. I shall keep everyone updated here. I still foresee an issue with verifying that the bot works as expected though due to the lack of available testing and a bot account to run it on. Things will sort themselves out though in the next while, I am sure. Minus editing I could always check if it "compiles"/runs and could probably work in a dry-run framework similar to my other projects (where they go through the motions, without making actual edits, printing to a local text file(s) instead). --TheSandDoctor Talk 05:44, 22 April 2018 (UTC)
Sounds good; no rush, just seeing if I can help you hit the ground running when you get to it. Perhaps DatGuys's config structure would help you figure out a way to do dry runs; mildly similar, you would just have to make up some pages and a database structure, to get the best dry run that is possible prior to hitting the real articles. Best of luck on your finals, and if it makes you feel any better, you will still wake up in cold sweats about them several years in the future (note to dreaming self: no, I have no finals. No, it does not matter you did not study.). Kees08 (Talk) 06:21, 22 April 2018 (UTC)
Not sure how this is going, but I have found User:GA bot/Stats to be inaccurate. It simply needs to list the number of pages created by each editor with "/GA" in the title. Most editors have less listed than they have done. It might b e easier to look into this while the bot is being redone. AIRcorn (talk) 22:13, 11 May 2018 (UTC)
Can't we get the database structure from the code? Enterprisey (talk!) 04:25, 13 June 2018 (UTC)
I committed the current database structure. Let me know if you want dumps too. Legoktm (talk) 07:04, 13 June 2018 (UTC)
@Enterprisey and TheSandDoctor: Ping, in case you missed Lego's comment like I did. Thanks lego! Kees08 (Talk) 02:58, 25 June 2018 (UTC)
I totally missed that. Thank you so much Legoktm (and Kees08 for the ping)! A dump would be probably useful, though not entirely necessary. --TheSandDoctor Talk 03:55, 25 June 2018 (UTC)

Bot needed for updating introduction section of portals[edit]

Many portals lack human editors, and need automated support to avoid going stale.

Most portals have an introduction section with an excerpt from the lead of the root article corresponding to the portal. The content for that section is transcluded from a subpage entitled "Intro".

The problem is that the excerpts are static, and grow outdated over time. Some are many years out of date.

What is needed is a bot to periodically update subscribed portals, by refreshing the excerpts from the corresponding root article leads.

Each excerpt should end similar to this:

...except that the link should go to the corresponding root article, rather than aviation.

There are over 1500 portals, and so it would be quite tedious for a human editor to do this. Some portals are supported, while others aren't updated for years.

Portals are in turmoil, and so, this is needed sooner rather than later.

Of course, they need greater support than this. But, we've got to start somewhere. As the intros are at the tops of the portal pages, it seemed like the best place to start.    — The Transhumanist   07:06, 14 April 2018 (UTC)

Probably better to do section transclusion, i.e, like Portal:Donald Trump/Intro Galobtter (pingó mió) 07:09, 14 April 2018 (UTC)
I tried various forms of transclusion of the lead, and they all require intrusive coding of the source. Either section markers, or noinclude tags.
I think an excerpt-updater would be better, as there would be zero impact in the source pages in article space. Cumulatively, portals include tens of thousands of excerpts. Injecting code for each of those into article space would be unnecessary clutter, when we could have a bot update the portal subpages instead.    — The Transhumanist   00:38, 15 April 2018 (UTC)
Adding code to the mainspace pages to facilitate translations is a really bad idea. GF editors will just strip the coding. I don't support automatically changing the text of portals to match the ledes I'd rather see them redirected to the matching articles. Short excerpts don't really help the reader especially for broad concept articles whicb is what most portals try to cover. Legacypac (talk) 04:19, 15 April 2018 (UTC)
Such coding generally has comments included with it so that GF editors don't remove it. As for support/oppose, that's irrelevant, as it is allowable code, like all the other wikicode we use. They added an entire extension to MediaWiki, available on all MediaWiki sites, for transcluding content based on inserted code, and it's already a standard feature on Wikipedia. I think such code makes the source less readable, and think it is best practice to avoid it. As long as there is an alternative, like bot-updated excerpts in portals.
Redirects would be links. Portals with just links are lists, not portals. To go to merely redirects, the portal design itself would need to be changed via a new consensus. Portals display content by transcluding excerpted content, that's their core design element. One of the biggest problems with portals is that there aren't enough editors to refresh the excerpts manually. Hence, the bot request.
Short excerpts are exactly the point of portals. To let editors dip in to the subtopics of a subject, in exactly the same way the main page does that for the entire scope of Wikipedia. While you may not find them useful, I find the main page highly useful and entertaining. I rarely follow the links to the rest of the article, but am glad I read the excerpts. The thing I love about it most is that the content changes daily. If portals were set up like that, I would visit the portals for my favorite subjects often. I might even assign one as my home page. Bots can accomplish this. But rather than tackling the whole thing at once, focusing on a bot for updating the portion at the topmost part of the page, the intro, seems like a good place to start.    — The Transhumanist   05:49, 15 April 2018 (UTC)
For a way to avoid that, see this revision I did on Portal:Water. Only problem is that it transcludes the entire page which is pretty heavy..then uses regex to find the first section... But yeah, I do agree with you - I don't see how the excerpts help much. Galobtter (pingó mió) 06:04, 15 April 2018 (UTC)
Excerpts are the current design standard of portals. Changing the practice of using excerpts would be a change in the design standard of portals, which is outside the scope of this venue. Bots are for automating routine and tedious tasks. The method for updating excerpts has been for the most part to do it manually. A bot is needed to help with this onerous chore.    — The Transhumanist   06:09, 15 April 2018 (UTC)
Excerpts not helping much is part of the my general position that portals don't help much in general haha Galobtter (pingó mió) 06:26, 15 April 2018 (UTC)
Based on the replies, the strongest exception to portals was that they are out of date and unmaintained. Both of which problems can be solved with bots. So, I've come to the experts. I'm sure they can find an automatable solution.    — The Transhumanist   23:21, 15 April 2018 (UTC)
Excerpts are part of the problem, not a solution. Portals are a failed idea amd no amount of bot mucking around is going to fix them. Legacypac (talk) 18:08, 16 April 2018 (UTC)
Please keep in mind when transcluding anything from mainspace, fair use media is currently restricted to "articles" and should not be transcluded to Portal space. — xaosflux Talk 19:15, 17 April 2018 (UTC)
You mean, like pictures of book covers, logos, and the like?    — The Transhumanist   04:17, 18 April 2018 (UTC)
I tend to think both bot updating and transclusion of content from article space are problematic approaches. The stated problem that this request is trying to fix is that portal intros become stale over time because no one is paying attention. If some automated process is adopted, portal pages could well become broken and stay that way for long periods of time because no one is paying attention. I'd take stale over broken any day. Of course, the risk of such breakage depends on how the automation is done, but isn't a better solution to simply avoid potentially dated language/information in portal intros, or to mark such stuff with, say, {{as of}} or {{update after}}? This would require an initial round of assessments to add such templates (/fix problematic wording, etc.), but it looks like with all the attention portals are getting there will be a concerted effort to review portals once the current RFC fails is closed. - dcljr (talk) 22:38, 19 April 2018 (UTC)
To reduce the "brokenness rate", the bot could first add {{historical}} to all the portals which have less than a certain threshold of edits in a certain period, then after a week perform the proposed edit to the existing "intro" section/subpage of the portals which are not marked historical. --Nemo 12:15, 16 May 2018 (UTC)
  • At this point the vast majority of portals have been updated with a variety of templates which transclude content from mainspace directly. This was a good idea at the time, but does not seem to have been the prefered option. JLJ001 (talk) 15:25, 29 May 2018 (UTC)

Bot to tag all remaining disambiguation links.[edit]

We developed a consensus a while back to tag all remaining disambiguation links in the project with a {{dn}} tag. In order to avoid excessive tagging, the idea is to generate a list of all links, let it sit for a few weeks, then recheck it and tag everything that has still not been fixed after that interval. Any takers? bd2412 T 22:20, 17 April 2018 (UTC)

@BD2412: - I think I can write this one. Basically, we're looking for links to pages in Category:Disambiguation pages inside articles. Generate a list based on that, and after a couple weeks - rerun with tagging enabled for the links in that list. The query to find those pages should be: quarry:query/26624 if I understood right (Quarry's taking a long time to run it - DBeaver came back with 20,000+ hits) SQLQuery me! 21:56, 23 April 2018 (UTC)
There should be fewer than 8,200 total disambiguation links at this time, per The Daily Disambig; of those, at least 2,100 should already be tagged (you can exclude pages that already have such a tag on them), although many of the articles with tags are likely to include multiple tagged links, so I would think that the task should involve no more than 6,000 links to be tagged. bd2412 T 22:29, 23 April 2018 (UTC)
Good point, I'll rewrite the query to exclude {{dn}}. SQLQuery me! 22:32, 23 April 2018 (UTC)
@SQL: Hi, just following up on this. Cheers! bd2412 T 22:51, 9 June 2018 (UTC)

WikiProject Athletics tagging[edit]

It's been four years since this project last had a tagging run and I'm looking to get Article Alerts to cover the many relevant articles that have not been tagged since. Anyone interested in doing a tagging run of the articles and categories under Category:Sport of athletics? SFB 19:03, 4 May 2018 (UTC)

 Working on this tagging part over the next few days.   ~ Tom.Reding (talkdgaf)  22:11, 4 May 2018 (UTC)
Sillyfolkboy, 2 questions:
  1. there are ~6000 ~5600 pages to tag. I will propagate the |class= of other WikiProjects, if available. Should I leave |importance= blank, or use |importance=Low? The idea being that if importance were > "Low", it probably would have been tagged as such by now. I can also do this for articles less than a certain size instead.
  2. I'll leave pages alone (for now) which do not have any WikiProject tagged. To make classification of the resulting unclassified pages faster, I can apply |class=Stub to all pages less than 1000, 2000, 3000, etc. bytes. Please take a look at that list of ~6000 ~5600 and let me know what threshold below which to tag pages as stubs (if at all).
WP Athletics notified for input as well.   ~ Tom.Reding (talkdgaf)  23:47, 4 May 2018 (UTC)
@Tom.Reding: I would recommend propagation of other project's class if available, or mark as stub if under 2000 bytes. You can place importance as low by default. The project is quite well developed now, so the vast majority of important content is already tagged. These will mainly be recent articles on lower level athletes and events.
Category:Triathlon, Category:Duathlon, Category:Foot orienteers‎, Category:Athletics in ancient Greece and Category:Boston Marathon bombing need to be manually excluded. Thanks SFB 01:41, 5 May 2018 (UTC)
PetScan link updated to exclude those cats, ~400 removed. Won't start on this for a few days for possible comments.   ~ Tom.Reding (talkdgaf)  03:15, 5 May 2018 (UTC)
Orienteers were not excluded on the previous run, and consequently a lot of orienteering articles are now tagged as being within the scope of WikiProject Athletics, even though (with some exceptions) they're actually not. Would it be possible to untag them by bot? Sideways713 (talk) 16:20, 5 May 2018 (UTC)
Sideways713, pages < 2000 b mostly done. Will leave |class= blank for those >= 2000 b. Let me know if there's any desired change to the above guidance. Can do the untagging after.   ~ Tom.Reding (talkdgaf)  13:16, 18 May 2018 (UTC)
 Done.
Re: Orienteering+Athletics, this scan shows 459 which are tagged as both. However, just because someone is in Orienteering doesn't mean they shouldn't be in Athletics, only that they're a candidate for removal. So it's probably best to do this manually, unless there's some rigorous exclusion criteria available?   ~ Tom.Reding (talkdgaf)  17:28, 19 May 2018 (UTC)
If you exclude those in subcategories of Track and field athletes (at any level), and those in subcategories of Sports clubs at level 3 or lower, and possibly those in subcategories of Mountain running (I'm not entirely sure about this one - how does @Sillyfolkboy feel?), and untag the rest, that should be good enough. (That's only a few dozen exclusions.) There will probably still be some false removals - orienteers who dabble in running enough they could be marked as runners on wiki, but aren't yet - but it's a lot less effort to happen upon those later and tag them manually than it is to untag the other 400 pages manually, and the false removals should all be of athletes whose main claim to fame is orienteering and whose articles will be more naturally developed by members of that wikiproject. Sideways713 (talk) 22:43, 19 May 2018 (UTC)
I'm good with the above. There isn't actually a whole lot of crossover between orienteering and elite long-distance running, probably because the later is much better paying than the former so it isn't something, say, a marathon specialist would consider normally. SFB 23:43, 19 May 2018 (UTC)
Sideways713 & SFB: here is the PetScan (434 results) for these doubly-tagged pages with Category:Track and field athletes & Category:Mountain running, both fully recursed, removed. I've tried removing Category:Sports clubs at level 3 or lower via PetScan and locally via AWB's variably-recursive category utility, but both timeout at depths of 5 and greater. The tree grows very quickly, with ~13,000 unique mainspace pages at a depth of 2, to ~311,000 at a depth of 4. D2's pages subtracted from D4's pages gives a ~298,000 pool of pages to try to remove from the 434, but only 2 pages are removed (Brit Volden & Øyvin Thon), leaving 432, so this isn't a practical approach.   ~ Tom.Reding (talkdgaf)  14:31, 20 May 2018 (UTC)
@Tom.Reding: On that basis, I would leave this to a manual task. Given the small article base, there aren't any major downsides to the accidental inclusion in scope, especially as WikiProject Orienteering seems inactive at the moment. SFB 14:47, 20 May 2018 (UTC)
I meant exclude levels 1, 2 and 3 but don't exclude 4 and up, rather than the opposite. Sorry if that was unclear. Sideways713 (talk) 16:24, 20 May 2018 (UTC)
Right, but it's a distinction without a real difference. It would return a subset of the ~300k I found (since I lumped level 3 into those 300k instead of excluding them), so I decided to not be any more precise, since there's no need - the result would be either the same (i.e. I'd still find those same 2 to be removed from the 434) or worse (I'd find 0 or 1 of those same 2); basically a way for programmers to rationalize exerting least effort...   ~ Tom.Reding (talkdgaf)  19:45, 20 May 2018 (UTC)
No, what I meant is this, which gives 420 results. Sorry if there's a communication problem, Sideways713 (talk) 21:53, 20 May 2018 (UTC)
Sideways713, sorry for the delay. Just to be sure: those 420 results need to have {{WikiProject Athletics}} removed?   ~ Tom.Reding (talkdgaf)  14:25, 5 June 2018 (UTC)
SFB, can you confirm instead?   ~ Tom.Reding (talkdgaf)  21:11, 6 June 2018 (UTC)
@Tom.Reding: above link is down so I can't see the results, but I still think this action is better done manually, given the cross-over in the sports (i.e. just because an orienteer isn't currently in a track athlete category doesn't necessarily mean the athlete has not competed in track). Happy for you to proceed on your rationalized approach per above. SFB 22:27, 6 June 2018 (UTC)

Cyberbot I Book report updates[edit]

The Cyberbot I (talk · contribs) bot used to update all the book reports but stopped from January 2018. It seems its owner is caught up IRL to fix this. Can anyone help by checking the bot or the code? —IB [ Poke ] 16:13, 5 May 2018 (UTC)

You need to check with User:cyberpower678 - see Wikipedia:Bots/Requests for approval/Cyberbot I 5 - User:cyberbot I says it's enabled. There is no published code. Ronhjones  (Talk) 20:53, 14 May 2018 (UTC)
@Ronhjones: I have tried contacting Cyberpower a number of times, but he/she does not look into it anymore. Although the bot is listed as active for book status, it has stopped updating it. So somewhere it is skipping the update somehow. —IB [ Poke ] 14:30, 21 May 2018 (UTC)
@IndianBio: Sadly the original request Wikipedia:Bots/Requests for approval/NoomBot 2 has "Source code available: On request", so there is no working link to any source code. If User:cyberpower678 cannot fix it the current system, then maybe the only option is to write a new bot from scratch. I see user:Headbomb was involved in the original BRFA, maybe he might have some ideas? I can think about a re-write if there's no alternative - I will need a bit more info on what the bot is expected to do. Ronhjones  (Talk) 14:57, 21 May 2018 (UTC)
The modification likely isn't very big, and User:Cyberpower678 likely has the source code. The issue most likely consist of finding what makes the bot crash/not perform, and probably update a few API calls or something hardcoded into the bot (like a category). Headbomb {t · c · p · b} 15:01, 21 May 2018 (UTC)
Yes, I have the source, but it was modified as needed to keep it operational over time. @Headbomb: If you email me, I can email you a current copy of the source to look at.—CYBERPOWER (Chat) 15:58, 21 May 2018 (UTC)
I suppose I could, but I'm a really shit coder. Is there a reason to not make the source public? Headbomb {t · c · p · b} 16:35, 21 May 2018 (UTC)
It actually is.—CYBERPOWER (Chat) 17:03, 21 May 2018 (UTC)
@Headbomb: will you take a look at the code? I'm sorry I really don't understand the link which Cyberpower has given. I only code in Mainframe lol, but let me know what seems to be the issue. —IB [ Poke ] 08:04, 22 May 2018 (UTC)
Like I said, I'm a shit coder. This looks to be in PHP so presumably anyone that knows PHP could take over the book reports. Headbomb {t · c · p · b} 13:09, 22 May 2018 (UTC)
Someone only need to file a pull request, and I will deploy it.—CYBERPOWER (Chat) 13:42, 22 May 2018 (UTC)
I can have a look - I'm not a PHP expert by any means (I prefer Python! ;) ) but I've used it extensively in a past life. Richard0612 19:31, 22 May 2018 (UTC)
Richard0612, that will be a real help if you can do it. A lot many books are lagging in their updates. —IB [ Poke ] 12:32, 23 May 2018 (UTC)

(→) Hey @Richard0612: was wondering did you get a chance to look into the code base? —IB [ Poke ] 09:17, 4 June 2018 (UTC)

Not getting any response, so pinging @Cyberpower678: what can be done? —IB [ Poke ] 06:32, 10 June 2018 (UTC)
@Richard0612: sorry can we have any update on this? —IB [ Poke ] 15:10, 26 June 2018 (UTC)

I just happened to stumble across this discussion while visiting for something else, but I can say that the book report updates have resumed as of 28 June. --RL0919 (talk) 18:50, 2 July 2018 (UTC)

Alexa rankings / Internet Archive[edit]

This isn't really a bot request, in the sense that this doesn't directly have anything to do with the English Wikipedia and no pages will be edited (no BRFA is required), but I'm putting it here nonetheless because I don't know of a better place and it's used 500% more than Wikidata's bot requests page. However, it will benefit both Wikidata and Wikipedia.

I have been archiving (with wget and a list of URLs) a lot of alexa.com pages onto the Internet Archive and archive.is, currently about 75,000 daily (all the same pages). This was originally supposed to be for Wikidata and would have been done once a month on a lot more URLs, but that hasn't materialized. Unfortunately maintaining this automatically would be beyond my rudimentary shell script skills, and to run it as I am doing currently would require time which I do not have.

Originally d:User:Alexabot did this based on some URLs from Wikidata, but the operator seems to have vanished after being harangued on Wikidata's project chat because he added the data to items which were not primarily websites. It follows that in the absence of an established process to add values for the property to Wikidata, the archiving should be done separately, with the data to be harvested where needed. Module:Alexa was to have been used with the data, but the bot only completed three runs so it would be outdated at best, and the Wikidata RFC might end up restricting its use.

Could someone set their Unix-based computer, and/or or their bit of the WMF cloud servers, to

  • once a day, archive (to the Internet Archive) and/or download several lists of domain names (e.g. those used on Wikipedia and Wikidata; from CSV files which are sitting on my computer; lists of the top 1 million websites) and combine the lists
  • format those domain names with the regular expression below
  • once a month (those below about ~100,000 in rank) or daily/weekly (those ~100,000 and above), archive (to the Internet Archive or archive.is) all of the URLs (collected on a given day) between 17:10 UTC and 16:10 UTC the day after (Alexa seems to refresh data erratically between 16:20 and 17:00 each day, independent of daylight saving time)
    • wget allows archival of lists of websites; use -i /path/to/file and -o /path/to/file flags for archival and logging respectively
  • possibly, as an unrelated process, download the archived pages using URL format https://web.archive.org/web/YYYYMMDD054000/https://www.alexa.com/siteinfo/url.website (where YYYYMMDD is some date) and then harvest the data (Unix shell script regular expressions are almost entirely sufficient)
    • alternatively, just download directly from alexa.com around the same time (see below)
https://web.archive.org/save/https://www.alexa.com/siteinfo/$1
https://web.archive.org/save/https://traffic.alexa.com/graph?o=lt\&y=t\&b=ffffff\&n=666666\&f=999999\&p=4e8cff\&r=1y\&t=2\&z=30\&c=1\&h=150\&w=340\&u=$1
https://web.archive.org/save/https://traffic.alexa.com/graph?o=lt\&y=q\&b=ffffff\&n=666666\&f=999999\&p=4e8cff\&r=1y\&t=2\&z=0\&c=1\&h=150\&w=340\&u=$1

Caveats:

  • The Wikidata property currently uses the URL access date as the point in time, instead of the date that the data was collected (one day and 17 hours before a given UTC time), and does not require an archive URL or even a date. This might be fine for Google's Wikidata item since it will be number 1 until the end of time, but for everything else it will need to be fixed at some point
  • If you don't archive the graph images at the same time (or you archive pages too quickly), alexa.com will start throttling connections from the Internet Archive and you will be unable to archive /siteinfo/* for about a week
  • web.archive.org does not allow a large number of incoming connections per IP for either upload or download (only tested with IPv4 addresses – might be better with multiple IPv6 addresses), so you may want to get around this somehow. I have been funneling connections through Tor simply because it seemed easier to configure torsocks, but this is not ideal
  • Given the connection limit, it is only possible to archive about 100,000 pages and 200,000 graphs per day per IP address (and there might be another limit on alexa.com, which I haven't tried testing)
  • You can use wget's --spider and --max-redirect flags to avoid downloading content
  • Rankings below a certain point (maybe 1 million) are probably not very useful, since the rate of change is high. The best way to check this – which I haven't tried, because it only just occurred to me – is probably to download pages straight from alexa.com while the data is being archived, and check website rankings that way.
  • Some URLs are inexplicably blocked from archival on the Wayback Machine. Those are …/facebook.com, …/blogger.com and …/camelcamelcamel.com (there may be others but I haven't found any more). archive.is (which archives page requisites server-side) seems to block repeated daily archival after a certain point but you can avoid this by using URLs which redirect to the content to be archived
    • archive.is isn't supposed to be scriptable, but I did it anyway with a Lynx script
  • Some websites inexplicably disappear from the rankings from day to day, so don't stop archiving websites simply because their ranking disappears

If you want I can send you the CSV files of archive links that I've accumulated (email me and/or my alternate account, Jc86035 (1)). I have also been archiving spotifycharts.com and if it would be of any use I've written a shell script for that website.

Notifying Cyberpower678, just because you might know something I don't due to running InternetArchiveBot (or you might be able to get the Internet Archive to do this from their servers). Jc86035 (talk) 15:02, 20 May 2018 (UTC)

Jc86035 - I read all this and understand some things, but don't really understand what the goal is. Can you explain in a sentence or two. Are you trying to archive all URLs obtained through alexa.com onto archive.org / archive.is on a daily/weekly basis? What is the connection to wikidata? What is the purpose/goal of this project? -- GreenC 15:53, 26 May 2018 (UTC)

Jc86035 - re: archive.is - there are some archive.is libraries on GitHub for making page saves, but generally when doing mass uploads you'll want to setup an arrangement with the site owner to feed them links, as it gets better results if he does the archiving in-house, as he can get around country blocks and other things. -- GreenC 15:53, 26 May 2018 (UTC)

@GreenC: Originally the primary motivation was to collect data for Wikidata and Wikipedia. Currently most Alexa.com citations do not even have an archive link (and sometimes don’t even have a date), so the data is completely unverifiable unless Alexa for some reason releases archives of their old data. Websites ranked lower than 10,000 had usually been archived about once before I started archiving. However, I don’t really know what data should be archived (I don’t know how to make a list based on Wikipedia/Wikidata outlinks and haven’t asked anyone for such a list yet), and as such have just archived pages based on other, more easily manipulable lists of websites (such as some CSV file that I found in a web search for the top 1 million websites, which is apparently monthly Alexa data), and because it’s generally difficult and tedious to maintain I’ve just gotten a script to run the same list of about 75,000 archive links at the same time every day.
archive.is seems to only archive one page every two seconds at maximum, based on its RSS feed. Since the Internet Archive is evidently capable of a much higher throughput I would rather not overwhelm archive.is with lots of data which isn’t really all that important. I might ask the website owner to archive those three pages every day, though. Jc86035's alternate account (talk) 15:21, 27 May 2018 (UTC)
Anytime a new external link is added to Wikipedia, the Wayback Machine sees it and archives it. This is done automatically daily with a system created and run by Internet Archive. In addition archive.is has done active archiving of all links though I am not sure what the current ongoing status. Between these two most (98%) newly added links are getting archived. I don't know what an Alexa/com citation is, a Special:External links search only shows about 500 alexa.com URLs on enwiki. -- GreenC 04:14, 30 May 2018 (UTC)
@GreenC: How does the external link harvesting system work? Is the link archival performed only for mainspace, or for all pages? If an added external link has already been archived, is the link archived again? (A list could be created in user space every so often, although there would be a roughly ​136 chance of a given page's archival being done when the pages are being changed to use the next day's data, which would make the archived pages slightly less useful.)
There are lots of pages which currently do not have Alexa ranks but would benefit from having them added, mostly the lists of websites and the articles of the websites listed (as well as lists of other things which have websites, like newspapers). It would work as a proxy for popularity and importance. Jc86035's alternate account (talk) 08:11, 7 June 2018 (UTC)
@Jc86035: NoMore404. It gets links via the IRC system which I believe is for all spaces. Could test by adding a link to a talk page (not yet on Wayback) and check in 48hrs to see if it's on Wayback. Once a link is in the Wayback it automatically recrawls though how often hard to say. some pages multiple times a day, others once a year, etc.. not sure how they determine freq. -- GreenC 12:48, 7 June 2018 (UTC)
@GreenC: I've added links to Draft:Alexa Internet and User:Jc86035/sandbox, which should be enough for testing. Jc86035's alternate account (talk) 06:18, 8 June 2018 (UTC)
Both those URLs redirect to a page already existing in the Wayback not sure how nomo404 and wayback machine will respond. Redirects are a complication on Wayback. -- GreenC 15:42, 8 June 2018 (UTC)
@GreenC: None of the URLs have been archived. I think I'll probably stick to using the long list of URLs, although I might try putting them in the WMF cloud at some point. Jc86035 (talk) 16:19, 16 June 2018 (UTC)
Jc86035 The test URLs you used won't work, they are already archived on the Wayback. As I said above, "Both those URLs redirect to a page already existing in the Wayback". Need to use URLs that are not yet archived. -- GreenC 18:47, 16 June 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @GreenC: Okay. I've replaced those links with eight already-archived links and eight unarchived links; none of them are redirects. Jc86035 (talk) 06:39, 17 June 2018 (UTC)

Ok good. If not working 24-48hr I will contact IA. -- GreenC 14:07, 17 June 2018 (UTC)
Jc86035 - those spotify links previously exist on Wayback, archived in March. Need to find links not yet in the Wayback. -- GreenC 13:50, 19 June 2018 (UTC)
@GreenC: Eight of them (2017-12-14) were archived by me, and the other eight (2018-06-14) are too recent to have been archived by me. Jc86035 (talk) 14:41, 19 June 2018 (UTC)
@Jc86035: Clearly the links did not get archived. It still might be caused by a filter of userspace, so I added one of the links onto mainspace to see what happens. -- GreenC 01:56, 28 June 2018 (UTC)
I saved one manually to check for robots.txt or something blocking saves, but it looks OK. The one testing in mainspace: https://spotifycharts.com/regional/jp/daily/2018-06-14-- GreenC 02:03, 28 June 2018 (UTC)
@Jc86035: NoMo404 appears to be working. I added a Spotify link into mainspace here. The next/same day it showed up on Wayback. Looks like it's only tracking links added to mainspace, not Draft or User space. -- GreenC 23:25, 29 June 2018 (UTC)
@GreenC: Thanks. I guess it should work well enough for the list articles, then. Jc86035 (talk) 23:32, 29 June 2018 (UTC)

Em dashes[edit]

I find myself removing spaces around em dashes frequently. Per the MOS, "An em dash is always unspaced (that is, without a space on either side)".

Example of occurrence

Since this is such a black and white issue, a bot to automatically clean this up as it happens would be useful. Kees08 (Talk) 05:49, 31 May 2018 (UTC)

This is a context-sensitive editing task, since some spaced em dashes should be converted to en dashes, not to unspaced em dashes. Others, such as those in file names, should be left alone. – Jonesey95 (talk) 12:46, 31 May 2018 (UTC)
True there are cases it matters. Cases such as file names will require exceptions that can be written in the code. As for em dashes that should be en dashes, since and dashes can be spaced or unspaced, switching to spaced will not hury anything, unless there is a specific context I am missing. Kees08 (Talk) 03:40, 1 June 2018 (UTC)

Orphan tags[edit]

Hi, could you please give a bot an extra task of removing orphan tags from articles that have at least one incoming link from mainspace articles, lists and index pages but not disambig pages or redirects as per WP:Orphan. The category is Category:All orphaned articles but exclude Category:Orphaned articles from February 2009 as an admin is checking those. A rough estimate is there are at least 10,000 misplaced tags, thanks Atlantic306 (talk) 17:07, 2 June 2018 (UTC)

JL-Bot already removes the orphan tag, but based on the original discussion it requires 5 or more links (ignoring type). This was done as checking the type of link is not always straightforward and adds processing time. The 5 links was a community agreed compromise. The only exception is dab pages which should never be tagged as orphans (it will de-tag those regardless of number of links). That task runs ever week or two. If someone wants to build a fancier checking, let me know and I will discontinue mine. -- JLaTondre (talk) 22:44, 2 June 2018 (UTC)
This botreq started on my talk page, I suggested posting here first, glad as I didn't know about JL-Bot. I wouldn't know how to improve on JL-Bot other than by using API:Backlinks but it's a wash in terms of functionality. BTW I wrote a command-line utility wikiget (github) that can be hooked through a system call eg. "wikiget -b Ocean -t t" will output all transcluded backlinks for Ocean. It handles all the paging and various API:Backlink options. -- GreenC 23:15, 2 June 2018 (UTC)
Atlantic306, how is this different from the request you made a month ago at Wikipedia:AutoWikiBrowser/Tasks#AWB Request 2, which was decidedly a non-starter? Pinging the other contributors from that discussion, Premeditated Chaos & Sadads. If a large # of orphans have already been manually checked and all that remains for that group is the busywork of removing the tag, then that might be ok if others agree, but we need to see a link to such a discussion.
JLaTondre, do you have a link to the 5+ link discussion?   ~ Tom.Reding (talkdgaf)  12:03, 4 June 2018 (UTC)
Hi, this is different to the AWB proposal as that was for the early category of 9000 articles whereas this proposal leaves that category out as it is being manually checked and referrs to all of the remaining orphan categories. As above, a bot is already removing tags but I think this needs to be set at one valid link as per WP:Orphan as the JPL bot approval was back in 2008 and now consensus has changed that one valid link is sufficient for the tag removal, thanks Atlantic306 (talk) 12:11, 4 June 2018 (UTC)
Discussion is shown here and here. GreenC (talk · contribs) has said his bot can differentiate the links so perhaps his bot could take over the task, thanks Atlantic306 (talk) 12:25, 4 June 2018 (UTC)
WP:ORPHAN says "Although a single, relevant incoming link is sufficient to remove the tag, three or more is ideal...", I would object to a bot removing orphan tags on articles with fewer than 3 links on this basis alone. Headbomb {t · c · p · b} 12:25, 4 June 2018 (UTC)
The conclusions reached at WP:AWB/Tasks#AWB Request 2 apply to all most orphans, from 2009 up until some arbitrary time in the near-past.   ~ Tom.Reding (talkdgaf)  12:30, 4 June 2018 (UTC)
(edit conflict) I still think automated removal of orphan tags in general is a bad idea. To me, going through the orphan categories isn't just about making sure something else points there. Orphan-tagged articles often suffer other issues, so the tag is kind of a heads-up that the article needs to be looked at. It's like a sneeze. It could be nothing, but it could mean you have allergies, or a cold.
Same thing with an orphan-tagged article. It could be a great but under-loved topic. But maybe it's a duplicate article or sub-topic and can be merge/redirected. Maybe it's a copyvio that flew under the radar. Maybe it's not actually notable and should be deleted. Maybe the title is wrong and it's orphaned because all the links point to the right (redlinked) title. Maybe the incoming links are incorrect and are trying to point to something else, and need to be changed.
If you just strip the tags without checking the article, you're getting rid of the symptom without checking to see if there's an underlying illness, which essentially reduces the value of the tag in the first place. ♠PMC(talk) 12:55, 4 June 2018 (UTC)
Echoing this from PMC. We don't suffer from having a neverending backlog, and the current bots (per discussion above) and AWB minor fixes already remove templates from pages that are already in the clear. I would much rather that we take the time to go through and find merges or deletes, get these pages added to WikiProjects, and generally do other minor cleanup that happens when human eyes are on the pages. Anything that has lived with few or no links for 9+ years, suggests to me that it hasn't been integrated into the Wiki adequately. If we just remove the tag, we remove the likilihood of it's discovery again. Sadads (talk) 14:35, 4 June 2018 (UTC) 

Someone to take over User:HasteurBot[edit]

Hasteur (talk · contribs) has retired, it would be good if someone could take over the bot, that would be nice

The code can be found at is at https://github.com/hasteur/g13bot_tools_new, with hasteur stipulating "All I ask is that the credit for the work remains."

@Firefly: Hasteur posted this on your talk page, any interest in taking over? Headbomb {t · c · p · b} 10:40, 4 June 2018 (UTC)

@Headbomb: Yep, I'm happy to do this. Will look at it and submit a BRFA tonight (hopefully!) ƒirefly ( t · c · who? ) 13:27, 4 June 2018 (UTC)

Peer review - periodically contacting mailing list with unanswered reviews[edit]

Hi all, could a bot editor help out at peer review by creating a bot that periodically contacts editors on our volunteer list with a list of unanswered reviews? Some details

  • Discussion is here: WT:PR - the problem we are trying to answer is that of a large number of outstanding reviews that haven't been answerd
  • List of unanswered reviews is here: WP:PRWAITING
  • List of volunteers is here: WP:PRV
  • We will remove inactive volunteers, and I will reformat the list in a bot readable format similar to this: {{User:Tom (LT)/sandbox/PRV|Tom (LT)|anatomy and medicine|contact=never}}
    • Editors will opt in to the system - all will be set to default to never contact
    • Options for contact will be never, monthly, quarterly, halfyearly, and yearly (unless you can think of a more clever way to do this)

Looking forward to hearing from you soon, --Tom (LT) (talk) 23:12, 4 June 2018 (UTC)

Addit: ping also to Anomie who very kindly helped create the AnomieBot that now runs PR.--Tom (LT) (talk) 23:12, 4 June 2018 (UTC)
In the meantime, I'll mention that WP:AALERTS will report peer reviews requests to Wikiprojects, if the articles are tagged by banners. Headbomb {t · c · p · b} 11:29, 5 June 2018 (UTC)
Bump. --Tom (LT) (talk) 00:51, 25 June 2018 (UTC)
Bump. --Tom (LT) (talk) 03:24, 10 July 2018 (UTC)

Popular pages - indexing and WikiProject banners[edit]

Could someone help with doing the following to the pages in Category:Lists of popular pages by WikiProject?:

  • add the name of the WikiProject as a sort key to Category:Lists of popular pages by WikiProject
  • add the corresponding WikiProject category, with sort key "Popular Pages"
  • create a talk page (if it doesn't exist) and add the corresponding WikiProject banner

Oornery (talk) 05:14, 6 June 2018 (UTC)

Friendly Search Suggestions[edit]

Hi, suggesting that the Template:Friendly search suggestions be added by bot to every stub article talk page to aid the improvement of the articles, thanks Atlantic306 (talk) 20:53, 6 June 2018 (UTC)

That seems like an incredible waste of time and effort, as well as the patience of the community. Is there a consensus that this should be done? Primefac (talk) 12:42, 7 June 2018 (UTC)
  • Disagree, its completely uncontroversial and helpful to the community as the template has a large number of search options to improve stub articles and surely thats a very good use of time and effort to improve the Encyclopedia, for something so minor is consensus really needed? thanks Atlantic306 (talk) 20:30, 7 June 2018 (UTC)
Every stub article talk page - you're talking hundreds of thousands if not millions of stubs (Just checked the cat, which is at 2+ million). InternetArchiveBot and Cyberpower got harassed simply for placing (in my opinion completely relevant) talk page messages on a fraction of that. So yes, I do think you need consensus. Primefac (talk) 22:08, 7 June 2018 (UTC)
I would oppose that with tooth and nail. Absolutely not suitable for a bot task. Headbomb {t · c · p · b} 03:03, 8 June 2018 (UTC)
    • Well, its certainly too much for a human editor - if it was limited to 300 articles a day it would not cause much disruption, Atlantic306 (talk) 19:17, 14 June 2018 (UTC)
    • Will start an RFC when I have more time, thanks Atlantic306 (talk) 20:38, 16 June 2018 (UTC)

Sort Pages Needing Attention by Popularity/daily views[edit]

I suggest, for example, that someone sort the items on this page Category:Wikipedia_requested_photographs by page popularity, similar to how this page is sorted: Wikipedia:WikiProject_Computer_science/Popular_pages Instead of clicking through random obscure pages, a sorted table would allow people to prioritize pages that need attention the most. The example bot is found here User:Community_Tech_bot. Turbo pencil (talk) 00:57, 8 June 2018 (UTC)

@Turbo pencil: Try Massviews. --Izno (talk) 02:23, 8 June 2018 (UTC)
@Izno: Thanks a lot Izno. Super helpful! — Preceding unsigned comment added by Turbo pencil (talkcontribs) 04:12, 8 June 2018 (UTC)

Replace links to AP news hosted by Google with AP website links[edit]

Can anyone create a bot to replace links matching the regex https://www.google.com/hostednews/ap/.*\?docID=([0-9a-f]{32}) with https://apnews.com/$1. There are about 2800 links to AP news hosted by Google and all the links are dead. I estimate about 20–30% of these links have the docId tag and can be rewritten to link to AP's website. This doesn't always work, but it works often enough to make this worth the effort. You'll need to download the page first and check for absence of the string "The page you’re looking for doesn’t exist. Try searching for a topic." and the presence of a non-empty div of articleBody class. You'll also have to flip the deadurl tag to no after replacement and avoid references that have already been archived. Some examples:

Gazoth (talk) 13:09, 8 June 2018 (UTC)

Bot to change redirects to 'Redirect'-class on Talk pages?[edit]

As per edits like this one I just did, is it possible to have a bot go around and check all the extant Talk pages of redirect pages, and confirm/change all of their WP banner assessment classes to 'Redirect'-class?... Seems like this should be an easy and doable task(?)... FWIW. TIA. --IJBall (contribstalk) 15:41, 10 June 2018 (UTC)

Just ran a petscan on Category:All redirect categories, which I assume includes all redirects, but which also contains 2.9 million pages. Granted, 1.4mil of these are in Category:Unprintworthy redirects (which likely do not have talk pages of their own), and there are probably another million or so with similar no-talk-page status, but that's still a metric buttload of pages to check. Not saying it can't be done (haven't really even considered that yet), just thought I'd give an idea of the scale of this operation. Primefac (talk) 16:25, 10 June 2018 (UTC)
No one says it needs to be done "fast"!... Face-wink.svg Maybe a bot can check just a certain number of redirect pages per day on this, so it doesn't overwhelm any resources. --IJBall (contribstalk) 16:59, 10 June 2018 (UTC)
I believe the banners automatically set the class to redirect when the page is a redirect, sp that would fall under WP:COSMETICBOT. Not sure what happens if |class=C is set on a redirect, but that should be easy to test. If |class=C overrides redirect detection, that would be suitable for a task. Headbomb {t · c · p · b} 03:39, 11 June 2018 (UTC)
I'm telling you, I just had to do this twice, earlier today – two pages that had been converted to redirects years ago still had still had |class=B on their Talk pages. It's possible that this only affects pages that were converted to redirects years ago, but it looks there is a population of them that need to be updated to |class=Redirect. --IJBall (contribstalk) 03:47, 11 June 2018 (UTC)
Setting |class=something overrides the automatic redirect class. This should be handled by EnterpriseyBot. — JJMC89(T·C) 05:11, 11 June 2018 (UTC)
Yup, as JJMC89 mentioned, this is EnterpriseyBot task 10. It hasn't run for a while, because I let a couple of breaking API changes pass by without updating the code. I'm going to fix the code so it can run again. Enterprisey (talk!) 05:45, 11 June 2018 (UTC)
If such a task is done, it's best to either remove the classification and leave the empty parameter |class=, or remove the parameter entirely. As Headbomb and JJMC89 have noted, redirect-class is autodetected when no class is explicitly set. This is true with all WikiProject banners built around {{WPBannerMeta}} (but see note), so setting an explicit |class=redir just means that somebody has to amend it a second time if the page ceases to be a redirect.
Note: there are at least four that are not built around {{WPBannerMeta}}, and of the four that I am aware of, only {{WikiProject U.S. Roads}} autodetects redir class; the other three ({{WikiProject Anime and manga}}; {{Maths rating}}; and {{WikiProject Military history}}) do not autodetect, so for these it must be set explicitly; moreover, those three only recognise the full form |class=redirect, they don't recognise the shorter |class=redir that many others handle without problem. --Redrose64 🌹 (talk) 07:54, 11 June 2018 (UTC)
Yes, I skip anime articles explicitly, and the bot won't touch the other two troublesome templates due to the regular expressions it uses.
A bigger problem concerns the example diff that started this thread. It's from an article in the unprintworthy redirects category. I thought the bot should have gotten to that category already, so I just went into to inspect the logs. Unbelievably, after munching through all of the redirect categories, it has finally gotten stuck on exactly that category (unprintworthy redirects). Apparently Pywikibot really hates one of the titles in it. I'm trying to figure out which title precisely, so I can file a bug report, but for now the bot task is on hold.
However, all of the other redirect categories that alphabetically come before it should only contain articles that the bot checked already. Enterprisey (talk!) 20:11, 18 June 2018 (UTC)
It was actually a bug in Pywikibot, so the bot's held up until a patch is written for that. Enterprisey (talk!) 18:56, 26 June 2018 (UTC)

[r] → [ɾ] in IPA for Spanish[edit]

A consensus was reached at Help talk:IPA/Spanish#About R to change all instances of r that either occur at the end of a word or precede a consonant (i.e. any symbol except a, e, i, o, or u) to ɾ inside the first parameter of {{IPA-es}}. There currently appear to be about 1,190 articles in need of this change. Could someone help with this task with a bot? Nardog (talk) 19:24, 12 June 2018 (UTC)

Indexing talk page[edit]

User:Legobot has stopped indexing talk pages and archives and User:HBC Archive Indexerbot is deactivated. I would like a replacement for that task. --Tyw7  (🗣️ Talk to me • ✍️ Contributions) 20:06, 12 June 2018 (UTC)

Any these work? Category:Wikipedia_archive_bots -- GreenC 14:31, 17 June 2018 (UTC)
Most of them are archive bots. Looking for index bots to tame over Legobot which has developed a big and doesn't index all talk pages. --Tyw7  (🗣️ Talk to me • ✍️ Contributions) 17:52, 17 June 2018 (UTC)
User:Legobot has 33 tasks not sure which one (Task #15?). Did Legobot say why they stopped or was it abandoned without a word? -- GreenC 20:41, 18 June 2018 (UTC)
it's working on random page and many people had reported it but it hadn't been fixed. See the discussion at User talk:Legobot --Tyw7  (🗣️ Talk to me • ✍️ Contributions) 21:05, 18 June 2018 (UTC)
Looks like Legobot is on vacation until July 7. They should either fix the bugs (if serious) or give permission for someone else to take it over, should anyone wish. -- GreenC 21:17, 18 June 2018 (UTC)
Legobot (talk · contribs) is not on vacation, it is still running (there would be chaos on several fronts if it had stopped completely). It is Legoktm (talk · contribs) that is on vacation, and if you have been following both User talk:Legobot and User talk:Legoktm, you'll know that Legoktm has not been responding to questions concerning Legobot (other than one or two specifics on this page such as #Take over GAN functions from Legobot above) for well over two years. --Redrose64 🌹 (talk) 17:41, 19 June 2018 (UTC)

New York Times archives moved[edit]

Diff

The new URL can be obtained by following Location: redirects in the headers (in this case two-deep). I bring it up because of the importance of NYT to Wikipedia, uncertainty how long redirects last and the new URL is more informative including the date. -- GreenC 21:46, 13 June 2018 (UTC)

Comment Looks like there are 29,707 links with "query.nytimes.com" Ronhjones  (Talk) 15:25, 9 July 2018 (UTC)

BRFA filed -- GreenC 15:44, 20 July 2018 (UTC)

Move WikiProject Articles for creation to below other WikiProject templates[edit]

In Special:Diff/845715301, PRehse moved WikiProject Articles for creation to the bottom and updated the class for WikiProject Video games from "Stub" to "Start". Then, in Special:Diff/845730267, I updated the class for WikiProject Articles for creation, and moved WikiProject Articles for creation back to the top. But then, in Special:Diff/845730984, PRehse decided to move WikiProject Articles for creation to the bottom again. For consistency, we should have a bot move all {{WikiProject Articles for creation}} templates on talk pages to below other WikiProject templates. If the WikiProject templates are within {{WikiProject banner shell}}, then {{WikiProject Articles for creation}} will stay within the shell along with other WikiProject templates. GeoffreyT2000 (talk) 16:44, 14 June 2018 (UTC)

Needs wider discussion That sounds like a lot of bot edits for questionable benefit. Seek approval at one of the village pumps. Anomie 17:35, 14 June 2018 (UTC)

This change should be fine per Wikipedia:Talk page layout. -- Magioladitis (talk) 18:24, 14 June 2018 (UTC)

If anything, this could be bundled in AWB, assuming it has consensus, so that AWB bots make the change when they do other task. However, this very likely wouldn't get consensus to be done on its own. Headbomb {t · c · p · b} 20:08, 14 June 2018 (UTC)
There are no bots doing tasks in this direction. Unless, we decide that wikiproject tagging bots should also be doing this. Only Yobot ued to do this but right now there is no guideline to ask bot owners to perform this action. So we have two ways: Form a strategy or approve a sole task for this. I would certainly support the task to be done if ther was a discussion held somewhere about this task of similar tasks. -- Magioladitis (talk) 22:54, 14 June 2018 (UTC)
  • I concur this needs wider discussion. Why does the order of the WikiProject banners matter? Primefac (talk) 02:19, 15 June 2018 (UTC)
For instance, we have a lose rule that WikiProjct Biography "comes before any other WikiProject banners". At the moment, I do not see why WikiProject Articles for creation should be at the bottom of all Projects but there is a place to discuss this. If this get support we should then create bots to do it. It's about 60,000 talk pages with this template. -- Magioladitis (talk) 07:31, 15 June 2018 (UTC)
Dedicated WP:TPL bots never had support as far as I recall. Maybe there was one shoving banners into the metabanner after a certain threshold, but that'd be the only one if it ever was a thing. I don't see what'd be different here. Headbomb {t · c · p · b} 13:05, 15 June 2018 (UTC)
There was a bot that was adding WPBS and was doing that task and Yobot was doing it as part of WikiProject tagging. My main questions are: a) whether we have a guarantee that current BAG will continue to accept this as a secondary task and b) is there a need to actully do it as a sole task? WP:TPL bots did not have much luck in the past due to not conrecte rules (which now we have; I decicated a lot of time in this direction) and not built in AWB tools (which now we have since at some point I did some thousands of edits to rename templates to standard names). -- Magioladitis (talk) 14:12, 15 June 2018 (UTC)
BAG cannot guarantee that any specific thing will be accepted by the community. If a task is proposed and there is consensus for it (or at least a lack of objections after a call for comments/trial), it'll be approved. If there is no consensus for the task to be done, it won't be approved. Headbomb {t · c · p · b} 20:19, 18 June 2018 (UTC)

Potentially untagged misspellings report[edit]

Hi! Potentially untagged misspellings (configuration) is a newish database report that lists potentially untagged misspellings. For example, Angolan War of Independance is currently not tagged with {{R from misspelling}} and it should be.

Any and all help evaluating and tagging these potential misspellings is welcome. Once these redirects are appropriately identified and categorized, other database reports such as Linked misspellings (configuration) can then highlight instances where we are currently linking to these misspellings, so that the misspellings can be fixed.

This report has some false positives and the list of misspelling pairs needs a lot of expansion. If you have additional pairs that we should be scanning for or you have other feedback about this report, that is also welcome. --MZMcBride (talk) 02:58, 15 June 2018 (UTC)

Oh boy. Working with proper names variations are often 'correct', usage is context dependent so a bot shouldn't decide. My only suggestion is to skip words that are capitalized. For the rest use something like approximate (fuzzy) matching to identify paired words that are only slightly different due to spelling (experiment with the agrep threshold without creating too many false positives), then use a dictionary to determine if one of the paired words is a real word and the other not. At that point there might a good case for it being a misspelling and not an alternative name. This is one of those problems computers are not good at and is messy. Unless there is an AI solution. -- GreenC 14:23, 17 June 2018 (UTC)
Spelling APIs, some AI-based. -- GreenC 01:44, 28 June 2018 (UTC)

Misplaced brace[edit]

In this diff, I replaced } with | As a result, the article went from [[War Memorial Building (Baltimore, Maryland)}War Memorial Building]] to [[War Memorial Building (Baltimore, Maryland)|War Memorial Building]], and the appearance went from [[War Memorial Building (Baltimore, Maryland)}War Memorial Building]] to War Memorial Building. Is a maintenance bot already doing this kind of repair, and if not, could it be added to an existing bot? Nyttend (talk) 13:39, 21 June 2018 (UTC)

Probably best we get an idea of how common this is before we look at doing any kind of mass-repair (be it a bot or adding it to AWB genfixes). Could someone look through a dump for this sort of thing? I'm happy to do it if nobody else gets there first. ƒirefly ( t · c · who? ) 20:02, 21 June 2018 (UTC)
This has some false positives but also some good results. Less than 50. -- GreenC 15:56, 24 June 2018 (UTC)

Tag cleanup of non-free images[edit]

I noticed a couple of non-free images here on the enwiki to which a {{SVG}} tag has been added, alongside with a {{bad JPEG}}. Non-free images such as logos must be kept in a low resolution and size to comply with the Fair Use guidelines. Creation of a high-quality SVG equivalent, or the upload of a better quality PNG for these files should not be encouraged. For this reason I ask a bot to run through the All non-free media category (maybe starting from All non-free logos) and remove the following tags when they are found:

{{Bad GIF}}, {{Bad JPEG}}, {{Bad SVG}}, {{Should be PNG}}, {{Should be SVG}}, {{Cleanup image}}, {{Cleanup-SVG}}, {{Image-Poor-Quality}}, {{Too small}}, {{Overcompressed JPEG}}

Two examples: File:GuadLogo1.png and File:4th Corps of the Republic of Bosnia and Herzegovina patch.jpg. Thanks, —capmo (talk) 18:16, 28 June 2018 (UTC)

This seems like a bad idea to me. There's no reason a jpeg version of a non-photographic logo shouldn't be replaced with a PNG lacking artifacts, or a 10x10 logo be replaced with something slightly larger should there be a use case. As for SVGs of non-free logos, that has long been a contentious issue in general. Anomie 12:37, 29 June 2018 (UTC)
I agree, particularly relating to over-compressed JPEGs. We can have small (as in dimensions) images without them being occluded by compression artefacts. ƒirefly ( t · c · who? ) 13:40, 1 July 2018 (UTC)

Redirects of OMICS journals[edit]

Here's one for Tokenzero (talk · contribs)

OMICS Publishing Group is an insidious predatory open access publisher, which often deceptively names it journals (e.g. the junk Clinical Infectious Diseases: Open Access vs the legit Clinical Infectious Diseases). To help catch citations to its predatory journals with WP:JCW/TAR and Special:WhatLinksHere, redirects should be created. I have extracted the list of OMICS journals from its website, which I've put at User:Headbomb/OMICS. What should be done is take every of those entries and:

  • If Foobar doesn't exist, create it with
#REDIRECT[[OMICS Publishing Group]]
[[Category:OMICS Publishing Group academic journals]]
{{Confused|text=[[Foobar: Open Access]], published by the OMICS Publishing Group}}

There likely will be some misfires, but I can easily clean them up afterwards. Headbomb {t · c · p · b} 04:38, 29 June 2018 (UTC)

  • Would it be possible to tag the talk pages of these redirects with {{WPJournals}}? --Randykitty (talk) 07:46, 29 June 2018 (UTC)
It's not really needed, but that could be done, sure. Headbomb {t · c · p · b} 14:11, 29 June 2018 (UTC)
  • Would it be over-linking to make the "OMICS Publishing Group" in the hatnote into a wiki-link? XOR'easter (talk) 15:30, 29 June 2018 (UTC)
    • I'd be OK with that, personally. Foobar will point to OMICS Publishing Group so that wouldn't be super useful. But if Foobar is ever created, then the links would point to different places, and that might be useful. Headbomb {t · c · p · b} 15:38, 29 June 2018 (UTC)

OK, Coding... Tokenzero (talk) 13:34, 30 June 2018 (UTC)

BRFA filed. See also pastebin log of simulated run. Two questions: should the talk page with {{WPJournals}} be created for each redirect, or only the main one (without and/& or abbreviated variants)? Should the redirects be given any rcats? Tokenzero (talk) 19:28, 21 July 2018 (UTC)
I don't know that any rcats need to be added. I can't think of any worth adding (beyond {{R from ISO 4}}). As for redirects, @Randykitty:'s the one that asked for them, so maybe he can eludicate (probably all). I don't really see the point in tagging those redirects myself, but it doesn't do any harm to tag them either. Headbomb {t · c · p · b} 21:08, 21 July 2018 (UTC)

Fix station layout tables[edit]

In the short term, could someone code AWB/Pywikibot/something to replace variations of <span style="color:white">→</span> (including variations with font tag and &rarr;) with {{0|→}}? This is for station layout tables like the one at Dyckman Street (IND Eighth Avenue Line). Colouring the arrow white is not really ideal when the intention is to add padding the size of an arrow. I'm not sure why there are still so many of them around.

In the long term, it would be nice to clean up these station layout tables further, perhaps even by way of automatically converting them to a Lua-based {{Routemap}}-style template (which does not exist yet, unfortunately). Most Paris Metro stations' articles have malformed tables, for instance, and probably a majority of stations have deprecated or incorrect formatting somewhere. Jc86035 (talk) 04:03, 2 July 2018 (UTC)

@Jc86035: How do we find the articles - are they categorised? or is it a "brute force" search and hope we find them all? ---
insource:/span style.*color:white.*→/ Gives 1818 Articles
insource: &rarr; Gives 1039 Articles.
Ronhjones  (Talk) 00:30, 4 July 2018 (UTC)
In the long run, I'd be happy to start sandboxing/discussing a way to make a station layout template/suite, and a bot run should be possible if it's shown that {{0}} is a better way to code it. Happy to do the legwork on the AWB side of things.
I do have a question, though: just perusing through pages, I found Ramakrishna Ashram Marg metro station , which has the "hidden" → but I'm not really sure why it's there - is it supposed to be padded like that? Also, for later-searching, a search of the code Jc86035 posted (with and without ") gives 1396 pages. I know it misses out on text that isn't exactly the same, but it's probably a good estimate of the number of pages using → as described. Primefac (talk) 13:54, 4 July 2018 (UTC)
It may well be a simple run with AWB. I will play devil's advocate and ask is there any MoS for s station page? It seems to me that the pages involved will be all US based - there seems to be a big discrepancy to layouts of equivalent sized stations in my side of the pond - where most the stations seem to have plumped for using succession boxes. I'm not saying either is correct, just should there be a full discussion at, say, Wikipedia:WikiProject_Trains with regards to having a consistent theme for all? Maybe then it might be worth construction a template system, like the route maps.Ronhjones  (Talk) 15:43, 4 July 2018 (UTC)
The station layout tables (such as that top left of Dyckman Street (IND Eighth Avenue Line)#Station layout) should not be confused with routeboxes (which are always a form of succession box). --Redrose64 🌹 (talk) 20:26, 4 July 2018 (UTC)
I was asking if there should not be a general station MoS, and maybe a template system for the station layout. The US subway stations all seem to have a layout table and no routeboxes, whereas, say, the London ones all have a routebox and no layout table. Maybe they need both. Also just using a plain wikitable tends to result in non-consistent layoutsRonhjones  (Talk) 16:53, 5 July 2018 (UTC)
@Ronhjones: In general I think it's a mess. Some station articles use the style of layout table used in most articles for stations in the US, some use the Japanese style with {{ja-rail-line}}, some use standard BSicon diagrams, some use Unicode diagrams, some probably use other styles, and some (like Taipei station) use a combination. I found an old discussion about some layout templates for the American style in Epicgenius's user space, but no one's written a Scribunto module with them (or used them in articles) yet. Jc86035 (talk) 17:48, 5 July 2018 (UTC)
Regarding a general MoS for stations: there certainly is inconsistency between countries, but within a country there is often consistency.
For articles on UK stations, we agreed a few years ago that layout diagrams were not to be encouraged. Reasons for this include: (i) in most cases, there are are two platforms and the whole plan may be replaced with text like "The main entrance is on platform 1, which is for services to X; there is a footbridge to platform 2 which is for services to Y."; (ii) for all 2,500+ National Rail stations, their website provides a layout diagram which is more detailed than anything that we can do with templates (examples of a simple one-platform station; a major terminus); (iii) trying to draw a layout plan for London Underground stations is fraught with danger since a significant number have their platforms at different levels or angles, even crossing over one another in some cases. --Redrose64 🌹 (talk) 18:23, 5 July 2018 (UTC)
I know, I was born in Barking... District East on Pt.2 (train doors open both sides), Hammersmith & City line ends Pt.3 (down the end Pt.2), District West on Pt.6... :-) Ronhjones  (Talk) 22:24, 5 July 2018 (UTC)
Since I was pinged here in Jc86035's comment, I suppose I'll put my two cents. I experimented with a modular set of station layout templates a couple of years ago. (See all the pages in Special:PrefixIndex/User:Epicgenius/sandbox that start with "User:Epicgenius/sandbox/Station [...]".) This itself was based off of {{TransLink (BC) station layout}}, which is used for SkyTrain (Vancouver) stations. Template:TransLink (BC) station layout is itself a modular template, and several instances of this template can be used to construct a SkyTrain station layout. epicgenius (talk) 23:33, 5 July 2018 (UTC)
@Primefac: Sorry I didn't reply about this earlier. The use of the "hidden" arrow on Ramakrishna Ashram Marg metro station is actually completely wrong, since it is in the wrong table cell, and the visible right arrow also being in the wrong place makes it pointless. A more correct example is in Kwai Hing station. Jc86035 (talk) 09:14, 6 July 2018 (UTC)
Ah, I see. I guess that would be my main concern, though I suppose the GIGO argument could be made... but then again I like to try and minimize the number of false positives that later have to be re-edited by someone else. Primefac (talk) 17:41, 6 July 2018 (UTC)

Bot to deliver Template:Ds/alert[edit]

Headbomb {t · c · p · b} 21:13, 2 July 2018 (UTC)

Update Ontario Restructuring Map URLs in Citations[edit]

Request replacing existing instances of the URLs for Ontario Restructuring Maps in citations. While the old URLs work, the new maps employ a new URL nomenclature system at the Ministry of Municipal Affairs and Housing (Ontario) and have corrected format errors that make the new versions easier to read. The URLs should be replaced as follows:

Map # Old URL New URL
Map 1 http://www.mah.gov.on.ca/Asset1605.aspx http://www.mah.gov.on.ca/AssetFactory.aspx?did=6572
Map 2 http://www.mah.gov.on.ca/Asset1612.aspx http://www.mah.gov.on.ca/AssetFactory.aspx?did=6573
Map 3 http://www.mah.gov.on.ca/Asset1608.aspx http://www.mah.gov.on.ca/AssetFactory.aspx?did=6574
Map 4 http://www.mah.gov.on.ca/Asset1606.aspx http://www.mah.gov.on.ca/AssetFactory.aspx?did=6575
Map 5 http://www.mah.gov.on.ca/Asset1607.aspx http://www.mah.gov.on.ca/AssetFactory.aspx?did=6576
Map 6 http://www.mah.gov.on.ca/Asset1611.aspx http://www.mah.gov.on.ca/AssetFactory.aspx?did=6577

Thanks. --papageno (talk) 23:41, 5 July 2018 (UTC)

Qui1che, are these six URLs the sum total of the links that need to be changed? Primefac (talk) 23:43, 5 July 2018 (UTC)
That is correct. There are only six maps in the series. --papageno (talk) 01:29, 6 July 2018 (UTC)

Y Done GreenC 16:38, 20 July 2018 (UTC)

Ancient Greece[edit]

Not exactly bot work, but you bot operators tend to be good with database dumps. Can someone give me a full list of categories that have the string Ancient Greece in their titles and don't begin with that string, and then separate the list by whether it's "ancient" or "Ancient"? I'm preparing to nominate one batch or another for renaming (I have a CFD going, trying to establish consensus on which is better), and if you could give me a full list I'd know which ones I need to nominate (and which false positives to remove) if we get consensus.

If I get to the point of nominating them, does someone have a bot that will tag a list of pages upon request? I'll nominate most of the categories on one half of the list you give me, and there are a good number, so manually tagging would take a good deal of work. Nyttend (talk) 00:01, 6 July 2018 (UTC)

quarry:query/28035
Category Caps Notes
Category:Ambassadors of ancient Greece Lower Redirect to Category:Ambassadors in Greek Antiquity
Category:Articles about multiple people in ancient Greece Lower
Category:Artists' models of ancient Greece Lower
Category:Arts in ancient Greece Lower
Category:Athletics in ancient Greece Lower CfR: Category:Sport in ancient Greece
Category:Battles involving ancient Greece Lower
Category:Cities in ancient Greece Lower
Category:Coins of ancient Greece Lower
Category:Economy of ancient Greece Lower
Category:Education in ancient Greece Lower
Category:Eros in ancient Greece Lower Redirect to Category:Sexuality in ancient Greece
Category:Films set in ancient Greece Lower
Category:Glassmaking in ancient Greece and Rome Lower Redirect to Category:Glassmaking in classical antiquity
Category:Gymnasiums (ancient Greece) Lower
Category:Historians of ancient Greece Lower
Category:History books about ancient Greece Lower CfR: Category:History books about Ancient Greece
Category:Military ranks of ancient Greece Lower
Category:Military units and formations of ancient Greece Lower
Category:Naval battles of ancient Greece Lower
Category:Naval history of ancient Greece Lower
Category:Novels set in ancient Greece Lower
Category:Operas set in ancient Greece Lower
Category:Pederasty in ancient Greece Lower
Category:Plays set in ancient Greece Lower
Category:Political philosophy in ancient Greece Lower
Category:Portraits of ancient Greece and Rome Lower
Category:Prostitution in ancient Greece Lower
Category:Set indices on ancient Greece Lower
Category:Sexuality in ancient Greece Lower
Category:Ships of ancient Greece Lower
Category:Slavery in ancient Greece Lower
Category:Social classes in ancient Greece Lower
Category:Television series set in ancient Greece Lower
Category:Wars involving ancient Greece Lower
Category:Wikipedians interested in ancient Greece Lower
Category:Comics set in Ancient Greece Upper
Category:Festivals in Ancient Greece Upper
Category:Military history of Ancient Greece Upper
Category:Military units and formations of Ancient Greece Upper Redirect to Category:Military units and formations of ancient Greece
Category:Museums of Ancient Greece Upper
Category:Populated places in Ancient Greece Upper
Category:Transport in Ancient Greece Upper
Category:Works about Ancient Greece Upper CfR: Category:Works about ancient Greece
@Nyttend: See list above. If you tag one, I can tag the rest. — JJMC89(T·C) 04:53, 6 July 2018 (UTC)
You might also want to search for "[Aa]ncient Greek", as is Category:Scholars of ancient Greek history and Category:Ancient Greek historians. – Jonesey95 (talk) 05:06, 6 July 2018 (UTC)

Suspicious User Watcher[edit]

Watches suspicious users because they might wreak havoc on the wiki. Bot reports back to the operator(s) so they know what the user is doing, just in case the user is committing vandalism, or anything else. Bot finds suspicious users by seeing if they vandalized (or as I mentioned before, anything else) past the 2nd warning. Manual bot. — Preceding unsigned comment added by SandSsandwich (talkcontribs) 08:19, 9 July 2018 (UTC)

Idea is not well explained. I have a funny feeling this is a joke request anyway, but whatever. Primefac (talk) 12:02, 9 July 2018 (UTC)
@SandSsandwich: that would be a lot of work. But to begin with, how should the bot decide/recognise which users are suspicious? —usernamekiran(talk) 13:01, 11 July 2018 (UTC)
I mean, technically speaking, we already have an anti-vandal bot. Primefac (talk) 17:13, 11 July 2018 (UTC)
Do you mean cluebot or actual anti-vandal bot? Who we had to retire, cuz he was getting extraordinarily intelligent (special:diff/83573345). I mean, he had unlimited access to entire wikipedia afterall. Anyways, this idea is not much feasible: a bot generating list of users (obviously after observing contribution history of every non/newly auto confirmed users. Then posting this list somewhere, other humans examining these users. Too much work for nothing. Too many resources would be wasted. Current huggle/cluebot/RCP/watchlist/AIV pattern is better than this. —usernamekiran(talk) 01:25, 12 July 2018 (UTC)
We already have multiple tools that are used to give increasing attention to editors after 1, 2, 3 or 4 warnings. I'm not sure whether an additional process is required, or why 2 warnings is such a significant threshold. ϢereSpielChequers 09:30, 16 July 2018 (UTC)

Association footballers not categorized by position[edit]

Would it be possible to fill Category:Association footballers not categorized by position with the intersection of:

  • AND all players not in the following 15 football position categories:

Some members of WP:FOOTY have been working on adding missing positions, this would be much appreciated in order see all players which are missing a position category. Thanks, S.A. Julio (talk) 04:36, 14 July 2018 (UTC)

@S.A. Julio: Is this task expected to be a "one-off" run, or do you see it being a regular task? A one-off could possibly be several runs on AWB. The number of players involved could also be be quite big - do you know the total in the 9 categories? - I got over 9000 for the first one. Ronhjones  (Talk) 15:57, 14 July 2018 (UTC)
@Ronhjones: I think probably a one-off. I can periodically check if any new articles are missing positions in the future, but currently there are far too many which would need to be added to Category:Association footballers not categorized by position. I've currently counted ~160,000 articles (some of which are likely not players, however), with two categories still running (though most of these are likely duplicates of what I've already counted). There are just over 113,000 players already categorised by position. S.A. Julio (talk) 16:58, 14 July 2018 (UTC)
Coding... @S.A. Julio: That's quite a few pages - too many for a semi-automated run(s). I think I'll skip the AWB option. I'll probably do it, so it can be re-run, say quarterly. I think I'll do it in stages - make a local list of players, and then process that file one line at a time. I assume if I combine all those categories, then we will end up with duplicates, which will need to be removed? Ronhjones  (Talk) 18:59, 14 July 2018 (UTC)
@Ronhjones: I realised a simpler option might just to use only Category:Association football players by nationality and Category:Women's association football players, and look to the players one level down. Theoretically, every player should be in the top level category of their nationality (even if they fall under a subcategory as well). In reality I'm sure there are some pages which are improperly categorised, though likely a very low amount. And most articles should be players (only a few outliers such as Footballer of the Year in Germany and List of naturalised Spanish international football players). S.A. Julio (talk) 19:26, 14 July 2018 (UTC)
@S.A. Julio: If the bot was going to run daily, then maybe a change might be beneficial, but not much as an Api call can only get 5000 page titles per call, so it needs multiple calls anyway. For an occasional running bot, then it might be better to ensure you get them all. A test for "Association football player categories" above has given me 162,233 pages in 14642 categories - does that sound right? How can we eliminate the non-player articles? I can see a few answers (you might know better)...
  1. Do nothing. Just ignore the non-player pages that gets added to the category
  2. Find the pages and add a "Nobots" template to the page to deny RonBot access
  3. Find the pages and put them in a category, say, "Category:Association football support page", we can then add that category to the exclusion list. I like this one, it means that if someone creates a new page and it gets added by the bot to the category, then adding the new cat will ensure it gets removed on the next run.
  4. I did think of a search ' insource: "Infobox football biography" ' - that gives 153,074 pages, can one be sure that every page has that code - looks like a big difference from the pages I found? If so we could have used that to get the first list! :-)
Other than that, after seeing how well the code works, the overall plan for processing will be...
  1. Get all in "Association football position categories" and keep as a list in memory
  2. Get "Category:Association footballers not categorized by position", and check that they all should be there (i.e. no match with above list) - if there is a match then remove the cat from that page, if not add to the list, so we don't have to try to add it again.
  3. Get all in "Association football player categories", and only keep only those that don't match the other list
  4. From the resulting "small" list, edit all the pages to add the required category.
Ronhjones  (Talk) 15:24, 15 July 2018 (UTC)
@Ronhjones: I think option #3 sounds best, this could also be useful for other operations (like finding players not categorised by nationality). With option #4 the issue is that a small percentage of these articles are missing infoboxes (I even started Category:German footballers needing infoboxes a while back for myself to work on). I've started to gather a list of articles which should be excluded, I could begin adding them to a category, would it go on the bottom of the page or on the talk page (similar to categories like Low-importance Germany articles)? And sounds like a good plan, thanks for the help! S.A. Julio (talk) 20:02, 15 July 2018 (UTC)
@S.A. Julio:It will be easier to code with it on the article page - I use the same call over and over again, just keep changing the cat name (User:RonBot/7/Source1) - which returns me the page names. I'll crack on with the plan. Let me know what you call the new category. I'll probably do a trial for the second set, and see how many we get, tonight. The comparison of the two numbers will give you the true number of pages that will end up in Category:Association footballers not categorized by position. Ronhjones  (Talk) 20:22, 15 July 2018 (UTC)
17 cats found, 113279 pages contained. That means 162233-113279=48954 pages to examine. Ronhjones  (Talk) 21:02, 15 July 2018 (UTC)
Dummy run comes up with a similar figure. I've put the data in User:Ronhjones/Sandbox5 - NB: I used Excel to sort it, so it may have corrupted some accented characters. Ronhjones  (Talk) 02:35, 16 July 2018 (UTC)
@Ronhjones: Alright, I've adjusted some categories which shouldn't have been under players, the next run should have a few less articles. I created Category:Association football player support pages, and now have a list at User:S.A. Julio/sandbox of pages which need to be added. Could a bot categorise these? S.A. Julio (talk) 04:10, 16 July 2018 (UTC)
I can do a semi-auto AWB run for that list, later Ronhjones  (Talk) 12:38, 16 July 2018 (UTC)
930 pages done - one was already there, and I did not do the two categories - I will only be adding the category to pages in main-space, so cats, templates, drafts, userpages, etc. don't need to go into Category:Association football player support pages. You can have them there if you want - it's not an issue, just less to add. Ronhjones  (Talk) 15:56, 16 July 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Ronhjones: Alright, thanks! I think the issue is that there were two articles redirecting to the category mainspace. List of Eastleigh F.C. players was inadvertently categorised (missing a colon), and List of Australia national association football team players should have redirected to an already existing article. Now fixed. S.A. Julio (talk) 16:38, 16 July 2018 (UTC)

@S.A. Julio: OK, I'll try another dummy run later, with the new category in the "exclusion" list. We should end up with about 1000 less matches! I'll sort the list in python before exporting, then put in my sandbox again. I'll also do a couple of tests of the "add cat" and "remove cat" subroutines in user-space, just to check the code works OK. Then once you are happy that you have put all the "odd" pages in Category:Association football player support pages, then it will be time to apply for bot approval. Ronhjones  (Talk) 17:54, 16 July 2018 (UTC)
Please also see User_talk:Ronhjones#Category:Association_football_player_support_pages. Do have a re-think about name, and let me know. maybe "Association football non-biographical" Ronhjones  (Talk) 19:55, 16 July 2018 (UTC)
@Ronhjones: Sounds good. I've been going through some category inconsistencies, one of which is stub sorting. For example, Category:English women's football biography stubs is categorised under Category:English women's footballers, yet "football biographies" is not necessarily limited to players (can include managers, referees, officials/administrators etc.). I'm working on fixing the category structure, hopefully should be finished relatively soon. Regarding the name, what about Category:Association football player non-biographical articles? Or another title? S.A. Julio (talk) 07:12, 17 July 2018 (UTC)
@S.A. Julio:Tweaked code to only retrieve articles and categories. Now there are 161259 articles. 47013 are not matching - User:Ronhjones/Sandbox5 (not excel sorted this time, names look OK). I agree on cat name - will add request on cat move later (and let bot move them) Ronhjones  (Talk) 12:47, 17 July 2018 (UTC)
Now listed Wikipedia:Categories_for_discussion#Current_nominations. As soon as the move is finished, I will file the BRFA. Ronhjones  (Talk) 15:46, 17 July 2018 (UTC)
@Ronhjones: Alright, perfect. The other day I added the position category for ~1700 articles, so the list should be slightly shorter now. I'll now finish working on fixing the stub category structure, hopefully there will be less results for the next run (like A. H. Albut, who was only a manager). S.A. Julio (talk) 17:01, 18 July 2018 (UTC)
@S.A. Julio: Less is better :-) I will add that my bot task 3, just adds a template to pages, and usually manages 8-10 pages a minute, so expect a quite long run when we get approval - 8 pages a minute is 4 days for 45000 articles - but of course only for the first run, subsequent runs will be much faster as there will be a lot less pages to tend to (plus the basic overhead of getting the page lists - 2h) Ronhjones  (Talk) 17:38, 18 July 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── BRFA filedRonhjones  (Talk) 19:53, 19 July 2018 (UTC)

HTML errors on discussion pages[edit]

Is anyone going to be writing a bot to fix the errors on talk pages related to RemexHtml?

I can think of these things to do, although there are probably many more things and there might be issues with these ones.

  • replace non-nesting tags like <s><s> with <s></s> where there are no templates or HTML tags between those tags
  • replace <code> with <pre> where the content contains newlines and the opening tag does not have any text between it and the previous <br> or newline
  • replace <font color=#abcdef></font> with <span style="color:#abcdef"></span> where the content does not only contain a single link and nothing else

Jc86035 (talk) 13:50, 15 July 2018 (UTC)

Changing font tags to span tags is a bit of a loosing battle when people can still have font tags in their signatures. -- WOSlinker (talk) 13:48, 16 July 2018 (UTC)
I was under the impression all of the problematic users had been dealt with, and all that was left was cleaning up extant uses. Primefac (talk) 17:24, 16 July 2018 (UTC)
Unfortunately not. There are a couple unworked tasks on the point in phab, but I've seen a few lately who use obsolete HTML. (Exact queries somewhere on WT:Linter I think.) --Izno (talk) 05:47, 17 July 2018 (UTC)

What is the use of fixing the old messages? I suggest you leave them as they are. This is not useful to change them. Best regards -- Neozoon 16:16, 21 July 2018 (UTC)

Missing big end tags in Books and Bytes newsletters[edit]

Around 1300 pages linking to Wikipedia:The Wikipedia Library/Newsletter/October2013 contain <center><big><big><big>'''''[[Wikipedia:The_Wikipedia_Library/Newsletter/October2013|Books and Bytes]]'''''</big> after a misformatted issue 1 of a newsletter.[3] I guess it looked OK before Remex but now it gives an annoying large font on the rest of the page. A few of the pages have been fixed with missing end tags. It happened again in issue 4 (only around 200 cases) linking to Wikipedia:The Wikipedia Library/Newsletter/February2014 with <center><big><big><big>'''''[[Wikipedia:The_Wikipedia_Library/Newsletter/February2014|Books and Bytes]]'''''</big>.[4] None of the other issues have the error. The 200 issue 4 cases could be done with AWB but a bot would be nice for the 1300 issue 1 cases. Many of the issue 4 cases are on pages which also have issue 1 so a bot could fix both at the same time. PrimeHunter (talk) 00:36, 19 July 2018 (UTC)

I like how this problem magnifies with each newsletter addition; (see User_talk:Geraki#Books_and_Bytes:_The_Wikipedia_Library_Newsletter). Wonder why issue 5 is bigger than issue 4? -- GreenC 20:48, 19 July 2018 (UTC)
wikiget -f -w <article name> | awk '{sub(/Books[ ]and[ ]Bytes[ ]*(\]){2}[ ]*(\x27){5}[ ]*[<][ ]*\/[ ]*big[ ]*[>][ ]*$/,"Books and Bytes]]\x27\x27\x27\x27\x27</big></big></big>",$0); print $0}' | wikiget -E <article name> -S "Fix missing </big> tags, per [[Wikipedia:Bot_requests#Missing_big_end_tags_in_Books_and_Bytes_newsletters|discussion]]" -P STDIN
It looks uncontroversial, and 1500 talk pages isn't that much. Will wait a day or so to make sure no one objects. -- GreenC 22:45, 19 July 2018 (UTC)
test edit. -- GreenC 13:26, 20 July 2018 (UTC)
Small trout, well perhaps a minnow, for not having the bot check that each instance it was "fixing" was actually broken. Special:Diff/850668962/851324159 not only serves no purpose, but is wrong to boot. If you're going to fix a triple tag, it would have been trivial to check for a triple tag in the regex. Storkk (talk) 14:35, 21 July 2018 (UTC)

Y Done -- GreenC 15:21, 21 July 2018 (UTC)

Populate Selected Anniversaries with Jewish (and possibly Mulsim) Holidays[edit]

Right now there is a manual process to go year by year to the last year's date and remove the Jewish holiday and then find the correct Gregorian date for this year and put it in. This is because Jewish and Muslim holidays are not based on the Gregorian calendar. There is a website, http://www.hebcal.org for Jewish holidays that lists all the Gregorian dates for the respective Jewish holidays. I suggest a bot take that list and update the appropriate dates for the current/next year. Sir Joseph (talk) 19:29, 20 July 2018 (UTC)