Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)
Jump to: navigation, search

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

edit WP:BRFA/BHGbot_3

BHGbot 3

Operator: BrownHairedGirl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 05:17, Friday, January 20, 2017 (UTC)

Automatic or Manually Assisted: Automatic, supervised

Programming Language(s): AutoWikiBrowser with custom modules

Source code available:

Function Overview: 1) creating category redirects

Links to relevant discussions (where appropriate):

Estimated number of pages affected: initially one run of about several thousand articles. Possibly more later

Edit period(s): initially one run

Already has a bot flag (Y/N): N

Function Details: Category:Organizations by country has thousands of sub-categories, whose names may be spelt either "organizations" (with a Z) or "organisations" (with a S). Since this is a WP:ENGAVR issue, only some countries (see WP:TIES) have a convention for one form or the other; the others use whatever form the creator chose.>br />This randomness of title makes categorisation unnecessarily difficult.
I have used AWB to generate a list of the categories involved, from which I have generated a mirror image list (e.g. if we have Category:Foo organisations in Bar, my list contains Category:Foo organizations in Bar), or vice versa. My AWB module simply takes the category name, skips the page if it already exists, and otherwise parses the name to create a {{Category redirect|Foo organizations in Bar}} or vice versa. I have tested it on the first level of (see these 40 edits[1], where I forgot to switch-to skip-if-exists for the first few edits).

BHGbot does not currently appear to have bot authorisation for AWB, and I will need that to do this tasks. Thanks.

Note that I used the BHGbot account years ago for Wikiproject tagging, with a custom Perl module. I have no current plans to resume that task, but the bot flag has been removed due to inactivity. If this task is approved, I would like that flag to be restored. Thanks. --BrownHairedGirl (talk) • (contribs) 05:08, 20 January 2017 (UTC)

Discussion

  • Hello BHG, mass creation of pages/redirects has shown to be contentious in the past. Has a discussion taken place anywhere (please link in the header above) where the community has shown support for the thousands of pages you want to create? — xaosflux Talk 15:58, 21 January 2017 (UTC)

edit WP:BRFA/Wiki Feed Bot

Wiki Feed Bot

Operator: Fako85 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:57, Wednesday, January 11, 2017 (UTC)

Automatic, Supervised, or Manual: Supervised

Programming language(s): Python

Source code available: https://github.com/fako/datascope

Function overview: Get information from the API in batches. Edit a page when an user clicks a link. Notify users on their talk pages with an automated message

Links to relevant discussions (where appropriate):

Edit period(s): It will edit a page when a user clicks a link that is added by a user to this page with wikitext.

Estimated number of pages affected: Depends on how popular the tool will become. Each user will typically have one feed.

Exclusion compliant (Yes/No): Yes, it only edits pages where editors places the relevant link. It does not check for the bots template, but it will check that the page is in the users namespace.

Already has a bot flag (Yes/No):

Function details:

Demo

You can see the tool in action on its demo page.

Bot read rights

The Wiki Feed system preprocesses information once a day. It fetches all recent changes from yesterday, groups them in pages and then starts getting meta information about these pages. It gets this information from the API and other services like Wikidata and in the future the Pageview API.

To be able to do this as efficient as possible the Wiki Feed Bot would like bot read rights to fetch 5.000 items in one go. The bot reads information for about 40.000 pages each day. Currently it does not respect the maxlag parameter. I'm making this a high priority for the system [2]

Currently Wiki Feed does not use the RCStream. We're considering it, but we need some time to implement this as it requires a fair amount of changes to the system.

Edit of pages in users namespace

To use Wiki Feed people need to paste some wiki text onto a page in their own user's space. This wiki text includes a link. When they press this link they will go to the algo-news tool on labs. The tool makes the user wait until it is done. Once the feed has been calculated the results are added to the page where the link originated from and the user gets redirected back to their user page. The redirect depends on the link that is on the user page. We're making this more reliable at the moment [3]

Notify users

Instead of users having to wait for the tool to complete the feed we can add a message on the talk page as soon as the feed is ready. This allows users to continue with what they are doing and come back later. This mechanism is not yet supported, but we'd like to ask permission to do it already as this is almost certainly a feature that we need [4]

Discussion

  • Pictogram voting info.svg Note: This request specifies the bot account as the operator. A bot may not operate itself; please update the "Operator" field to indicate the account of the human running this bot. AnomieBOT 19:09, 11 January 2017 (UTC)
  • Pictogram voting info.svg Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 19:09, 11 January 2017 (UTC)
  • I just looked at the code, and when I click on the "force refresh" link on the sample feed this code runs. It looks like you're using the requests library without a User-Agent header or maxlag parameter. Are there plans to add both before this bot hits production? Enterprisey (talk!) 19:13, 11 January 2017 (UTC)
  • I've updated the operator Wiki Feed Bot (talk) 20:05, 11 January 2017 (UTC)
  • As far as I know this bot has not been editing outside of its or my userspace. There is no mechanism in place to enforce this though, so it could be misused, but I was not expecting that to happen. I'll look into where this edit was made and report on this discussion thread. Wiki Feed Bot (talk) 20:05, 11 January 2017 (UTC)
  • I'm making maxlag a high priority I only discovered its existence through this approval process. I'll make a ticket for the user agent. Both will be in place before we start announcing Wiki Feed to the public. See: [5] & [6] Wiki Feed Bot (talk) 20:05, 11 January 2017 (UTC)
  • Usersearch does not reveal any edits on the enwiki for this user. Don't know what AnomieBOT found (maybe these approval pages?) and whether things are already reverted. Wiki Feed Bot (talk) 20:09, 11 January 2017 (UTC)
  • Wiki Feed Bot, Special:Contributions/Wiki Feed Bot is what AnomieBOT is looking at. You should stop using your bot account to contribute to this BRFA, since (see WP:BOTACC) you should be using your regular account (Fako85, I assume) for responding to these. Enterprisey (talk!) 20:16, 11 January 2017 (UTC)
  • Ok will do, but I can't edit Wiki Feed Bot's user page with my own user, because I'm not editing enough. It would be great if that's possible, but otherwise I'll keep switching accounts. Fako85 (talk) 20:20, 11 January 2017 (UTC)
    Fako85, you can get that permission manually by becoming confirmed; see WP:RFP/C for instructions on how to do that. Enterprisey (talk!) 20:34, 11 January 2017 (UTC)
  • I'm highly skeptical on whether we should grant a bot flag to a bot run by an editor who isn't yet even autoconfirmed. Given the amount of damage that can be done with a bot account, bot operators are typically editors who have been around for at least a little while and built up trust with the community. ~ Rob13Talk 21:03, 11 January 2017 (UTC)
    I'm here at the dev summit on my own accord flying in from Europe. Surely that counts for something. Sitting at table #10 if you want to say hi. Fako85 (talk) 21:21, 11 January 2017 (UTC)
    Also my partner in this is Ed Saperia who organized Wikimania 2014 Fako85 (talk) 21:24, 11 January 2017 (UTC)
  • So I'm actually at the Wikimedia Developer Summit and was able to chat with Fako85 about this. The idea is pretty neat – you can do cool things like get the most edited articles in a certain category over the past day, or get a list of recent articles documenting natural disasters, sorted by number of deaths. There is a web interface for the news feed, but Fako and Ed were hoping to bring it to the wiki as a subscription service. I personally think this could be useful, e.g. WikiProject Women could have a dedicated page that lists the most recent articles on Women, or the most recently edited by number of pageviews, etc.
    For now I'd like to put this BRFA on hold until the tool is more developed and we are able to discuss the idea further with the community. Given it would be subscription-only, I don't think it's particularly controversial, but the community may have input on how it should function. We should also respect community norms that we generally don't grant advanced rights to new-ish users. In that regard I can at least offer my word that the project Fako and Ed are working on is legitimate, and I do not think they are going to use the bot account to intentionally disrupt the wiki MusikAnimal talk 22:15, 11 January 2017 (UTC)
    So I've been thinking about this more, and even being a subscription service, I think we should account for any potential misuse. My understanding, and correct me if I'm wrong Fako85, you subscribe by adding a configured link (that points to Tool Labs) to a wiki page, then click on the link. That will trigger the bot to update the page with the requested results. For this reason there are a few safeguards we should put in place:
    • For the userspace, the bot should only edit the page if the link was added by that user. This prevents a vandal from adding the link to someone's user page and making the bot add some unwanted content.
    • For now, the bot should only edit the userspace. If people show interest, we could extend this to the Wikipedia namespace (e.g. WikiProjects), and perhaps the template namespace. At the very least, the mainspace is a strict no-no.
    • If and when we do extend this to WikiProjects (and all of the Wikipedia or Template namespace), we'll want some sort of approval process. Again, a vandal could make the bot add unwanted, potentially offensive content unrelated to the WikiProject.
    I'm not sure what the best approach is for the last point – having an approval process, but first we should consult a few major WikiProjects and see if they are interested. I'm going talk to Fako more about this while we're here at the dev summit, and there also happens to be some WikiProject experts here as well who I'm sure will have something to say. I will ask any in-person participants to comment here as needed (rather than me speaking for them) MusikAnimal talk 22:43, 11 January 2017 (UTC)
  • Talking to MusikAnimal about this we came up with a better way to include feeds on pages. In short: people will need to add a template to their user pages and we'll check if this template has indeed been added by the user to prevent misuse. The process is more precisely described in this ticket: [7] Wiki Feed Bot (talk) 00:28, 12 January 2017 (UTC)
  • To summarize the discussion till now. We'll be looking for people in the community that want to use this. So far responses have been enthusiastic. We need to implement these tickets before going live:
    Thanks everybody for the feedback. It has been very helpful Wiki Feed Bot (talk) 00:28, 12 January 2017 (UTC)
    Reminder to use your personal account when editing as a human! :) MusikAnimal talk 00:35, 12 January 2017 (UTC)
  • Regarding the use of loading images to these pages - what kind of check are you doing to ensure that fair-use images are not used? — xaosflux Talk 02:58, 12 January 2017 (UTC)
    It gets all the info through the API. No external images are being used. I saw a recent change in the API where some (pageprop) images are postfixed with _free and some aren't. Is that related to this topic? Currently the system uses the _free images and ignores the others. If possible I would like to show an image whenever one is available of course (even if it's not "free"), but I don't understand the policies completely. Fako85 (talk) 21:56, 12 January 2017 (UTC)
    The policies are the enwiki-hosted images may be "fair use", and as such they can not normally be placed on pages such as user pages, project pages, etc. commons: does not have fair-use, so it is always safe to use a file from commons, but for an image from enwiki you would need to examine the licensing restrictions before including it on userpages. — xaosflux Talk 02:44, 13 January 2017 (UTC)
    Good point Xaosflux. I didn't know about these policy requirements. However recently they seem to have changed the behavior of the API (Nov 30, 2016) as described in this ticket. I'll make sure that I use the free images and stay clear from fair-use ones, which may mean that some pages will not show images in the feed. Fako85 (talk) 22:00, 13 January 2017 (UTC)
  • While Fako85 works on implementing the above, I'd like to ping Harej who helps with WikiProject X, to get his input on whether this bot would be helpful for WikiProjects MusikAnimal talk 02:35, 16 January 2017 (UTC)

edit WP:BRFA/TheMagikBOT_2

TheMagikBOT 2

Operator: TheMagikCow (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:59, Monday, January 2, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (utilising the MediaWiki API and PyWikiBot)

Source code available: --

Function overview: Will add the {{pp}} template to protected pages that do not have them.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Add_protection_templates_to_recently_protected_articles

Edit period(s): Continuous

Estimated number of pages affected: 5/day (Guess!) Initial run will be higher to catch backlog ~5000.

Exclusion compliant (Yes):

Already has a bot flag (Yes/No):

Function details: Gets all the protected pages and scans to check if they have the appropriate padlock in the top corner. If they do not, the bot will add one. Many pages (~5000) have been returned that need fixing, and this is an ongoing issue. See here for some diffs at the request.

Discussion

Do you have a breakdown in the protection levels that you want to deal with? (e.g. FP, TP, ECP, SP, PC). As far as FP pages, that would require an admin bot and an administrator to be the operator. — xaosflux Talk 18:09, 2 January 2017 (UTC)

Pages in the (main) namespace with an edit protection on. FP would obviously require an admin bot, but if that is not possible for me to be granted, I am happy at excluding FP pages. TheMagikCow (talk) 12:11, 3 January 2017 (UTC)

I also think that depending on its permissions, this might also need to take care when it comes to PC pages (which could be active), as the marginal edit just to add the icon can, depending on the bot's permissions, accept pending changes automatically. --slakrtalk / 01:30, 7 January 2017 (UTC)

Cyberbot II has handled adding/removing pending changes protection templates but seems to have stopped. That bot is a pending changes reviewer. —MRD2014 (talkcontribs) 15:57, 7 January 2017 (UTC)
Perhaps it would skip those pages, and then catch them in the next scan when the changes have been accepted/reverted? TheMagikCow (talk) 09:54, 8 January 2017 (UTC)

A couple of questions/points of discussion. Firstly, this bot would fetch the list of protected pages and add the corresponding padlock template. Can this bot do the reverse, removing the padlock from unprotected pages (likely because a protection expired) if you fed it the list of pages which have a padlock template instead of the list of protected pages?

Secondly, could we get an admin and experienced bot user to sign on as an additional operator so that fully protected pages can also be addressed, at least for the first run? This might be more trouble than it's worth, but seems like it's worth talking about. Tazerdadog (talk) 09:40, 9 January 2017 (UTC)

We already have bots that remove protection templates (MusikBot (talk · contribs) and DumbBOT (talk · contribs)). —MRD2014 (talkcontribs) 00:36, 10 January 2017 (UTC)
@Tazerdadog: a list of FP/TP could be generated to determine the impact first. — xaosflux Talk 03:42, 13 January 2017 (UTC)
The impact is nowhere near as many pages in this category. FP pages tend to be indefintiley FP. I think we should just focus on the basic requirements first. TheMagikCow (talk) 07:15, 13 January 2017 (UTC)

Can we get this up and running? TheMagikCow (talk) 17:56, 14 January 2017 (UTC)

@Xaosflux:. TheMagikCow (talk) 16:04, 18 January 2017 (UTC)
TheMagikCow This bot will not be able to edit above your own access level - so you will only be able to add for ECP/SP/PC1 - will this still be useful at this level? — xaosflux Talk 16:08, 21 January 2017 (UTC)

edit WP:BRFA/WugBot_2

WugBot 2

Operator: Wugapodes (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:18, Wednesday, December 28, 2016 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: GitHub

Function overview: Moves (tentatively) approved hooks from the main DYK nomination page to the approved sub page.

Links to relevant discussions (where appropriate): RfC and implementation discussion

Edit period(s): every two hours

Estimated number of pages affected: Two

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The bot reads the DYK nom page and gets the section headers and nomination pages. It then reads those nomination pages and determines if they were closed (if so, they are removed from the page) or if they have at any point been approved are currently approved (if so, moves it to the approved page) else it leaves them on the page. It then reads the approved page and determines where to put the nominations depending on the style consensus determines. Example output can be seen at User:Wugapodes/DYKTest/0 and User:Wugapodes/DYKTest User:Wugapodes/DYKTest/1.

Discussion

Nota bene The bot currently supports two different output styles, with section headers and without section headers, because there doesn't seem to be consensus on which to use. Once a decision is made, it can be set to output that one. I'm personally leaning towards with section headers because it makes the code a bit easier to maintain (plus I like it). So if anyone wants to try and read consensus on that at the discussion (or offer their opinions) feel free to do so. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 23:18, 28 December 2016 (UTC)

First off, I had this same idea of doing away with the overuse of templates when I worked on the User:MusikBot/RotateTDYK task. At the time people were content with the system they had. So glad to see something is being done about it! I wonder if this will conflict at all with my bot task, but I shall worry about that later...
Anyway, there's obviously a lot of reading to do before I can fully evaluate this. If another bot approver is more familiar with the DYK project then feel free to take over :) For now there's one thing that stands out right off the bat: the second output style (without section headers) is still in Category:Pages where template include size is exceeded. Wasn't that the point, to refactor it so that it doesn't exceed the template include size? MusikAnimal talk 06:15, 29 December 2016 (UTC)
The discussion is quite a wall of text, so hopefully someone has the stamina for it. The reason the without section headers page is not working out is actually the way it should function, weirdly enough. Because there's no sections, it just moves whatever hooks have been approved to the bottom of the list and removes them from the DYK page. Since I can't remove them from the DYK page (yet), every time it runs it moves all of them, but were the DYK page to have been edited, it would only move the newly approved ones. So it's a bug in testing that I can't really remove because it wouldn't function properly when it's actually approved (it's partly why I'm for with section headers). That being said, I'm trying to figure out ways to fix that so it can be properly tested. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 06:26, 29 December 2016 (UTC)
I realized that was actually a bigger problem than anticipated (if someone put a bunch of duplicate entries on DYKN, the bot would copy them and break the page) so I decided to fix it rather quickly. The next run should be at 7am UTC, so we'll see if it works out in a couple minutes. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 06:47, 29 December 2016 (UTC)
MusikAnimal, your MusikBot was the other bot (aside from Shubinator's) that I thought might have an edit conflict with the new bot being created by Wugapodes. And, Wugapodes, it occurred to me as I read this thread that it's very possible for people to restore or readd their nominations to the main nomination page if they don't understand the process, and for the bot to "move" it to the new page, and end up with duplicate entries. I think it might be safest to have the bot make sure that any hook being moved does not already exist on the Approved page; if it does, then don't add it again to the Approved page (but do delete it from the main nominations page). Does this make sense? BlueMoonset (talk) 05:18, 30 December 2016 (UTC)
It does make sense, and I totally agree. Change already made for the non-section style, and I think for the section style but I'll check to make sure. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 05:42, 30 December 2016 (UTC)

The bot is in a stable state and I have decided on a style. After a discussion with BlueMoonset, I decided to write the bot to the common denominator of all the different opinions and a discussion on what features to add or modify will be had later rather than sitting on my hands hoping one emerges. As such, I have modified the bot to only move currently approved hooks, rather than hooks that were ever approved. It's a one way move, so if it's unapproved, it will stay on the approved page (will be discussed later). I chose to use the section style since there seems to be no consensus on which one to use and sections seem to be the status quo (will be discussed later). Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 00:29, 6 January 2017 (UTC)

Wugapodes If I understand what you are saying about "currently approved hooks, rather than hooks that were ever approved", if a hook has been pulled and the nomination re-opened, it won't be moved because it's technically unapproved after being re-opened. Yes? I'm OK with that, but just wanted to understand what you are saying. — Maile (talk) 00:43, 7 January 2017 (UTC)
Maile66 Yes, but since the bot only moves nominations one way, it really only matters for the first run and if a nomination is approved and then quickly unapproved. Essentially, if an approved or approved agf tick is not the last tick on a nomination when the bot runs, it won't be moved. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 01:21, 7 January 2017 (UTC)
Good. Thanks for all the work you're doing on this. — Maile (talk) 01:23, 7 January 2017 (UTC)
@MusikAnimal: I realized you may not have this watchlisted, so pinging you. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 21:26, 7 January 2017 (UTC)
I hope to get to this today... again I feel I should review the discussions first. I apologize if I am less responsive over the next week, you can blame it on mw:Wikimedia Developer Summit :) MusikAnimal talk 20:04, 8 January 2017 (UTC)
It's no problem. I know you're busy and just wanted to make sure this didn't get lost in the fray. Have a good summit! Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 22:28, 8 January 2017 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────@Wugapodes: Alright, I think I've got a grasp on how it should function. So let's let's focus on how the heck we're going to do an effective trial. I see that User:Wugapodes/DYKTest/0 (the new nominations page) is being regularly updated. Nominations removed here are because they were approved. However I also see some are being added [8]. Is this simply because the bot is copying the live nominations page, then re-removing any nominations that are approved? Then we have User:Wugapodes/DYKTest/1, which I assume represents the new "Approved" page. Under the new system, admins will add the approved DYK to the Main Page, then remove the transcluded nomination page from the Approved page, correct? Hence why you're not having the bot update it right now, since no one will be removing them?

Also, if a nomination becomes unapproved, will someone manually re-add it to the nominations page? Otherwise it seems we'd run into the scenario that BlueMoonset mentioned where reviewers would never find the re-opened nomination. I really like the idea of an automated two-way road, but I see that evidently the rough consensus has settled on the one-way road, at least initially. We can proceed with this, but I suspect folks will get confused and you'll quickly want to move to a two-way system, which may require another complicated BRFA trial. We might want to think ahead and get that ironed out MusikAnimal talk 05:22, 9 January 2017 (UTC)

You're pretty much correct except you've got the pages wrong. I realized the link above was to the wrong page (it's now fixed), which perhaps led to your current confusion. DYKTest/0 is the (poorly named) test approved page (DYKTest/1 was for the non-section style that I've since dropped). The correct link for DYKTest/1 is to User:Wugapodes/DYKTest which is a clone of the nomination page. But everything you said is, otherwise, spot on so I'll discuss that and you can look at the pages with fresh, correctly informed eyes. You'll notice that User:Wugapodes/DYKTest also adds nominations occasionally, and that's because it does indeed copy from the live nomination page like you said. The reason User:Wugapodes/DYKTest/0 removes nominations is because the page has an archive template, it adds them, obviously, if they are approved.
If a nomination becomes unapproved, whoever pulls the hook will have to re-add it to the page. We could perhaps have a pulled hook section that the bot doesn't update, just like the special occasion holding area. I think that a two-way road could find consensus, but my main concern is getting some system in place so that the transclusion problem on the nomination page can be resolved. I don't particularly mind another complicated trial, if anything I'm hoping a future discussion could lead to some positive changes for DYK that would necessitate it.
anyway, to answer your question of how a trial would work...I'm not particularly sure. We could simply let it keep updating these test pages, but ask some dyk regulars to try using these pages for a week and provide feedback on how it's working, perhaps by posting a note on WT:DYK. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 05:55, 9 January 2017 (UTC)
@Wugapodes: Sorry for taking so long to get back to you, the San Francisco events have concluded and normal wiki life should resume now :) If you don't mind, perhaps you could rope in the DYK regulars to try out the new system? I think a one week trial sounds about right. Let me know when we've got confirmation that some folks will help out, and we'll get the ball rolling MusikAnimal talk 02:31, 16 January 2017 (UTC)
@MusikAnimal: Good to hear! I posted a note on WT:DYK so hopefully that will get some responses. If it's ignored for a few days, I'll start knocking on doors...er...user talk pages. Wugapodes [thɔk] [ˈkan.ˌʧɻɪbz] 05:44, 16 January 2017 (UTC)
@Wugapodes and MusikAnimal: I've watchlisted it. — Maile (talk) 12:41, 16 January 2017 (UTC)
I think the idea to have a separate page for approved hooks is sound since it saves one from combing through T:TDYK. OTOH, I don't think creating a new process that removes such hooks from the main discussion page is a good idea. Not only would this mean that a new DYK-subpage is created without consensus on WT:DYK, it would also lead to confusion for people not familiar with this. IMHO, the bot should just copy the hooks to the approved page, not move them. Having them at two places (approved and T:TDYK) is not really a problem and once they are promoted, they will be removed from both pages anyway since the subpage is archived. Regards SoWhy 08:12, 16 January 2017 (UTC)
@SoWhy: I'll let others clarify, but my impression is consensus has been reached via this RfC and the implementation discussion. Do you feel more discussion is needed? Leaving the approved nominations on T:TDYK is the precise issue, it appears. The page often exceeds the transclusion limit and nominations fail to show up at all, even though they were transcluded. Something has to be done, so they decided to move the approved nominations to a dedicated page. Documentation at all the related DYK pages should be updated, and surely folks will eventually adapt to the new system MusikAnimal talk 18:17, 16 January 2017 (UTC)
Sorry, I must have missed this. My bad. Personally, I think some more discussion is needed but I won't stand in the way if people think consensus exists. Regards SoWhy 20:56, 16 January 2017 (UTC)
SoWhy I guessed you missed a lot. The discussion has been ongoing in spurts since last August or September, in any number of WT:DYK threads, and one at the Village Pump. The link that MusikAnimal provided, is the current one ... scroll upwards after you click, and there is a lot more discussion above the section he linked. It's not an archived thread, but has been at the top of the WT:DYK since November 1, 2016 for anybody who may have wanted to join in. — Maile (talk) 21:11, 16 January 2017 (UTC)
I agree that this solution is good for now. I do still believe that there should be an obvious visual marker as to whether a nomination is approved or challenged, whether that means having a separate section for challenged hooks, or having a colored one-line banner at the beginning of each nomination. That would make it much easier for promoters and double-checkers to quickly see the status of each nomination. Antony–22 (talkcontribs) 22:29, 20 January 2017 (UTC)

edit WP:BRFA/ZackBot_4

ZackBot 4

Operator: Zackmann08 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:42, Saturday, December 3, 2016 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: User:ZackBot/river_cleanup

Function overview: Remove instances of {{{basin_countries}}} from {{infobox river}}

Links to relevant discussions (where appropriate): Special:Permalink/756930869#Deprecation of basin countries

Edit period(s): one time run

Estimated number of pages affected: 11,472 (all in Category:Pages using infobox river with "basin countries" parameter

Exclusion compliant (Yes/No): yes

Already has a bot flag (Yes/No): yes

Function details: Simply will go through all pages in the deprecation category and replace the param with the new syntax that is used in nearly every info box that has anything to do with geographic location. The code will search for the regex: \|\s*basin_countries\s*\=\s*([A-Za-z\[\]\s\.]*) and grab the country. It will then replace that code with | subdivision_type1 = Country\n| subdivision_name1 = #{country} this adding the suibdividion_type as Country and inserting the new subdivision name.

Discussion

Can you link to a discussion to remove this parameter somewhere? It looks like the template still displays it when populated. — xaosflux Talk 01:01, 3 December 2016 (UTC)

@Xaosflux: There are multiple threads on Template talk:Infobox river about this. Not sure if they will satisfy what you are looking for... The existence of the deprecation category and tracking of these occurrences seems to suggest it, as does the fact that this is the new convention. What are your concerns? --Zackmann08 (Talk to me/What I been doing) 07:58, 3 December 2016 (UTC)
Just making sure there is a consensus for the removal before you make 10000+ changes, the category was made by @Rehman:. Rehman, any comments? — xaosflux Talk 15:24, 3 December 2016 (UTC)
@Xaosflux: Gotcha! Makes total sense to me! :-) --Zackmann08 (Talk to me/What I been doing) 19:20, 3 December 2016 (UTC)

Thanks for the ping, User:Xaosflux. I personally have no serious concerns with this, but, it would be better if this could be coupled with more fixes since we will be editing quite a number of articles. The other fixes may include the updating of coordinates parameters (as per the template talkpage), among other fixes. Hence I feel like we should hold this request for now. Rehman 06:44, 4 December 2016 (UTC)

@Rehman and Xaosflux: I'm happy to add that to the script, but I'm pretty sure that JJMC89 bot is already doing that. I'm not seeing what the problem is with it editing so many articles? it is a simple edit... --Zackmann08 (Talk to me/What I been doing) 06:21, 5 December 2016 (UTC)
I would also like to point out that there is precedent... See User:Monkbot. --Zackmann08 (Talk to me/What I been doing) 07:27, 5 December 2016 (UTC)
I support this task, and this bot should not try to replicate the tricky work that JJMC89 bot is doing. If Rehman has other specific infobox cleanup edits to suggest (besides coordinates), they can be listed here or on the infobox's talk page. – Jonesey95 (talk) 07:31, 5 December 2016 (UTC)
@Xaosflux: any further thoughts? Seems like the only concern that has been raised is that it is solely a cosmetic change, but there is some debate as to whether that is true. --Zackmann08 (Talk to me/What I been doing) 03:05, 6 December 2016 (UTC)
I'd really like to see some more people weigh in on this so giving it some time. — xaosflux Talk 03:09, 6 December 2016 (UTC)
@Xaosflux: works for me! :-) I shall standby. @Frietjes, TimK MSI, and Pigsonthewing: you've all taken part in the discussion on the template talk page or made edits to {{Infobox river}}. Any thoughts? --Zackmann08 (Talk to me/What I been doing) 03:14, 6 December 2016 (UTC)
Thanks for the input Jonesey95. If you believe dealing with the coordinates now is a complex move, then as you suggest, lets play it safe and leave that out :-) (Unless of course, if that bot can take over the tasks being proposed here). I don't have anything serious in my mind other than what is being proposed, so I support to go ahead. As a side note, please do add an edit summary of something like this if possible, so others would be encouraged to look at the updated infobox. Cheers, Rehman 23:43, 6 December 2016 (UTC)
@Rehman: I like that idea! Will certainly add a clear edit summary. Just need the go ahead to put this in trial. --Zackmann08 (Talk to me/What I been doing) 22:22, 11 December 2016 (UTC)

{{BAGAssistanceNeeded}}

@Zackmann08: I have similar concerns to task 7 regarding the regular expressions. infobox_text.gsub!(/\|\s*subdivision_name1\s*\=\s*/, "") looks like it would remove the subdivision_name1 parameter, but leave the value. Is that intentional?

There are some tricks I'd like to share... If you're not already familiar with this tactic, try out the pry-byebug gem, and replace the client.edit line (all editing code) with binding.pry. This will activate the debugger where it would normally edit. From here, you can sort of do a dry run of the bot, and carefully analyze if the change it was going to implement is correct. That may surface other issues. Within the debugger, use exit to skip to the next iteration, and exit-program to exit out of the bot's run entirely. I'd also strongly recommend a sleep 1 (one second) for each iteration, at least for the first 20-30 pages of a run, to make sure nothing goes horribly wrong on a large scale. As with task 7, I will try to look over the more complex regex, too.

Finally, could you also update this BRFA filing to include links to relevant discussions? Thanks! MusikAnimal talk 07:30, 16 December 2016 (UTC)

@Zackmann08: With other task completed, shall we shift focus back here? First and foremost, I have big concerns taking on 10,000+ pages when the "Links to relevant discussions" of the BRFA filing is empty. I know you said there were some at Template talk:Infobox river. Could you point those out? Also, from the "Function details" it is unclear exactly what is going to happen. Could you elaborate on what the "new syntax" is, and also touch on any additional functionality that was requested by Rehman here in the BRFA? Are others in agreeance with the new requested changes? MusikAnimal talk 22:46, 21 December 2016 (UTC)
{{OperatorAssistanceNeeded}} I second MusikAnimal's sentiment. We're looking to be spoon-fed the relevant discussions, because there are few of us, and we're unfortunately not omniscient (nor have time to scour archives). Furthermore, if the issue is for some reason contentious, we want to ensure everyone involved is aware a bot is about to do stuff. --slakrtalk / 22:56, 21 December 2016 (UTC)
@Slakr and MusikAnimal: Sorry been a busy day... Let me follow up in the morning! I wanted to acknowledge this tho! --Zackmann08 (Talk to me/What I been doing) 05:07, 22 December 2016 (UTC)
@Slakr and MusikAnimal: so the more I looked at it, the more I realized that there isn't really a thread that shows consensus for this change to be made.... IMHO it is a matter of common sense. All infoboxes that have any sort of location specific data are migrating towards to idea of subdivison1, subdivison2, subdivison3, etc... That being said, I do agree that before a bot is executed making 10,000+ edits, a CLEAR consensus should be reached. To that end, I have started a thread on the talk page here. I pinged the most recent editors. If that doesn't drive some responses I will start a WP:RFC. In the meantime, shall we put this on hold? Shall we move forward with testing it but wait off on full approval? I defer to your judgments. --Zackmann08 (Talk to me/What I been doing) 18:44, 22 December 2016 (UTC)
A discussion on a template talk page is not normally counted as a community consensus unless it has also been advertised somewhere more well attended by the general community. Maybe this one only concerns certain wikiprojects (not sure how much wider interest there would be) but it should at least be advertised on their talk pages if not conducted there. SpinningSpark 19:20, 23 December 2016 (UTC)
@Spinningspark: sorry for the late response. Holliday's have kept me off wikipedia for a bit. Where do you think would be the best place to have this conversation? --Zackmann08 (Talk to me/What I been doing) 15:47, 27 December 2016 (UTC)
I'm not really able to advise you authoritatively on this particular case as I am not familiar. My comment was a general one. You need to make a call on whether you think this is just a local issue or if there is wider community interest. As I said, I suspect it is only of interest to Wikiprojects using the template, in which case open a discussion on the main one and put a notification on the others (or link to the existing discussion at the template page). You can identify these by sampling a few talk pages of articles using the template where the wikiprojects claiming scope are declared. Sometimes template talk pages also declare Wikiproject scope, but more rarely. SpinningSpark 18:14, 27 December 2016 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────@Zackmann08: Sorry I didn't get to you sooner. I probably wouldn't have started an RfC, at least in the History and geography list, as this is a very much technical matter that is of little interest to most. RfC's are a bit time consuming :) The series of pings you made is probably enough, and maybe also notify WT:RIVERS and WT:WPINFOBOXES, as Spinningspark suggests. I'll give this a few days then we can proceed. In the meantime, please kindly fill out the "Function details" so that is more clear as to exactly what the bot is going to do. You can also go ahead and fill in the "Links to relevant discussions". Thanks! MusikAnimal talk 05:10, 28 December 2016 (UTC)

A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) I think there is now sufficient support for this task. Before we go any further, we still need the above "Function details" to be filled out. What is the old syntax, and what is the new syntax? Note also my first comment above about the "subdivision_name1" regex. It's unclear what it is trying to do. The additional functionality recommended by Rehman here in the BRFA doesn't appear to have been discussed much elsewhere, so I think we should save that for another task MusikAnimal talk 21:53, 31 December 2016 (UTC)

@MusikAnimal: didn't want you to think I was ignoring you. been traveling a lot the past few days. What little time I did have for WIkipedia I was working on other projects. Hoping to get some time to work on this today or tmrw. Thanks for the follow up! :-) Happy new year! --Zackmann08 (Talk to me/What I been doing) 18:54, 1 January 2017 (UTC)
Thanks for following up on this User:Zackmann08, appreciate it. I too was away due to holidays, and will again be away due to lots of RL work :( I agree with MusikAnimal, this is a simple technical task, and can be done sooner IMO. Happy New Year! Rehman 14:37, 3 January 2017 (UTC)
@MusikAnimal and Rehman: alright. Finally taking the time to look this over... In response to your question Musik about the regex... The purpose of infobox_text.gsub!(/\|\s*subdivision_name1\s*\=\s*/, "") is to remove subdivision_name1 if it already exists. My thinking was that the pages in the category have | basin_countries = Foo as well as | subdivision_name1 = (blank) and I didn't want to end up with duplicate params. This however has multiple issues... First, what about | subdivision_type1 = (blank)? That would need to be checked for as well... What if both params are filled in? Then shouldn't I just delete basin_counties? Here's my thinking... Why don't I change the category in the template to only categorize pages that have basin_countries set and nothing set for subdivision_name1. Then I can safely delete those as I iterate through. Thoughts? --Zackmann08 (Talk to me/What I been doing) 17:48, 3 January 2017 (UTC)
That seems sensible, but I'm honestly still confused what is the proper syntax and what is not. Are both |subdivision_name1= and |subdivision_type1= valid? I take it |basin_countries= lists the countries one by one, whereas with the subdivision syntax we use a different parameter for each country. Is that correct? MusikAnimal talk 00:07, 5 January 2017 (UTC)
@MusikAnimal: so If you use |subdivision_name1= you need |subdivision_type1= to go with it. In this case, type will always be country. --Zackmann08 (Talk to me/What I been doing) 20:53, 5 January 2017 (UTC)
@Zackmann08: Sorry to keep asking, but if you could kindly fill in the "Functional details" with exactly what the bot is intended to do, that'd be very helpful so we don't have to keep scanning through the discussion MusikAnimal talk 00:03, 14 January 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @MusikAnimal: done. --Zackmann08 (Talk to me/What I been doing) 22:36, 16 January 2017 (UTC)

@Zackmann08: Thanks! It is much clearer now :) Though it seems the part about removing |subdivision_name= if it is blank has not been explained. Do you still plan to add any similar logic? Or maybe you want to update the template to only categorize pages that have basin_countries set and nothing set for subdivision_name1 (and subdivision_type1)?
A few more concerns: the numbering of the new parameters suggests there could be multiple sets. Is it possible that subdivision_name and type can be used for something other than the country, in which case the bot would have to increment the parameter name accordingly? So if name/types 1 and 2 are taken for other purposes, and |basin_countries= is set, the bot would need to remove basin_countries and set |subdivision_name3= and |subdivision_type3= accordingly. Finally it looks like the regex to capture the country won't account for piped links [9] MusikAnimal talk 23:45, 16 January 2017 (UTC)

Bots in a trial period

edit WP:BRFA/BU RoBOT_30

BU RoBOT 30

Operator: BU Rob13 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:29, Saturday, January 21, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic (with supervision of specific instances where the script fails with grace)

Programming language(s): AWB / Lua

Source code available: AWB / Module:Infobox AFL biography/convert

Function overview: Replace deprecated parameters of {{Infobox AFL biography}} with new numbered parameters to improve accessibility.

Links to relevant discussions (where appropriate):

Edit period(s): One-time run (possibly multiple to convert new articles made while editors are learning new parameters)

Estimated number of pages affected: All pages in Category:Infobox AFL biography articles using deprecated parameters, which is still populating.

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Pretty straight-forward. This is a combination Lua module and AWB code. The AWB code takes care of modifying certain inputs to the Lua script and separating out any "total" games/goals into that new parameter. The AWB script also substitutes all the parameters from this infobox into the Lua module, which will take care of separating out the non-accessible lists of teams/years/games(goals) into appropriate numbered parameters.

In a few instances, the Lua module will fail, but it will do so gracefully. Mainly, it will fail when the number of teams and years within one category aren't the same. When this happens, the module prints the original code along with an error message that is searchable, so that I can manually find these articles to correct. It will not cause any errors in the article when failing gracefully.

This module is adapted from Module:Infobox gridiron football person/convert, which was used as part of Wikipedia:Bots/Requests for approval/BU RoBOT 2. That bot task was successful when it was run and highly similar. The only purely new bit of code is the bit of AWB replacement done prior to sending everything into the module.

Discussion

Approved for trial (30 edits). Please run a short trial and link to the edits here for demonstration purposes. — xaosflux Talk 16:00, 21 January 2017 (UTC)

edit WP:BRFA/JJMC89 bot_8

JJMC89 bot 8

Operator: JJMC89 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:52, Saturday, January 21, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: replace.py on GitHub

Function overview: Replace {{Don't edit this line {{{machine code|}}}|{{{1}}} with {{Don't edit this line {{{machine code|}}} in Taxonomy templates.

Links to relevant discussions (where appropriate): WP:BOTREQ#Tidy taxonomy templates (permalink)

Edit period(s): One time run

Estimated number of pages affected: Up to 19,772

Exclusion compliant: Yes

Already has a bot flag: Yes

Function details: Replace {{Don't edit this line {{{machine code|}}}|{{{1}}} with {{Don't edit this line {{{machine code|}}} in Taxonomy templates.

Discussion

Approved for trial (50 edits). — xaosflux Talk 15:51, 21 January 2017 (UTC)

edit WP:BRFA/Ramaksoud2000Bot

Ramaksoud2000Bot

Operator: Ramaksoud2000 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:59, Monday, December 26, 2016 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java

Source code available: https://github.com/MER-C/wiki-java/blob/master/src/org/wikipedia/Wiki.java with a small run file. Will make run file available when finalized.

Function overview: Tags files that were uploaded with the WP:File Upload Wizard with the "I haven't got the evidence right now, but I will provide some if requested to do so" option with {{di-no permission}} and notifies the uploader.

Links to relevant discussions (where appropriate): WP:F11 is the policy that authorizes this.

Edit period(s): When manually run. Most likely daily to weekly.

Estimated number of pages affected: 6 files and 6 user talk pages on first run. About 1-5 files and user talk pages would be affected on each subsequent run.

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No (Note: Task 2 was approved, but not with a flag)

Function details: Goes through Category:Files licensed by third parties, a category that tracks WP:File Upload Wizard uploads that are credited to an external, non-public source. Checks each file for the text: '''Evidence:''' Will be provided on request. This indicates that the "I haven't got the evidence right now, but I will provide some if requested to do so" option was selected. The bot then tags the file with {{subst:npd}} and notifies the uploader with {{subst:di-no permission-notice}}. The bot ignores all files with the following templates: {{OTRS pending}}, {{OTRS permission}}, {{OTRS ticket}}, {{OTRS received}}, and {{Di-no permission}}. The bot cannot be exclusion compliant because notification of the uploader is required for deletion under F11. Unless the policy is changed, I don't see a way to be exclusion compliant. I ran the program and created a list of files and user talk pages that would affected on the first run at User:Ramaksoud2000/Bot trial.

Discussion

  • @Ramaksoud2000: For files that should be ignored, you'll want to check for all redirects to the templates. For instance, instead of {{Di-no permission}} one could use the equivalent {{npd}}, {{No permission}}, {{Db-f11}}. You'll also want to check if these templates were substituted, and not just look for the code that transcludes them.
    That aside, I do think we should seek broader input. The task seems uncontroversial, but I think we should achieve at least some rough consensus for it. Others might know of additional templates, scenarios, etc., where we wouldn't want to tag with {{Di-no permission}}, or perhaps there is doubt as to whether this process should be automated. Let's try to reach out to those who work closely will files, perhaps at Wikipedia talk:Files for discussion? If you know of a few other appropriate venues, please notify them of the discussion MusikAnimal talk 18:35, 27 December 2016 (UTC)
    • The template must be substituted to work properly and have the correct date, and when substituted {{Di-no permission}} is placed. I don't think it should check for transclusions of the redirect, because that means the template isn't working properly, and the file won't get deleted. As for additional discussion, that's a good point, and I have posted notices on WT:FFD, WT:File Upload Wizard, and WT:Media copyright questions. Ramaksoud2000 (Talk to me) 19:56, 27 December 2016 (UTC)
    • The bot should probably check for redirects to the OTRS templates though, and I'll update it to do so. Ramaksoud2000 (Talk to me) 20:11, 27 December 2016 (UTC)
  • Finally! I've been wanting something like this for a long time. There are a few other file maintenance tasks (mostly deletion tagging) that can probably be automated but this is an excellent start. I wholeheartedly, 100%, support this bot. About time we got some major assistance in cleaning up the file namespace. It is a mess out there. --Majora (talk) 22:00, 27 December 2016 (UTC)
  • Note: It doesn't really matter, but the bot trial page above is outdated because image patrollers have gotten to those images and tagged them as di-no permission. An updated list is at User:Ramaksoud2000Bot/Images_missing_permission. Ramaksoud2000 (Talk to me) 04:08, 29 December 2016 (UTC)
  • As an OTRS agent, I worry about this from multiple perspectives. First, the typical 7-day wait time for this criteria is quite low compared to the OTRS backlog (multiple months). Often, images with OTRS pending aren't tagged as such, so this would result in deleted images. Many OTRS permissions agents aren't admins, so this contributes further to the OTRS backlog, exacerbating the whole problem. Second, a bot can't determine whether an image actually needs permissions, and many administrators deleting on this criteria do so mechanically. What is there's a likely PD claim to be made? Our uploaders don't know about PD, but those tagging for permission should. Third, what if a non-standard note that an OTRS ticket has been submitted lies somewhere on the page? I often see people just include a ticket number somewhere instead of an appropriate template. The third can be addressed from a technical standpoint. The first could possibly be addressed by placing a 30 day hold before the bot tags for permission or lengthening the wait time for F11. I'd prefer the former. The second, though, requires human eyes at some point. ~ Rob13Talk 10:45, 30 December 2016 (UTC)
    • I'm also seeing lots of files in this category that don't appear to actually need permission. See, for instance, File:1962-mets-uni.svg. ~ Rob13Talk 10:48, 30 December 2016 (UTC)
      • User:BU Rob13. You misunderstood. The bot only uses the category as a filter to narrow the scope. The bot only tags images that have "Evidence: Will be provided on request" in the permission field. This indicates that the file uploader selected the "I haven't got the evidence right now, but I will provide some if requested to do so" option in the File Upload Wizard, which is not a valid way to upload files. The Upload Wizard contains this warning when they select that option: "Note: files without verifiable permissions may be deleted. You may be better off obtaining proof of permission first."
      • Also, the number of files is low. I ran the bot, and these images would be tagged [10]. I don't understand your concern about the OTRS backlog, because this bot would not do anything different than someone tagging the image as needing permission. As for the concern about the possibility of a free image, I patrol the new image queue, and have never seen an image under this option be PD. This option in the File Upload Wizard is buried deep, and an uploader would have to skip past options for age and U.S. gov works to reach this option. Ramaksoud2000 (Talk to me) 11:02, 30 December 2016 (UTC)
        • I withdraw my objections. The OTRS concern is basically that missing permissions files, if immediately tagged, get deleted before OTRS agents would get to the ticket. But that's a broader concern that should be addressed in a different venue. ~ Rob13Talk 11:24, 30 December 2016 (UTC)
  • BU Rob13's concerns are quite valid, but as explained I think this task is a lot more simple and less questionable than it may seem. This discussion was pointed to from three noticeboards, and over a week has gone by without opposition. I think we're OK to move forward, but someone please correct me if they feel more input is needed.
    Ramaksoud2000, you wrote Checks each file for the text: Evidence: Will be provided on request. Are you referring to the code that would check for this text? Feel free to share that, but more importantly, could you possibly provide examples (permalinks) to such revisions of a file page? Also, going through the file upload wizard I was unable to locate the "I haven't got the evidence right now, but I will provide some if requested to do so" option. Could you walk me through that? Thanks MusikAnimal talk 08:42, 5 January 2017 (UTC)
The program checks all pages in that category to see if they contain the following wikitext: '''Evidence:''' Will be provided on request. I'm not at my computer right now to get the code, but it's a simple statement that returns true if the page wikitext contains that exact string. This is an example revision of a page containing that text. To find the option in the File Upload, go through the following steps: This is a free work -> This file was given to me by its owner -> Evidence. Ramaksoud2000 (Talk to me) 14:16, 5 January 2017 (UTC)

Will the bot detect existing instances of {{di-no permission-notice}} to avoid duplicate notifications (e.g., if some human notifies first)? --slakrtalk / 00:55, 7 January 2017 (UTC)

Thanks for bringing that up. No, it does not. I see a few issues with this. First, it already checks for {{di-no permission}} on the file. It's unlikely that someone would notify the uploader and not tag the file. If you are referring to a scenario where someone notifies the uploader and tags the file a minute later, but is beaten by the bot, I find that unlikely because this bot doesn't run continuously, and people generally tag the file first. I've never seen someone notify first. Second, if it were to check for the notice, the bot would have trouble distinguishing notices for files that were previously deleted for no permission, but re-uploaded under the same name. Thanks, Ramaksoud2000 (Talk to me) 01:06, 7 January 2017 (UTC)
I agree this probably shouldn't be a big concern, unless now that it is explained, Slakr still thinks otherwise? I am happy to move forward with a trial, barring objections MusikAnimal talk 02:41, 16 January 2017 (UTC)


Approved for trial (50 edits). which should equate to 25 files + user talk, if I understand correctly MusikAnimal talk 11:48, 20 January 2017 (UTC)

Please provide me a courtesy ping and chance to review trial edits prior to approving the bot. I'd like to see how this one goes. ~ Rob13Talk 04:56, 21 January 2017 (UTC)

edit WP:BRFA/Yobot_27

Yobot 27

Operator: Magioladitis (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:38, Saturday, December 3, 2016 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: Yes

Function overview: Replace ISBN magiclinks with ISBN template

Links to relevant discussions (where appropriate): Here you are:

Edit period(s): One off

Estimated number of pages affected: Roughly 275k+ 350,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: ISBN 123456789-0 → ISBN 123456789-0 Invalid ISBN

Discussion

@Legoktm, Bgwhite, and MZMcBride: -- Magioladitis (talk) 12:26, 3 December 2016 (UTC)

Will you only be changing if it is currently actually a magic link? (As opposed to free text replacement). — xaosflux Talk 15:27, 3 December 2016 (UTC)
Xaosflux Yes and no. Free texts are almost 0 in fact. Since we detecte them via WP:CHECKWIKI project and fix them regularly. So, in fact one way or another there should be no free texts and no ISBN templates with errors. Me and many other will fix any wrong ISBNs. -- Magioladitis (talk) 15:51, 3 December 2016 (UTC)

I'll add a check for maximum length though so I won't break anything. -- Magioladitis (talk) 15:54, 3 December 2016 (UTC)

@Magioladitis: For example, your example above in nowiki tags - will this be ignored? — xaosflux Talk 16:51, 3 December 2016 (UTC)
@Xaosflux: everything in nowiki tags will be ignored. Less than 10 pages currently though. Check Wikipedia:CHECKWIKI/WPC 069 dump. -- Magioladitis (talk) 19:33, 3 December 2016 (UTC)
  • This is a big change. I support it, but I don't think there is en.WP consensus to make this change yet. Also, do you want to do RFC and PMID, or should that be a separate discussion? – Jonesey95 (talk) 16:36, 3 December 2016 (UTC)
    • Magic cleanup could all be one task or per-magic tasks; but they would all fall under the same community standard. — xaosflux Talk 16:51, 3 December 2016 (UTC)
    • Jonesey hit the nail on the head.
      1. PMID and RFC should be included. Get everything done in one pass.
      2. WMF doesn't exactly know what the final design will be. Parser function? Template? Interwiki? Combination? Need to wait for that.
      3. If WMF is going to give each project some leeway, then a discussion should be had to get consensus.
    Disappearance of some magic links. Change from Tidy to Parsoid. WMF's Linter coming online. Is the next year a golden age or dark age for gnomes? Bgwhite (talk) 01:19, 4 December 2016 (UTC)
    Re #3, definitely leeway on how the migration happens and at what pace, but at some point in the future magic links are definitely going to go. And hopefully a golden age :) Legoktm (talk) 01:59, 4 December 2016 (UTC)

Thank you for taking this on Magioladitis. :) Legoktm (talk) 01:59, 4 December 2016 (UTC)

I've been switching ISBN magic links to use a template in a semi-automated fashion in the past few months using a dumb Python script. Both Magioladitis and Bgwhite took notice and are now eager to get in on the fun. ;-) The script allows me to see a diff of each edit and then approve or not with a keystroke. I can post that script somewhere if there's interest.

I think switching to the {{ISBN}} template is safe and doesn't need to wait for anything. From the few thousand pages now using this template, I've seen no adverse effects and don't expect to. Switching to a template brings ISBN syntax in line with pretty much every other type of similar link on the wiki, with the same tracking mechanisms, edit interface, etc. We really should've done this a long time ago, but now is the next best time to get this over with, in my opinion. --MZMcBride (talk) 05:19, 5 December 2016 (UTC)

There is interest, MZMcBride :) Somewhere on Github would be fine --Edgars2007 (talk/contribs) 15:51, 26 December 2016 (UTC)
Hi Edgars2007. Sure, here it is: Special:Permalink/756836310. It's not a great script, but it works well enough, I suppose. --MZMcBride (talk) 03:26, 27 December 2016 (UTC)
MZMcBride, ahh, a simple one-liner :) Thank you! --Edgars2007 (talk/contribs) 16:59, 28 December 2016 (UTC)
Approved for trial (75 edits). — xaosflux Talk 05:02, 16 December 2016 (UTC)

Given the large scope of this, I would propose turning off all other changes in AWB, and making only this change. That is also what BenderBot seems to be doing with the http -> https changes it makes. There is no reason why any other changes need to be made at the same time, and Yobot has a poor track record when making general fixes - in fact, it is currently blocked because the bot was making incorrect edits unrelated to the bot task which was supposed to be running. — Carl (CBM · talk) 23:52, 17 December 2016 (UTC)

CBM In this case since it is a Find and Replace issue there will be problems at all. I have skip checks for this one. Problems occur because I do ot have skip checks for all of Yobot's tasks. -- Magioladitis (talk) 08:57, 20 December 2016 (UTC)

Because of the scope, I would still prefer to see no general fixes or other changes. Just do the proposed change and get it over with. There are several discussions going on elsewhere about the possibility of removing gen fixes from all Yobot tasks. No reason to put them on another. — Carl (CBM · talk) 11:57, 20 December 2016 (UTC)
I want to register a concern that I can't find where consensus has been established to do this. The ISBN and PMID magic links are used by editors who don't use citation templates (myself included), and they are very quick and easy to use. Should there not be a wider discussion before removing them from enwiki? SarahSV (talk) 17:28, 22 December 2016 (UTC)
It seems they are going to be deactivated across all Wikipedia languages. So with no change there would just be a plain text ISBN or PMID. Whether that would be better or worse than the proposed change is a question, but it seems the status quo isn't an option. — Carl (CBM · talk) 20:12, 22 December 2016 (UTC)
SarahSV See User_talk:MZMcBride#ISBN_magic too. -- Magioladitis (talk) 20:16, 22 December 2016 (UTC)
I find myself agreeing with MZMcBride: Let's get this over with. On CBM's point, if it would be easier/less controversial less risky to run it with everything else turned off, then let's do this as a solo task. That might also give us more room in an edit summary to explain that we're updating to remove reliance on a deprecated feature that will be removed. WhatamIdoing (talk) 20:21, 22 December 2016 (UTC)
Edit summaries can use a very small piped link (e.g. blahblah|Task 27) that links to a full description on the bot page. It really doesn't need that much room. — xaosflux Talk 20:29, 22 December 2016 (UTC)
Magioladitis, thanks. I can't follow what happened at phabricator:T145604. Someone proposed it, and then what? Thank you for pointing out that it ought to be discussed. I do remember you adding the magic numbers for PMID not long ago, so to have to remove them again seems odd. They're very useful. SarahSV (talk) 20:56, 22 December 2016 (UTC)
Technical RFCs have a more extensive and diverse process than the ones we're used to. There is always a Phab task (sometimes whole trees of tasks), a page on mediawiki.org if the idea isn't shot down early, a real-time discussion on IRC, usually discussions on e-mail lists (frequently multiple discussions) as well as on wiki, and sometimes in-person meetings (e.g., at the hackathon associated with Wikimania). RFCs are not only listed centrally but supervised by a committee to make sure that decisions (if any) get implemented, and they are advertised through multiple channels. This one, for example, has been advertised in Tech News and mailing lists. It is not unusual for a technical RFC to last for many months or even years.
In terms of "this ought to have been discussed", it has already been discussed, in public, for months now. I looked into this a while ago, and I very strongly doubt that any kind of "discussion" involving Wikipedia editors would have changed the decision. The main people involved are long-time Wikipedians, and they're aware that this change will likely precipitate small changes to more than half a million pages across many thousands of wikis (including wikis not run by the WMF). But the technical benefits are really quite important, so it's going to happen despite these costs.
In terms of our own internal process, this kind of thing falls into WP:CONEXCEPT: The MediaWiki dev community (which is mostly volunteer devs), and not the editors at this one wiki, is responsible for deciding what the core software does. WhatamIdoing (talk) 22:19, 22 December 2016 (UTC)
I agree that this bot, if approved, should have an edit summary that mentions "Task 27" and links to this BRFA. It should also not perform AWB's general fixes. – Jonesey95 (talk) 18:34, 23 December 2016 (UTC)

I agree that this should be done without other general fixes. -- Magioladitis (talk) 08:32, 27 December 2016 (UTC)

xaosflux I'll postpone the bot trial for some days since there are still ongoing discussions. -- Magioladitis (talk) 11:30, 27 December 2016 (UTC)

That's best. — xaosflux Talk 11:54, 27 December 2016 (UTC)

Bots that have completed the trial period

edit WP:BRFA/Bender the Bot_7

Bender the Bot 7

Operator: Bender235 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:06, Friday, January 20, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: upon request

Function overview: replace http:// with https:// for the New York Times domain.

Links to relevant discussions (where appropriate): WPR: Why we should convert external links to HTTPS wherever possible and WPR: Should we convert existing Google and Internet Archive links to HTTPS?

Edit period(s): one-time run

Estimated number of pages affected: about 100,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Bgwhite recently pointed me at Secure The News, a project of the Freedom of the Press Foundation, conveniently listing all major news outlets that enable HTTPS access. Having already converted The Guardian links earlier, I want to work through that list one by one, starting with The New York Times who proudly announced their activation of HTTPS a week ago.

We have a lot of NYT links (my conservative guess is 100k pages), and while the NYT announcement says so far only "articles published in 2014 and later" are HTTPS accessible, I want to convert them all right now for two reasons: (1) it does not break older links (for example), only redirect to HTTP again; but if NYT does that on their site, at least they keep the HTTP Referrer information. And (2) as they announced they "intend to bring the rest of our site under the HTTPS umbrella", so it's only a matter of time.

Discussion

Approved for trial (50 edits). Here is a trial approval to test your code, do you have any statistics as to how many of the links you will change will end up being currently useless due to the remote server changing them back to http? — xaosflux Talk 05:09, 21 January 2017 (UTC)
I don't have an accurate number, but I would guess as of today about 70-80% of the links would be re-routed to HTTP on the NYT server. This number will gradually go to zero over the next couple of months. (By the way, the example link above already works with HTTPS on mobile; desktop will follow soon.) --bender235 (talk) 12:07, 21 January 2017 (UTC)
Trial complete. Edit history obviously here. --bender235 (talk) 16:21, 21 January 2017 (UTC)

edit WP:BRFA/Dexbot_10

Dexbot 10

Operator: Ladsgroup (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:23, Saturday, January 7, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): python

Source code available: Super simple based on pywikibot. I'll put in gist or somewhere if you want to.

Function overview: Changing old jstor to new style.

Links to relevant discussions (where appropriate): Special:PermaLink/758550969#can your bot do this

Edit period(s): Once, and then once in a month.

Estimated number of pages affected: 2,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Copy-pasting from the discussion page: A few years ago JSTOR deprecated the SICI interface. So, all these links.jstor.org links now point to a www.jstor.org/stable/##### url. Could Dexbot convert these (This would require a query to jstor, so that might be beyond its ability).

Discussion

Pinging User:AManWithNoPlan who suggested Ladsgroupoverleg 13:23, 7 January 2017 (UTC)

Approved for trial (50 edits). please post results when trial is done. — xaosflux Talk 13:49, 7 January 2017 (UTC)
One edit has been done and it looks good. Once this is approved and all are converted, I will manually convert the few that the bot cannot do (ie. they resolve to more than one journal article and you have to manually pick the right one) AManWithNoPlan (talk) 16:47, 7 January 2017 (UTC)
@Xaosflux: doneLadsgroupoverleg 13:32, 9 January 2017 (UTC)

https://en.wikipedia.org/w/index.php?title=Bertrand_Russell&type=revision&diff=759142295&oldid=758638670 this appears to have overrun a pipe. AManWithNoPlan (talk) 03:03, 10 January 2017 (UTC) Pipes and }} issues. I am fixing these. https://en.wikipedia.org/w/index.php?title=Campania&diff=prev&oldid=759142636 https://en.wikipedia.org/w/index.php?title=Architecture_of_Karnataka&diff=prev&oldid=759142044 https://en.wikipedia.org/w/index.php?title=Anthroposophy&diff=prev&oldid=759141910 AManWithNoPlan (talk) 03:08, 10 January 2017 (UTC) @Ladsgroup:

@AManWithNoPlan: That's strange because I use pywikibot's textlib to get urls, I fixed it in my code. Ladsgroupoverleg 15:44, 14 January 2017 (UTC)
@Xaosflux: pinging AManWithNoPlan (talk) 22:11, 14 January 2017 (UTC)
Can you fix your contributions link to show just these 50 edits? — xaosflux Talk 00:16, 15 January 2017 (UTC)

Here's a link https://en.wikipedia.org/w/index.php?title=Special:Contributions/Dexbot&dir=prev&offset=20170109090902&target=Dexbot AManWithNoPlan (talk) 00:26, 15 January 2017 (UTC) Trial complete.

edit WP:BRFA/DatBot_5

DatBot 5

Operator: DatGuy (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:10, Thursday, December 15, 2016 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB

Function overview: Replace deprecated WikiProject Chinese-language entertainment template

Links to relevant discussions (where appropriate): Wikipedia:Bot requests#Replace deprecated WikiProject Chinese-language entertainment template

Edit period(s): One time run

Estimated number of pages affected: ~2083

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: I could make it skip pages with the template easily, especially as its AutoWikiBrowser but I see no need.

Discussion

Approved for trial (25 edits). — xaosflux Talk 17:34, 15 December 2016 (UTC)
Made 22 edits. 10 in a row were fine, but just encountered a problem. Will fix it tomorrow. Dat GuyTalkContribs 19:37, 15 December 2016 (UTC)
Trial complete. Fixed some bugs, "prettied" up the RegEx. Dat GuyTalkContribs 15:03, 16 December 2016 (UTC)
@DatGuy: Please link to the problematic edits, and similar ones where the bug was fixed MusikAnimal talk 19:24, 16 December 2016 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  • In the first few edits such as [11], if the importance was x in the language entertainment template, it was also x when converted to WPChina template. I changed it to remove the importance if there is no WPChina template since it may vary. Then there were multiple fine edits with no issues, but I encountered a minor issue at Talk:Jet Li. The bot was confused about WPChina being on the top, as I made it check only under it. That is fixed now. I've also added a fix for the removal of the newline, causing the }} to be moved to the same line as the WikiProject template. At Talk:Hero (2002 film) I added some RegEx for it to check for other task forces. By the way, the edits you see are good because I did not set it on autosave for debugging reasons. Dat GuyTalkContribs 19:34, 16 December 2016 (UTC)
    Symbol full support vote.svg Approved for extended trial (50 edits). the edits you see are good because I did not set it on autosave – wonderful :) Let's do another 50 edits, for safe measure MusikAnimal talk 20:01, 16 December 2016 (UTC)
    Trial complete. [12] is the only major issue I encountered. Can't really do anything about that. Different ratings in different WikiProjects. DatBot sticks with the now-deprecated template. Dat GuyTalkContribs 22:30, 16 December 2016 (UTC)
    What about here, where there was a class specified for {{WikiProject China}}, but not for {{WikiProject Chinese-language entertainment}}? Perhaps we should use the class of whichever comes first, so if there's not one on the first instance, use the second. I know it's a pain, but it might be good to fix this particular edit, and any others where there was a classification but there isn't now MusikAnimal talk 22:50, 16 December 2016 (UTC)
    Will do that, but it's been a long day so I'm going to sleep. Will finish the other request at task 4. I am going on vacation, so I might be able to either fix it there or when I come back. Please don't expire this if I don't fix it during my break. Thanks, Dat GuyTalkContribs 22:53, 16 December 2016 (UTC)
    Let's also make sure the same issue doesn't happen for the importance parameter. It looks like there may also be aliases, e.g. imp instead of importance [13]. We should take these into account as well.
    And no worries, we won't expire the BRFA. Enjoy your vacation! :) MusikAnimal talk 22:58, 16 December 2016 (UTC)
    It already looks for imp. Also, the vacation is actually for nearly THREE weeks, so I'll definitely try my hardest to get it done under that time. Thanks, Dat GuyTalkContribs 14:50, 18 December 2016 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────{{OperatorAssistanceNeeded}} I noticed you are actively editing. Are you able to tend to this bot task? MusikAnimal talk 04:57, 28 December 2016 (UTC)

Yes. Working on it now. Dat GuyTalkContribs 18:40, 28 December 2016 (UTC)
  • Yes check.svg Done. If there is no class for Chinese-language entertainment, it takes the class from WikiProject China (if it exists). Dat GuyTalkContribs 20:45, 28 December 2016 (UTC)

Symbol full support vote.svg Approved for extended trial (50 edits). MusikAnimal talk 05:25, 29 December 2016 (UTC)

Trial complete. Example of what I added: [14]. Only issue I had was with [15], since there was a double ||. Not much you can really do about that. There were gaps in the edits, but because of off-wiki situations, not any bugs :). Dat GuyTalkContribs 13:35, 29 December 2016 (UTC)
@DatGuy: Alright, several issues, I'm afraid: Here it stripped out the |auto=inherit, shouldn't we retain that? Here it not only stripped out |auto=inherit, but also strangely assigned |importance=start, any idea why that happened? Next, here it changed {{WikiProject banner shell}} to the nonexistent {{WikiProject banner Shell}} (uppercase S), that breaks the display entirely. Here it stripped out |cinema=yes, I'm unsure if it's OK or makes sense to have both cinema and entertainment as a task force. Next, about the issue you found with double pipes [16]: what happens if the bot runs against that (just curious)? Finally, just to be sure, are we intentionally ignoring the importance of {{WikiProject Chinese-language entertainment}} [17]? MusikAnimal talk 17:28, 29 December 2016 (UTC)
I'll go through each point, descending.
  1. That'll be quite easy to add. Where would you like it to go? I suggest it being at the end, after any task forces
  2. Not sure, weird. Doing...
  3. Yes check.svg Done
  4. I thought I fixed that already while sandbox testing. Doing...
  5. Nothing happens. It looks specifically for \|.
  6. Yes, since chinese-language entertainment is a specific topic within a broader one.
Hopefully that addresses everything. Regarding the tasks I'm {{doing}}, expect to see them on my sandbox soon. Dat GuyTalkContribs 21:22, 29 December 2016 (UTC)
@DatGuy: You stated on IRC there are examples in your sandbox. The most recent example still ignores the |cinema=yes that was on {{WikiProject China}}. Let's make sure all valid parameters are retained. The ordering I don't think matters much. For {{WikiProject Chinese-language entertainment}}, you can probably ignore any parameter other than the class. Also, are we accounting for template redirects, such as {{WPCHINA}}?
For the next trial, I strongly recommend you do much of it with autosave off so that you can notice and fix any bugs you encounter. I've fixed all the bot errors I pointed out above, but next time kindly consider reviewing all the trial edits and do any necessary cleanup yourself, pointing out the errors here in the BRFA. This will save time for us both :) MusikAnimal talk 22:25, 29 December 2016 (UTC)
Apologies. For some reason, dumb me thought it was a good idea to turn the find+replace off. As per recommendation, tomorrow I'll test with multiple different possibilities on my sandbox. Thanks for your patience. Dat GuyTalkContribs 23:06, 29 December 2016 (UTC)
@DatGuy: Any updates on resolving the above issues? MusikAnimal talk 00:12, 12 January 2017 (UTC)

A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) MusikAnimal talk 19:02, 15 January 2017 (UTC)

edit WP:BRFA/OAbot

OAbot

Operator: Pintoch (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 19:05, Saturday, October 22, 2016 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/dissemin/oabot

Function overview: Adds free to read external links in citation templates.

Links to relevant discussions (where appropriate): WP:OABOT, Help talk:Citation Style 1

Edit period(s): continuous

Estimated number of pages affected: most main space pages with scholarly citation templates (about one million pages, maybe?)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: The bot queries various APIs to find free to read external links for citation templates, and adds them to the templates. More details can be found on the project page. You can try out what the bot would currently do with the web-based demo.

The bot uses the forthcoming access signaling features of CS1-based templates, so it should not be run before these features are deployed (normally in one week). I post the application now because I expect a lot of discussion anyway.

Our goal for this bot was not to add as many links as possible, but rather to make sure all the changes would be as small and uncontroversial as possible. The bot adds at most one link per template, uses the appropriate identifier parameters when available, and keeps off the templates where one of the existing links is already free to read. The editing pace of the bot would be quite low (a few pages every minute) as the API calls it makes are quite slow (as you can see on the web interface).

Of course, suggestions are welcome. For instance, would it be useful to post a message on the talk page of each affected page, similarly to what User:InternetArchiveBot currently does?

Notifying users involved in the project: Ocaasi (WMF), symac, Andrew Su, jamestwebber, A3nm, Sckott and ChPietsch.

Discussion

I ran the demo on Alan Turing and the bot added |doi-access=free to the citation with doi:10.1112/plms/s2-43.6.544. When I click through that DOI, the page says that I need to sign in to get a full text PDF. Is that a bug? – Jonesey95 (talk) 19:37, 22 October 2016 (UTC)

Good catch! This error comes from our data sources, as you can see here (the record is classified as free to read). This is bound to happen sometimes, although errors of this kind are usually quite rare in my experience, as BASE's OA classification tends to err on the side of caution. ChPietsch (responsible for APIs at BASE), any idea? I do not see any straightforward fix. The newly launched http://oadoi.org does not solve the issue for this DOI either. This is probably a good case for a notification on the talk page, inviting editors to check the edit? − Pintoch (talk) 19:56, 22 October 2016 (UTC)
If the bot were live and it were to make the edit described above, an editor might notice that the bot added a non-free URL to a citation and remove it. What prevents the bot from trying to add a non-free (or any non-working) URL again? For example, InternetArchiveBot uses {{cbignore}} as a way for an editor to prevent the bot from further altering a citation. Does OABot have something similar, e.g. perhaps honoring the same ignore template or a new {{oaignore}}? —RP88 (talk) 21:45, 22 October 2016 (UTC)
Excellent suggestion. We should have that indeed. − Pintoch (talk) 06:59, 23 October 2016 (UTC)

As another example, on Makemake for the citation to doi:10.1038/nature11597 the bot wants to add a 'url' of http://orbi.ulg.ac.be/jspui/handle/2268/142198 . That link redirects to http://orbi.ulg.ac.be//handle/2268/142198 . That page doesn't actually have a full text version, the "fulltext file" attached is just a 2.2 kB blank PDF. The tool oaDOI also fails for this DOI, but in a different manner. For this DOI it claims to have found a full text version, but it directs us to a different URL http://pubman.mpdl.mpg.de/pubman/faces/viewItemOverviewPage.jsp?itemId=escidoc:1615196 which says "There are no public full texts available". —RP88 (talk) 05:39, 23 October 2016 (UTC)

Interesting. I am considering to add a second layer of full text detection in the bot, based on Zotero. It will be very slow, but for a bot like this we don't really care. − Pintoch (talk) 06:59, 23 October 2016 (UTC)

I ran the demo on Alzheimer's disease. Two times, apparently for the same citation, the demo added http://www.ncbi.nlm.nih.gov/pmc/articles/PMC. That link points to an error page; presumably the bot forgot to append the appropriate identifier (if there is one). Also, all of the new research gate urls end end with the .pdf extension but when I followed those links I did not get a pdf file but instead got an html abstract page. The bot shouldn't cause the cs1|2 template to mislead the reader (urls ending in '.pdf' display the pdf icon).

Thanks! I think I can deal with these problems via special cases in the code. − Pintoch (talk) 20:17, 22 October 2016 (UTC)
The two bugs should be fixed now (make sure you refresh the processing as there is some caching). − Pintoch (talk) 20:39, 22 October 2016 (UTC)

I ran the demo on Influenza A virus subtype H7N9 and noticed one item that wasn't really an error. For the citation to doi:10.1016/S0140-6736(13)60938-1 it wants to add a 'url' of:

  • https://www.researchgate.net/profile/Haixia_Xiao/publication/236637171_Origin_and_diversity_of_novel_avian_influenza_A_H7N9_viruses_causing_human_infection_Phylogenetic_structural_and_coalescent_analyses

when a 'url' of https://www.researchgate.net/publication/236637171 would have been adequate. Not really a big deal, but I figured it would be nice to use the short researchgate URLs if possible. —RP88 (talk) 21:19, 22 October 2016 (UTC)

It's the m-dash in the page name. The bot is breaking on pages with non-ascii characters in their page name. For example, it also fails on the page À. —RP88 (talk) 21:54, 22 October 2016 (UTC)

Thanks. I've fixed the unicode problem (that's what you deserve when you use python 2) and have shortened RG urls a bit more (though I could probably remove the profile bit indeed). − Pintoch (talk) 22:32, 22 October 2016 (UTC)

Do you think you could add the following code to main.py in order to further shorten the ResearchGate URLs? After rg_re add...
rg2_re = re.compile('^(https?://www\.researchgate\.net/)profile/[^/]+/(publication/[0-9]+)$')
...and change to the rg_match code to the following:
rg_match = rg_re.match(oa_url)
if rg_match:
	oa_url = rg_match.group(1)
	# Further shorten ResearchGate URLs by removing "profile/<name>" 
	rg_match = rg2_re.match(oa_url)
	if rg_match:
		oa_url = rg_match.group(1) + rg_match.group(2)
Thanks. —RP88 (talk) 13:21, 26 October 2016 (UTC)
Thanks a lot for the suggestion, I have added it to the code after a slight simplification. − Pintoch (talk) 22:40, 26 October 2016 (UTC)

Back to Alzheimer's disease again. The demo added |url=https://www.academia.edu/27685650 When I clicked that link from the demo page, I land on a page that offers apparently two ways to get to the pdf document. One way is through a google login – a rather larger reddish button which pops up a window wanting me to sign in with a google account. The other is a larger image that purports to look like a stack of pages with the document title and PDF in white on a red background and the Acrobat icon. Clicking that image gets me to a 'sign up to download' display (google and / or facebook). The demo did not add |url-access=registration, but, wasn't that the purpose of all of that haggling we've been doing at WT:CS1?

But wait, there's more. When I put the link here in this post, magically, it takes me to the document and now the link in the demo's report does the same. This is astonishing. Do not astonish the user.

I know that you are not responsible for that pathetic academia.edu user interface, but I do have to wonder: if it is so poorly designed as to act this way, should the bot be adding those links to Wikipedia articles?

Trappist the monk (talk) 00:12, 23 October 2016 (UTC)

Yes, as you noticed Academia.edu does not require registration to download the PDF if the user comes from Wikipedia. Therefore links to this website are currently displayed without access annotation. But this behavior can easily be changed of course. I personally think these links are useful, as they give access to full texts that are hard to find elsewhere. Sometimes, even Google Scholar is not aware of these links (compare this Google Scholar cluster and http://doai.io/10.1017/s0790966700007503). The bot currently prioritizes links to conventional repositories over social networks such as ResearchGate and Academia.edu. − Pintoch (talk) 18:10, 23 October 2016 (UTC)

This is from Clitoris:

{{cite book |last=Schünke |first=Michael |first2=Erik |last2=Schulte |first3=Lawrence M. |last3=Ross |first4=Edward D. |last4=Lamperti |first5=Udo |last5=Schumacher |title=Thieme Atlas of Anatomy: General Anatomy and Musculoskeletal System |volume=1 |publisher=[[Thieme Medical Publishers]] |year=2006 |isbn=978-3-13-142081-7 |url=https://books.google.com/books?id=NK9TgTaGt6UC&pg=PP1 |accessdate=November 27, 2012 |ref=harv}}
Schünke, Michael; Schulte, Erik; Ross, Lawrence M.; Lamperti, Edward D.; Schumacher, Udo (2006). Thieme Atlas of Anatomy: General Anatomy and Musculoskeletal System. 1. Thieme Medical Publishers. ISBN 978-3-13-142081-7. Retrieved November 27, 2012. 

The demo added |doi=10.1136/aim.26.4.253. The template has a |url= link to a preview-able facsimile at google books. The doi:10.1136/aim.26.4.253 links to a vaguely related review of this book in the BMJ (which is not mentioned in the citation). Not really useful for helping readers find a copy of the source material that supports the Wikipedia article. The bot should not be littering cs1|2 templates with such vaguely related material.

Trappist the monk (talk) 00:24, 23 October 2016 (UTC)

I propose to exclude cite book from the bot's scope, as confusions between books and their reviews could indeed be a problem. Is there a book-specific parameter in citation that I could use to detect CS2 books? − Pintoch (talk) 06:59, 23 October 2016 (UTC)
I'm not sure that is necessary. I think the problem is at a different point in the processing workflow. Why did the bot even think this citation matched doi:10.1136/aim.26.4.253? Comparing the metadata between the DOI and the citation, the only match is the title. There is no overlap between the five authors in the citation and the one author for the DOI . The years of publications are not the same. Why did the bot think it had a match when literally the only matching metadata was the title? While it may be uncommon, I suspect there are more than a few journal articles that share titles. Maybe the better fix is for the bot to demand a higher quality match before adding a DOI? How about this proposal: if a query is made to the dissemin API without a DOI, the bot should apply an additional filter that rejects any results that differ in either the author's last name or year of publication. —RP88 (talk) 09:02, 23 October 2016 (UTC)
That would make sense. Feel free to implement that if you are up for it. For now I will keep cite book blacklisted, mostly for performance reasons. − Pintoch (talk) 18:10, 23 October 2016 (UTC)
I have added an additional check when adding a DOI, by comparing the metadata in the citation with the official metadata from the publisher. This solves the problem in the example raised by Trappist. − Pintoch (talk) 21:36, 4 November 2016 (UTC)

This, also from Clitoris, the demo added |doi-access=free (I modified it to use Module:Citation/CS1/sandbox to avoid the unrecognized parameter error):

{{cite journal/new |last=Smith |first=K. C. |first2=T. J. |last2=Parkinson |first3=S. E. |last3=Long |first4=F. J. |last4=Barr |title=Anatomical, cytogenetic and behavioural studies of freemartin ewes |journal=[[Veterinary Record]] |volume=146 |issue=20 |pages=574–8 |year=2000 |doi=10.1136/vr.146.20.574 |ref=harv |subscription=yes|doi-access=free }}
Smith, K. C.; Parkinson, T. J.; Long, S. E.; Barr, F. J. (2000). "Anatomical, cytogenetic and behavioural studies of freemartin ewes". Veterinary Record. 146 (20): 574–8. doi:10.1136/vr.146.20.574Freely accessible. (subscription required (help)). 

Following that doi gets an abstract and, ultimately, a note that reads: "Access to the full text of this article requires a subscription or payment." Far from free. So, the |subscription=yes is correct. Should the bot add a free signal parameter when there is only one 'external' link and when the template includes |subscription=yes? This is contradictory.

Trappist the monk (talk) 00:37, 23 October 2016 (UTC)

Good point. The error comes from the same data source (highwire press). If this is frequent among records from this publisher, ChPietsch might consider to reclassify them? Otherwise I can blacklist this publisher downstream. I will restrict also the bot to templates without |subscription= and |registration=. − Pintoch (talk) 06:59, 23 October 2016 (UTC)

I ran the demo on Pluto and noticed another error, maybe a data source problem? For the citation to doi: 10.1007/s10569-010-9320-4 the bot wants to add a 'url' of http://www.springerlink.com/content/g272325h45517581/fulltext.pdf . However, that URL redirects to http://link.springer.com/article/10.1007%2Fs10569-010-9320-4 which requires a login/money for full text. —RP88 (talk) 06:16, 23 October 2016 (UTC)

This bug is introduced in Dissemin, I think I can fix that upstream. − Pintoch (talk) 06:59, 23 October 2016 (UTC)

I have rolled out a new version that checks all links with Zotero's scrapers, which increases a lot the processing time but should give a more accurate full text detection. − Pintoch (talk) 18:10, 23 October 2016 (UTC)

I ran the demo on Dengue fever and noticed one item that was a little odd, but maybe not your issue. The bot failed to identify doi: 10.1002/14651858.CD003488.pub3 as freely available. The DOI resolves to http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD003488.pub3/abstract , which hosts the full PDF (presumably via Cochrane Library's "green open access" program. —RP88 (talk) 13:33, 27 October 2016 (UTC)

Yes, this is due to my recent decision to add one extra layer of full text availability detection. What happens in this case is that Zotero scrapes this page, returns a full text URL http://onlinelibrary.wiley.com/store/10.1002/14651858.CD003488.pub3/asset/CD003488.pdf?v=1&t=iusekwil&s=86c45ca3ee282309247c9e0cdc4b8d3779ecf544 , then we try to download this file to check it looks like a PDF, and we fail because of a 403 error. I suspect Wiley puts some protections on their full text URLs to make sure they are not shared (this is probably why they put some tokens after .pdf). So, we will not add any green locks on Wiley DOIs. This is bound to happen if we want to be absolutely sure that all the links we add link to full texts: this is a precision/recall trade-off. I think we all agree the bot should have an excellent precision. In the case you brought up, the bot does not make any change to the template: I think this is fine. There are surely cases where it adds another free url or identifier without noticing that the doi is free. Is this a serious issue? At least editors (or other bots!) can prevent it by adding |doi-access=free (in which case OAbot would not change the template). By the way, I have added instructions for publishers in case they want to make sure the bot can detect their full texts. (Wiley does not fully comply with Google Scholar guidelines as the link they put in the citation_pdf_url meta tag does not lead to a PDF file, but an HTML file). − Pintoch (talk) 14:27, 27 October 2016 (UTC)
Thanks for the explanation. —RP88 (talk) 14:44, 27 October 2016 (UTC)

Here is another that very likely has nothing to do with the bot but instead more likely to be a data issue. On (225088) 2007 OR10 the cite to doi:10.3847/0004-6256/151/5/117 ideally should be found to match arXiv:1603.03090 ( https://arxiv.org/pdf/1603.03090.pdf ). —RP88 (talk) 15:06, 27 October 2016 (UTC)

Yes, Dissemin's index is currently not fully up to date with BASE's, and this DOI mapping has been modified recently, so we don't detect this. This should normally be resolved as we process updates from BASE. − Pintoch (talk) 21:38, 28 October 2016 (UTC)

{{BAGAssistanceNeeded}} I believe all the errors reported here have been fixed. What are the next steps? − Pintoch (talk) 21:36, 4 November 2016 (UTC)

Approved for trial (50 edits). Do a small run, post the results here please. — xaosflux Talk 23:42, 4 November 2016 (UTC)

The bot was blocked at its 10th edit by the pywikibot error "Hit WP:AbuseFilter: Link spamming". The bot account does not look blocked, but I will not attempt any edit before getting your opinion on the matter. − Pintoch (talk) 01:03, 10 November 2016 (UTC)

In Autism, the bot added hdl:11693/22959 and |hdl-access=free. That's sort of right I guess. The article is available through a doi link on the hdl page. But, doi:10.1016/j.neuron.2015.09.016 (which does link to the article) is already present in the template yet the bot appears to have ignored that fact.
The academia links still require some sort of login so the bot should not be adding them as free-to-read links. The researchgate links never quite finish loading (I know, not your problem; just annoying from a reader's perspective).
Trappist the monk (talk) 01:40, 10 November 2016 (UTC)
Thanks for spotting this problem with the HDL. The bot is mislead by the PDF placeholder on the institutional repository. The DOI is detected as free to read by Dissemin by Zotero fails to confirm that. I will do two fixes: do not consider PDF files as full texts if they are too short (say less than 3 pages), and do not add any link to a citation with a DOI that Dissemin detects as free to read (but do not add the green lock unless Zotero confirms it is free to read).
Concerning links to academia.edu, are you sure you need to login when coming from Wikipedia? (Have you tried clicking on the links as they appear in the wiki pages?) Or do you think we should not add these links because they require a registration when coming from outside Wikipedia? It seems to me that a debate about the sources we want to include (and more generally on the bot itself) would be helpful, perhaps at WP:VPP? − Pintoch (talk) 10:11, 10 November 2016 (UTC)
Relying on page-count doesn't seem to me to be a good idea. I've seen pdf 'articles' that were less than a page in length – obituaries, book reviews, retractions, etc.
Perhaps its simply a definition problem where free-to-read includes the caveat that some 'free' sources are only free-to-read when linked from Wikipedia.
Trappist the monk (talk) 11:02, 10 November 2016 (UTC)
I agree looking at the page count is not great, but what else can we do? It is not an issue if we skip very short articles (we just won't add a full text link to them, but we will not tag them as paywalled or anything like that.) Ideally I could have a look at CiteSeerX's filtering heuristics, I know they have put some effort into filtering out PDF files that are not full texts. But integrating that into the pipeline will be almost surely quite painful. − Pintoch (talk) 11:10, 10 November 2016 (UTC)
Pintoch the abusefilter issue should be resolved now (I adjusted the filter) - ping me if you hit it again. — xaosflux Talk 04:48, 10 November 2016 (UTC)
Thanks a lot! I will resume the bot after fixing the problems above. − Pintoch (talk) 10:11, 10 November 2016 (UTC)
{{OperatorAssistanceNeeded}} Any progress on moving back to trials? — xaosflux Talk 20:18, 2 December 2016 (UTC)

The bot has made 20 edits so far, and I have spotted a few mistakes:

I am taking the following measures:

  • instead of just doing a HEAD request on the URLs that are supposed to lead to a PDF, the bot now downloads the PDF and checks that it is a valid PDF file (this measure was implemented earlier but was not effective due to a cache in the pipeline).
  • as I have no control over how academia.edu extracts its DOIs, and as the usefulness of these links was questioned by Trapist, the bot will now stop adding links to this website. ResearchGate has similar issues as DOAI.io's index of it is getting out of sync with the website, so the bot will stop querying DOAI.io altogether. I think it would be good to have a debate at WP:VPR to decide whether we want these links or not: if so, it should not be very hard to convince these two sources to adapt their metadata accordingly.

Editing will be resumed when I will be satisfied with my implementation of these fixes. − Pintoch (talk) 15:09, 4 December 2016 (UTC)

Trial complete. Now I just need to find the time to analyze the results. − Pintoch (talk) 09:35, 6 December 2016 (UTC)

Here is a report for the last 30 edits made by the bot, which are the edits made with the latest version of the code. The bot processed about two thousand citation templates (I don't have the exact figure).

  • The bot added |doi-access=free on 69 DOIs, and I checked manually that they were all free to read from an IP address that is not covered by institutional subscriptions.
  • The bot added additional free to read links to other citations. All the links it added were free to read. I have checked that the free versions all look faithful to the "published" one (if not identical), when my institutional accesses allowed me to go through the paywall.
Link added by OAbot Publisher link Publisher access
CiteSeerX: 10.1.1.659.5717 doi:10.1037/a0029016 $11.95
hal.in2p3.fr/in2p3-00450771 doi:10.1021/jp9077008 $40.00
hal.in2p3.fr/in2p3-00014184 doi:10.1016/j.nuclphysa.2003.11.00 $39.95
hdl:1807/50099 JSTOR 23499358 registration
hdl:1807/50095 JSTOR 23499354 registration
CiteSeerX: 10.1.1.680.5115 doi:10.1080/10888705.2010.507119 EUR 39.00
CiteSeerX: 10.1.1.145.4600 doi:10.1145/102782.102783 $15
hdl:10536/DRO/DU:30050819 doi:10.1016/S0140-6736(12)61728-0 registration
CiteSeerX: 10.1.1.454.4197 doi:10.1016/S0140-6736(07)61575-X registration
hdl:10419/85494 doi:10.1093/cje/25.1.1 $29.00
arXiv:1311.2763 doi:10.1140/epjh/e2013-40037-6 EUR 41.94
hdl:10915/2785 doi:10.1002/andp.19053220806 free
hdl:10915/2786 doi:10.1002/andp.19053221004 free
hdl:10144/125625 http://ajcn.nutrition.org/content/85/1/218.short free
CiteSeerX: 10.1.1.529.1977 doi:10.1016/j.ympev.2005.10.017 $39.95

As you can see, in three cases the published version was also free. This happens when the bot fails to discover the full text from the publisher. This is bound to happen as not all publishers comply with the Google Scholar guidelines or have an up-to-date Zotero scraper. This makes the HDL link less useful, but the bot did not add any wrong information, as it does not mark any link as paywalled. − Pintoch (talk) 18:52, 6 December 2016 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.