Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)
Jump to: navigation, search

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

edit WP:BRFA/Harej bot

Harej bot

Operator: Harej (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:47, Monday, July 27, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Source on GitHub

Function overview: Moves WikiProject "meta-categories" (e.g. Category:Science WikiProjects) from a WikiProject's page to its eponymous category (e.g., from Wikipedia:WikiProject Biology to Category:WikiProject Biology) for those WikiProjects that have their own categories.

Links to relevant discussions (where appropriate): Discussion at WikiProject Council

Edit period(s): An initial run which will take several hours, followed by monthly runs

Estimated number of pages affected: Approximately 3,200 WikiProject pages and categories

Exclusion compliant (Yes/No): Yes, as pywikibot respects the bots template.

Already has a bot flag (Yes/No): Yes. Harej bot is older than dirt; its user page links to its original approval which pre-dates the BRFA process.

Function details: The bot pulls a list of WikiProjects that have categories identical to their name. There are currently 1,602 of these WikiProjects. The bot takes categories that are of the style "X WikiProjects" (e.g. Category:Science WikiProjects) and migrates them from the WikiProject page to its category. In each edit summary, the bot will link to a page explaining the edit.

This category migration will make WikiProject categories easier to browse through; rather than WikiProjects being sorted into a category twice through its page and its category, they are only sorted once through its category. It also makes it easier to update categories; they only need to be updated in one place instead of two. These WikiProject "meta-categories" are used to create pages in the WikiProject Directory and streamlining the categorization scheme will make it easier for the community to properly categorize these projects. Properly categorized WikiProjects are discoverable WikiProjects.

In the future the bot could migrate additional categories, but it is currently only migrating WikiProject meta-categories so as to create the narrowest, least controversial bot proposal.


A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag. Harej (talk) 22:47, 27 July 2015 (UTC)

edit WP:BRFA/Monkbot_8

Monkbot 8

Operator: Trappist the monk (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:39, Saturday, July 25, 2015 (UTC)

Automatic, Supervised, or Manual: authomatic

Programming language(s): awb

Source code available: yes, find this → \{\{\s*vcite2 and replace it with this → {{cite

Function overview: replace {{vcite2 journal}} with {{cite journal}}

Links to relevant discussions (where appropriate):

Edit period(s): probably just one-time

Estimated number of pages affected: 3000

Exclusion compliant (Yes/No): yes

Already has a bot flag (Yes/No): yes

Function details: The 25 July 2015 update to Module:Citation/CS1 added support for |vauthors= and |veditors=. |vauthors= was initially created as part of {{vcite2 journal}}. The content of this parameter is parsed into multiple |last= / |first= pairs. To do this, {{vcite2 journal}} requires |vauthors= to be a comma-delimited list of author names where each name is one or more surnames, a space, and one or two initials: Last FM, Last FM; the Vancouver system style. The parsed name-list (along with all of the other template parameters) is passed to {{cite journal}} for rendering.

That functionality with additional error checking has been added to Module:Citation/CS1 and is now available to all of the cs1|2 citation templates. As such, {{vcite2 journal}} is no longer required and becomes an extra step in the citation rendering process. This bot request is made to rename all of the instances of {{vcite2 journal}} in article and template space to {{cite journal}}. (selfishly, this request is made so that the bot operator doesn't have to spend several hours at the computor in a bleary-eyed haze: clicking ... clicking ... clicking ...)

I've made a hundred or so edits with this script. The results can be found in Special:Contributions/Trappist_the_monk search for convert {{vcite2 journal}} to {{cite journal}}


edit WP:BRFA/BattyBot_46

BattyBot 46

Operator: GoingBatty (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:25, Saturday, July 25, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB with custom module documented at User:BattyBot/CS1 maint: Extra text

Function overview: Fix citations to remove articles from Category:CS1 maint: Extra text

Links to relevant discussions (where appropriate): Category:CS1 maint: Extra text

Edit period(s): Daily

Estimated number of pages affected: 3,000+

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: To remove articles from Category:CS1 maint: Extra text, this bot task will fix citations that have parameters that contain text that duplicates static text provided by the template. (e.g. this edit) The bot will also apply AWB general fixes as needed. GoingBatty (talk) 15:25, 25 July 2015 (UTC)


edit WP:BRFA/Yobot_24

Yobot 24

Operator: Magioladitis (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:43, Monday, June 1, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: open source

Function overview: Remove {{Persondata}}

Links to relevant discussions (where appropriate): Wikipedia:Village_pump_(proposals)/Archive_122#RfC:_Should_Persondata_template_be_deprecated_and_methodically_removed_from_articles.3F concluded that "Consensus is to deprecate and remove."

Edit period(s): One time run

Estimated number of pages affected: 1.5 million pages

Exclusion compliant (Yes/No):

Already has a bot flag (Yes/No): Yes

Function details: Straightforward


@Magioladitis: Two questions:

  1. Should this bot wait until AWB has been changed to stop adding/updating Persondata?
  2. Since Persondata is not visible in the article, does WP:COSMETICBOT apply? Would it be better to include Persondata removal in AWB general fixes, for other bots & users to remove as they make substantial changes?

Thanks! GoingBatty (talk) 23:30, 1 June 2015 (UTC)

GoingBatty I'll be doing general fixes at the same time. I applied for this so I have control to AWB's code. The bot won't start until we are 100% that mass removal is a good thing to do. Before starting I'll modify the AWB's code not to add Persondata and probably we'll do a new release so that no other editors will add it. -- Magioladitis (talk) 05:41, 2 June 2015 (UTC)
We already have consensus for the removal of Persondata. If the addition of Persondata by automated tools hasn't been a breach of COSMETICBOT, then neither should be its removal. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 07:40, 2 June 2015 (UTC)
@Pigsonthewing: Hi Andy! You make a good point. Are there any bots that have been adding Persondata as their primary approved task? GoingBatty (talk) 12:27, 2 June 2015 (UTC)
Yes, Rjwilmsibot used to add but not anymore. I already contacted Rjwilmsi about the RfC. -- Magioladitis (talk) 12:57, 2 June 2015 (UTC)
Found the approval at Wikipedia:Bots/Requests for approval/RjwilmsiBot 4. Thanks! GoingBatty (talk) 13:33, 2 June 2015 (UTC)

The RFC mentioned above has a section (not an actual wiki markup heading) "Rough plan" which says in part

1. Transfer |SHORT DESCRIPTION= across to Wikidata. Yes check.svg Done


4. Transfer any new data to Wikidata, then remove methodically.

I don't see any agreement to modify the rough plan, so I suppose that is the plan. Will this bot transfer new data to Wikidata, or just "remove methodically." If this bot just removes, how will the part about transferring new data be done? Also, does # 1 mean that if any new data is found, only the SHORT DESCRIPTION will be transferred and other, more suspect, data such as birth and death dates will not be transferred? Jc3s5h (talk) 16:03, 2 June 2015 (UTC)

Whoah! I second that concern. The five-point plant presented at the RfC was expressly conditioned on the "transfer any new data to Wikidata" before the systematic removal is implemented. This immediate removal without transfer of new input persondata to Wikidata violates the conditions upon which the RfC was approved. Please adhere to the RfC "rough plan" as presented. Dirtlawyer1 (talk) 18:23, 2 June 2015 (UTC)
I have already suggested that you read the lengthy and detailed discussion of data import under the RfC; and on the pages linked from there, on Wikidata. The RfC was concluded as "deprecate and remove", with no conditions atatched, in the light of that discussion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:55, 2 June 2015 (UTC)
To add, it became apparent during the course of the RfC that no more data would be transferred to Wikidata, all other PD fields having been deemed unreliable. I can't imagine what "remove methodically" might entail. Alakzi (talk) 20:38, 2 June 2015 (UTC) Oh, I see what you mean now, Dirtlawyer1. I agree that the (or a) bot should migrate any descriptions added after PLbot's last run; that would be eminently sensible. Alakzi (talk) 20:56, 2 June 2015 (UTC)
@Alakzi: Indeed. I didn't just fall off the wiki-turnip truck yesterday. In addition to the recently added persondata descriptions, I have also raised a concern about the married name variations of female bio subjects listed under alternative names. Dirtlawyer1 (talk) 21:13, 2 June 2015 (UTC)
The outcome of the RfC is "deprecate and remove", not "deprecate and remove with caveats". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:55, 2 June 2015 (UTC)

That RFC closed 26 May 2015 (UTC). Wouldn't "deprecate and remove" imply a reasonable period of classic deprecation, possibly with warnings or something, while retaining functionality for some period of time before removal? --slakrtalk / 05:27, 4 June 2015 (UTC)

  • I'm not keen on the sound of "I'll be doing general fixes at the same time." Whatever this bot does, please don't do the other things AWB does at the same time, particularly changing reference positions. Magioladitis, if you're doing an AWB update, please remove that instruction. Also, only 35 people supported this RfC. Is that enough to be making mass changes? Sarah (SV) (talk) 20:43, 5 June 2015 (UTC)
I won't do any ref reordering. I will wait until there is a stable consensus to remove. There is an active discussion in WP:BOTREQ at the moment. -- Magioladitis (talk) 06:28, 6 June 2015 (UTC)

First batch proposal

I had a look at Wikipedia:Bots/Requests for approval/RjwilmsiBot 4: this bot introduced persondata derived from the infobox. Afaics I suppose this automatic removal task would be uncontroversial (apart from maybe cosmeticbot concerns):

  • yobot removes persondata templates that comply to all of the following:
    1. the persondata was created by RjwilmsiBot 4, before the 2014 merge of persondata to wikidata;
    2. the persondata template/data has not been modified since its introduction
    3. the article still carries an infobox
    4. the persondata does not contain any pre-1924 dates

I see several advantages to carrying out such task now, most importantly some feedback before carrying out possibly more intrusive tasks. --Francis Schonken (talk) 09:19, 6 June 2015 (UTC)

Added fourth condition per Wikipedia:Bot requests#Temporarily exclude pre-1924 births from persondata delete. I still think it useful to operate a first batch of uncontroversial persondata removals. --Francis Schonken (talk) 05:33, 7 June 2015 (UTC)
  • Support - I support the creation of this bot.--BabbaQ (talk) 23:05, 20 June 2015 (UTC)

I think the bot should be approved with no restrictions. -- Magioladitis (talk) 08:07, 5 July 2015 (UTC)

edit WP:BRFA/MoohanBOT_8

MoohanBOT 8

Operator: Jamesmcmahon0 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 08:33, Sunday, May 10, 2015 (UTC)

Automatic, Supervised, or Manual:Automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Creating redirects from [[Foo]] to [[List of Foo]].

Links to relevant discussions (where appropriate):

Edit period(s): one-time run, then weekly/monthly depending on how many new lists are created without redirects

Estimated number of pages affected: Initially 12617

Exclusion compliant (Yes/No):Yes

Already has a bot flag (Yes/No): Yes

Function details: I have compiled a list of pages where there exists a [[List of Foo]] page but no [[Foo]] page, as a redirect or otherwise. My bot will create all of the pages as redirects to their lists. Specifically with the content;

#REDIRECT [[List of Foo]]
{{R from list topic}} 
{{R with possibilities}}

[[Category:Bot created redirects]]

This is per Pigsonthewing request at Wikipedia:Bot requests#Redirects to lists, from the things they are lists of.


You say that you've made a list of all relevant pages; can we see it? עוד מישהו Od Mishehu 08:32, 12 May 2015 (UTC)
List is here Jamesmcmahon0 (talk) 12:31, 12 May 2015 (UTC)

Anomie how do we proceed here? -- Magioladitis (talk) 23:09, 30 May 2015 (UTC)

The "needs advertisement" seems to have been satisfied, it got some support, and it got some good suggestions that reduced the list dramatically. Proceed as you see fit. Anomie 01:19, 2 June 2015 (UTC)
Update - I've been away but am now redoing lists from the latest dumps and checking all suggestions etc. standby... Jamesmcmahon0 (talk) 13:11, 7 June 2015 (UTC)
@Jamesmcmahon0: Any news? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:00, 25 June 2015 (UTC)

edit WP:BRFA/ThePhantomBot


Operator: PhantomTech (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:11, Thursday, March 19, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: No, not now at least, though I'll share some of the regex if asked during the approval process

Function overview: Monitors recent changes for possible vandalism and edits from long term abuse users, logs findings and (sometimes) gives information to AN/I for review by users.

Links to relevant discussions (where appropriate): Not sure if this would require consensus from AN/I since it would be posting there or not since the posting task is simple and likely to be uncontroversial.

Edit period(s): daily (while I have a computer on) with plans to make it continuous

Estimated number of pages affected: 1 (AN/I) not counting pages in its own user space

Exclusion compliant (Yes/No): no

Already has a bot flag (Yes/No): no

Function details: This bot is meant to allow a decrease in the amount of edit filters and to identify abuse that can't be reverted by bots like ClueBot due to lack of certainty. Every 60 seconds (that time might be lowered to 20-40 seconds to spread load) a list of changes since the time of the last check is filled. On a separate thread, the bot goes through the list, and decides if the actions match a set filter, these filters are usually similar in what they check to the edit filters however are not limited to the same restraints. If a filter is matched the associated actions are taken, usually logging to the user space and sometimes a noticeboard report. Essentially, this bot acts as a post-edit filter, currently targeting long term abuse but technically able to act on any identifiable action. Since it happens after edits, as opposed to "during" edits, it doesn't slow down editing for users so problematic edits don't have to be frequent, like they do to be edit filter worthy, for it to be worth it for this bot to check for them. In its current state I have two LTA matches setup, one stolen from a log only edit filter and another stolen from edit filter requests, and a general abuse match, also stolen from edit filter requests. If the bot is accepted, I plan on going through all the active long term abuse cases and adding whichever ones I can along with some edit filter requests that aren't accepted due to infrequency.


Vandalism/abuse monitoring is a difficult area; I suggest that you write your bot and have it edit a page in its or your userspace (no approval necessary unless edit rates are high) as if it were ANI, and monitor what it reports. You can in turn pass the valid observations it makes along to ANI, and if the quality of the reporting is high enough you may find other people watching the page to see what it finds. I expect you'll get a high false-positive rate which you'll need to analyse to improve the performance of your algorithms, and eventually you'll get to a point where regexs just don't cut it for detecting the long-term, low-frequency abuse you're targetting - and you'll have to look at more sophisticated processes. This is the technological evolution that Cluebot went through, but it catches more egregious and obvious vandalism.

Do you think operating in your or the bot's own userspace would be an acceptable stepping stone? Josh Parris 22:18, 20 March 2015 (UTC)

I realize that there is lots of long term abuse that can't be solved by regex alone, this bot will never be able to handle every LTA case but I do plan on implementing more advanced checks in the future. I have no problem running my bot for a bit with it doing nothing but logging to User:ThePhantomBot/log. PhantomTech (talk) 22:36, 20 March 2015 (UTC)
I would want to see a community consensus that bot generated ANI reports are wanted, please discuss and link that discussion here. — xaosflux Talk 05:43, 26 March 2015 (UTC)
@Xaosflux: As I've been working on my bot I've been adding more functionality and thinking about the best ways to have the bot's reports dealt with. Here's my current plan for how it will report things:
  • Bad page recreation - Log to user space
  • High probability sockpuppets - Report to SPI
  • Lower probability sockpuppets - Log to user space
  • LTA detection - Report to AIV or report to AN/I where certainty is reasonably low (not too low, don't want to waste people's time)
  • Newly added LTA filters, including ones being tested - Log to user space
  • IPs using administrative templates - Report to AN/I
  • Sleeper account detection - Not implemented yet so I don't know how often it will go off, if its often log to user space otherwise report to AN/I
I assume you still want to see a discussion for the AN/I reports but do you want to see any for the other places? I'm guessing you'll want SPI mentioned in the discussion too since I don't think any bots currently report to there. Also, do you have any suggestions on where to report these things or how to report them? Admittedly AN/I does feel like a weird place for bot reports but the goal is to get the attention of editors who may not be aware of the bot's existence. PhantomTech (talk) 07:03, 26 March 2015 (UTC)
Start reading AIV archvies such as Wikipedia_talk:Administrator_intervention_against_vandalism/Archive_3#Suggested_merge_of_Wikipedia:Administrator_intervention_against_vandalism.2FTB2 for some suggestions. WP:AIV/TB2 is probably the oldest 'bot reported' noticeboard right now. — xaosflux Talk 10:23, 26 March 2015 (UTC)
@Xaosflux: Are you suggesting that if my bot were to report to ANI it should do so via a transuded page? I like that idea, using transclusion to put the bot's reports somewhere they'll be seen keeps the bot's updates off the watchlists of people who don't care. PhantomTech (talk) 15:32, 26 March 2015 (UTC)
I'm sugesting that prior community discussion on ANI bot reports came to that conclusion - and that after reading up on it you start new discussions to find out where people would make best use of your reports. For ANI it could be the existing TB2 subpage, but they might want it on its OWN subpage; for the other forums people might want subpages, might want main, or might not want bot reports at all. I am not trying to dictate the solution, just that whatever it is should enjoy community consensus before integrating to existing forums. — xaosflux Talk 16:59, 26 March 2015 (UTC)

I posted a discussion at Wikipedia:Village_pump_(idea_lab)#ThePhantomBot_reporting_to_noticeboards to help get an idea of what kind of reporting users would like. Depending on how that goes I'll post something to village pump proposals with notifications on the relevant noticeboard's talk pages. PhantomTech (talk) 05:57, 27 March 2015 (UTC)

There's a lot of room for fuzzy logic on some of the things you've mentioned. To name a few:
  • High probability sockpuppets / Lower probability sockpuppets — exactly how are these detected?
  • Bad page recreation — what's detected here?
  • LTA detection — how does this magic work? Do you have any code already written, and if so, which forms of LTA did you have in mind? Any examples?
A huge bonus of the abuse filter is that people can actually see what the filters are detecting and make tweaks if there are a crapton of false positives, for example.
Also, any updates otherwise on the stuff raised above? It looks like this has been sitting here for a few months. :P
--slakrtalk / 05:48, 4 June 2015 (UTC)
@Slakr: Sorry about how long this has been sitting here, I've gotten busy since starting it which has slowed it down a bit, hopefully I'll be able to get more time to work on it more often soon. To your concerns about "fuzziness", you're right, there aren't any clear cut ways to tell what is considered, for example, "high probability" vs "low probability" which means that, to some extent, my judgment will have to be trusted in deciding where some things go. I don't expect too many people to just trust me without having something to look at so I think the best way to gain support for things outside of the bot's userspace is to be able to show some examples of real detections and where they would be reported. Currently, almost all detections are sent to the debug log, but I'm working on setting up different logging locations and don't plan on seeking consensus until after those have been reported to for a bit.
Sockpuppets: I'm working on different ways to detect "general" (non-LTA case) sockpuppets, what the bot can detect right now is based on removing speedy deletion tags and I don't think it causes enough disruption to be worth an SPI. I'm not sure what other ways I'll come up with for general sockpuppet detection so, if the bot ends up being allowed to report to SPI, the early reports will all be LTA cases with new reports only happening after several successful manual reports.
LTA: LTA cases are detected using anything from the same basic checks an abuse filter can do to ones that involve looking through all the user's contributions for patterns or past behavior. The ones that can be detected are any that I'm aware of and can think of a way to detect, which would include any that currently have abuse filters setup for them. I have detections setup for a few cases (see User:ThePhantomBot/Edit Codes#LTA: Long Term Abuse for a list) but a lot of them haven't been updated during my inactivity.
Bad pages: Bad page detections are pages that probably have to be deleted, (or moved) currently bad recreates are the only things detected (and reported here) but I plan on setting up something to detect autobiographies better than the abuse filters do and have something "ready" to detect self-published drafts.
Help from other users: I agree that a bonus to the abuse filters is that more than one user can edit them, since that isn't as easy with a bot I think it's something that will limit its usefulness a bit so I hope someone will be able to think of some way to allow others to help more directly without having to make all the detection algorithms public. Until then, I don't mind sharing detection algorithms with anyone who is currently trusted by the community and has a reason to see them (like abuse filter managers) via email or something. I'd also like to keep abuse filters setup and updated (but disabled) that would exist if my bot didn't, so they can easily be enabled in case something happens to my bot. Keeping the disabled abuse filters updated is something I wouldn't mind doing personally (if I can get the perm) or by working with abuse filter managers. To mitigate the issue of something causing false positive spam, I plan to have a fully protected page to allow specific detections to be turned off by administrators without having to turn them all off by blocking or something and testing detections in the debug log before having them file any reports.
Other Updates: Sleeper account detection is something I have a few ideas for how to do but haven't worked on in a while, it probably won't be done till a while after everything else has been. I've been working on setting up what's needed for the ANI reports ("priority" reports) and I think the proposal will end up being to transclude a navbox (this one, once it's done) so that it is split from the actual noticeboard page. ANI is still the best place I can think of to have the "priority" reports go since the goal is to get experienced editors to see them soon after being reported and decide what (if anything) needs to be done, and I think using a navbox and transclusion will allow for minimal disruption, but I'm open to other ideas and realize it may not get enough support to happen.
Hopefully that answered all your questions without being too long, if not, let me know. PHANTOMTECH (talk) 07:43, 4 June 2015 (UTC)

edit WP:BRFA/EnzetBot


Operator: Enzet (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:18, Sunday, March 8, 2015 (UTC)

Automatic, Supervised, or Manual: supervised.

Programming language(s): Python.

Source code available: no source code available since bot is a part of major project.

Function overview: fixes inconsistencies and formatting in metro stations articles (in station infobox and S-rail templates).

Links to relevant discussions (where appropriate):

Edit period(s): every time bot found some inconsistency in metro pages.

Estimated number of pages affected: about 1000 pages. There are about 10 K metro stations in the world, so no more than 10 K pages should be affected.

Exclusion compliant (Yes/No): yes.

Already has a bot flag (Yes/No): no.

Function details: I have The Metro Project for metro map automated drawing. It uses Wikipedia for check metro system graphs and sometimes meets inconsistencies and bad formatting in Wikipedia articles. Now I fix them manually (see my contributions) but want to entrust it to my bot.

Tasks for this request:

  • wrap dates in station infobox with date template, e.g. 2000-03-30 to {{date|30-03-2000|mdy}};
  • add links to station structure types and platform types, e.g. Shallow single-vault to [[Single-vault station|Shallow single-vault]];
  • fix redirects in S-rail template.


I see the bot account has been editing articles. It is not yet approved for that.

I note you want to edit dates, but I see from your recent edit to Klovska (Kiev Metro) and your function details (above) you haven't realised the importance of ISO 8601 date formatting. I also note that you did not elect to use the style reportedly preferred by Ukrainians in your edit; is there a reason for this? Out of interest, why are these dates of interest to your bot?

The bot fixes inconsistencies between articles; how does it know which one is correct?

The links to station structure types and platform types you're proposing to link - are they the ones in infoboxes, or article text?

What major project is bot a part of, and why does that make the source code unavailable? Josh Parris 14:32, 9 March 2015 (UTC)

I'm sorry for editing without approval. It was a test to make sure bot works. I'll never do it again.
Yeah, I see, date changes seem to be a bad idea. I think, I should remove it from tasks list. Should I undo my edits (there are only 5 of them)?
About inconsistencies. Bot doesn't know which one is correct, it only can detect wrong things or possibly wrong things. For example, wrong date format (month number can't be greater then 12), wrong terminus (station cannot be a next or previous station for itself), if station A is next for station B, station B should be previous for station A, wrong S-rail values (if it conflicts with station lists on metro page or line page), and so on. That's why bot is not automatic, I supervise every edit. I don't know how to formulate it as a task since there are so many types of inconsistencies. May be you can help me?
Yes, bot will add links to infobox only if there is no such link in article text.
My major project is not open source for now. It generates very simple suggestions for bot I exampled above—what to replace in which article. If bot source code is important, I can push it to public repository but it is trivial since it uses pywikibot (no more then 100 LOC). Enzet (talk) 17:01, 9 March 2015 (UTC)
If you're supervising every edit, then this is a "manual bot" and can be run using your own account without approval. Would you like to do so? Josh Parris 11:30, 13 March 2015 (UTC)
OK, I understand all about inconsistencies. If I don't want to use Enzet account for semi-automated editing, can I use EnzetBot account (with removed {{Bot}} template and without approval) or should I register new account without bot keyword? What is a good practice for that? Also, is there some criteria for semi-automated editing (no faster than 1 edit per 5 seconds, no more 100 edits in a row, or something like that)? (Sorry if I missed it from the rules.)
Also, I am realized that (1) wrapping station structure and platform type with links and (2) fixing S-rail redirects tasks may be provided without supervising or supervising for them is really fast (checking is trivial). Can I get approval or disapproval for these tasks in this request or I should create new one? Enzet (talk) 09:27, 17 March 2015 (UTC)

Josh Parris any further comments? -- Magioladitis (talk) 18:44, 19 May 2015 (UTC)

Enzet What are the redirects of S-rail? -- Magioladitis (talk) 08:55, 22 June 2015 (UTC)

((OperatorAssistanceNeeded|D)) Magioladitis (talk) 06:28, 10 July 2015 (UTC)

Redirections in S-rail templates. See this, this (station redirect), or this edit (line redirect). Enzet (talk) 15:05, 14 July 2015 (UTC)

I asked WikiProject Stations to join the discussion. WikiProject transport too. -- Magioladitis (talk) 09:03, 15 July 2015 (UTC)

{{Date}} should not be used in articles; the template makes this clear. Alakzi (talk) 11:32, 15 July 2015 (UTC)

Alakzi the bot already changed 3 pages. Do you suggest the edits should be reverted? -- Magioladitis (talk) 14:15, 15 July 2015 (UTC)

I've replaced it in all three. Alakzi (talk) 14:21, 15 July 2015 (UTC)

I striked out this bot part. -- Magioladitis (talk) 17:10, 15 July 2015 (UTC)

Short of publishing the source code, I'd want to see a list of all replacements the bot would perform. Alakzi (talk) 17:44, 15 July 2015 (UTC)

Bots in a trial period

edit WP:BRFA/BD2412bot


Operator: BD2412 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:15, Thursday, May 14, 2015 (UTC)

Automatic, Supervised, or Manual: Supervised,

Programming language(s): AutoWikiBrowser.

Source code available: AWB.

Function overview: I frequently clean up links left from disambiguation page moves. For example, the page Epping previously was an article on a town in England. This page was moved to Epping, Essex, and Epping became a disambiguation page with several hundred incoming links. As is commonly found in such cases, most of the links intended the town in England, and many were found in formulations like "[[Epping]], Essex", or "[[Epping]], [[Essex]]". A similar issue is the recurring creation of common patterns of disambiguation links to heavily linked articles; for example editors will often make edits creating disambiguation links like "[[heavy metal]] music" and "the [[French]] language", which can easily be resolved as "[[heavy metal music]]" and "the [[French language]]". Over time, large numbers of these links may build up. I would like permission to run AWB as a bot so that when page moves are made or common disambiguation targets become heavily linked, obvious formulations like these can be changed with less of a direct investment of my time.

Links to relevant discussions (where appropriate): Wikipedia:Disambiguation pages with links generally contains the protocol for repairing links to disambiguation pages.

Edit period(s): Intermittent; I intend to run this when a page move creates a large number of disambiguation links, for which obvious formulations for a large number of fixes can be seen.

Estimated number of pages affected: New disambiguation pages are created frequently. I would guess that between a few dozen pages and a few hundred pages might require this kind of attention on any given day, although there are likely to be days where no pages require such attention.

Exclusion compliant (Yes/No): Yes, as AWB does this automatically.

Already has a bot flag (Yes/No):

Function details: When large numbers of links to new disambiguation pages are created from existing pages having been moved to disambiguated titles, or from the buildup of common patterns of editing behavior over time, I will determine if there are obvious patterns of links to be fixed, for example changing instances of "[[Epping]], Essex" or "[[Epping]], [[Essex]]" to "[[Epping, Essex|Epping]], Essex", or "[[Epping, Essex|Epping]], [[Essex]]". I will then run AWB in bot mode to make these changes, and review the changes once made.


BD2412 I like the idea of this bot but I think similar proposals have been rejected in the past as WP:CONTEXTBOT. Could you please raise a discussion at WP:VILLAGEPUMP so that we check whether there is consensus for these changes or not? There might be traps I can't think of right now. -- Magioladitis (talk) 12:57, 16 May 2015 (UTC)

Which Village Pump page would that go to? bd2412 T 15:12, 16 May 2015 (UTC)
BD2412 Let's start from Wikipedia:Village pump (miscellaneous). -- Magioladitis (talk) 21:50, 16 May 2015 (UTC)

Wikipedia:Village_pump_(miscellaneous)#Bot_request_for_disambiguation_link_fixing_issue. -- Magioladitis (talk) 11:11, 21 May 2015 (UTC)

As I was afraid... Wikipedia:Village_pump_(miscellaneous)/Archive_49#Bot_request_for_disambiguation_link_fixing_issue. I see no actual consensus there. -- Magioladitis (talk) 23:19, 27 May 2015 (UTC)

BD2412 can you provide me a list of 50 manual edits doing this task? I would like to judge reactions. I do not guarantee approval. In fact, while I like this task a lot, I think it will get a lot of reactions. Still I think you can try to make 50 edits so we can really see reactions. Take it an unofficial bot trial. -- Magioladitis (talk) 23:22, 27 May 2015 (UTC)

I recently did a run of about 10,000 fixes to links to Striker (which is soon to be turned unto a disambiguation page). Not all of these fall into the pattern that I have discussed here, but those that changed [[Midfielder]]/[[Striker]] to [[Midfielder]]/[[Striker (association football)|Striker]] would. There were probably a few hundred of those in the mix. This run of my contributions was in the thick of this run. bd2412 T 23:40, 27 May 2015 (UTC)

BD2412 My experience show that there will be a lot of reaction. I'll reject the bot request and I encourage you that you keep doing this kind of changes supervised by your normal account using AWB. Unless, of course, there is at some point clear consensus that I do that do this kind of stuff. Some editors in the past even complaint for orphaning a link before xfD closes. Just a general remark for oter editors that my be readin this: BRFA is not the place to gain consensus but a place to request based on consensus. -- Magioladitis (talk) 23:14, 30 May 2015 (UTC)

I am not proposing to orphan links prior to an XfD closing - I generally don't, in fact. Striker was an exceptional case based on the volume of links, and the fact that the RM time has run with multiple votes of support and no objections. My proposal is directed solely to link fixes needing to be made after a consensus-based page move has been carried out. I have had very few reactions to runs of thousands of fixes made using AWB, and I have never had a reaction when making obvious fixes of the type I propose. I would be glad to keep doing it this way, but I have actually physically burned out computer mice and had wrist aches that lasted for days! bd2412 T 00:37, 31 May 2015 (UTC)

BD2412 Any ideas of how we can ensure there is consensus for this task? I hope you understand my position. -- Magioladitis (talk) 18:51, 31 May 2015 (UTC)

There is a longstanding consensus for fixing disambiguation links, which is the foundation of Wikipedia:WikiProject Disambiguation. bd2412 T 19:02, 31 May 2015 (UTC)

I need Anomie's opinion on this one... -- Magioladitis (talk) 22:17, 11 June 2015 (UTC)

Approved for trial (100 edits). -- Magioladitis (talk) 13:44, 28 June 2015 (UTC)

I'll give it a trial run this weekend. Thanks. bd2412 T 15:16, 30 June 2015 (UTC)
@BD2412: Gentle poke given that it's been two weeks. What's the status? — Earwig talk 20:20, 13 July 2015 (UTC)
I've been busy with things that keep me logged in to my regular account - I can't run AWB as a bot unless I log into the bot-authorized account. I'll give it a run-through tonight. I'll need to find a set of links with applicable fixes first. bd2412 T 20:44, 13 July 2015 (UTC)
I ran a few tests, but at the moment there are no disambiguation pages with large numbers of links requiring the same solution. These come up sporadically. I tested the bot function on links containing "layout = Longitudinal" (for which the only answer will be Longitudinal engine), links to San Vicente, El Salvador (changed to San Vicente, El Salvador) and Fukushima, Japan (changed to Fukushima, Japan). I made a typo on one variation of the "Longitudinal" fix, and fixed that manually. Otherwise, everything went smoothly. bd2412 T 01:28, 14 July 2015 (UTC)

@BD2412: Now I understand better what the bot does and I like it more. Should you try find more examples in order to complete the 100 edits test period? -- Magioladitis (talk) 10:14, 21 July 2015 (UTC)

  • I intend to test more - it's a matter of time before an existing article with a few dozen incoming links is moved in favor of a disambiguation page, leaving a certain number of obvious repetitive solutions to be enacted. bd2412 T 12:24, 21 July 2015 (UTC)
- gentle poke - any updates? :) ·addshore· talk to me! 12:53, 28 July 2015 (UTC)
Something will come up. It always does. bd2412 T 15:06, 28 July 2015 (UTC)

edit WP:BRFA/JhealdBot


Operator: Jheald (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:36, Monday December 8, 2014 (UTC)

Automatic, Supervised, or Manual: Supervised

Programming language(s): Perl

Source code available: Still under development.

Function overview: Maintenance of subpages of Wikipedia:GLAM/Your_paintings, in particular the subpages listed at Wikipedia:GLAM/Your_paintings#Artists_by_birth_period. There is currently a drive to identify Wikidata entries for the entries on this list not yet matched. I seek approval to keep these corresponding pages on Wikipedia up to date.

Initially I would just use the bot as an uploader, to transfer wikipages edited off-line into these pages (including fixing some anomalies in the present pages -- which I would probably do sequentially, through more than one stage, reviewing each fix stage before moving on to the next).

Once the off-line code is proven, I would then propose to move to a semi-automated mode, automatically updating the pages to reflect new instances of items with d:Property:P1367 and/or corresponding Wikipedia and Commons pages.

Links to relevant discussions (where appropriate):

Edit period(s): Occasional (perhaps once a fortnight), once the initial updating has been completed. And on request.

Estimated number of pages affected: 17

Exclusion compliant (Yes/No): No. These are purely project tracking pages. No reason to expect a {{bots}} template. If anyone has any issues with what the bot does, they should talk to me directly and I'll either change it or stop running it.

Already has a bot flag (Yes/No): No. I have one on Commons, but not yet here.

Function details:

  • Initially: simple multiple uploader bot -- take updated versions of the 17 pages prepared and reviewed offline, and upload them here.
  • Subsequently: obtain a list of all Wikidata items with property P1367. Use the list to regenerate the "Wikidata" column of the tables, plus corresponding sitelinked Wikipedia and Commons pages.


Regarding uploading offline edits: Are these being made by anyone besides the operator? What license are they being made under? — xaosflux Talk 23:44, 18 December 2014 (UTC)
@Xaosflux: The pages have been being prepared by me using perl scripts, drawing from Wikidata.
I've slowly been making the scripts more sophisticated -- so I've recently added columns for VIAF and RKDartists links, both taken from Wikidata, defaulting to searches if there's no link, or no Wikidata item yet identified. Content not drawn from Wikidata (typically legacy entries from the pages as I first found them) I have prefixed with a question mark in the pages, meaning to be confirmed. For the most part these are blue links, which may go to completely the wrong people.
So at the moment I'm running a WDQ search to pull out all Wikidata entries with one (or more) values for the P1367 "BBC Your Paintings identifier" property, along with the properties for Commons category name (P373), VIAF (P214) and RDKartists (P650). I'm also running an Autolist search to get en-wiki article names for all Wikidata items with a P1367. Plus I have run a look-up to get Wikidata item numbers for all other en-wiki bluelinks on the page (this gives the Q-numbers marked with question marks). But the latter was quite slow, so I have only run it the once. At the moment I'm still launching these searches by hand, and making sure they've come back properly, before updating & re-uploading the pages.
As to the licensing -- Wikidata is licensed CC0. My uploads here are licensed CCSA like any other upload to the site (though in reality there is very little originality, creativity or expression, apart from the choice of design of the page overall, so probably (under U.S. law at least), there quite possibly is no new copyrightable content in the diffs. Various people of course are updating Wikidata -- I've been slowly working down this list (well, so far only to the middle of the 1600s page) though unfortunately not all of the Wikidata updates seem to be being picked up by WDQ at the moment; the Your Painters list is also on Magnus's Mix-and-Match tool; and various others are working at the moment, particularly to add RKD entries to painters with works in the Rijksmuseum in Amsterdam. But Wikidata is all CC0, so that all ought to be fine.
What would help though, would be having the permission for a (limited) multiple uploader, so I could then upload the updates to all 17 pages just by launching a script, rather than laboriously having to upload all 17 by hand each time I want to refresh them, or slightly improve the treatment of one of the columns.
I'm not sure if that entirely answers your question, but I hope does make clearer what I've been doing. All best, Jheald (talk) 00:45, 19 December 2014 (UTC)
Approved for trial (25 edits or 10 days). Please post your results here after the trial. — xaosflux Talk 01:48, 19 December 2014 (UTC)
@Xaosflux: First run of 16 edits made successfully -- see contribs for 19 December, from 15:59 to 16:55.
(Links to RKD streamlined + data updated; one page unaffected).
All the Captchas were a bit of a pain to have to deal with; but they will go away. Otherwise, all fine. Jheald (talk) 17:31, 19 December 2014 (UTC)
Sorry about that, I added confirmed flag to avoid this for now. — xaosflux Talk 17:34, 19 December 2014 (UTC)
New trial run carried smoothly (see this related changes page).
Update still prepared by executing several scripts manually, before a final uploader script; but I should have these all rolled together into a single process for the next test. Jheald (talk) 09:11, 11 January 2015 (UTC)
Run again on January 21st, adding a column with the total number of paintings in the PCF for each artist. Jheald (talk) 17:13, 24 January 2015 (UTC)

Have you completed the trial? Josh Parris 10:20, 4 March 2015 (UTC)

I was going to go on running it once a month or so, the next one probably in a day or two, until anyone progressed this any further, possibly making tweaks to my offline processing scripts as I went along. Obviously I'm open to suggestions as to anything I can improve or do better; though the actual unsupervised bit itself is just an upload script, refreshing a dozen or so pages, so nothing very complicated. (The off-line preprocessing is a bit more involved, but still pretty trivial). Jheald (talk) 00:33, 5 March 2015 (UTC)
I note that further edits have been made. Out of interest, why do IDs change? The painter's been dead for centuries. Are they merges of duplicates? Also, is the trial finished now? Josh Parris 14:54, 9 March 2015 (UTC)
@Josh Parris: Clearly there has been a significant update of VIAF ids on Wikidata in the last three weeks, with a lot of new VIAF ids added -- I think by one of Magnus Manske's bots. This is why there are significant reductions in length for a lot of pages, with VIAF searches being replaced by explicit VIAF links.
I imagine that this may be catch-up resynchronisation for several months of updates at VIAF; but it may also be that now VIAF is explicitly targeting Wikidata items rather than just en-wiki articles, and is actively doing matching at the VIAF end, that may be why there now seems to be a sudden rush of new VIAF <--> Wikidata matches.
You're right that there are a few VIAF matches that have changed. I haven't looked in to any in detail, but two strong possibilities would be either erroneous matches that have been corrected (ie we used to point to the VIAF for somebody quite different); or alternatively that a group of duplicate entries on VIAF may have been merged -- eg if there had been a VIAF for the Library of Congress id, and another for the Getty ULAN id, and the two had not previously been connected.
As to where we're at, matching of the Your Paintings painter identifiers continues to move forwards using mix-n-match. About 80% of the YP identifiers have now been triaged into has / doesn't have / shouldn't have Wikidata item, with progress ongoing; plus I've now got as far as painters born before 1825, using mix-n-match search to match to RDKartists and other databases. Then there will also a stage where new Wikidata items are created for YP ids that currently don't have them but should; and these new ids in turn will also have RKD artists (etc) that they match. So there's still a lot to do going forward, and the tracking pages will continue to need updates if they are to reflect that.
At the moment it's still done using about four scripts that I sequentially run by hand on an occasional basis. The one I'd have to write a bit more code to integrate is the one that merges in the article names on en-wiki for the Wikidata items, because these are currently got using an Autolist query which is then saved manually. I'd need to look into how to replace that batch look-up with an API call, if I was to make the whole thing more integrated and run on regular basis (weekly?) I'm happy to do that work if anybody wants it, but for the time being it's also as easy just to go on doing what I've been doing, generating the updates in a partially manual way. So I'm happy to be open to views, if anybody has got any strong preferences either way. Jheald (talk) 23:27, 4 May 2015 (UTC)

Jheald what is to be done here? I have no followed the entire discussion to be honest. -- Magioladitis (talk) 08:49, 15 July 2015 (UTC)

Bots that have completed the trial period

edit WP:BRFA/Cyberbot II_5

Cyberbot II 5

Operator: Cyberpower678 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:37, Saturday, June 6, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic and Supervised

Programming language(s): PHP

Source code available: Here

Function overview: Replace existing tagged links as dead with a viable copy of an archived page.

Links to relevant discussions (where appropriate): Here

Edit period(s): Daily, but will likely look it will run continuously.

Estimated number of pages affected: 130,000 to possibly a million.

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: The bot will crawl its way through articles on Wikipedia and attempt to retrieve an archived copy of dead-links at the time closest to original access date, if specified. To avoid persistent edit-warring, users have the option of placing a blank, non-breaking, {{cbignore}} tag on the affected to tell Cyberbot to leave it alone. If the bot makes any changes to the page, a talk page notice is placed alerting the editors there that Cyberbot has tinkered with a ref.

The bots detecting of a dead-link needs to be carefully thought out to avoid false positives, such as temporary site outage. Feel free to suggest some algorithms to add to this detection function. At current the plan is to check for a 200 OK response in the header. If any kind of response that indicates downage, the bot proceeds to add the archived link if available, or otherwise tags it as dead. A rule mechanism can be added to the configurations for sites that follow certain rules when the kill a link.

There is a configuration page that allows the bot to be configured to desired specifications, which can be seen at User:Cyberbot II/Dead-links. The bot attempts to parse various ways references have been formatted and attempts to keep consistent as to not destroy the citation. Even though the option to not touch an archived source is available, Cyberbot II will attempt to repair misformatted sources using archives if it comes across any.

Any link/source that is still alive, Cyberbot can check for an available archive copy, and the request site be archived, if it can't find any.

The bot can forcibly verify if the link is actually dead, or be set to blindly trust references tagged as dead.

The bot may need some further developing depending on what additional issues crop up, but is otherwise ready to be tested.


I think this is a great idea. One thought: there are several kinds of dead links - (a) sometimes the site is completely defunct and the domain simply doesn't work - there is no server there any more, (b) sometimes the site has been bought by another entity and whatever used to be there isn't there any more, so most things get a 404, (c) sometimes a news story is removed and now gets a 404, or (d) sometimes a news story is removed and is now a 30x redirect to another page.

For a, b, or c, what you are describing is a great idea and probably completely solves the problem. For (d), it may be tricky to resolve whether this is really a dead link or whether they merely relocated the article.

One thought/idea: can you have a maintainable list of newspapers that are known to only leave their articles available online for a certain amount of time? The Roanoke Times for example, I think only leaves things up for maybe six months. Sometimes, they might redirect you to a list of other articles by the same person, e.g. [1] which was a specific article by Andy Bitter and now takes you to a list of Andy Bitter's latest articles. Other times you just get a 404, e.g. [2]. Since links from are completely predictable that they will disappear after six months, you could automatically replace 302s, whereas for some other sites, you might tag it for review instead of making the replacement on a 302. An additional possible enhancement would be that, knowing that the article is going to disappear in six months, you could even submit it to one of the web citation places so that we can know it will be archived, even if misses a particular article. --B (talk) 18:38, 9 June 2015 (UTC)

First time comment on a BRFA. For d) one possibility (for cite templates with additional parameters such as author, date, location or title) would be for the bot to check if these words appear on the target page of the link. Non-404 dead links usually lack these. Jo-Jo Eumerus (talk) 18:54, 9 June 2015 (UTC)
This is really great input, and it should be possible, to maintain such a list via a control panel I've already set up. Right now, my development is focused various ways a reference has been formatted and appropriately parsing it and modifying it when needed.—cyberpowerChat:Limited Access 20:42, 9 June 2015 (UTC)
Another thing that came to mind: Can the bot check more than one archive service? Such as both the Wayback one and WebCite? Jo-Jo Eumerus (talk) 16:13, 13 June 2015 (UTC)
I really wish you informed me of that earlier, the bot has been written around the WayBack machine. Also, the WebCite doesn't seem to have an API, or a way to immediately look up a website. It seems to prefer to email the user with a list of archive links. My bot doesn't have email though, so WebCite is out atm. Screen scraping by the looks of it may not be effective either.—cyberpowerChat:Online 16:29, 13 June 2015 (UTC)
I can however program the bot to ignore any reference that have a webcite template or archive link.—cyberpowerChat:Online 16:35, 13 June 2015 (UTC)
Rats. I know I have to suggest things more quickly. Category:Citation Style Vancouver templates should perhaps be included as well, some of them allow for URLs. Jo-Jo Eumerus (talk) 18:58, 13 June 2015 (UTC)

Since you mentioned this, how do you avoids things like "temporary site outage"? Links go temporary bad very often. Sometimes there's a DNS issues. Sometimes regional servers or cache servers are down. Sometimes clouds are having issues. There's scheduled maintenances and general user errors. It is definitely unreliable to check a link only once.

I don't want to repeat all the comments from previous BRFAs, but there's tons of exceptions that you have to monitor. Like I've had sites return 200 and a page missing error just as returning 404 and valid content. I've had sites ignore me because of not having some expected user agent, allowing/denying cookies, having/not having referrer, being from certain region, not viewing ads, not loading scripts, not redirecting or redirecting to a wrong place or failing redirect in scripts, HEAD and GET returning different results, and a hundred other things. —  HELLKNOWZ  ▎TALK 18:03, 24 June 2015 (UTC)

Those are issues that need to be controlled for even without a bot. If an average editor tries to follow an external link and comes to a 404 page, that editor is as likely to replace the link with a working one, even if the 404 page only comes from a temporary cite error. If there is an archived version of the page, and the link is changed to that, then no information is lost. bd2412 T 18:09, 24 June 2015 (UTC)
Even if it is a temporary downage, Cyberbot will simply be adding an archived version of the link to the original citation either through the use of the wayback template or if using a cite template through the archive-url parameter. Nothing is lost. The verification procedure will be very erroneous at first but as I get more information, refinements can be easily added. Rules can be added to the bot's configuration page for ones with regular problems. If the bot is being problematic with a source, users can attach a {{cbignore}} tag to the citation to tell the bot to go away.—cyberpowerChat:Limited Access 18:46, 24 June 2015 (UTC)
Adding an archive url implies the link is dead, unless you add |deadurl=no, which implies it is not dead at this time. There was brief discussion on this (can't really recall where), and sending a user to a slower, cached version when a live one is available was deemed "bad". I would say you need consensus for making archive links the default links when bot has known detection errors and links may be live. It may be low enough that people don't care as long as there are archives for really dead links. —  HELLKNOWZ  ▎TALK 19:35, 24 June 2015 (UTC)
There is a clear consensus for a bot to do this at the Village Pump discussion. The problem of dead links is substantial, and the slim chance that a website will be down temporarily when the bot checks is vastly outweighed by the benefit of fixing links that are actually bad. I would also suggest that a site that goes down "temporarily" may not be the best site to link to either. bd2412 T 19:48, 24 June 2015 (UTC)
I linked to the discussion that supports this bot. The bot leaves a message on the talk advising the user to review the bot's edit and fix as needed. So any link changed that shouldn't be changed can be fixed and tagged with {{cbignore}}.—cyberpowerChat:Online 20:03, 24 June 2015 (UTC)
Worth noting also that when I manually repair dead links, the site is usually down (although I have encountered a few working links which were presumably tagged during temporary outages) and the archived links almost always work. These errors do constitute only a minor share of all replacements, in my experience.Jo-Jo Eumerus (talk) 20:07, 24 June 2015 (UTC)
I see consensus for dead links, not most likely dead links though. We had such consensus already, and this is a previously approved bot task. There is no question that we need a bot, the question is what error rate in what areas is allowed? The VP proposal was worded "could we have such a bot in theory?", not "we have a bot that will have x% error rate, is this acceptable?" We are talking hundreds of thousands of links here. Even a 0.01% error rate is thousands of links. From what Cyberpower says, it would be higher and we know some cases cannot be avoided. BRFA needs to show either close to 0% error rate or clear consensus that an error rate is acceptable (see, for example, ClueBot NG BRFA). This is described as part of WP:CONTEXTBOT. —  HELLKNOWZ  ▎TALK 21:25, 24 June 2015 (UTC)
If we are talking about links that sometimes work and sometimes don't (and therefore might not be working when the bot checks), I think it's pretty obvious that we are better off with a link to an archived page that works all the time. It's not an error at all to replace a questionable link with a stable link to the same content. bd2412 T 22:08, 24 June 2015 (UTC)

"If the bot makes any changes to the page, a talk page notice is placed alerting the editors there that Cyberbot has tinkered with a ref." -- Is there consensus for this? That's a lot of messages. —  HELLKNOWZ  ▎TALK 21:25, 24 June 2015 (UTC)

Technically, 0.01% would be tens of links. I think we'll need a test run to establish how reliable the link replacement is, though.Jo-Jo Eumerus (talk) 21:45, 24 June 2015 (UTC)
It's been asked for a couple times, it can be switched off. Link checking can be switched off too. It would drastically speed the bot up.—cyberpowerChat:Online 21:48, 24 June 2015 (UTC)

The main problem I see with this is automatically trying to identify whether a link is up or down. It's ridiculously tough for a bot to do it (reflinks had a ton of code for it), and IIRC sites like CNN and/or NYT blocked the toolserver in the past. I also don't see any advantage to using a special exclusion template and spamming talk pages. I also had written my own code for this (BRFA) which I'll resuscitate. It'll be great to have multiple bots working on this! Legoktm (talk) 22:22, 26 June 2015 (UTC)

I have been discussing with Legoktm on IRC and I think 2 bots is a lovely idea. More coverage quicker. My bot shouldn't have any conflicts with another bot. Legoktm and I will be implementing a feature to allow them both to acknowledge {{nobots|deny=InternetArchiveBot}}. As for checking whether a link is dead or not, it seems to be an agreement among us to leave that feature off for now, or indefinitely. As spamming talk pages, we can see how that works out. If it's too much after the trial, we can turn that off too.—cyberpowerChat:Online 23:16, 26 June 2015 (UTC)

Development Status

  • Yes check.svg Done Fetch appropriate articles
  • Yes check.svg Done Recognize and parse various formats in references
  • Yes check.svg Done Parse a template properly
  • Yes check.svg Done Recognize and parse various formatted external links, and citations
  • Yes check.svg Done Detect if a link is really dead
  • Yes check.svg Done Submit archive requests for links that are alive but have no archive
  • Yes check.svg Done Detect if the link has been marked as dead
  • Yes check.svg Done Detect if the link has an archive
  • Yes check.svg Done Handle the link properly
  • Yes check.svg Done Scan the archive and retrieve an archive
  • Yes check.svg Done Properly format new references and links
  • Yes check.svg Done Fix improperly formatted templates
  • Yes check.svg Done Notify on talk page
  • Yes check.svg Done Log report generator
  • Yes check.svg Done Refinements
  • ((BAGAssistanceNeeded)) Development is finished, source code has been posted and I believe the bot is ready for a trial run.—cyberpowerChat:Online 23:08, 22 June 2015 (UTC)
  • Before being approved for a trial please answer the following questions:
    1. Should Cyberbot scan all links on specified pages, or just references?
    2. Should Cyberbot scan all pages, or only those contain dead-link tags?
    3. Should Cyberbot modify all links, only those tagged as dead, or tagged as dead and those the bot see as dead?
    4. Should the bot verify if a tagged link is really dead, or blindly trust dead-link tags?
    5. Should the bot provide the latest archived copy or those closest to the set access date of source?
    6. Should Cyberbot touch sources that already have archives on them?
    7. Should Cyberbot leave a message on the respective talk page when it edits a page?
    8. Can you suggest a subject line Cyberbot should use for talk page messages? You can use keywords such as {linksrescued}, {linkstagged}, {linksmodified}, and {namespacepage}.
    9. Can you suggest the body of the message Cyberbot should leave behind. You can use the same syntax mentioned in the previous question. Use \n for newlines.
    10. Should Cyberbot check if a link is dead, as in check those that aren't tagged?
    11. Should Cyberbot make sure an archived copy is available and ready should the live link ever go down?
  • All these questions are individual configuration options for this bot. Knowing how the community wants would be of a great help.
  • Here is my opinion on the matter:
    1. Bad links are bad links, so it shouldn't make a difference if they are in references or text.
      I interpret that as all links.—cyberpowerChat:Online 01:43, 23 June 2015 (UTC)
      Yes, all links. If a link is dead, it should be made good.
    2. Same as above, although I would start with those that are tagged.
      This is a configuration question. There are 2 scanning methods, one scans all pages, the populates pages that contain dead-link templates and scans those. It sounds like you want all pages in the end.—cyberpowerChat:Online 01:43, 23 June 2015 (UTC)
      In that case, I would go with all pages. Going with tagged links is useful because it focuses on links known to be dead, but if the resources exist to do all pages, go for it.
    3. I presume the bot will do nothing to links that appear to be in fine working order. If it sees a link as dead, it should fix it, tagged or not.
    4. I think it makes more sense to verify. Basically, it should be agnostic about the tags, since those may be erroneous.
      I agree that verification is a must, but the process still quite erroneous. Certain dead links do return a 200 OK and the bot will see that as a live link.
      To what extent can the process be tweaked as it goes? Can we start with clearly dead links, and then refine the process for links that are tagged as dead but do not show up as dead?
      Rules can be introduced using the rules parameter in the configuration page. Verification algorithms can be improved on demand. The bot's source code has been for maintainability.—cyberpowerChat:Limited Access 02:36, 23 June 2015 (UTC)
      Ok - not to throw in new complications, but if a links is tagged as a dead link, but the bot thinks it's a live link, perhaps the "dead link" tag should either be removed or modified to indicate that there's some question about whether it really is a dead link. Also, this raises an additional question for me. What does the bot do when it finds a dead link for which no fix exists (i.e. no archive)? Perhaps it should also note this on the talk page, so editors will know that whatever proposition the link is supposed to support will need a new source. bd2412 T 02:55, 23 June 2015 (UTC)
      The bot would simply remove the tag if it was deemed as alive and the bot can't find an archive it will tag it as dead. Any modification done to the page results in a talk page notification. Both these features can be turned on and off on the configuration page.—cyberpowerChat:Limited Access 03:50, 23 June 2015 (UTC)
    5. I would prefer the closest archive to the source date, since the contents of the page may have changed.
    6. I'm not sure what you mean by "sources that already have archives". If the link already purports to point to an archive I don't know how we would find an archive of that link.
      What I mean by that is, if a source contains a reference to an archive, should Cyberbot fiddle with it or leave it alone? My recommendation is to leave them alone.—cyberpowerChat:Online 01:43, 23 June 2015 (UTC)
    7. I have no preference with respect to talk page messages. Since the operation is a bit complicated, I guess it would be too much to describe in a tag on the page.
      Have you seen the source code yet? Compared to that, notifying on the talk page is easy. :p—cyberpowerChat:Online 01:43, 23 June 2015 (UTC)
    8. This is more than a tagging or modification. I would just say "Dead link(s) replaced with archived links".
    9. Any message should briefly describe the operation, and state that "[this] dead link was replaced with [this] link from the Internet Archive" (or whatever service is used).
    10. As above, the concern is the links, irrespective of the tags. Although we can start with tagged links, ultimately every link should be checked.
    11. Checking for archives of working links seems a bit out of scope, and a bigger task. I don't recall whether we had determined that there is a way to prompt Internet Archive or another such service to archive a link.
      Some users have asked for it, and I've been able to implement without much cost to resources. I recommend this be turned on.
      If there's a call for it, sure.
  • Cheers! bd2412 T 01:34, 23 June 2015 (UTC)

What should we do here? -- Magioladitis (talk) 13:54, 28 June 2015 (UTC)

Approve for a trial, obviously. :p—cyberpowerChat:Online 14:03, 28 June 2015 (UTC)

First trial (100 edits)

Approved for trial (100 edits). ·addshore· talk to me! 15:52, 28 June 2015 (UTC)

50 article edits and 49 talk page edits already done (counting by way of the edit reasons). I'll inspect the article edits. Jo-Jo Eumerus (talk) 18:00, 28 June 2015 (UTC)
Trial complete. A random review of the edits reveals no problems.—cyberpowerChat:Online 18:24, 28 June 2015 (UTC)
A few notes of mine:
  • The bot is adding the {{wayback}} template without a space between the template and the preceding markup, leaving no space between any punctuation and the "Archived" template output in seeing mode. Is this right? (On Paul Bonner, it did add a space in one of the two replacements).
    Fixed Though I should note it didn't edit Paul Bonner.—cyberpowerChat:Online 19:43, 28 June 2015 (UTC)
    Whoops. It was Peter Bonner, not Paul. Sorry!
  • The bot didn't change the two broken external links on Zeba Islam Seraj. I assume that the bot noticed that the most recent archived copies are also broken? only returns the closest working copy of the page, or it returns nothing. If the bot gets nothing, it does nothing with the link.—cyberpowerChat:Online 19:52, 28 June 2015 (UTC)
  • Floating ecopolis had a previously archived link that was broken by the bot. Apparently it tried to archive the already archived link.
    Actually the wayback was being improperly used. The generated link is unusable. The bot attempted to fix the formatting, but it failed, it should have removed the 1= parameter.—cyberpowerChat:Online 19:57, 28 June 2015 (UTC)
    FixedcyberpowerChat:Online 21:15, 28 June 2015 (UTC)
  • Talysh Khanate also had an incomplete replacement, not sure what went wrong there.
    Not sure what happened there either, I'll have to look at that closely.
    FixedcyberpowerChat:Online 21:31, 28 June 2015 (UTC)
  • One Wayback archive was of an already broken page (last link on the Überlingen article). The Margin of error, Palmer's College and Gecko (software) replacement also appears to be already broken. Same for the last link on Koreatown, Los Angeles (or so it appears to me).
    The bot can't be expected to accurately determine if the archive is good or not, that's why the suggestion of human review.—cyberpowerChat:Online 20:27, 28 June 2015 (UTC)
  • In a few instances, the bot replaced a working link with another working link because the original was mis-tagged as broken (Parsley Sidings, Hot Cross and Vanity (singer)).
    Don't blame the bot if someone else mistagged it. The bot can't be expected to know if the link is really dead or not when there is consensus to shut the link verification process off.—cyberpowerChat:Online 20:27, 28 June 2015 (UTC)

That's all from me - only the first four things are potentially problematic. Jo-Jo Eumerus (talk) 18:53, 28 June 2015 (UTC)

  • [3] - the bot grabbed the earliest archive, why earliest and not latest? (P.S. I only checked like 10 pages, so this isn't a full review.) —  HELLKNOWZ  ▎TALK 19:51, 28 June 2015 (UTC)
    I'm assuming it has something to do with the blank accessdate parameter making the bot assume a unix timestamp of 0 and resulting in it trying to pull an archive as close to January 1, 1970 as possible. I;ll put in a fix for that.—cyberpowerChat:Online 20:27, 28 June 2015 (UTC)
    WikiBlame could perhaps be implemented somehow? That's what I use when finding the best archived-link. (tJosve05a (c) 20:37, 28 June 2015 (UTC)
    In all my years of being here, I never learned what WikiBlame is. Can someone enlighten me?—cyberpowerChat:Online 20:50, 28 June 2015 (UTC)
    A tool for searching in the revision history of a page, per Wikipedia:WikiBlame.Jo-Jo Eumerus (talk) 21:39, 28 June 2015 (UTC)
    How would that help?
    FixedcyberpowerChat:Online 21:53, 28 June 2015 (UTC)
  • Josve05a has brought up more issues that I missed and have addressed them.—cyberpowerChat:Online 13:50, 29 June 2015 (UTC)

Second trial (300 edits)

  • The previous trial has concluded, and the brought up issues have been addressed. Requesting another trial of 500 this time.—cyberpowerChat:Online 22:04, 28 June 2015 (UTC)

Symbol full support vote.svg Approved for extended trial (300 edits). 500 is too much for us to check. Let's do 300 first. -- Magioladitis (talk) 08:09, 5 July 2015 (UTC)

Notes by Josve05a

I've only done tests on a few articles to find the worst bugs. These are not all, but those I found when checking a selective number of articles to see its reliability. Saying "the bot can't know if it is dead on Wayback" is not a good excuse. That is a reason not to allow the bot task.

Legend for recurring errors:

Code Error
(b) The source URL was not dead.
(c) The archive-url is dead.
Diff URL Archived Note
[13] [14] [15] (c)
[16] [17] [18] (c)
[19] - - Added |dead-url=yes, even though |deadurl=yes already existed
[20] [21] [22] (c)
^ [23] [24] (b)
^ [25] [26] (c)
[27] [28] [29] (c)
[30] [31] [32] (b)

(tJosve05a (c) 16:18, 5 July 2015 (UTC)

{{OperatorAssistanceNeeded|D}} Magioladitis (talk) 22:46, 7 July 2015 (UTC)

Trial complete. Sorry. The bot is still waiting to receive the fixes to mentioned bugs.—cyberpowerChat:Online 22:53, 7 July 2015 (UTC)
Rome wasn't built in a day. ;-) bd2412 T 23:15, 7 July 2015 (UTC)
I have addressed (c). The likelihood of a bad archive being added should be greatly reduced now. A solution for items 1 and 2 is already present. I have fixed item number 6 and item number 13 so far.—cyberpowerChat:Online 13:09, 10 July 2015 (UTC)
It took some searching but I managed to get 12 and 14 fixed and confirmed it with this edit. Also an addendum, I have instructed the bot to change the links in external links directly, if they are not inside reference tags. That way when fixing sources and links, I'm not disrupting the article with a wayback template.—cyberpowerChat:Offline 06:04, 11 July 2015 (UTC)

Third trial (500 edits)

The bot appears to be ready for one last trial before approval.—cyberpowerChat:Offline 06:04, 11 July 2015 (UTC)

I have reviewed the bot's configuration once again (last time I did it was before the first trial), and it seems like my earlier major concern about VERIFY_DEAD being set to true is resolved. (Note to closing BAGer: bot does not seem to have consensus to run with VERIFY_DEAD set to true, since it is too prone to errors.)
I'm still unsure about the talk page notices. I cleaned up the wording a bit, adding a {diff} label (which Cyberpower says he can implement) and removing unnecessary information, but I'm still debating the general usefulness, since it will appear on tens (possibly hundreds) of thousands of talk pages. I would like to see some more comments on this.
My only other issue is concerning PAGE_SCAN. As I understand it, setting this to false (as Cyberpower intends to do when the bot is approved) will involve tens of millions of archival requests to the Internet Archive (Wikipedia has 81,235,194 external links at last count; some of these are already archived but many will not be). I understand this is in line with the goals of that service, but I'm not sure if this is a good idea without explicit confirmation from them. So let's hold off on setting PAGE_SCAN to 0 after approval until we get more details on this.
Anyway: Symbol full support vote.svg Approved for extended trial (500 edits). Hopefully I will be able to do a careful review of the results of this trial after it is complete. — Earwig talk 05:00, 13 July 2015 (UTC)
Trial complete.. I looked over the edits and can't find any problems with them except that some pages don't seem to be archiving correctly, based on the talk messages left behind. The source rescuing hasn't revealed any bugs this time, but I would appreciate a seperate set of eyes on this too in case I missed something.—cyberpowerChat:Online 13:43, 14 July 2015 (UTC)

The Earwig here you are! 500 pages :) Josve05a you may also want to have a look! -- Magioladitis (talk) 08:53, 15 July 2015 (UTC)

I'll be checking the articlespace contributions. As a note/question, I am not sure how much importance should be placed on working link-->working archive or nonworking link-->nonworking archive replacements; they appear to be fairly minor issues on their own (unlike working link-->nonworking archive replacements). Jo-Jo Eumerus (talk, contributions) 13:30, 15 July 2015 (UTC)
Alright, from Ani DiFranco forward I see [36] where the bot fixed one link but de-{{dead link}}-ed two and [37] where a citation already using a Webarchive link got its "wayback" part removed. Jo-Jo Eumerus (talk, contributions) 14:03, 15 July 2015 (UTC)
I somehow missed that edit, which begs to ask how many others I missed. As for that edit, it's reasonable to assume the bot got confused as 2 sources were placed in one reference. Unless I'm mistaken, only one source should be in a reference at a time, so that should rather be fixed on the article. True?—cyberpowerChat:Online 14:14, 15 July 2015 (UTC)
@Jo-Jo Eumerus: Is there anything wrong with that second case? It doesn't seem to be an explicit part of the task description, but the end result is better formatted since it does make the original source visible. @Cyberpower678: It is odd, yes, but I don't think it's technically disallowed – either way, the bot shouldn't be doing that, even though it's understandable why the bug would arise in the first place. — Earwig talk 05:20, 17 July 2015 (UTC)
Mmm. Yeah, with your argument I think that can be done. I'll review some more edits from Assembly line forward. Jo-Jo Eumerus (talk, contributions) 10:58, 17 July 2015 (UTC)
Alrighty, aside from the usual Austin, Texas had a mistagged link (because it still works) changed to a broken Wayback link, but it's clearly noted on the talk page so I guess it's not a major issue. Nothing else serious to see. Jo-Jo Eumerus (talk, contributions) 11:49, 17 July 2015 (UTC)
Then this might be a problem. The way the bot is coded, it's designed to look for reference tags, external links, and citation templates. If it finds the reference tag, it looks for the source inside it. It can't see 2 sources the way it's coded, and updating that would require major rewrites of the bot. While I can adjust the regex to absorb the 2 links no problem, feeding them into the parser might be a problem as it only takes in one link. Ideally, I would rather simply fix this issue on the article since this seems to have occurred only once of all the trials.—cyberpowerChat:Online 14:10, 17 July 2015 (UTC)
Just thinking outside the box here, but why don't we make a separate bot to find and fix instances of multiple references in a single tag, run that on all of Wikipedia, and then run this one when that one is done. bd2412 T 14:29, 17 July 2015 (UTC)
Unfortunately, you're thinking outside of our galaxy here. Such a bot would be extremely difficult to program. How would it know what text to put where. In the case here, this reference has 2 external and text mentioning both links. Your bot would need to master the english language first. I do like the idea though.—cyberpowerChat:Online 15:16, 17 July 2015 (UTC)
My suggestion would be to correct the reference manually and move on. I have a sneaking suspicion that this issue will come up so rarely, that any human could easily fix it. And the bot won't come back to it once it has an archive link, or an ignore tag.— Preceding unsigned comment added by cyberpower678 (talkcontribs)
──────────────────────────────────────────────────────────────────────────────────────────────────── Is there a way to tell these problem refs? Maybe the bot can simply list them somewhere (or tag them) and have a human repair them before doing the botwork. Jo-Jo Eumerus (talk, contributions) 15:46, 17 July 2015 (UTC)
Yes. That can easily be done. But to do all 12 million articles may take some time.—cyberpowerChat:Online 15:53, 17 July 2015 (UTC)
OK. I believe this is a problem that can be actively dealt while the bot is working. Were there any other problems? -- Magioladitis (talk) 10:36, 24 July 2015 (UTC)
Not that I am aware of.—cyberpowerChat:Online 13:54, 24 July 2015 (UTC)

Josve05a did you had the chance to check (some of) the 500 edits? -- Magioladitis (talk) 14:32, 24 July 2015 (UTC)

I did do some spot tests and checks and the rate of error (dead archives etc.) is within my acceptable parameters, in my opinion. However, I would suggest a mew maintenence tempate/category be added next to the links/on the talk page, so a human can do a second review of all bot edtis if wanted, instead of having to look at edit logs. Like "Template:Bot link-archivation" or something, in monthly categories. Just a suggestion, to catch those which may be in error. (tJosve05a (c) 16:50, 24 July 2015 (UTC)
Would the talk page notifiers serve that scope? Jo-Jo Eumerus (talk, contributions) 16:55, 24 July 2015 (UTC)
Not unless they all got "collected at one page, like if they had a category/template in them. It is one thing to "see" the talk page notifiers while on the article, another to systematicly manually review them afterwards. The notifiers is to "let you know" that it happened, a category where hese could be in would to to "allow manual reviews"...I'm just mumbling right now... (tJosve05a (c) 17:10, 24 July 2015 (UTC)

Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.

Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at anytime. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.