Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)
Jump to: navigation, search
Shortcuts:

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators


Current requests for approval

Bots in a trial period

edit WP:BRFA/HostBot_6

HostBot 6

Operator: Jmorgan (WMF) (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:19, Sunday, March 1, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Script to gather sample (th_invitees.py); script to send invites (send_th_invites.py)

Function overview: Deliver a template invitation to the Wikipedia Co-op to new Wikipedia editors via the editors' talk page.

Links to relevant discussions (where appropriate):

Edit period(s): daily

Estimated number of pages affected: 50-100 per day

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Functionally equivalent to HostBot's approved tasks for Teahouse and The Wikipedia Adventure invites. The bot will invite a subset of potential Teahouse invitees to the Co-op instead.

Discussion

So, who is it that wants this? Links. 110.174.86.241 (talk) 00:35, 2 March 2015 (UTC)

Please read this. This bot will work for the new mentoring project, WP:Co-op. Being a mentor at Co-op, I support this bot request. Cheers, Jim Carter 09:10, 2 March 2015 (UTC)
  • As the project manager of The Co-op, there are 27 editors who have expressed interest in mentoring newer editors for our pilot. Invitations sent out for the Teahouse and The Wikipedia Adventure were and continue to be an effective way to make these spaces visible to newer editors looking for help. In order to test out The Co-op as a mentorship space effectively and in a reasonable amount of time (1 month), using this system of invitations to newer editors seems like a sensible approach. I, JethroBT drop me a line 23:09, 2 March 2015 (UTC)

There doesn't seem to be any objections, and you've done this kind of thing before. Approved for trial (7 days). Josh Parris 10:15, 4 March 2015 (UTC)

edit WP:BRFA/Commons fair use upload bot_3

Commons fair use upload bot 3

Operator:  (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 19:49, Wednesday January 7, 2015 (UTC)

Automatic

Programming language(s): Python

Source code available:

I have this working passively locally, but yet to test out on labs with sample images. When I have significant updates to the code, I will consider uploading a new version of the source under https://github.com/faebug as the wikigit repository is unlikely to be maintained. Migrated version of code on github link above, test version only.

Function overview:

This is a cross-wiki bot to copy files at risk of deletion on Wikimedia Commons to local wikis where they can be retained under either fair use or the image is public domain in the source country but may be problematic under Commons interpretations (such as the URAA).

Links to relevant discussions (where appropriate):

Edit period(s):

  • Previously running hourly, without any issues, so I'm planning on doing the same.

Estimated number of pages affected:

  • The bot was successfully running prior to the toolserver shutdown, coincidentally the last transferred files were some of my own. See ListFiles.

Exclusion compliant : Yes


Already has a bot flag : No Effectively this was reset by the usurp process.

A trial may not be needed considering the track record, however If there is one, I would prefer it to be a month or longer as my availability may be patchy.

Function details: This bot re-uploads files that are deleted on Commons to projects where they are in use, if those projects accept non-free files. It goes over the files in Category:Pending fair use deletes, uploads them to local wikis, then marks them for speedy deletion on Commons when it's done. Any article using the images receives a notice that the file has been re-uploaded as a fair use candidate. Local wikis are responsible for determining if the re-uploaded image is eligible for their non-free content policy, and deleting it in a timely manner if it is not. If for some reason it's not able to upload the image, it will leave an error message on the file page and not mark it for deletion.

Discussion

Arbcom exemption/requirements
  • The Arbitration Committee has passed the following motion which related to this request for approval:
Despite the restrictions on his editing images related to sexuality, may operate the Commons fair use upload bot if the Bot Approvals Group approves it.

The bot may upload sexuality images that would, if Fæ himself had uploaded them to the English Wikipedia, breach Fæ's restriction, only if the upload is requested by a third party.

The bot shall maintain a log of: the images it uploads; the names of the articles on the English Wikipedia where the images appear at the time of upload; and the username of the Commons editor requesting the transfer to the English Wikipedia.

For the Arbitration Committee, Callanecc (talkcontribslogs) 01:24, 15 January 2015 (UTC)

Bot discussion
  • Can you please indicate on the local userpage who owns the account? Courcelles 22:26, 7 January 2015 (UTC)
    • Good point. Done, rather than relying on visiting the Commons page. -- (talk) 22:35, 7 January 2015 (UTC)
  • After playing around with the bot locally and having it fall over a few times, I am planning to rewrite it to rely on pywikibot rather than mwclient as its interface to the API. This will probably work far more reliably on WMFlabs and be much easier to maintain in future years. Though the code is not all that long, with other commitments and the increased testing needed, this will take weeks rather than a few days. -- (talk) 08:49, 12 January 2015 (UTC)
    Note, some 'real' images are ready for the bot to localize, see Commons:Deletion requests/Files uploaded by SPVII DrFresh26. I'm advising that the bot should be operational within a week or two.
    The account commonsfairuseupload has been set up on labs. I have a test version running under Pywikibot core on WMFlabs, however there is a fair amount of rewriting to be done before running it live and it makes sense to put a first snapshot up on github. -- (talk) 23:25, 13 January 2015 (UTC)
    Early snapshot now on github as above. -- (talk) 13:35, 14 January 2015 (UTC)
    A separate bot flag restoration request has been raised on Commons, c:Commons:Bots/Requests/Commons fair use upload bot. -- (talk) 12:52, 15 January 2015 (UTC)
  • This bot would just perform the same task as User:Dcoetzee's bot, right? How will the bot handle Bugzilla:61656? Dcoetzee's bot handled this by reverting CommonsDelinker, see e.g. Special:Diff/615048249. Ideally, this should be fixed in CommonsDelinker instead of the fair use upload bot, but nothing seems to have happened in CommonsDelinker since the bug was reported in 2010. --Stefan2 (talk) 14:53, 20 January 2015 (UTC)
    To be honest, I am not 100% sure I understand the issue, not having looked into the functionality of the delinker (note the bug was reported in 2010, but Dcoetzee's code was successfully running from 2012 to 2014 the way it was). However, the way the CFUUB behaves at the moment is that it locally uploads the file under an amended file name and inserts a redirect as the old local image page text. This should leave the old name untouched to avoid permission problems on local wikis. My understanding is that this precautionary step also avoids possible conflict with the delinker when the original is speedy deleted from Commons. If folks want this to work differently, then this might be something to amend in the delinker's behaviour, rather than building in odd intelligent reverts into CFUUB to undo the work of the delinker.
I have yet to convert this bit of code to pywikibot, but if you look in the current test status source code linked above for the two places that site.upload(open('/tmp/downloadedfile'), newfilename, newdesc, ignore=True) occurs, these are relevant.
As I am regular dialogue with @Steinsplitter:, I would defer to his judgement as he has recently been active in updating the delinker and would welcome his advice during testing. Perhaps he could take ownership of this bug request too? I could do with some test images, so maybe we can agree on a few and demonstrate the system in the trial period. -- (talk) 16:28, 20 January 2015 (UTC)
When Dcoetzee's bot uploaded files, it worked like this:
  1. Someone on Commons requested a local upload
  2. Dcoetzee's bot uploaded the file under a slightly different name by inserting "from Commons" in the file name to avoid permission problems
  3. The bot created a redirect from the old name to the uploaded file
  4. The file was deleted on Commons by a Commons admin
  5. CommonsDelinker failed to notice that a redirect existed locally and therefore incorrectly removed the file from English Wikipedia
  6. Dcoetzee's bot reverted CommonsDelinker's incorrect removal
I think that step 5 should be fixed by correcting the bug in CommonsDelinker, but Dcoetzee decided to fix it by introducing step 6 because the CommonsDelinker programmers didn't fix the bug for several years. There is some discussion in Wikipedia:Bots/Requests for approval/Commons fair use upload bot 2, for example in the "Function details" section. If Steinsplitter can fix CommonsDelinker, then that would be much better. --Stefan2 (talk) 16:49, 20 January 2015 (UTC)
Agreed. I'll pay attention to testing this out based on the fact that Steinsplitter believes this bug has been addressed in Magnus' new version of the delinker (Phabricator:T63656). -- (talk) 18:16, 20 January 2015 (UTC)
See my comment here --Steinsplitter (talk) 18:17, 20 January 2015 (UTC)

As best I can tell, there's no reason to delay a trial. Is that the case? Josh Parris 06:35, 20 February 2015 (UTC)

I'm considering putting some time aside to trial the code in about a week. -- (talk) 09:44, 20 February 2015 (UTC)
My understanding is that you intend to monitor closely, but this is a rewritten bot. I'm also under the impression that there won't be a huge number of edits. As such, Approved for trial (30 edits or 30 days)., commencing sometime in the next couple of weeks. Josh Parris 19:24, 23 February 2015 (UTC)

edit WP:BRFA/Demibot

Demibot

Operator: Demize (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:33, Friday November 21, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: On Github

Function overview: Generate indexes of talk page archives

Links to relevant discussions (where appropriate):

Edit period(s): Daily (more often if discussion believes it's warranted)

Estimated number of pages affected: 3207 (based on current configuration and logging)

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: Builds a list of all pages transcluding {{User:HBC Archive Indexerbot/Optin}} (currently only in the Talk, User Talk, Wikipedia Talk, File Talk, and WikiMedia Talk namespaces, although I will extend that to other namespaces when the bot is operational) and generates archive indexes according to the specification of HBC Archive Indexerbot. Specifically:

  • Finds and parses the Optin template on each page
  • Builds a list of all pages that fit the mask specified on the page
  • Iterates through that list and finds different thread with a regex, then does the following for each thread:
    • Stores the title of the topic
    • Estimates the number of replies
    • Finds the earliest and latest times in the thread, stores those as well as the difference between them
    • Generates a link to the thread
  • Grabs either the default or a specified template and generates the talk page archive index from this
  • Writes the talk page archive index to the page specified in the Optin template (this is the only edit the bot will make aside from to its own log page)
  • Logs its actions to User:Demibot/log

Discussion

I've been writing the bot already, and it's mostly functional for my own talk page, but before I write anything that makes edits (even to my own talk page index) I wanted to bring this to BRFA. The code is a bit of a mess right now, partly because it's my first time writing Python, but it's functional. It doesn't generate the index from a template yet, but the functionality is there; I just need to write a parser for the template. This bot replaces the now-inactive HBC Archive Indexerbot and the inactive Task 15 on Legobot, neither of which appear to be coming back any time soon. If anybody has any questions, I'll be glad to answer them. If anyone has any concerns or suggestions, I'll be glad to hear them. demize (t · c) 20:33, 21 November 2014 (UTC)

This is much needed functionality to the extent that I was considering coding it and requesting permission to run from Their Majesties. I'm sure it will have wide ranging support.
Note: You can of course run it on your own talk pages, even before trial approval.
All the best: Rich Farmbrough15:35, 22 November 2014 (UTC).
  • Pictogram voting info.svg Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 18:12, 24 November 2014 (UTC)
    It did, and I tagged the page it created for deletion (it's been deleted) and fixed the problem that made it create a page that it shouldn't have. I've also run it on my own talk page, and it generates the archive index excellently, in my opinion, with the one flaw that sections without any timestamps have a first comment date of December 31, 9999 and a last comment date of January 1st, 1900. I'm thinking that I'll just make it output dashes instead of dates and durations when there are no timestamps, that'll be a bit better. demize (t · c) 19:00, 24 November 2014 (UTC)
  • Just an update on the current status of the bot: As per above, I have tested this on my own talk page. I have also made a version of the bot that should work on all talk pages. That's been pushed to GitHub, but the changes haven't yet been pulled to the Tools Labs server. What happens next with the bot is up to the BAG. demize (t · c) 22:56, 24 November 2014 (UTC)
  • ((BAGAssistanceNeeded)) As it's been a week since I filed the request with no comment from a BAG member, I'm requesting BAG assistance. I've been making some changes to the code on Github over the past week, but there's very little I can do now without approval for a trial and a trial scope. demize (t · c) 18:20, 28 November 2014 (UTC)
  • (Non-BAG member observation) Demize, any specific reason why the bot isn't marked as exclusion compliant? APerson (talk!) 00:19, 30 November 2014 (UTC)
    Strictly speaking, it isn't. It's opt-in rather than opt-out; it doesn't look for the {{bots}} template at all, since if someone wanted to remove it from their page they could remove the opt-in template. It's still just as easy to stop from editing your pages, just not technically exclusion compliant. demize (t · c) 04:50, 30 November 2014 (UTC)
Approved for trial. Definitely useful. Please do not do a full run but instead trial against one or two users talk page archives, such as mine! (which should already have the HBC template, if not then just add it :) ). Post links once complete! ·addshore· talk to me! 15:15, 4 December 2014 (UTC)
Thanks! I'll get this done tomorrow. I'll run it against mine and yours, and if another user or two volunteers their page I can run it against theirs as well. demize (t · c) 15:23, 4 December 2014 (UTC)
@Demize: Feel free to index my talk page ;) (iirc, I don't have HBC template, so please add it!)  Revi 15:41, 4 December 2014 (UTC)
Trial complete. Fixed a few issues that popped up (turns out my attempts to catch exceptions threw a few, and I added some other things without testing them... they're fixed now). It took longer than I expected due to real life and my need to figure out what on earth the masks for -revi's talk page archives could be. The mask option really wasn't designed for monthly archives, it turns out. That aside, the bot worked as expected once I ironed out the bugs. demize (t · c) 21:03, 9 December 2014 (UTC)
  • In this edit, the bot/botowner added a template with multiple duplicate parameters |mask=. I commented that out. This is about CAT:DUPARG. User talk:Demize described that this is part of the bot, but I think we should not expect multiple parameters (that being undocumented). DePiep (talk) 01:17, 10 December 2014 (UTC)
(edit conflict) DePiep brought up quite a valid point on my talk page regarding the Optin template just now (User talk:Demize#Duplicate_parameters). The duplicate parameters are an issue, and an easily correctable one if I want to make this bot not entirely compatible with the current operating instructions. I could make the masks all be specified as one parameter, with multiple masks separated by semicolons (and this could be done while preserving the ability to specify them as multiple parameters). If there's any way to exclude the template this bot uses from CAT:DUPARG, then that could be done as well. Certainly an issue to consider. demize (t · c) 01:24, 10 December 2014 (UTC)
AFAIK, there are no exceptions possible (cat:duparg is build deep inside mw software). Also, one should consider what such repetition intends versus regular tempalte usage & practice (mw-level). DePiep (talk) 01:29, 10 December 2014 (UTC)
I'm certainly not averse to changing how the bot reads the template. The only issue is that HBC Archive Indexerbot and Legobot both read multiple parameters from the template rather than cramming them all into one parameter. Either way, it doesn't affect anything on the MW level: the template has no content since it's used simply as a way for the bot to find pages to work on and to know what to do on that page. I can certainly see the point of not repeating the parameters to keep within the accepted practices for templates though, and I'll probably write it in with semicolon delimiters when I get the chance... for now, it's almost 21:00 and I have an exam at 08:00, so I'm off for the night. demize (t · c) 01:49, 10 December 2014 (UTC)
Umm, User talk:-revi/Archive Index links has some errors; <span>blah blah</span> are shown on link - it should not be like this...  Revi 11:53, 10 December 2014 (UTC)
Indeed, it does. I'll make it strip HTML tags when I get home, as well as write another regex to make it handle wikilinks, something else I didn't notice was an issue until now. Thanks. demize (t · c) 12:15, 10 December 2014 (UTC)
Because it breaks the section link.  Revi 12:17, 10 December 2014 (UTC)

So, just a few comments from me / some things I would love to see... Naturally when people have a lot of talk page archives the index page is pretty big! Would it be possible to shorten "0 days, 15 hours, 39 minutes" to maybe "0d 15h 39m"? On big index pages this should make quite a difference. Another space saver could be in the Link column changing the text displayed from the link from "User talk:Addshore/Archive 1#Thanks" to "Archive 1#Thanks" for example! Also, If we have the first and the duration do we really need the last? I guess it could help people sort? I'm starting to think the template should have more epic options! ·addshore· talk to me! 20:19, 10 December 2014 (UTC)

The template I used for these pages was the one that has all the variables in it, so it's a bit bigger than the default one (which just has a few columns). I'll definitely take your other suggestions into account, and if you have a template you think should be the default, then let me know!
As for the bot configuration template, I actually just came up with another idea: {{User:HBC Archive Indexerbot/Optin|target=/Archive Index|mask=/Archive <#>|mask2=/Archives/<#>/<#M>|start2=2014|end2=2015|indexhere=yes}} would work for -revi's talk page, and is much simpler than how it is now. It also takes care of the issue of duplicate parameters, but remains mostly backwards-compatible with them (so long as I don't mess up the code too badly). The optional start and end parameters say which number to start from and end at. Start would default to 1. If the end parameter is specified, it'll keep looking through pages until it hits the number specified in end, otherwise the behavior is as it is now (it will stop looking through pages once it finds one that doesn't exist). <#M> would be replaced with month names in order to make monthly archives so much easier to work with. How does this sound? demize (t · c) 21:07, 10 December 2014 (UTC)
Well, since nobody seems to have any objections and I have a fair bit of free time coming up, I'll get started with this either later today or tomorrow. My todo list:
  • Make the bot strip HTML from section titles
  • Remove any extra equals signs from the beginning and end of the section title (minor change to the regex should work; instead of explicitly ==, look for ={2,})
  • Parse wikilinks so that the link title shows up without the rest of the link
  • Implement the other changes I described above
And then I'll be back here, ready to set the bot loose on a few pages again! Some of the items in that list (that is, the first three) might be easier to fix than I'm expecting, I'll have to see if the python library I'm using makes it easier... demize (t · c) 17:10, 22 December 2014 (UTC)
Approved for trial. Approved for trial with the same conditions as before once you have done your poking. Let us know how it goes! ·addshore· talk to me! 16:45, 9 January 2015 (UTC)
Just an update to let everyone know I didn't forget about this. I have most of the changes implemented, but school and work are taking up a fair amount of time right now so I've had less of a chance to work out all the implementation details regarding the updated Optin template than I'd have liked. I should have this ready to go soon! demize (t · c) 15:43, 28 January 2015 (UTC)

edit WP:BRFA/EranBot_2

EranBot 2

Operator: Eran (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:28, Thursday December 18, 2014 (UTC)

Bot does not make edits to mainspace

Programming language(s): Python (pywikibot framework)

Source code available: https://github.com/valhallasw/plagiabot

Function overview: EranBot has been checking all articles tagged with either WP:MED or WP:PHARM for nearly 6 months now. It has picked up more than 200 confirmed issues with copyright violations. Articles that the bot flagged have issues more than half the time. This has allowed us to pick up classes of students with issues around copyright that would have otherwise been missed.

We are considering the possibility of expanding the scope of this bot to other topic areas and hopefully eventually to all of En Wikipedia.

Before we can do this we need

  1. community support / support from the BAG
  2. Turnitin to agree to allow us greater use of their API
  3. the ability to follow up on the concerns raised (either with staff or volunteers)

Links to relevant discussions (where appropriate):

  • Discussion regarding staff to support the effort [1]

Edit period(s): Few times per day

Estimated number of pages affected: The bot will only make edits to a handful of pages. No edits will be made to mainspace

Exclusion compliant (Yes/No):  ?

Already has a bot flag : Yes

Function details: The plan is to have the list of difs sortable by Wikiproject and by whether or not a student from the education program was involved. This will allow people to concentrate on the subject area they are interested in. Hope is to change the output to be more similar in formatting to https://en.wikipedia.org/wiki/Special:NewPagesFeed with a drop down box in the sort by area.

Discussion

{{OperatorAssistanceNeeded}} (procedural) @ערן: @Eran: please endorse this request for a change to your bot's scope. — xaosflux Talk 23:34, 18 December 2014 (UTC)

  • I support this request.
  • It seems that there are other users who run bots for copyright violation detection (@The Earwig: - EarwigBot, @Coren:/@Madman: - CorenSearchBot/MadmanBot), so I would like to have comments/ideas based on previous experience.
  • The bot aims to take a different approach from those bots in mainly 2 aspects:
    • It scans diffs, rather than newly created articles.
    • It uses a commercial software for copyright violation detection (ithenticate; see more in Wikipedia:Turnitin) rather than commercial software for search (yahoo boss; Yahoo search).
Eran (talk) 08:09, 19 December 2014 (UTC)
Thank you, — xaosflux Talk 13:14, 19 December 2014 (UTC)

Approved for trial (7 days). -- Magioladitis (talk) 16:16, 30 December 2014 (UTC)

Thanks. Will likely take us a month or so to get the trial running. Doc James (talk · contribs · email) 01:20, 31 December 2014 (UTC)
Please post here when ready to begin, the trial days will start then. — xaosflux Talk 01:43, 31 December 2014 (UTC)
We have collected data for 1.5 hours. We are now working on formatting this data. Hope is to run it further once more development is done. The data is here [2] Doc James (talk · contribs · email) 02:46, 26 January 2015 (UTC)
Where's this at? Josh Parris 10:17, 4 March 2015 (UTC)

edit WP:BRFA/JhealdBot

JhealdBot

Operator: Jheald (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:36, Monday December 8, 2014 (UTC)

Automatic, Supervised, or Manual: Supervised

Programming language(s): Perl

Source code available: Still under development.

Function overview: Maintenance of subpages of Wikipedia:GLAM/Your_paintings, in particular the subpages listed at Wikipedia:GLAM/Your_paintings#Artists_by_birth_period. There is currently a drive to identify Wikidata entries for the entries on this list not yet matched. I seek approval to keep these corresponding pages on Wikipedia up to date.

Initially I would just use the bot as an uploader, to transfer wikipages edited off-line into these pages (including fixing some anomalies in the present pages -- which I would probably do sequentially, through more than one stage, reviewing each fix stage before moving on to the next).

Once the off-line code is proven, I would then propose to move to a semi-automated mode, automatically updating the pages to reflect new instances of items with d:Property:P1367 and/or corresponding Wikipedia and Commons pages.

Links to relevant discussions (where appropriate):

Edit period(s): Occasional (perhaps once a fortnight), once the initial updating has been completed. And on request.

Estimated number of pages affected: 17

Exclusion compliant (Yes/No): No. These are purely project tracking pages. No reason to expect a {{bots}} template. If anyone has any issues with what the bot does, they should talk to me directly and I'll either change it or stop running it.


Already has a bot flag (Yes/No): No. I have one on Commons, but not yet here.

Function details:

  • Initially: simple multiple uploader bot -- take updated versions of the 17 pages prepared and reviewed offline, and upload them here.
  • Subsequently: obtain a list of all Wikidata items with property P1367. Use the list to regenerate the "Wikidata" column of the tables, plus corresponding sitelinked Wikipedia and Commons pages.

Discussion

Regarding uploading offline edits: Are these being made by anyone besides the operator? What license are they being made under? — xaosflux Talk 23:44, 18 December 2014 (UTC)
@Xaosflux: The pages have been being prepared by me using perl scripts, drawing from Wikidata.
I've slowly been making the scripts more sophisticated -- so I've recently added columns for VIAF and RKDartists links, both taken from Wikidata, defaulting to searches if there's no link, or no Wikidata item yet identified. Content not drawn from Wikidata (typically legacy entries from the pages as I first found them) I have prefixed with a question mark in the pages, meaning to be confirmed. For the most part these are blue links, which may go to completely the wrong people.
So at the moment I'm running a WDQ search to pull out all Wikidata entries with one (or more) values for the P1367 "BBC Your Paintings identifier" property, along with the properties for Commons category name (P373), VIAF (P214) and RDKartists (P650). I'm also running an Autolist search to get en-wiki article names for all Wikidata items with a P1367. Plus I have run a look-up to get Wikidata item numbers for all other en-wiki bluelinks on the page (this gives the Q-numbers marked with question marks). But the latter was quite slow, so I have only run it the once. At the moment I'm still launching these searches by hand, and making sure they've come back properly, before updating & re-uploading the pages.
As to the licensing -- Wikidata is licensed CC0. My uploads here are licensed CCSA like any other upload to the site (though in reality there is very little originality, creativity or expression, apart from the choice of design of the page overall, so probably (under U.S. law at least), there quite possibly is no new copyrightable content in the diffs. Various people of course are updating Wikidata -- I've been slowly working down this list (well, so far only to the middle of the 1600s page) though unfortunately not all of the Wikidata updates seem to be being picked up by WDQ at the moment; the Your Painters list is also on Magnus's Mix-and-Match tool; and various others are working at the moment, particularly to add RKD entries to painters with works in the Rijksmuseum in Amsterdam. But Wikidata is all CC0, so that all ought to be fine.
What would help though, would be having the permission for a (limited) multiple uploader, so I could then upload the updates to all 17 pages just by launching a script, rather than laboriously having to upload all 17 by hand each time I want to refresh them, or slightly improve the treatment of one of the columns.
I'm not sure if that entirely answers your question, but I hope does make clearer what I've been doing. All best, Jheald (talk) 00:45, 19 December 2014 (UTC)
Approved for trial (25 edits or 10 days). Please post your results here after the trial. — xaosflux Talk 01:48, 19 December 2014 (UTC)
@Xaosflux: First run of 16 edits made successfully -- see contribs for 19 December, from 15:59 to 16:55.
(Links to RKD streamlined + data updated; one page unaffected).
All the Captchas were a bit of a pain to have to deal with; but they will go away. Otherwise, all fine. Jheald (talk) 17:31, 19 December 2014 (UTC)
Sorry about that, I added confirmed flag to avoid this for now. — xaosflux Talk 17:34, 19 December 2014 (UTC)
New trial run carried smoothly (see this related changes page).
Update still prepared by executing several scripts manually, before a final uploader script; but I should have these all rolled together into a single process for the next test. Jheald (talk) 09:11, 11 January 2015 (UTC)
Run again on January 21st, adding a column with the total number of paintings in the PCF for each artist. Jheald (talk) 17:13, 24 January 2015 (UTC)

Have you completed the trial? Josh Parris 10:20, 4 March 2015 (UTC)

edit WP:BRFA/APersonBot_2

APersonBot 2

Operator: APerson (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:37, Sunday June 8, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: https://github.com/APerson241/APersonBot/blob/master/dyknotifier/

Function overview: Will notify an editor if an article they had created/expanded was nominated for DYK by someone else.

Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 60#Asking_for_Noting_bot

Edit period(s): Daily

Estimated number of pages affected: 60 (per run)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: This bot will notify an editor if an article they had created/expanded was nominated for DYK by someone else. The notifiers and creators will be detected by scanning the signatures present in the DYK nom, and User:APersonBot/DYKNotice {{DYKnom}} will be used to notify people.

Discussion

Questions from Xaosflux

  • Do you plan to send notification to everyone in the article contribution history ('persons who expanded'), if not what will be your criteria for inclusion?
xaosflux Talk 02:07, 21 June 2014 (UTC)
The criteria for inclusion will be strictly limited to the user(s) who the nominator entered in {{NewDYKnomination}} as the author(s) using the author parameter (or one that works the same way). At the moment, the bot checks who these users are by parsing the line that appears inside <small> tags. APerson (talk!) 02:34, 21 June 2014 (UTC)
  • Pinging the WT:DYK page to see if there are any objectors still, and to bring such objections here. If no objections in a day will approve for trial — xaosflux Talk 02:53, 21 June 2014 (UTC)
    Special:Diff/613764883xaosflux Talk 02:59, 21 June 2014 (UTC)

Comment from Maile66

  • Just would like to mention that it would be good if when the notification posted on a user's talk page, it had a link to the nomination template so that the editor would be able to comment on the nomination template. — Maile (talk) 13:14, 22 June 2014 (UTC)
    Maile66, I believe that {{DYKNom}}, which I rewrote and expanded for the occasion, links to the nomination template if possible. APerson (talk!) 02:37, 23 June 2014 (UTC)
  • DYK Chiltern Firehouse kicked out a notification within minutes of the nomination template being created, which is good. However, it was the nominator who got the notice on his talk page, rather than the desired creators other than the nominator. — Maile (talk) 18:13, 24 June 2014 (UTC)
See response below. APerson (talk!) 19:26, 24 June 2014 (UTC)

Trial Period

  • Original Trial, 100 edits, 7 days Please post back the results of your trial here; include any user feedback (positive or negative). — xaosflux Talk 01:18, 22 June 2014 (UTC)
    Looks like you've hit a couple of minor bugs? (e.g. double posting to User_talk:MelanieN; posting to nominator. Have these been solved along the way? — xaosflux Talk 04:10, 24 June 2014 (UTC)
    @Xaosflux, Maile66: I just fixed a bug about the bot forgetting to not notify the nominator. The double post was caused by an earlier version of {{DYKNom}} not adding a HTML comment with the name of the template when it was subst:'ed. APerson (talk!) 19:26, 24 June 2014 (UTC)
    Thanks, please reply when your trial is done, flag this with the BAGAssistanceNeeded template too. — xaosflux Talk 00:45, 25 June 2014 (UTC)
    Status update: I'm changing the library which the bot uses from wikitools to pywikibot, since I've found that wikitools is extremely buggy. This may take a few days. APerson (talk!) 02:49, 29 June 2014 (UTC)
    ((BotExtendedTrial|edits=100|days=14)) Trial extended, please verify edits to each of the categories from the original trial after your code change. — xaosflux Talk 01:15, 30 June 2014 (UTC)
  • @APerson: It's been 11 days. Please proceed to the extended bot trial. -- Magioladitis (talk) 13:38, 11 July 2014 (UTC)
    @Magioladitis: I've finished the bot (almost) and am currently trying to get it to save a page. Everything else works; the bot will probably start editing within one or two hours. — Preceding unsigned comment added by APerson (talkcontribs) 18:31 12 July 2014
    APerson Could you please provide us with a status on this bot? — Maile (talk) 15:07, 25 July 2014 (UTC)
    @Maile66: I've been very busy both on and off Wikipedia. Since I haven't gotten pywikibot (the library I switched to) to save, I have my pywikibot script spit out a list of people to notify and another script notify everyone on the list. The only issue I have right now is that for some inexplicable reason, the bot insists on coming up with people whose submissions have already been resolved (i.e. passed or failed) but have not yet been "officially" notified. For instance, the bot wants to notify people whose nominations have been closed but whose hooks have yet to actually appear on the Main Page. I'm not sure if this behavior is desirable, so a few more changes should fix this. APerson (talk!) 22:43, 25 July 2014 (UTC)
    @Maile66, Xaosflux:, update: the bot is now alive and kicking, but (now that all the logic is properly working) I've discovered that there just aren't that many people who haven't had their submissions nominated by someone else AND that haven't participated in the nomination discussion. It'll take a while to rack up 100 edits. APerson (talk!) 22:44, 10 August 2014 (UTC)
    APerson I see it! I don't know what you have configured to trigger the notice, but it looks to me like it scans the DYK nominations page. It seems to be going back through some older ones, so some editors may wonder why they're getting this at this point in time. I assume that once it gets through the backlog on that page, it will then just work on the newer ones as they come along. I'm so happy this is working now. Matty.007 will be, also. — Maile (talk) 23:32, 10 August 2014 (UTC)
  • I saw this bot was on trial (for MURDER!; call Robot Jessica Fletcher!) and purposely didn't notify the author of Parliamentary War Memorial when I nominated it. There's still no sign of a notification the editor's page, so you might like to investigate (call human Jessica Fletcher!) Belle (talk) 09:43, 15 August 2014 (UTC)
    Belle, I have not yet had the opportunity to start running the bot on toolserver. So, at the moment, the only time the bot runs (and notifies people) is when I double-click it on my computer. I'll look into running it on toolserver soon. APerson (talk!) 16:37, 20 August 2014 (UTC)
    APerson, Did you mean labs? I thought toolserver was no more. — Maile (talk) 16:39, 20 August 2014 (UTC)
    Oh yeah, I meant labs. APerson (talk!) 16:41, 20 August 2014 (UTC)
  • APerson, I see the bot recently placed a couple of notifications on editor pages. I assume you're still testing and probably don't need me to tell you this. However, the latest I've seen is getting it right, but still missing something. When there is more than one editor (creator) that needs to be notified, it only notifies one. — Maile (talk) 22:52, 2 September 2014 (UTC)
  • APerson Is the trial complete? -- Magioladitis (talk) 14:33, 14 September 2014 (UTC)
    Magioladitis, it looks like APerson on August 22 put a notice on his user page that he'd be on a Wikibreak "...a pretty long time." Do we have an alternative solution to getting this bot done? DYK approved this bot January 2014. The request for the bot was made on Feb 23, 2014 and picked up by user Ceradon on March 1, 2014. However, as far as we know, Ceradon never took any action on that, and has not edited on Wikipedia since April 1, 2014. A new request was made for this bot on June 1, 2014. DYK has waited for several months now, with two different people volunteering to do the bot. As far as I can tell, his last testing on this bot was through APersonBot on August 31, 2014. That test showed the bot was not yet perfected, as by my message above. Please let us know our alternatives. — Maile (talk) 18:35, 22 September 2014 (UTC)
    (editing from bot account) Maile66, due to unusual circumstances I will be unable to access my main account until around 7 November. I can vouch for the bot's readiness, although you should hold off on approving it until then. This will be the only non-bot edit I'll make with this account. APersonBot (talk!) 21:43, 24 September 2014 (UTC)
  • Symbol recycling vote.svg Reopened I have resubmitted the request. APerson (talk!) 01:42, 29 November 2014 (UTC)
    Magioladitis, care to take another look? APerson (talk!) 01:58, 2 December 2014 (UTC)
  • Symbol full support vote.svg Approved for extended trial (100 edits or 14 days). -- Magioladitis (talk) 08:25, 2 December 2014 (UTC)
  • APerson did you finish the extended bot trial? Please provide diffs and comments on bot's action. -- Magioladitis (talk) 08:12, 24 December 2014 (UTC)
    Sorry - I've been quite busy, with both on- and off-wiki projects. At the moment, I'm trying to get the bot to recognize when it's already notified someone. There have been some errors, which means that I have to run the bot in interactive mode (i.e. I approve each edit) while testing. This, in turn, means that testing is going slower than I'd like. Still, I hope to have a wonderfully working bot as soon as I can. APerson (talk!) 01:51, 10 January 2015 (UTC)
    So, how's this going? Josh Parris 19:00, 23 February 2015 (UTC)

Bots that have completed the trial period

edit WP:BRFA/StanfordLinkBot

StanfordLinkBot

Operator: ashwinpp (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:39, Wednesday December 3, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Github The repository is currently empty. Source code will be added after bot approval.

Function overview: Insert links between Wikipedia pages based on statistical inference on human navigational traces.

Links to relevant discussions (where appropriate): Research:Improving_link_coverage is an ongoing research effort and describes the underlying idea which is used to predict links.

Edit period(s): One time run to test efficacy of a specific method.

Estimated number of pages affected: 7000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No):

Function details:

The job of this bot is to insert a link between a source and a target page, given the mention in the source page which should link to the target page. The input is in the form of a tab-separated file. To make this bot version-agnostic, it provides a best-effort service when searching for the mention in the source article. If the mention exists then the link is added (at the first mention), otherwise it is not. It does not support specifying a location of the mention (in terms of number of words preceding it) because that location is subject to change due to edits.

The link prediction algorithm was developed in a research project that is part of a collaboration between Stanford University and the Wikimedia Foundation. The project page can be found here. A paper describing the algorithm and results is under submission to the World Wide Web Conference; if you would like a confidential preprint, please get in touch with Bob West.

Link Prediction Method

We propose a novel approach to identifying missing links on Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia’s navigability. We leverage a data set of navigation paths collected through a Wikipedia-based human-computation game called The Wiki Game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates according to various metrics. We further validate our prediction by recruiting human raters from Amazon Mechanical Turk and setting up a human evaluation task that asks them to guess which links should exist in Wikipedia, based on the Linking Guidelines. Our evaluation (see above for how to obtain a preprint of the paper) shows that the links predicted by our method are of higher quality than alternative methods.

Discussion

  • Pictogram voting info.svg Note: This request specifies the bot account as the operator. A bot may not operate itself; please update the "Operator" field to indicate the account of the human running this bot. AnomieBOT 23:48, 3 December 2014 (UTC)
A couple starter questions to get the ball rolling:
  1. Do you happen to have a couple of real-life examples of what goes in and what comes out? Like, let's say you ran this bot over a few pages. Can you give an example of some of the changes it'd actually make?
  2. Does the bot simply handle article links? Does it parse, understand, and/or exclude category links or other anomalies?
  3. Is there a "proposed-changes" or "log-only" mode of any sort? E.g., instead of editing, could it write to a log in its user space saying how it would have edited?
  4. Is there any reference of where this has already been used, or are we the first? It's not a make-or-break sort of thing, just something that'd help us if it already has a track record or something. :P
  5. I'm not seeing a substantial edit count from the bot operator. Normally, this would suggest a severe lack of experience needed when it comes to running a bot in an active community with its own quirks, but if someone at the Foundation is working with you on this, could you please get them to chime in here? That'd help establish what all's going on and possibly provide some background information for us and other editors reviewing this request.
Cheers =) --slakrtalk / 03:20, 10 December 2014 (UTC)
Hi, I'm Bob West, one of the developers of the bot. Sorry for the delay in answering your questions, Slakr. Here are my replies in the order of your questions. Please let me know if anything needs more clarification.
(1) The bot will take as input a text file with two columns: column 1 has the source of the link to be added, column 2 has the target. That is, an article listed in column 1 will be modified by adding a link to its corresponding article in column 2. Our algorithm, which produced the input file for the bot, is a source predictions algorithm, i.e., it is given a target and finds the top sources that should link to it (this makes our algorithm different from most other algorithms, which tend to take a source as input and produce a ranked list of target candidates). We found the 10 best sources for 700 target pages in the English Wikipedia, and it is these 700 × 10 = 7,000 links that we would like to add to Wikipedia through our bot.
As an example, here are the top 10 source articles that should link to the target Abortion:
Catholic_Church, Miscarriage, Ethics, Infant_mortality, Embryo, George_W._Bush, Women's_rights, Modern_liberalism_in_the_United_States, Obstetrics, Infanticide
As another example, here are the top 10 source articles that should link to the target Air_pollution:
Los_Angeles, Pollutant, Automobile, Asthma, Environmental_policy_of_the_United_States, Great_Smog, Ozone, Weather, Acid_rain, Environmental_law
At the time we ran our algorithm, none of these sources linked to the respective targets. If the link has been added since, we will not modify the source. Also, when we ran the algorithm, all these links could be added without modifying the words of the article, since the source article contained a mention of the target; the only modification that's necessary is to add double square brackets around the mention (which will be easy using a regex). If the target mention cannot be found in the article text anymore, our bot will not modify the source.
(2) Yes, the bot focuses on simple article links. It won't add any category links, and crucially, it will never remove any existing links but only add links that can be tied to a pre-existing anchor text.
(3) Since the modifications we are planning are so small (simply add "[[" and "]]" around phrases that already exist in the source article), we were planning on making the changes directly in the wikitext.
(4) As the bot is being developed specifically for Wikipedia, it hasn't been applied anywhere else yet.
(5) I have informed our collaborator at the Foundation about this discussion. Now that I've replied, I'll ping her again and will ask her if she wants to add a few words here.
Cervisiarius (talk) 19:53, 22 December 2014 (UTC)
Hi @Slakr: My name is Leila and I work in the Foundation as part of the Research-and-Data team. Here is some background and context that I hope help you and other editors to move forward with a decision. Thanks in advance for your time.
Bob, his advisor, and I officially started doing research on improving link coverage in early December 2014. The research is being documented here. Our research builds on the ideas Bob and his colleagues developed before we start our collaboration, and that earlier work is under review in the World Wide Web Conference. That earlier research is what StanfordLinkBot will be operating based on. Given that I was not involved in the earlier research, I took the following steps to assess the performance of StanfordLinkBots:
  • I went over the paper where Bob et al. discuss the algorithm behind StanfordLinkBot. My conclusion was that the approach used to identify missing links is unique and neat. For example, unlike other algorithms suggested earlier in the literature that start from an article and find all the articles that should be linked from this article, they find all the articles that should be linked to the current article. This approach along with high precisions reported control for overlinking that is usually a concern in methods that recommend adding links to articles.
  • Bob shared the two-column list he mentions above with me. I randomly chose 100 link suggestions from that list and checked the articles in English Wikipedia to assess whether the suggestions made sense to me. They did, and I was impressed with the performance of the bot in practice in the subset I looked at.
  • Given that what the bot is supposed to do is very simple (adding links to already existing words in articles), and given the expertise of the folks who wrote the code and those who will review it, I have not reviewed the code for StanfordLinkBot. I'd be happy to do so and share my thoughts if needed.

LZia (WMF) (talk) 21:50, 24 December 2014 (UTC)

{{BAGAssistanceNeeded}}

  • Symbol question.svg Question:  : Not a BAG member, just a random guy with a question. Will disambiguation pages and/or pages with a parenthetical after the name, such as Georgia (U.S. state) vs. Georgia (country) vs. Georgia confuse the bot? Also, I'm assuming that the bot will not add a second link to a page if there is already one on the page? – Philosopher Let us reason together. 02:47, 17 January 2015 (UTC)
  • Hi Philosopher! The basic principle behind the bot is that a source page S will be linked to a target page T if many users who were actively looking to reach T went through S on the way (and if S has no link to T yet; which is the answer to your second question: you're right that no second link will be added if one already exists). The case of disambiguation and parenthesized pages is no different from any other type of ambiguity. In the (contrived) scenario that many users who tried to reach the target Georgia (country) went through, say, Atlanta, then a link from Atlanta to Georgia (country) might be suggested. I'm calling this scenario "contrived", though, because we've never seen such an erroneous suggestion from our algorithm. Cervisiarius (talk) 06:14, 19 January 2015 (UTC)
  • So, to follow up, the bot will be capable of creating piped links? Obviously, no one will be writing "Georgia (U.S. state)" qin the text of an article, so you can't just add brackets around it to create your link. – Philosopher Let us reason together. 00:36, 20 January 2015 (UTC)
  • Yes. We are explicitly creating piped links. Ashwinpp (talk) 19:29, 21 January 2015 (UTC)
  • Approved for trial (50 edits). (or so) — There seems to be no major objection; the trial is mainly a sanity check to make sure the bot makes edits sanely and to discover any unexpected issues. Once we have that small base of edits (assuming it looks/works fine and issues aren't encountered), we can likely just approve the full run. --slakrtalk / 04:05, 26 January 2015 (UTC)
Also, I noticed some edits to the sandbox. While this is fine to test automated editing, what we're really looking for in the trial are changes to the pages themselves (so that we can see the diffs of the changes the bot makes). Otherwise, it's difficult to have a point of reference and track down potential issues. --slakrtalk / 20:29, 2 February 2015 (UTC)
Yes, we were making some automated edits to the Sandbox for testing purposes and to ensure we don't break wikimarkup. Appreciate your interest :)
We've made trial edits to 50 articles as can be seen in the history. Ashwinpp (talk) 09:22, 8 February 2015 (UTC)
Sorry for the confusion. It was initially named StanfordLinkPredictor, however we changed it to StanfordLinkBot to reflect the fact that it is a bot. The infobox has now been corrected Ashwinpp (talk) 09:22, 8 February 2015 (UTC)
No worries. Could someone on BAG handle closing out that request and possibly block/redirecting its userpage? DMacks (talk) 20:16, 8 February 2015 (UTC)

Trial complete

Trial complete. Ashwinpp (talk) 21:45, 9 February 2015 (UTC)

{{BAGAssistanceNeeded}} Ashwinpp (talk) 01:05, 19 February 2015 (UTC)

I note that the bot account has been used by a human. Stop doing that.

The bot's edit to Charles I of England hasn't been reversed in the last ten-ish days, but myself I find it questionable. Country names are an example of overlinking, and in this particular article there's a link to England in the lede. Anyone confused by or curious about the term will have stumbled across it in the first seconds of reading the article. This is an example of where the manual of style on linking is going to provide some food for thought for the developers. Portuguese India bears similar considerations. I also note you said earlier "If the link has been added since, we will not modify the source".

Why was "European" included in the link text to classical music in Cello?

The edit to Aromatic hydrocarbon linked to carbon dioxide, but not carbon monoxide. Is the bot capable of applying multiple edits simultaneously, and was there supporting data to suggest a link to carbon monoxide? Same thing with Seasonal human migration and cattle; and Full breakfast and a bunch of ingredients.

The bot chose to bypass Harvard Graduate School of Education and instead pipe link Harvard to Harvard University; do you think this was an appropriate edit? Similarly, in editing Raphael, a potential link to High Renaissance was ignored and Renaissance was pipelinked to The Renaissance. What was the processing chain that lead there? The term Computer scientist wasn't linked to the article Computer scientist, but instead pipelinked to Computer science; is this behaviour by design; is it optimal? In Mickey Mouse, an article with a couple of dozen mentions of the word animation, the term traditional animation was split to link to Animation. Why as this particular instance of animation chosen to be the one receiving the link? Similar issues occur in Parmigiano-Reggiano and Rock music.

In the Optics edit the bot doesn't link the first instance of Newton, but something like the second. How does the bot choose which instance of a term to link? Similar behaviour is observed in Dissection and Portuguese India.

The soybean edit links to Dairy, which is a dairy products factory, rather than Dairy product which is the meaning of word used in the sentence. What can be done to prevent subtly incorrect links like this? Or in sugar, where Protein was linked, whereas a more contextually precise link ought to go to Protein (nutrient)?

Did you check any of the edits the bot made, or do a dry run beforehand? Josh Parris 05:24, 20 February 2015 (UTC)

A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) Josh Parris 04:08, 26 February 2015 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at anytime. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.