Jump to content

Wikipedia talk:Bot policy

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Minecrafter0271 (talk | contribs) at 00:38, 14 February 2020 (→‎BAG Nomination: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

New BAG nomination: Enterprisey

Hi! This is a notice that I have nominated myself for the Bot Approvals Group. I would appreciate your input. Thanks! Enterprisey (talk!) 06:15, 31 July 2019 (UTC)[reply]

Bots triggered by multiple users

Hello! Following up on the situations that were resolved with Citation bot, I think it would be useful to record the expectations for bots that can get triggered by other users to make edits without the operator needing to manually review it. Looking for some succinct verbiage to add to the BOTPOL on this. Needs to be broad, but not overly strict. Some obvious examples of bots getting triggered by other users that don't really need anything changes are anti-vandalism bots, copyvio detection bots, sinebot, archive bots. The citation bot issue's discussion has led to what I see a need for certain classifications of bots to authenticate, authorize, and identify the triggering user. The authorization can be simple (such as "is not a blocked user") or it could be more complex - but that degree is probably best left to a BRFA not the policy. I'm looking to record the current standards here moreso then to create a new rule. All input is welcome! — xaosflux Talk 13:20, 7 August 2019 (UTC)[reply]

Something like Bots which can be triggered by users other than the operator are expected to display the username of the requesting user in edit summaries and log entries and to not perform actions if the user is blocked or lacks the necessary user rights to carry the action out themselves? Jo-Jo Eumerus (talk, contributions) 13:38, 7 August 2019 (UTC)[reply]
Getting there I think - single-page type editing bots probably don't need that far; think this issue was primarily for when you could trigger the bot to edit an arbitrary page? Or perhaps this is really just an issue if the trigger is out-of-band of onwiki edits? It was clear that something needed to be done, and we shouldn't end up in a repeat situation in the future. — xaosflux Talk 13:46, 7 August 2019 (UTC)[reply]
For example, Special:Diff/909683673 isn't really a problem - as the source of the triggers is onwiki, logged, authenticated, and restricted - and that edit could include multiple updates from collaborative edits on the request page. — xaosflux Talk 13:48, 7 August 2019 (UTC)[reply]
Bots which can be triggered by users other than the operator are expected to publicly display the username of the requesting user in some way and to respect user rights- and block-imposed limitations. to make it a bit open-ended; "in some way" to leave room for bots that are triggered by edits to a page. Jo-Jo Eumerus (talk, contributions) 13:53, 7 August 2019 (UTC)[reply]
Jo-Jo Eumerus, That might be an issue for some bots like IABot. IABot can edit through certain levels of protections that new users can't, but new users can invoke the bot to any page they like. Of course they can't configure how it should run as it will use community defined defaults, which I believe is acceptable under the conditions I stated. I think this verbiage would better cover it:

Bots and/or tools that allow users to invoke the bot on a list of pages are required to observe if the user is blocked and to link the username of the invoking user in the edit summary. Bots that allow configuration options for said list must also check that the invoking user can edit through protection if any are applied to pages.

CYBERPOWER (Chat) 15:26, 7 August 2019 (UTC)[reply]
I would prefer to see, for one-off edit kinds of bots triggered by others, OAuth strongly recommended if not required for implementation in these kinds of bots and for the edit subsequently to be performed under the user's name (and/or to ship the editor straight to preview rather than to make the edit itself). The issue with citation bot was the 'category run', which IMO is reasonably a bot edit and which where that sits now should list the implementing user (again, strong preference to OAuth here). (Just getting my objectives on the page; I'll be back later to read other commentary.) --Izno (talk) 16:42, 7 August 2019 (UTC)[reply]
I think we would need to define "can be triggered" better. A lot of AnomieBOT's tasks are arguably "triggered" by others, e.g. by someone orphaning a reference, or adding a maintenance tag, or putting a template into Category:Wikipedia templates to be automatically substituted, or doing something on various project pages that are bot-clerked, or making various other edits that manipulate pages' categories, templates, and so on. On the other hand, there's no way for others to get AnomieBOT to do something by visiting any page off-wiki. And I could imagine a bot where edits are "triggered" by submitting a list of pages to a form off-wiki, but the user doing the submission has no control over what the bot does to each page. Where's the line, exactly? Anomie 16:49, 7 August 2019 (UTC)[reply]
I think the key thing for me would be an active choice in what page to trigger the bot on. If the bot always edits the same page, then being triggered to edit that page is not much different than a WP:PURGE. Signbots, vandalbots and dating bots, are not actively triggered, they are reactively triggered. Headbomb {t · c · p · b} 17:16, 7 August 2019 (UTC)[reply]
One of the identified problems with citationbot was that unauthenticated users were able to use it to hound another editor by having it follow them around - each atomic edit appeared appropriate, but the result was found to be unacceptable. — xaosflux Talk 17:33, 7 August 2019 (UTC)[reply]
Maybe it's better to state a general principle, currently embodied by the points "operator disclosure" and "operator verification". Something like "All decisions about the operation of the bot need to be accountable". (In practice, people want to know which talk page to yell at.) Nemo 18:03, 7 August 2019 (UTC)[reply]
As a policy, the more general the better - just want to be careful that we don't make it so broad that it is useless. The primary recipients of the policy are BAG and 'crats who end up approving bots. From a high-level technical perspective, I want to make sure we don't intermix an "operator" (someone who basically has total control and accountability for actions under the account) with a "user"/"triggering editor" of the bot though. — xaosflux Talk 18:07, 7 August 2019 (UTC)[reply]
That still leaves something of a grey area, depending on how you define "reactively triggered". The Citation bot queue complained about just above could have been implemented (somewhat poorly) as a wikipage, would the bot reacting to a wiki page edit have been sufficient for that situation? If not, then what about AnomieBOT's TemplateSubster reacting to the template being in a particular category; should AnomieBOT have to somehow dig through page histories to find out who added the category (probably by putting {{subst only|auto=yes}} on the template's doc subpage, but maybe some other way)? And similarly for TagDater and listing a template on WP:AWB/DT? Anomie 18:48, 7 August 2019 (UTC)[reply]
Editing WP:AWB/DT isn't triggering a bot, it is changing the configuration/rules of a weird pseudo-bot, so I think that example is out of scope for this discussion (it's a problem, but it's a different problem). AnomieBOT's TemplateSubster and {{subst only|auto=yes}} may be a real edge case in terms of this discussion, but it is also possible it simply means that that signal ("This template should be substed") belongs inline in the template—and thus subject to the same protection level as the template code—rather than on the /doc page (I need to think about this a bit more). But I also don't think one or two pathological edge cases are dealbreakers for a policy. There is always IAR and BAG discretion (and mandate for discretion can always be codified if really needed).
Other than that, I think y'all are overcomplicating this.
There is a pretty obvious distinction between things like AnomieBOT's TemplateSubster, SineBot, and ClueBOT, on the one hand, and things like Citation Bot and IABot (interactive), on the other. The former are not really "triggered by a user", they are operating autonomously. The user in question isn't directing the bot, they are doing something completely different that happens to be something the relevant bot is looking for: to subst erroneously unsubst'ed templates, to add a missing signature, or revert vandalism. In the latters' case, a user has actively asked the bot to perform a task (regardless of how much or little control that user has over the specifics of the task). It's the distinction between a user triggering the bot and a bot being triggered by something the user does. You can't do anything to make SineBot add signatures to another user's talk page messages, and you can't make ClueBOT revert another user's edits. And in all those cases the operator is responsible for the edits, and presumably perfectly comfortable with that. But you can make IABot add |archive-url= to all citations on an article written by another user (or, to pick a completely random example, mess up the carefully maintained indentation in a citation *cough* *cough*), and you can make IABot do this to all articles another user edits. You can use Citation Bot to bulk change the citations on a range of articles where you disagree with other editors on citation style. For the former group I need to be able to yell at Anomie and Cyberpower678; for the latter I need to be able to yell at whatever editor is being a… is directing the bot.
The duck test is pretty simple: should Martin be blocked when Citation Bot is being used to harass someone? If not, we have a case where the user needs to be authenticated (so the bot knows who is directing it), authorized (so blocked users are denied), and identified (so admins can block someone other than the operator, or other editors know who to yell at). None of the examples given so far (except possibly TemplateSubster and unprotected template /doc pages) seem at all problematic to distinguish here; and absent actual examples of bots that would be inappropriately caught by such a clause we're just bikeshedding.
Also, given the requirements and the state of technological development, I see absolutely no reason why BOTPOL shouldn't require OAuth for these cases now for all new BRFAs. The same for existing bots but with some kind of limited grandfather clause for tasks that are low potential for abuse and negligible risk of controversy (insufficient consensus for the task). Citation Bot failed on both those counts, and was actively abused, so a change was needed there in any case. But for other maintenance-mode/no-longer-actively-developed bots, where retrofitting OAuth would be a big ask, there may not be any pressing need. We're well past the point where OAuth is just "the cost of doing business" (or "table stakes" if you prefer), and if it's too hard for someone otherwise capable of writing a bot that other users can use, then something is wrong with either the bot architecture or the WMF infrastructure. --Xover (talk) 19:09, 13 August 2019 (UTC)[reply]

The WP:BOTDEF link in the side box redirects back to this very page, though the "main article" link in the same section works as expected. 69.85.254.70 (talk) 14:24, 17 October 2019 (UTC)[reply]

@69.85.254.70:, that's normal, this is an informational shortcut box (see {{Shortcut}}). It tells you what the shortcut for the section is, so you can use it to quickly link to that section in various discussion. Headbomb {t · c · p · b} 18:57, 17 October 2019 (UTC)[reply]

Question re:Bot policy page

Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted. When such content is required, download database dumps instead. Bots that require access to run queries on Wikipedia databases may be run on Wikimedia Toolforge; such processes are outside the scope of this policy.

I need to write an algorithm to iteratively return the page content from every link in List of human protein-coding genes 1, 2, 3, and 4, use NLTK to word-tokenize the page, then test whether the page content contains any of the strings "gene", "genes", "protein", or "proteins", returning the page title if the page does not. My motivation for doing that is to fix the links to non-gene-related pages in these tables via recoding the script that generates them. Since this involves accessing ~20000 pages, what quantifies "substantial portions of Wikipedia's content"? Seppi333 (Insert ) 22:51, 25 November 2019 (UTC)[reply]

@Seppi333: for such large scale page examinations, you should consider off-line use of a Wikipedia:Database download. — xaosflux Talk 23:08, 25 November 2019 (UTC)[reply]
I've never worked with a database dump before; does the pywikibot library support the creation of pagegenerators from links on a page in dump files? Seppi333 (Insert ) 23:21, 25 November 2019 (UTC)[reply]
Nvm. Figured it out and already programmed it. Seppi333 (Insert ) 01:46, 26 November 2019 (UTC)[reply]
@Seppi333: would you be able to link or describe how for documentation? Wug·a·po·des03:03, 26 November 2019 (UTC)[reply]
Are you asking about the generator specifically or the script that does what I described?
@Wugapodes: Either way, after reading the documentation, it doesn't appear that the Pywikibot library supports pagegenerator creation from links on a page in a dump file (see doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.html#generator-options and the pywikibot.pagegenerators.XMLDumpOldPageGenerator() generator further down in the documentation). If it actually does support it, the documentation needs to be rewritten because it's not at all apparent to me how to program that.
If you're interested in the script I used, it's here: User:Seppi333/GeneListNLP. I wrote it as a single-purpose script rather than a reusable module, but I'm sure you can figure out how to revise it to suit other purposes. Basically all you'd have to do to customize it for other pages and use cases is change the pages in listOfPages (these are pages that the script runs the linkedPages generator on) and then modify the tested strings in the relevant line within the genePageTest() function to suit your needs. In other words, just modify these two lines of code:
listOfPages=["List of human protein-coding genes 1", "List of human protein-coding genes 2","List of human protein-coding genes 3","List of human protein-coding genes 4"]
if ("protein" in wordList) or ("proteins" in wordList) or ("gene" in wordList) or ("genes" in wordList):
The script generates a lot of output because I had a secondary use in mind when I programmed it (i.e., assess how many gene pages were located at the gene symbol instead of the official UniProt name in accordance with MOS:MCB). The mistargeted links are saved to a text file called mistargetedLinks.txt (NB: it creates a second similar text file as well, which I'm using to generate a dictionary with piped links for the script that writes the 4 wikitables listed above). Seppi333 (Insert ) 06:12, 26 November 2019 (UTC)[reply]
In the event anyone actually reuses this script for another list/article, I'd suggest notifying WT:WikiProject Disambiguation about the entries it finds since they've been quite helpful with adding disambiguation for ~200 pages that were identified when I ran it on the gene lists. Seppi333 (Insert ) 00:26, 30 November 2019 (UTC)[reply]

BAG Nomination

I nominated myself for a spot in BAG. The discussion can be found at Wikipedia:Bot Approvals Group/nominations/Minecrafter0271. I am posting here as required. Please consider my request. Cheers! Minecrafter0271 (talk) 00:38, 14 February 2020 (UTC)[reply]