Wikipedia talk:Bot policy

From Wikipedia, the free encyclopedia
  (Redirected from Wikipedia talk:BOT)
Jump to navigation Jump to search
Archive
Archives

1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 · 14 · 15 · 16 · 17 · 18 · 19 · 20 · 21 · 22 · 23 · 24 · 25 · 26 · 27


Control proposals


Archive policy


Archive interwiki (also some approvals for interwiki bots)

Bots that consume user time, and request for comment[edit]

Please see Wikipedia:Village_pump_(proposals)/Archive_145#Disable_messages_left_by_InternetArchiveBot. I would appreciate comments from anyone who reviews the suitability of bots for Wikipedia would be welcome.

Also, I would like to ask if anyone here can point at or describe any previous discussion about a goal that bots avoid consuming user time. In the current text of the bot policy there is a statement that bots editing wiki text might "they clutter page histories, watchlists, and/or the recent changes feed with edits that are not worth the time spent reviewing them". I wish to advocate more strongly that bots not do anything which consumes human time in a way that is not predicted in advance and discussed to get the consent of the community.

The Internet Archive Bot has posted 500,000 messages to talk page. At the discussion linked above, I estimate that each of these messages lives for several years and consumes 30 seconds average from multiple users who scroll past it or get exposed to it on the talk page. Almost no one has a meaningful interaction with this messaging and I feel this is time wasted. I do not mind a few messages, but anything scaled into the hundreds of thousands becomes a community wellness issue. 500,000 messages times 30 seconds is 4100 hours. I do not think my number is wrong, but even if it is, I would rather advocate for setting norms in how much time anyone's bot can ask of Wikipedia's editors. I feel that unless there is community consensus, the base expectation should be that wiki users should interact with other humans and have minimal exposure to obvious bot solicitation for their time.

To this point I think all of this is innocent and well meaning and without precedent but I wanted to develop the conversation on this topic. Please comment on the village pump discussion above to see a practical case of this. Blue Rasberry (talk) 22:10, 16 February 2018 (UTC)

At the time, IABot had consensus to do so, the BRFA was open for over two months. The bot was also much more prone to errors then. So I don't think this was a mistake back then, but improvements to IABot is making this talk page stuff much less needed in general. As for a discussion, I don't think anything like that exist specifically, at least nothing recent, but see WP:BOTREQUIRE. I'd argue that #3 is being questioned right now. Headbomb {t · c · p · b} 00:07, 17 February 2018 (UTC)
Headbomb Yes, thanks, correct. I am not saying that any mistake happened in the past. In retrospect and for the future, yes, I am advocating for #3 "does not consume resources unnecessarily" take into greater consideration the amount of human time which any bot by design will solicit. I do not know where to draw the line, but perhaps any bot creator should give an estimate of how much wiki community human user time a bot will consume. As an arbitrary starting point, maybe a bot which consumes more than 100 hours over its life from humans who have not consented or opted into interaction with the bot gets mention and consideration. Perhaps anything over 500 hours of non-opt-in human time is a red flag for broader discussion. I am not particular about details here but just wanted to shout out here for future policy consideration. Blue Rasberry (talk) 00:16, 19 February 2018 (UTC)
Frankly, whatever estimate you have there relies on so many assumption that you could easily be off by 2 order of magnitudes either way. How much time a bot takes from humans is unmeasurable, and at this point Newton's Flaming Laser Sword kicks in. But even assuming it would somehow be measurable, there's a half a million issues with using 'time' as a metric. The real question here is is BOTPOL lacking in some way? I don't see that the answer is yes, given WP:BOTREQUIRE makes it clear bot tasks need consensus, be harmless, and not waste resources. A second question might be is BOTPOL unclear in some way?, and I can't offer a better answer than 'Maybe?'. If your answer to that is yes, then we'd have to know what, specifically, you consider unclear. Headbomb {t · c · p · b} 04:55, 19 February 2018 (UTC)
@Headbomb: We have lots of measurements of this situation. I agree that I can be underestimating. I would like to hear any calculation which you or anyone else could do that makes a case for this being an overestimate, with my base guess being "500,000 messages each consume 1 second". Yes, I agree that the most complete answer could use research but we do have a lot of data. Also, whether we have data or not, all sides make assumptions. What is happening here is new, unusual, and merits discussion.
  1. We know how many messages the bot in this case posted - 500,000+
  2. We know how many views the talk pages where the bot posted were viewed - 10s of millions, probably 100s of millions over 2 years
  3. We know that this particular bot solicited humans to read its messages and react
  4. In response to the bot's solicitation, we have counts of how many times humans responded
  5. We can measure how much time it takes for human to engage with the bot
  6. We can measure how much time it takes for a human who sees the message for the first time can understand they should ignore it
  7. We can measure how much time it takes for a human who recognizes the message to ignore it by scrolling past
I feel justified in cornering anyone to talk through how much time a project like this consumes. Does anyone doubt Wikipedia's audience report metrics? Does anyone doubt that people expect text on Wikipedia to be intended for humans to read it?
Is there anyone who is willing to state their disbelief that if we know that a talk page has 50 views over a period of 2 years, then those 50 views constitute the expenditure least 1 second of human time? To me this seems obvious and I wonder how anyone could doubt this.
Yes, the bot policy is unclear, because lots of people seem to have the idea that if designers make bots which seek out human time, then somehow, the human time which bots seek is not a resource worth protecting as valuable. I advocate that it is. Bots should not have unregulated access to human user attention, and bots which seek a lot of human time need special consideration, and in general the presumption and norm should be that Wikipedia human users interact with other humans and do not experience calls to engage with bots. Blue Rasberry (talk) 14:30, 20 February 2018 (UTC)
Re 'time' stuff. This can neither be precisely measured, nor compared from bot to bot. So whatever threshold would be decided can neither be enforced nor verified. Going up this venue is a waste of time, quite literally.
First scenario. Let's say we decide 50000 seconds of human review time per year is 'the limit'. Bibcode bot for instance edits at an average rate of 5000/year. If every edit takes 3 seconds to review, it would be in the clear. If every edit takes 10 second to review, then it's at the limit. Whatever limit you come up with yields zero insight on whether or not the task should be carried out, or whether it has consensus to operate, of unnecessarily waste resources, human or otherwise.
Second scenario, let's say that whatever 'human time' IABot costs is 'the limit'. What happens if we find Cluebot or Citation bot takes more time. Do we disable those bots? Block them? That seems ridiculous. Do we decide they're OK, despite exceeding the acceptable limit? Then why bother having a limit in the first place?
That's why a "human time cost" is neither desirable as a metric, nor is such a limit desirable as policy. And it wouldn't even be enforcable, assuming this would somehow get adopted in policy.
Re the second part.
"Yes, the bot policy is unclear, a) because lots of people seem to have the idea that if designers make bots which seek out human time, then somehow, the human time which bots seek is not a resource worth protecting as valuable. I advocate that it is. b) Bots should not have unregulated access to human user attention, c) and bots which seek a lot of human time need special consideration, d) and in general the presumption and norm should be that Wikipedia human users interact with other humans and do not experience calls to engage with bots."
Concerning a), I have yet to see one editor, bot operator or otherwise, who thinks that. Concerning b) such bots are regulated c) every bot undergoing a BRFA gets special considerations commensurate to the bot's task. See WP:BOTAPPROVAL/WP:BAGG#Guide to BRFAs for how approvals work and how the BAG operates. If you have proposed revisions to that process or BAG procedures, feel free to bring them up. However, you yourself admit that IABot had consensus to operate in the manner it was approved of at the time. This consensus was called into question, and now IABot must operate differently. What exactly is wrong with this picture? You mostly seem to be venting your frustration with IABot in general, rather than make specific policy proposals. d) That depends on the bot. Human-bot interaction is quite desirable at times, and unrequired at other times. Headbomb {t · c · p · b} 15:08, 20 February 2018 (UTC)
@Headbomb: I am not complaining about IABot. I am happy with how it got approval, operated, and then recently changed after discussion. However, in retrospect, it operated in an inappropriate which which bot policy should prohibit in the future. IABot did awesome things and also made a big mess, but the bigger issue is preventing the bad activity in general rather than criticizing IABot in any particular way. IABot is awesome and only deserves praise. The criticism I have is for bot policy in general and not about IABot even though I use it for an example. IABot followed policy and community consensus but following policy led to an unexpected negative outcome which I want to raise, address, and prohibit by policy.
There is a difference between the bots you demonstrate and IABot. My complaint is about bots which communicate and seek engagement in places where previously we have expected human conversation. You are showing bots which communicate in the edit history log, where we already have some precedent of bot editing, and where there is already noise in the log entries, and where there are already regulations in place like a character limit and an inability to create an ongoing conversation.
If I were to make a rule or regulation about norms, I would say that bot communication in the edit history log is fine, but the talk page is for humans and should have a higher bar of entry for bots, and maybe other places where currently there are humans also should be preserved for humans and have a bias against letting bots become routine conversation participants. There are also already regulations which seek to minimize bot edits to articles, like for example, no one wants to see more bot edits in an article history when there could be fewer. People might have a goal to have fewer bot edits in articles, or to achieve a goal, but there is no one who seeks to design bots which do article editing while maximizing the bot edit count.
Check out your examples, which I think are fine bot activity:
Each of these bots post to the edit log, where we already have norms for certain kinds of bot posting with human posting. I am not overly worried about this kind of posting right now because it is short, I find it comfortable for me as a human reader, and because it is easy enough to ignore.
Here is the kind of bot activity which I find problematic:
Here are the differences between this bad kind of bot activity versus the better activity which you present:
  1. The bad activity is communication in a space were there was a norm for human to human conversation. This bot changed the norm from ~95% of messages on talk pages being human to maybe something like 85% of talk messages in the past 2 years being from humans. This is a massive change to user experience and creates an environment where engaging with Wikipedia means humans being directed to engage with bots rather than humans being engaged with humans.
  2. This bot makes active appeals for human responses. It says, "Please take a moment to review my edit" and "When you have finished reviewing my changes, you may follow the instructions". By design the bot is seeking peership with human editors and asking for their time, attention, and labor.
  3. This bot posts a lot of text. Whereas the bots you show have a text posting limit of 250 characters and actually use about half that, this bot routinely posts messages of 2000 characters. New users will read this; regular users have to learn the new skill of ignoring these long, ubiquitous bot posts. Even avoiding the messages takes time.
  4. Because of the time this bot solicits and the scale of its operation, tens of thousands of English Wikipedia editors have had the shared cultural experience of spending time reading these bot posts and thousands of experienced editors now have the shared experience of training themselves to ignore these posts. There probably are not more than 20 people whose Wikipedia activity prioritizes having communication exchanges with this bot. Perhaps there is not even 1 editor who does this among the tens of thousands whom the bot asked. Interaction with this bot has become a part of the Wikipedia experience in terms of its familiarity. We should be cautious about what kinds of experiences we scale up to 10s of 1000s of editors and which have the design to solicit lots of time from each. This is unusual in Wikipedia.
  5. These messages are unlikely to provide a high quality user experience. Maybe many people will have no strong opinions about them, but as compared to a talk page post from a human, these automated messages are much lower value on the talk page.
  6. Inherent in the design of talk page messages versus the edit history log there is an appeal for human attention and time. There should be some thoughtfulness about whether the design of a bot interaction seeks to recruit volunteer labor versus being more discrete, opt-in, or providing information briefly then allowing interested users to click through to get more details.
  7. IABot posts to talk pages where those messages persist for years. The messages have a design to seek and consume human attention until they are archived, and probably 95% of messages do not get archived within 2 years. Each of these messages will consume time from every talk page visitor to whom the messages get published. If 30 people go to a talk page in a year, then I think it is reasonable to assume that the messages will consume a second from each of them even if the users only seek to ignore the message. If I am very wrong, then maybe for 30 viewers somehow the message in total will only consume 1 second. Even 1 second is a lot of time scaled up to 500,000 messages which get published to an average of 30 people each.
I am ready to say that the bot activity you present has a design and intent to consume less human time and the IABot has a design and intent to seek more human time. Yes, as you say, human time is not an absolute cut off for approving or rejecting bots, and we have to balance human time consumed with the value of the bot activity. In the case of IABot the ratio of human time consumed to value is orders of magnitude more than what has been normal in Wikipedia. I am not advocating for a hard cutoff for a certain amount of human time, but if a bot designer makes a choice about whether by design the bot should seek 1000s of hours of human engagement versus a few hours, then the bot review policy should favor bots which consume less human time. When bot designers present a bot for review they should first reflect on how much human time they expect their bot to solicit and consume and they should self-report that estimate as part of the application process. They should show their calculation, and say something like "This bot will post 100,000 messages of 150 characters to Wikipedia edit history logs, and maybe humans will read or ignore these messages 100,000 times consuming at least 50,000 seconds and probably not more than 300,000 seconds." If a bot has a design to seek human attention, then the bot operator should say "This bot will post 2000 character messages to 500,000 talk pages and each message will ask for 3-5 minutes of human labor. By design this bot seeks 5000 hours of human interaction but I expect that most people will ignore the messaging and I only expect it to consume 1000 hours of human volunteer labor." Self reporting by bot proposers is a great place to start, just to demonstrate that human time has a value and minimizing human labor costs is a good thing to do all things being equal. Blue Rasberry (talk) 15:45, 21 February 2018 (UTC)

Again, I re-iterate, what changes specifically, are you seeking to the WP:BOTPOL and/orWP:BAGG? You say talk page messages should have a 'higher bar for entry', but are happy with how things with IABot went. So if IABot is not an example of something that needs to be address by WP:BOTPOL, what would be an example of it? Newsletter delivery bots? DPL bot? Because right now it really seems you're seeking a solution to a problem either doesn't exist or can't be defined, or are contradicting yourself because you say you're happy with how IABot was handled, but seek a change to BOTPOL to prevent things like IABot from happening, even though it had consensus. Headbomb {t · c · p · b} 16:33, 21 February 2018 (UTC)

@Headbomb: I do not want to criticize IABot in retrospect or find fault with anything done in the past. However, knowing what we know now, if anyone proposed anything like IABot then I would criticize it heavily and recommend that it have a prohibition against talk page messages. The outcome was not something we could have predicted but now that we know, yes, the design is a big problem. I do not wish to attack IABot because I like it and I like that it recently quit posting these messages, which means that it has always operated with the best available guidance. Now we have better information about what it means to post a huge amount of content and now I would not want that to happen again so casually.
I say the same about newsletter bots - they should post to hundreds of places typically and thousands of places sometimes. I hope that no newsletter operates at the level of 100s of 1000s without a human opt-in. No newsletter should go to the talk pages of any category of 100,000+ wiki articles. Consent matters more when a process becomes more interventional.
I might communicate poorly and I might continue to fail to communicate effectively. You say that the problem cannot be defined, but everyone imagines a default. I could start this way - can you at least say how much human time you think is consumed if IAbot or anything else posts 500,000 messages to talk pages? You say this is hard to measure, but is your best guess that this consumes 0 human time, some positive amount of time (I suggested 1 second per message), or some negative amount of time (like maybe posting messages somehow gives more time to people)? You say that this cannot be measured, so you must have in your mind some default time measurement. Probably that is 0: you imagine that since there is no number, then go forward with evaluations imagining that the messaging neither takes time nor consumes time.
The value of the activity is another issue, and of course in the end a bot evaluation compares the value of the bot to the cost of operating it. I am not considering all aspects of bots at this time, and not considering the activity benefits. Right now I am only raising the issue of measuring and reporting bot costs. My argument is that a bot posting 500,000 messages to talk pages has a cost in human time consumed. If I understand you correctly, you are taking the position that posting 500,000 messages will have no cost in human time.
I have trouble understanding you but that fault is mine, and also I take the blame for failing to make my own position understandable. I neither expect you to understand me nor do I expect you to agree, but I am trying here and I appreciate the talk. Thanks. Blue Rasberry (talk) 17:32, 22 February 2018 (UTC)

I find costs on Wikipedia to always be interesting discussions, but really hard to practically quantify. One thing to measure is what the cost of a reader clicking on a replaced link that doesn't lead to the intended target due to an IABot false positive is. Is that cost worth leaving a talk page message for further review? Now what's the cost of not leaving those messages? What's the cost of having Cyberpower participate in these discussions when they could be working on other things? And so on. I can see where you're coming from, but I'm not really sure how this could be integrated into the bot policy.

PS: I'd suggest investing in a better keyboard and mouse if it takes you 30 seconds to scroll past messages. Legoktm (talk) 20:34, 21 February 2018 (UTC)

@Legoktm: Suppose that that there is a 2000 character talk page message posted and persisting on a talk page for 2 years, and in that time the wiki pageview metrics report that 100 people have come to this message. May I ask if you would be willing to challenge my time estimate that such a message will consume at least 30 seconds over that time period, imagining that some people will scroll past it, some people will read it, and maybe some people will follow its instructions requesting human attention? The 30 seconds is not for me - it is average time consumed over the life of the advertising message. I am not contesting the value of what IABot does as its primary activity, but only saying that its posting 500,000 long messages should have been a cost consideration in its design and should be a cost consideration for future, similar bots which might post to a talk page.
I do not wish to press you into an argument, but I do wish that I could improve my own failure in communication and prevent you from misunderstanding that I am complaining about the inconvenience of scrolling for the sake of myself. Any message multiplied by 500,000 is new to wiki culture and is on a scale which we have no culture of understanding or discussing. Any process which has a design to consume time 500,000 times adds up to a major time sink. I do not object to the activity; I only object to the inherent design to consume much more time than the activity required, and that when there was a choice to consume more time or less time, the choice made was to consume more on the presumption that human time is not a cost worth considering. Blue Rasberry (talk) 17:32, 22 February 2018 (UTC)

Mass page moves[edit]

I've made a bold edit. If it is controversial, it can be discussed below. E to the Pi times i (talk | contribs) 19:45, 7 April 2018 (UTC)

Reverted, while mass page moves are covered by WP:BOTPOL, WP:MASSCREATION does not apply to that. Headbomb {t · c · p · b} 23:53, 7 April 2018 (UTC)
@Headbomb: Um, WP:BOTPOL doesn't currently say anything about mass page moves, so I don't see how "mass page moves are covered by WP:BOTPOL".
Unless you mean implicitly, but policy should be explicit, and anyway, mass page moves technically count as mass article creation if the new pages are being created with the page moves (which is all page moves for non-admin accounts).
I note in your edit summary you said "while mass page moves need a BRFA, this is not the policy section to cover that". Where would you suggest covering it? Mass page creation seems like the most appropriate section to cover it to me, since it's not covered anywhere else in the page, and it's not content heavy enough for its own section. E to the Pi times i (talk | contribs) 03:54, 8 April 2018 (UTC)
@E to the Pi times i: It's implicitly covered by WP:BOTREQUIRE and WP:BOTAPPROVAL. Bot policy doesn't need to be explicit about every specific type of edits made. For example, we don't specifically list adding/removing/updating WikiProject banners as something needing approval, even though it's one of the most common type of bots out there. The WP:MASSCREATION restriction exists because it creates massive cleanup backlogs when things are done sloppily, including dozens if not hundreds of PRODs/AfDs if created on non-notable entries. Bad page moves are very easy to undo, without the need for admin intervention in most cases.
But more to the point, before additions should be made to the bot policy (a WP:MASSMOVE section), there has to be a need for those additions. So the following things need to be addressed (IMO) before an addition is made to the bot policy
  1. Do you have examples of problematic mass page moves, especially problematic script-assisted page moves?
  2. Does preventing mass moves at a policy level solve more problems than it creates?
  3. Is there consensus for such additional restrictions (e.g. need for a WP:VP discussion before moving pages)?
Headbomb {t · c · p · b} 12:17, 8 April 2018 (UTC)
My concern comes more from a desire for personal clarity in reading the policy than from an overarching policy perspective, so my changes may be in error. I would suggest a revised version: "The restriction also applies to mass page moves when the moves create pages.", but I think your concerns may be correct when it comes to not having a consensus or need for explicitly regulating these changes in this way.
Also, I moved the anchor because it causes problems with edit summary links, as can be seen in my most recent edit summary on this page. E to the Pi times i (talk | contribs) 15:04, 8 April 2018 (UTC)

Creation of redirects[edit]

I recently used AWB to create about 50 uncontroversial redirects – see here. Each one was manually approved. Out of a nagging sense of paranoia, I just checked the bot policy and saw WP:MASSCREATION. Are uncontroversial redirects covered by that policy? If so, mea culpa – my apologies, and I will not do this again. Best, Kevin (aka L235 · t · c) 00:06, 17 April 2018 (UTC)

Redirects are not technically considered "articles" (although they are in the "article namespace"), so you're in the clear from a literal interpretation of the policy standpoint. And more importantly that is not the kind of thing the policy was created with in mind, so I wouldn't worry about it if I were you. - Kingpin13 (talk) 00:24, 17 April 2018 (UTC)
Thanks, Kingpin13 – much appreciated. Kevin (aka L235 · t · c) 01:53, 17 April 2018 (UTC)
Gonna say the letter of WP:MASSCREATION is perhaps not super clear here, but I agree with Kingpin13 here about the intent of WP:MASSCREATION: it's about articles. Sufficiently large redirect creation might still fall under WP:BOTPOL, and certainly fall under WP:CONSENSUS, but that'd be like any other sort of editing out there. Headbomb {t · c · p · b} 20:14, 26 May 2018 (UTC)

RFC to ease introduction of citation templates to articles not presently using them[edit]

Please see Wikipedia talk:Citing sources#RfC: Remove the bullet point that starts "adding citation templates..."

The discussion is relevant to this policy because even though the RFC has been around for less than a day, there have been several mentions of mass changes. Naturally mass changes are a phenomenon associated with bots. Jc3s5h (talk) 17:12, 4 July 2018 (UTC)