Wikipedia:Bots/Noticeboard

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BON)
Jump to: navigation, search

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here.

If you want to report an issue or bug with a specific bot, follow the steps outlined in WP:BOTISSUE first. This not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical).


Cluebot reversion of good-faith edits[edit]

I tried to start a discussion regarding Cluebot on the Cluebot talk page and my comments were archived by the bot without response. I'm concerned about Cluebot reverting good-faith edits, and the effect this may have on potential contributors.

Reading through the Cluebot pages and considering the lack of response, and rapid archiving, of my comment -- it is my feeling that discussions of this nature are not welcomed by the bot operator. It seems to me that the wider community ought to have a voice in how Cluebot is operated and should be entitled to review Cluebot's work on an ongoing basis and discuss the bot's settings and edits without having to fill out forms and have the discussion fragmented. I am concerned that the characterization of the 0.1% "false positive rate" used by the bot's proponents, though useful technically, belies the substantial number of good-faith edits this bot is reverting. Since it has been some years since the bot was approved, I think it's appropriate to review the work it is doing in light of the current editing climate and the evolution of the bot itself (and its settings) over the years.

At a minimum, I believe that the bot's operators and proponents have an obligation to take these concerns seriously enough to discuss them.

While mistaken reverts can be undone, the frustration they may cause to a well-meaning, fledgling contributor cannot.

The Uninvited Co., Inc. 19:52, 3 November 2017 (UTC)

Seems Cobi (talk · contribs), the bot's owner, hasn't edited since July 2017. Someone may want to send him an email. Headbomb {t · c · p · b} 20:17, 3 November 2017 (UTC)
In the meantime, did you report the false positive? Headbomb {t · c · p · b} 20:19, 3 November 2017 (UTC)
(edit conflict × 2) @UninvitedCompany: There is a notice on that page that says to report false positives at User:ClueBot NG/FalsePositives and not on that page (this is also in every edit summary for the bot). That's how they track issues and make improvements to the coding of the bot. I see no reason to create a protracted discussion. Nihlus 20:20, 3 November 2017 (UTC)

() To answer your two specific questions:

How have the decisions been made over what edits the bot will revert?

— The Uninvited Co., Inc.
The bot uses an artificial neural network to score each edit, and the bot reverts at a threshold calculated to be less than 0.1% false positives. See User:ClueBot NG#Vandalism Detection Algorithm, User:ClueBot NG/FAQ#Why did ClueBot NG classify this edit as vandalism or constructive?, User:ClueBot NG/FAQ#I think ClueBot NG has too many false positives. What do I do about it?.
ClueBot NG Edit Flow.png

What is the best way to have an open discussion about the way this automation is being conducted and its effect on new contributors?

— The Uninvited Co., Inc.
By giving specific, actionable suggestions whose merits can be discussed and the community can come to a consensus.

-- Cobi(t|c|b) 23:03, 3 November 2017 (UTC)

To reply to your comments here:

I tried to start a discussion regarding Cluebot on the Cluebot talk page and my comments were archived by the bot without response. I'm concerned about Cluebot reverting good-faith edits, and the effect this may have on potential contributors.

— The Uninvited Co., Inc.

False positives are an unfortunate technical inevitability in any system that automatically categorizes user content. Human editors suffer from this as failing as well. The only thing that can be done is to figure out where the trade-off should be made. I am certainly open to discussing where that trade-off is, but as you haven't made a proposal yet, I am happy with where it currently is.

Reading through the Cluebot pages and considering the lack of response, and rapid archiving, of my comment

— The Uninvited Co., Inc.

It's the same 7 day archival period you have on your talk page. I was busy and your message at the time didn't appear particularly urgent in nature, and in the 7 days no one else had any thoughts on the matter and so the bot archived it.

it is my feeling that discussions of this nature are not welcomed by the bot operator.

— The Uninvited Co., Inc.

This is a hasty generalization.

It seems to me that the wider community ought to have a voice in how Cluebot is operated and should be entitled to review Cluebot's work on an ongoing basis and discuss the bot's settings and edits without having to fill out forms and have the discussion fragmented.

— The Uninvited Co., Inc.

Free-form discussion is encouraged on the bot's talk page. Or here.

I am concerned that the characterization of the 0.1% "false positive rate" used by the bot's proponents, though useful technically, belies the substantial number of good-faith edits this bot is reverting.

— The Uninvited Co., Inc.

False positive rates are used as standard metrics for any kind of automated classification system. <0.1% means less than one edit is falsely categorized as vandalism out of every thousand edits it examines.

Since it has been some years since the bot was approved, I think it's appropriate to review the work it is doing in light of the current editing climate and the evolution of the bot itself (and its settings) over the years.

— The Uninvited Co., Inc.

Review is always welcome so long as it comes with concrete, actionable changes of which the merits can be properly discussed. Pull requests are even better.

At a minimum, I believe that the bot's operators and proponents have an obligation to take these concerns seriously enough to discuss them.

— The Uninvited Co., Inc.

We do.

While mistaken reverts can be undone, the frustration they may cause to a well-meaning, fledgling contributor cannot.

— The Uninvited Co., Inc.

Of course, but that is hard to measure objectively. Do you have any good metrics on the frustration caused to well-meaning, fledgling contributors? I'd love to see that data, and be able to tweak things to help those metrics go in the direction we want. -- Cobi(t|c|b) 23:39, 3 November 2017 (UTC)

I sense an attitude that the bot is essentially part of "settled policy" and the burden of change falls upon the shoulders of those individuals raising concerns. I don't think that's appropriate for any bot, let alone one that is so prolific, wide-ranging, and discretionary in what it does. I don't see where there has ever been any informed consent by the editing community at large that the tradeoffs made in the design of the bot are appropriate, let alone any ongoing discussion as the bot has evolved.
In response to your question, I did report the edit using the interface provided.
The fact that the "false positive rate" is a standard metric for systems with similar architecture does not mean that it is the most appropriate or only metric that should be used in community discussion of the bot's performance. I think it would be valuable for the community and the bot operators/designers alike to be aware of other metrics such as the number of good-faith edits reverted by the bot per unit time. It would be interesting to see whether that figure matches the projection one might make using the theoretical false positive rate and the gross reverts per unit time. The Uninvited Co., Inc. 18:01, 6 November 2017 (UTC)
Absolute numbers are not useful, that's why we discuss error rate, which includes both false positives and false negatives. Your discussion does not include the latter. There is a balance between reverting too many valid edits versus leaving too many bad edits. Hypothetically, if 10 in every 1000 reverts over some time period are false positives and we up the threshold and bring it down to 2 in 500 reverts over the same time period, then that is 3 good edits preserved but also 500 more vandal edits that someone has to manually review and revert. Who does the burden of reverting these edits fall upon? Where is the line between potentially trading a new editor versus exhausting multiple anti-vandalism editors? What if we instead lowered the threshold and got 30 in 2000 false positives, and thus were fixing 100% more vandalism? This is a system where (broadly speaking) lowering false positives also ups the false negatives. —  HELLKNOWZ  ▎TALK 18:22, 6 November 2017 (UTC)
We could...you know...disable IP editing all-together and force account creation. Less vandalism, and opportunity for such. Better false positive and false negative rates as well. :p—CYBERPOWER (Chat) 18:47, 6 November 2017 (UTC)
I think this would be a much better discussion if we actually had such metrics. I believe the absolute number is a good indicator of the extent of the problem even if it isn't relevant technically. And I believe it is relevant technically, because it indicates the amount of potential improvement that could be achieved by refining the parts of the bot outside the Bayesian filter. A careful review of reverted good-faith edits might, for example, reveal some obvious patterns that could be used to tweak the filter threshold, or the logic around it. The Uninvited Co., Inc. 01:06, 7 November 2017 (UTC)
  • Definitions are everything -- The assertion is made: "<0.1% means less than one edit is falsely categorized as vandalism out of every thousand edits it examines."
    No that's not what it means. It means less than one in a thousand is observed by a human editor as incorrectly categorized, who then follows the not-so-simple process to report it. For those pages no one follows, most of ClueBot's activities are unmonitored. Rhadow (talk) 15:34, 12 November 2017 (UTC)
    Yes, definitions are everything. We don't calculate that number based on reports. That number is calculated by dividing the training data randomly in half and giving half of the training data to the engine to train it, and then giving the rest of the training data to it as if they were live edits. It has to categorize them correctly with a false positive rate of less than 0.1%. That is, for every 1,000 edits we feed it for testing, only one can be a false positive. And this is just the core engine, before any sanity checks like the rest of that diagram after the "above threshold" box. See this FAQ entry. Please don't make uninformed assertions without doing at least a little bit of research. No where have we ever said that the false positive rate is based on reported false positives, and asserting it like you know it as fact is not an appropriate way of bringing up questions or theories. Neither is it appropriate to assert as true that my factual statements, backed up by process and code that are both publicly review-able, are definitively wrong. -- Cobi(t|c|b) 22:20, 12 November 2017 (UTC)
  • Thank you Cobi, for sending us to the definition of the published false positive rate (FPR). This is a second-semester epidemiology statistics exercise, made slightly more complicated by the third-semester definition of training sets used in AI. Publishing a false positive rate (Type I errors) from the training exercise is incomplete if not misleading. It would be more informative to see the whole confusion matrix. ClueBot uses a neural network which, unlike other classification methods, may give superior numeric results, but may never provide an explanation of how it identified a vandal's edit. An outsider needs the whole picture of the results in order to have the same level of confidence you do.
    People would have a higher level confidence in the protocol if they knew the size and the age of the training set. If the training set is not a valid sample of today's production data, then the 0.1% FPR is meaningless. I would like to see the rate of reported false positives each week or month from the actual data, not what the expected rate was from the training set. Rhadow (talk) 15:18, 13 November 2017 (UTC)
    All of the data is available either on the report website or on the Wikipedia API itself. You are welcome to generate any statistics you like. -- Cobi(t|c|b) 18:33, 13 November 2017 (UTC)
  • Hello The Uninvited -- You are correct, ClueBot III cleans up its own talk page Cluebot talk page frequently, so that a casual visitor will fine no evidence of complaints.
    And another observation -- the 0.1% denominator means nothing without a discussion of the numerator. There were 3.3 million edits last month. Of those, it looks like ClueBot makes about 30 revisions an hour or 21,000 a month. I rather doubt there are editors looking at 21,000 reversions a month. No more than 210 miscategorized articles are being reported a month. The more ClueBot does, the better the numbers look, because there are no humans to check on it. Rhadow (talk) 15:58, 12 November 2017 (UTC)
    Before talking about calculations, please get your definitions correct. . The archival settings for User talk:ClueBot Commons are set to 7 days, a common setting for user talk pages. The archives are there for anyone who wishes to look into the archives, and I am certainly open to anyone who wants to revisit discussions that were archived too soon to do so. Just, if you do so, add something to the conversation, because otherwise there is no value in pulling it from the archives. -- Cobi(t|c|b) 22:32, 12 November 2017 (UTC)
  • Is this report based on a single diff (23 October 2017) of ClueBot making a revert of an edit that a human might judge to be good faith, and so would merely click "undo" rather than "rollback"? The most important editor at Wikipedia is ClueBot because reverting vandalism quickly is key to convincing vandals that their time would be better spent at other websites. The most important person at Wikipedia is Cobi, ClueBot's maintainer. I agree that ClueBot's talk is archived too aggressively but some more generic discussion (WP:VPMISC?) about ClueBot's possible mistakes should occur rather than insisting that Cobi personally respond to each complaint. It is impossible for a bot to revert vandalism without occasionally reverting good-faith edits. Experience shows that is also impossible for humans. Johnuniq (talk) 21:55, 12 November 2017 (UTC)
  • I'm with Cobi here. It's par for the course when users that are clueless about how bots work, or the work that goes into them, come up demanding the bot to be perfect, but sometimes I really scratch my head when someone persists/piles on with no knowledge of said topic. Bots are never flawless, neither are humans, getting things right is all about balance. Just like ClueBot NG, it's similar for me with User:InternetArchiveBot.—CYBERPOWER (Around) 02:29, 13 November 2017 (UTC)
Seconded. This bot is very useful with false positives within acceptable range. Humans are also there to correct its errors. —PaleoNeonate – 07:11, 13 November 2017 (UTC)
(Off-topic) People seem to demand perfection for everything and get annoyed when there's a problem. Today the PRESTO card system was experiencing some difficulties and I see people dumping on the system on Twitter saying it has "nothing but problems" when in reality it works fine 99% of the time. Sounds similar to some of the nonsense I've seen on Wikipedia over the years about CBNG (e.g. "ClueBot is clueless" and what other creatively thought-of insults for the bot that has clearly been a WP:NETPOSITIVE, if we looked at bots that way). SMH. —k6ka 🍁 (Talk · Contributions) 01:15, 17 November 2017 (UTC)

Residual issues resulting from the Maintenance script bot's edits in 2015[edit]

See this discussion on Meta. Old/invalid accounts were renamed & given new "enwiki" names by the Maintenance script bot but the original accounts apparently weren't closed & account info wasn't migrated to the new/valid accounts... So. Editors are continuing to edit under the old/invalid accounts. Shearonink (talk) 16:30, 9 November 2017 (UTC)

Not sure this is a BOTN issue, especially since it's being dealt with at meta. Primefac (talk) 16:34, 9 November 2017 (UTC)
(edit conflict) No comment here. I was just trying to bring it to someone's attention. I've topic banned myself from BOTN for CIR reasons. GMGtalk 16:34, 9 November 2017 (UTC)
Yes, this probably isn't the completely correct place for a notice about it - I admit I don't operate bots, etc. - but it is an ongoing issue affecting Wikipedia-editing today so I thought it might need some more eyes on it. Could people have two Wikipedia accounts - both the original account that was renamed and the new account - and possibly be editing from both? Anyway, I'll wait for an answer on meta then. Shearonink (talk) 16:48, 9 November 2017 (UTC)

Appeal by Δ (BetaCommand)[edit]

The community is invited to comment on the appeal lodged by Δ at Arbitration Requests for Clarification and Amendment.

For the arbitration committee - GoldenRing (talk) 11:13, 18 November 2017 (UTC)

Double-redirect tagging[edit]

While the discussion at Wikipedia talk:Double redirects#The bots should operate with a delay has pretty much died down without clear consensus, there's been a suggestion that double-redirect-fixing bots should tag the redirects they fix with {{R avoided double redirect}}. This will help alert human editors to redirects that are left pointing to the wrong location as a result of disputed moves or mergers being reverted. Can this be implemented? Pinging bot operators R'n'B, Xqt and Avicennasis. --Paul_012 (talk) 10:12, 21 November 2017 (UTC)

I propose to file a bug at phabricator that this proposal could be implemented in the redirect.py script of the common pywikibot repository.  @xqt 11:49, 21 November 2017 (UTC)
I would certainly oppose a bot adding {{R avoided double redirect}}. Move a page like Proceedings of the Royal Society, and then you'd have 48 redirects tagged with that for no real reason.Headbomb {t · c · p · b} 12:31, 21 November 2017 (UTC)
What if limited to redirects which aren't the result of page moves? My original concern was mostly with pages that were changed into redirects and then reverted. --Paul_012 (talk) 23:51, 21 November 2017 (UTC)

ARBCOM on Wikidata[edit]

See Wikipedia talk:Arbitration/Requests#Crosswiki issues: Motion (November 2017). This will be relevant both to WP:BAG members and Wikidata-related bot operators. Headbomb {t · c · p · b} 00:37, 28 November 2017 (UTC)

How will that affect me. I’m a little confused here?—CYBERPOWER (Around) 01:56, 28 November 2017 (UTC)
Looks like the immediate impact would be to BAG, from (C) - task approvals for any large tasks involving wikidata integration will need an enhanced level of community support and wider advertisement for input. — xaosflux Talk 02:32, 28 November 2017 (UTC)
Given how controversial Wikidata integration has historically been, I'd say ArbCom's motion shouldn't make much difference there since we should already have been demanding strong consensus. Anomie 21:13, 30 November 2017 (UTC)
I would recommend placing all Wikidata-related bot requests on hold if filed after the time of the motion. Until we have a larger RfC result, these changes are extremely controversial and wide-scale changes explicitly discouraged by the motion. ~ Rob13Talk 14:29, 28 November 2017 (UTC)

Bot causing multi colon escape lint error[edit]

There are now 8,218 lint errors of type Multi colon escape, and all but 7 of these are caused by WP 1.0 bot. This bug was reported at Wikipedia talk:Version 1.0 Editorial Team/Index#Bot adding double colons 9 October 2017. Perhaps some bot experts who don't typically wander in those parts can apply their skills to the problem. Please continue the discussion there, not here. —Anomalocaris (talk) 06:14, 28 November 2017 (UTC)

Didn't Nihlus already deal with all of these? Primefac (talk) 13:08, 28 November 2017 (UTC)
NilhusBOT 5 is a monthly task to fix the problems with the 1.0 bot until such time as the 1.0 bot is fixed. --Izno (talk) 14:16, 28 November 2017 (UTC)
Correct. I've been traveling lately, so I wasn't able to run it. I am running it now and will let you know when it is done. Nihlus 14:33, 28 November 2017 (UTC)
  • Please see User talk:Nihlus#1.0 log. The problem is that to fix this properly afterward will be more difficult, unless the 1.0 bot, or another task, is done to retroactively rewrite the log page (not only update new entries but correct old ones). —PaleoNeonate – 15:50, 28 November 2017 (UTC)
Basically, I don't have the time but would need to myself properly fix the log today. It's simpler to just revert and fix it properly when I can. —PaleoNeonate – 15:51, 28 November 2017 (UTC)
@PaleoNeonate: Why are you having this discussion in two separate places? I addressed the issue on my talk page. Nihlus 15:55, 28 November 2017 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────I thought that this may be a more appropriate place considering that it's about 1.0 bot and Nihlusbot, so will resume it here. Your answer did not address the problem. Do you understand that:

  • Before October 5, 2017, category links were fine, but then later were broken, resulting in the same kind of bogus double-colon links as for drafts (these were not mainspace links, but Category: space links)
  • It's possible that draft links were always broken, resulting in the same kind of broken double-colon links
  • Nihlusbot causes both broken category and draft space links to become mainspace links (not Draft: or Category: ones as it should)
  • As a result, the "fix" does not improve the situation, the links are still broken (mainspace red links instead of category and draft links).
  • If keeping these changes and wanting to fix them later, it's more difficult to detect what links were not supposed to be to main space. In any case, to fix it properly, a more fancy script is needed which checks the class of the page...

Thanks, —PaleoNeonate – 23:31, 28 November 2017 (UTC)

Do I understand? Yes, yes, this time it did due to a small extra bit in the code, disagree as stated already, this is something I am working on. Thanks! Nihlus 00:27, 29 November 2017 (UTC)
So, there are issues with almost every single namespace outside of articlespace, so WP 1.0 bot is making a lot of errors and should probably be prevented from continuing. However, until that time, I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. Nihlus 01:52, 29 November 2017 (UTC)
A few hours ago, there were just 6 Multi colon escape lint errors. Now we have 125, all but 4 caused by WP 1.0 bot. This may be known to those working on the problem. —Anomalocaris (talk) 06:02, 29 November 2017 (UTC)
@Nihlus: thanks for improving the situation. I see that Category links have been fixed (at least the ones I noticed). Unfortunately links to drafts remain to mainspace. —PaleoNeonate – 19:54, 29 November 2017 (UTC)
@PaleoNeonate: As stated above: I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. Nihlus 19:55, 29 November 2017 (UTC)
Yes I have read it, but unfortunately contest the value of such hackish edits in 1.0 logs. Perhaps at least don't just convert those to non-working mainspace links when the class is unavailable, marking them so they are known not to be in mainspace (those double-colon items never were in mainspace)? A marker, or even a non-linked title would be a good choice to keep the distinction... —PaleoNeonate – 20:48, 29 November 2017 (UTC)
Again, I repeat: I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. That means those are the only fixes I am making with the bot going forward as I have no intention of supervising each edit made to discern whether something is a draft/project page or not. Nihlus 20:56, 29 November 2017 (UTC)
"I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed." We appear to talk past eachother. That is not what technically happened. This diff (which you reverted) was made because links to mainspace were introduced for pages not in mainspace. If your script doesn't touch such links in the future when it cannot determine their class, that's an improvement. You say that you don't correct them, but so far they were still "fixed" (converted to erroneous mainspace links). The "loss of information" from my first complaint was about that those bogus links were previously unambiguously recognizable as non-mainspace (those that are now confusing, broken mainspace links when the class is not in the text). —PaleoNeonate – 05:27, 1 December 2017 (UTC)
  • Has the bot operator been contacted or responded to this issue?—CYBERPOWER (Merry Christmas) 02:18, 1 December 2017 (UTC)
    @Cyberpower678: From what I understand, there have been multiple attempts at making contact with them. To be thorough, I have emailed Kelson, Theopolisme, and Wolfgang42 in an attempt to get a response and solution. Nihlus 04:36, 1 December 2017 (UTC)
    I have blocked the bot until these issues are resolved.—CYBERPOWER (Merry Christmas) 05:09, 1 December 2017 (UTC)
    Thanks. I will hold off on any task 5 edits with my bot. Nihlus 05:39, 1 December 2017 (UTC)
    You should note that per this post Wolfgang42 cannot do anything for us and I recall that Theopolisme is currently unable to devote time to this, so Kelson seems to be the only one who can assist right now. As I've mentioned a number of times in the last 2 years we really need to find someone with the skills to maintain this. ww2censor (talk) 12:14, 3 December 2017 (UTC)

They are voting on the future of AWB (a program used for powering bots)[edit]

Since AWB has a bot flag, that turns it into a bot engine, I thought you might want to know about a vote going on that will affect the nature of that program:

https://meta.wikimedia.org/wiki/2017_Community_Wishlist_Survey/Bots_and_gadgets/Convert_AWB_into_a_special_page#Voting

The Transhumanist 00:25, 2 December 2017 (UTC)