Wikipedia:Bots/Noticeboard

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Bots noticeboard

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here.

If you want to report an issue or bug with a specific bot, follow the steps outlined in WP:BOTISSUE first. This not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical).


IABot blue linking to Internet archive books[edit]

To a degree I have raised this previously and somewhat poorly in September 2019 at Wikipedia:Village pump (policy)/Archive 154#BOT linking to archive.org possible copyrighted sources. And I learnt a real lots on open library and worldcat out of it and finding sources in books held on archive.org (search foobar site:archive.org). While bot url blue linking seemed to have ceased after that I have observed this has restarted. For example: [1]. This is actually really cool and really useful fantastic stuff. And I use IAbot regularly myself for Linkrot. My concerns revolve around the issues of book loans from Internet Archive and how this is being pointed to with a possible bias as opposed to worldcat where other sources can be located. ( The example I have above has isbn but that wouldn't be present pre-1980s? or so). In summary:

  • At User:InternetArchiveBot IABot is described as 'an advanced bot designed to combat link rot, as well as fix inconsistencies in sources such as improper template usage, or invalid archives'. Blue linking arguably goes beyond the WP:LINKROT brief.
  • I would be more comfortable if the BOT when blue linking also provided a worldcat (olcl=) link or perhaps an open library link (ol=) which gives library locations (worldcat) or also shows possible sellers (open library). I have done this for the example above.[2]

Thankyou.Djm-leighpark (talk) 22:57, 11 November 2019 (UTC)

The diff you linked shows that there was already an ISBN present. The ISBN links to Special:BookSources, which leads to all of the sources you list and many more. – Jonesey95 (talk) 00:56, 12 November 2019 (UTC)
As I said there are cases where no isbn is present, or it may be the BOT operates only when an isbn is present? I support I could try and reverse engineer it or something but it is not reasonable for it to even try and I already mentioned about the isbn in this example. Thankyou. Djm-leighpark (talk) 02:31, 12 November 2019 (UTC)
Seems to be GreenC bot doing this but it may be a called procedure. Is this authorised? Djm-leighpark (talk) 07:33, 14 November 2019 (UTC)
@Djm-leighpark: can you provide some diffs of recent edits that you are concerned with? — xaosflux Talk 12:16, 14 November 2019 (UTC)

Recent diffs from my Watchlist:

These are typically useful, but I have the following concerns:

  1. This functionality does not appear to be shown on the IABot or GreenC bot userpage, or if it may be obfuscated. Rather than 'correcting' a citation this is adding to it.
  2. Just a concern in case this functionality has not been authorised. (Probably I don't know where to look).
  3. Concern if any issues if pointing towards open library loans as opposed to leaving links for commerical buying/finding at worldcat. I may be alone on this. This is partly because this is as far a I know this is an openlibrary/Internet Archive funded initiative which is biased towards pointing at Internet Archive resources away from other resources. I may be alone on this.
  4. Rather than just pointing URLs consideration should be made to also leaving the ol= identifier (and ideally oclc identifier as well). I may be alone on this.
  5. I am concerned the page links may not work on some documents .... I thought I had an example but it was not an Internet Archive resource so there may not be an issue.

Thankyou.Djm-leighpark (talk) 13:52, 14 November 2019 (UTC)

The project has been in the news[3][4][5][6][7] etc. There is approval for adding the books per BRFA and RFC. GreenC and Cyberpower678 are disclosed paid editors of Internet Archive and collaborating on the project. The project is fully supported by the WMF who consider the Internet Archive one of their closest partners, both are non-profit organizations with overlap, we try to collaborate with non-profits vs commercial organizations. Brewster Kahle wants to scan every book cited on Wikipedia so that it can be linked to directly at the page number, this represents 10s of millions of dollars and years of effort, every few weeks they are sending a shipping container full of books to various countries for scanning. -- GreenC 14:34, 14 November 2019 (UTC)

  • Thankyou for that information ... perhaps it would be useful to have a links to if from the bot user page ... perhaps it is already there and I've missed it (if it was on the Did you know I'd probably still miss it). On the positive side I am kind of leveraging some of the side effects of this into articles already. Again thankyou for the information.Djm-leighpark (talk) 14:57, 14 November 2019 (UTC)
I'm not opposed to IABot adding |url= links to cs1|2 citation templates as long as those that require registration are so marked. That appears to be happening. But, when |title= is wholly or partially wikilinked, adding a value to |url= causes URL–wikilink conflict errors. I think that I've discussed this issue with Editor Cyberpower678 though I can't find where I did that. My recollection of that conversation was that Editor Cyberpower678 was not interested in fixing that. I fully admit that I may be mistaken about this impression. It would be good to see it fixed because I grow weary of fixing these damned errors when they should not have been created in the first place.
Trappist the monk (talk) 15:06, 14 November 2019 (UTC)
I don't know about the history but IABot does not do this currently. -- GreenC 15:52, 14 November 2019 (UTC)
Really? Here are a couple of today's IABot/GreenC bot edits that broke the cs1|2 templates:
German occupation of Norway 04:29, 14 November 2019 UTC
Program counter 07:20, 14 November 2019 UTC
If it isn't IABot then it must be GreenC bot that is breaking these templates.
Trappist the monk (talk) 15:59, 14 November 2019 (UTC)
Oh yes, that is a problem. Should be fixed now. It was skipping some instances and not others. -- GreenC 16:32, 14 November 2019 (UTC)
Good. Thanks.
Trappist the monk (talk) 16:34, 14 November 2019 (UTC)
Trappist the monk, I’m not sure what I did to leave that impression, but I take bug reports seriously. —CYBERPOWER (Around) 03:13, 15 November 2019 (UTC)
@Djm-leighpark: maybe I didn't explain my ask well - sorry; can you provide some actual on-wiki revisions diff's of edits you think are problematic? An example of what I'm looking for would be this or this. Looking to see which account made exactly which edit that needs further review. — xaosflux Talk 15:12, 14 November 2019 (UTC)
  • I've tweaked the examples above to show the Diff's better. I don't know there's anything technically wrong with any of them; just concerns as I now know a little better about whats going on. I must confess to being a total Luditte and stay away from the interactive editor ... However I can see cites getting longer and longer and getting more messily intertwined with the prose. WP:LDR solves it but sends interactive editors (mostly everyone but me) bananas. Havard's rubbish on web cites. But I wonder how well the BOT copes with WP:LDR and Havard and citations such as Rennie (1849) in John Rennie the Elder where there are many pages to link? Djm-leighpark (talk) 16:18, 14 November 2019 (UTC)
    • Currently it doesn't. It only adds a link from a specific page mention. Most of the book citations at John Rennie the Elder would also be ignored because they don't use a {{cite book}} with an ISBN, as far as I know. Nemo 21:39, 25 November 2019 (UTC)

User:GreenC bot and edit filters[edit]

Is it possible to flag GreenC bot so it's not blocked by edit filters? I'm working to replace a blacklisted domain with archive versions such as:

[http://example.com]

With:

[http://web.archive.org/web/20190901/http://example.com]

The domain was hijacked/usurped by spammers so the old archives have good content. The edit filter is blocking upload of diffs since it is part of the archive URL. -- GreenC 01:11, 23 November 2019 (UTC)

@GreenC: I think the issue is the spam blacklist, since I see only 3 filter hits since 2017 (2 last february, and 1 on a testing filter) DannyS712 (talk) 01:33, 23 November 2019 (UTC)
OK this is correct it says spam blacklist, I was thinking they were the same .. should I post over there? Not sure what the options are if the account can be flagged or remove the listing temporarily. -- GreenC 01:46, 23 November 2019 (UTC)
I used example.com above but the actual domain is blackwell-synergy.com -- GreenC 01:52, 23 November 2019 (UTC)
See phab:T36928 - despite being open and certainly could be useful, it has been ignored by developers for 5 years, maybe at the end of 2020 we can beg for it in the wishlist process again... — xaosflux Talk 02:06, 23 November 2019 (UTC)
Xaosflux, ^ this. —CYBERPOWER (Around) 02:45, 23 November 2019 (UTC)
@Xaosflux: // or, I just spent a few minutes learning the basic code base of the extension and wrote a patch. Now we just need to convince people that it should be added DannyS712 (talk) 02:56, 23 November 2019 (UTC)
+1 that! — xaosflux Talk 03:12, 23 November 2019 (UTC)
I love how the sbl board only has "backlog" lol. — xaosflux Talk 03:13, 23 November 2019 (UTC)
@GreenC: Can a pattern for the URLs that should be allowed be added to MediaWiki:Spam-whitelist? Anomie 14:22, 23 November 2019 (UTC)
Anomie, not really unless you want to blanket archive web.archive.org and other archiving services. But then, that would be open to abuse. You could tighten the regex but it would be absurdly long. The best case would be to let IABot and GreenC bot have a permission that can override the blacklist. —CYBERPOWER (Chat) 14:34, 23 November 2019 (UTC)
@Cyberpower678: Doesn't seem that difficult to me, since archive.org encodes the archive date in the URL and archive.org is the archive site requested here. Let's say we want to allow only archive links before 2017, because the domain was hijacked in early 2017. Something like \bweb\.archive\.org/web/(?:19\d{2}|200\d|201[0-6])\d{10}/http://example\.com/ should do it. Anomie 14:47, 23 November 2019 (UTC)
Anomie, What about the other domains that have been, or will be hijacked? This won't be, hasn't been the only case where the bots have been stopped because of the spam blacklist. —CYBERPOWER (Chat) 14:51, 23 November 2019 (UTC)
On the other hand, what about the concerns raised on the task regarding situations where some editors can (accidentally) bypass the blacklist and others can't? Anomie 14:59, 23 November 2019 (UTC)
My thought was the right be given temporarily for specific jobs then removed when completed. Sort of a temporary privilege one can apply for. Adds overhead but shouldn't be a common request only if doing bot or other automated tasks specific to a blacklisted URL (not incidental). -- GreenC 15:37, 23 November 2019 (UTC)
@Anomie: we could make a group for it, and allow "sysop" to "add/remove", and allow "bot" to "addself/removeself" - sort of like how the 'flood' group is on other projects. — xaosflux Talk 16:48, 23 November 2019 (UTC)

Hi, thanks everyone. I ended up converting to doi.org (or |doi=) and deleting the original URLs, after the run on 550 pages it was left with 14 pages that need an archive URL added, this is pretty good can finish the rest manually. Good to see Danny's patch submitted. -- GreenC 15:37, 23 November 2019 (UTC)

Thank you. That was the recommended course of action indeed. (Such hijacks are one reason we need to get rid of all redundant publisher links as quickly as possible, replacing them with stable resources.) I see there are currently some links left to blackwell-synergy.com/toc/ and similar, but when you're done please note it at m:User:Praxidicae/DOI fix. Nemo 21:44, 25 November 2019 (UTC)

Bot question[edit]

Is there a bot (or a script) that can be set up to remove image files from sections headings? Images shouldn't really be added to article section headings per MOS:HEAD and MOS:ACCIM, and thanks to JJMC89 at User talk:JJMC89#Images in headings, I've now find a fairly fast way to search for such images, However, there seem to be quite a lot of articles (about a thousand) where this is a problem, including some which have quite a lot of sections with an image added to pretty much each section heading like this. The basic syntax for these files begins with either [[Image: or [[File:, but some articles may have Wikilinks in their headings as as well so just searching for ==[[ might yield some false positives. I'm also not clear on how a space between the heading syntax and the file syntax might effect a bot's search results (e.g. ==[[File: and == [[File:). Manually cleaning these up is a bit time consuming, but it can be done; I'm just wondering if there might be a faster way. I'm not sure whether HEAD and ACCIM would also apply to article talk pages; the rationale seems to be applicable, but talk pages don't really need to be article quality in terms of the MOS, etc.

In a similar vein to the above, I'm wondering if there's also a way for a bot/script to look for citations added to article section headings. Citations, however, would usually come after the heading itself; so, maybe searching for </ref>== could work to help track them down. Citations might require an human editor to follow up to see if there's a way for the citation to be used in the article. Perhaps having about track them down and adding them to a category page would be better than having a bot remove them outright. -- Marchjuly (talk) 05:54, 26 November 2019 (UTC)

As far as I know we have no scripts or bots for either thing. A dedicated bot might be desirable, although you'd want to make sure there are no justified exceptions beforehand - maybe ask some of the editors who added such links or files? Jo-Jo Eumerus (talk) 06:59, 26 November 2019 (UTC)

Hatnotes[edit]

I was wondering whether there are any bots which make sure hatnotes stay at the top of a page, in case someone adds another template at the top of hatnotes, like a message box.--CaiusSPQR (talk) 21:09, 1 December 2019 (UTC)

Related RfC[edit]

Hi. I have opened an RfC at Wikipedia talk:New pages patrol/Reviewers/Redirect autopatrol#RfC on autopatrolling redirects that relates to bots. Watchers may be interested. Thanks, --DannyS712 (talk) 01:38, 2 December 2019 (UTC)