Wikipedia:Bot owners' noticeboard

From Wikipedia, the free encyclopedia
  (Redirected from Wikipedia:BOTN)
Jump to: navigation, search

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although its target audience is bot owners, any user is welcome to leave a message or join the discussion here. This is not the place for requests for bot approvals or requesting that tasks be done by a bot. It is also not the place for general questions about the MediaWiki software (such as the use of templates, etc.), which have generally a best chance of being answered at WP:VPT.

User:Citation bot - mass creation of sub-templates[edit]

Citation bot (talk · contribs) is creating a massive number of sub-templates which store citation data within the Template namespace. I have read thru the various past approval requests listed at User:Citation bot#Bot approval, but I can't find an explicit request where they ask for approval for the mass creation of pages (request #6 mentions creation of subtemplates, refering to request #2, but that request doesn't ask for that ability). This bot has been operating in this manner apparently for many years, and the result is about 49k of these templates within Category:Cite doi templates, as well as some other types of citations. There are over 67 million DOIs in existence, so this could continue to grow indefinitely if left unchecked.

I'd like to ask that this bot be blocked temporarily until the operator can explain where and when he got community approval for mass creation of pages, and until this function of his bot has had a proper discussion (and moreso, the general problem of external data being stored in the template namespace). -- Netoholic @ 08:20, 11 June 2014 (UTC)

I can't see a problem with this. This is a very useful function that CitationBot's author clearly doesn't receive enough kudos for. It's not indiscriminately sucking up the entire DOI database, but just the information that is required to put properly formatted citations into articles, as and when that information is needed. The exact same information would have needed to be put in the articles directly by hand, if CitationBot wasn't doing this. Can you tell me what your alternative proposal is? -- The Anome (talk) 11:05, 11 June 2014 (UTC)
The problem is the mass creation of pages. There is no problem with filling out citations within the articles. --Netoholic @ 17:00, 11 June 2014 (UTC)
@The Anome:: May I suggest that if it finds an improper formatted citation, it put the properly formatted citation into the article itself, not create a template itself and then reference the template? If a bot were to fix all citations to a website, IMDb for example, no one would suggest that it create separate templates for every page (either the ones used or every page that exists) in the hope that someone could refer to that page. -- Ricky81682 (talk) 20:29, 11 June 2014 (UTC)
The explanation is in the docs shown at {{cite doi}}. The idea seems to be that if a ref has a DOI, you can just put {{cite doi|xxx}} (where xxx is the doi) in a reference. If the same doi has been used in another article, the job is finished because the first usage will have created the template. If this is the first usage, the bot notices the missing template and does the grunt work of filling in the citation and creating the template. It seems like a good system. To work out if it is effective, someone would need to count how many such templates exist (49k from the above), and compare that with the number of times a {{cite doi}} template is used. Johnuniq (talk) 11:30, 11 June 2014 (UTC)
Two comments: (1) {{cite doi}}, {{cite pmid}}, {{cite isbn}} templates are only generated as needed and certainly not for every doi in existence and (2) not everyone thinks they are a good idea (see this discussion). Boghog (talk) 12:24, 11 June 2014 (UTC)
Citation bot does one-by-one creation of these templates at the explicit request of editors. The OP appears to misunderstand how and when Citation bot operates. Citation bot is extremely useful in replacing tedious human work with robot work. Like Reflinks, Citation bot operates on demand only, and has a few bugs, but is on balance very useful. – Jonesey95 (talk) 13:03, 11 June 2014 (UTC)
Citation bot never sought approval to create new pages. Bots are specifically restricted in what tasks they perform. -- Netoholic @ 17:00, 11 June 2014 (UTC)
I support this bot's activity; with the caveat that I see it as a precursor to moving all such citation data to Wikidata, in the long term. As an aside, that data should also include the ORCID identifier of each author, where known (see WP:ORCID for more, incuding my interest as Wikipedian in residence with ORCID)). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:07, 11 June 2014 (UTC)

This issue at hand is that this bot has exceeded its mandate - the approval requests that were made were to update/fill-in citation data (which is uncontroversial), but has change his method of doing this into the mass creation of pages (which is definitely controversial) without seeking approval. The bot's function should be stopped immediately and evaluated. --Netoholic @ 17:00, 11 June 2014 (UTC)

I guess it all depends on your perspective, The bot could be stopped, and the issue discussed, the outcome would be that there isnt an issue. The bot has been doing this for almost 6 years, which predates MASSCREATION by about 3 years. Given those factors this is a non-issue. Werieth (talk) 17:12, 11 June 2014 (UTC)
The bot policy has always stipulated that the functions a bot performs must be approved. This bot has been performing a function that it never requested, and just because its been working and largely out-of-sight in the template space for 6 years doesn't mean its appropriate to let bots drift from their mandates. Its the bot owner that has the responsibility to prove that he has community approval, and the community has said for many years that mass page creation is something they want discussed. -- Netoholic @ 18:48, 11 June 2014 (UTC)
And the functions are approved, filling in references. Given that CB pre-dates the policy on mass creation, the bot falls under the grandfather clause. Can you point to any kind of discussion where users have had issues with the bots actions? From my perspective the bot is following the mandate given by users to support the process of filling in DOI and other reference material. The approval is fairly broad, the bot is doing it. You seem to be the only one objecting to how it does what it was approved to do. Werieth (talk) 18:55, 11 June 2014 (UTC)
This has been raised before at Wikipedia:Bot owners' noticeboard/Archive 4#A separate template for each cited source?. I believe it was during this that User:Smith609 gave his own bot tacit ex post facto approval to create new pages. Also this discussion shows that there is a strong indication that users may not want this. The bot could do its function (filling out citations) without creating new template pages. --Netoholic @ 19:12, 11 June 2014 (UTC)
@Werieth: You ask "Can you point to any kind of discussion where users have had issues with the bots actions?" - just look at the number of unresolved bug reports at User talk:Citation bot, some of them go back months. --Redrose64 (talk) 19:14, 11 June 2014 (UTC)
@Redrose64: There is certainly an issue that the bot owner has said that he doesn't have much time to maintain the bot, and it can be weeks between Wikipedia logins. He has said "The code is open source and interested parties are invited to assist with the operation and extension of the bot ... ", but quite frankly it seems more appropriate to transfer ownership to someone else with more time. But that's a side issue, I think. -- John Broughton (♫♫) 20:40, 11 June 2014 (UTC)

The issue of creating separate template pages was raised in March 2009, as pointed out above - see Wikipedia:Bot owners' noticeboard/Archive 4#A separate template for each cited source?. Like many discussions on Wikipedia, it ended with no consensus or resolution. So things have continued on, as is, for over five years.

I think it's important to focus on issues caused by the approach used by the bot, rather than whether there was appropriate approval. (Even if there had been approval, it can always be revoked.) So, possible issues:

  1. Extra pages - I'd argue that this is a minor problem, at worst, since the pages are in template space, where the are ignorable. And the cost of storage and extra processing to create article pages is negligible. Plus, as noted in the March 2009 discussion, if the English Wikipedia ever decides to have a Reference namespace, as is in place for the French Wikipedia (and others?), this might be useful. (Or not).
  2. Problems with editing the references. This is, to my mind, the big problem, something that may not have been as clear as it could be in my March 2009 posting. Using a doi cite template is fine for readers, but when someone goes into edit mode, what he/she sees is a naked template that cannot be directly edited. So (a) we're asking editors to know (yet) something more - what the curly brackets mean, and how to go to template space to edit the contents), and (b) we're adding a significant additional step for the editor who does understand how to edit that cite doi template.
One way to fix the second problem would be for editors and the bot to use "subst:"; that locks in the text so that going into edit mode makes all the citation information available. This approach would be compatible with having separate doi templates, and reusing the information on them. It would, however, mean that subsequent changes to a doi template would not automatically be reflected in articles using that doi source.
There is, however, something new that potentially impacts this discussion: Visual Editor. In edit mode when using VE, an editor can see all citation information. However, based on a test I just did, he/she still can't edit it - but perhaps that's coming. (I'm going to ask, at WP:VE/F). If/when that happens, and if/when the majority of editors are using VE (I predict that to happen around 2016), then the second problem goes away, at least for those using VE. -- John Broughton (♫♫) 20:40, 11 June 2014 (UTC)
The operator asked for help way back. Not long ago I raised a request here to get him a relief operator and was met with resounding silence. Is there now an assumption that bot owners are perpetually on the hook for keeping the bots up to date, or what's the story? Anyhow, notwithstanding that this particular use of Citation bot in template-space was always a bad idea, we still need that bot to be fixed for its vast utility in article-space. LeadSongDog come howl! 21:58, 11 June 2014 (UTC)
For anyone concerned with this, I might suggest that they try to join this Google hangout, read the list of projects on Meta, attend Wikimania or watch the video of the talk after, and otherwise participate in the conversation. At least 50 highly engaged Wikimedia community members have ideas in this space.
I do not know the right course of action, but if I named the most likely one, it would be that Citation bot or its successor puts the citations on Wikidata instead of English Wikipedia, and then all Wikimedia projects in all languages call the citation from there rather than storing it locally for each project. Blue Rasberry (talk) 14:40, 12 June 2014 (UTC)
@Bluerasberry: Thank you for the update. I hope consideration will also be given to how Visual Editor should handle "cite doi" templates, or their successor. -- John Broughton (♫♫) 03:52, 13 June 2014 (UTC)
One major appeal of templates is that it puts formatting and navigation out of the hands of the "casual" editor, to prevent impacting other articles unintentionally. This points to two issues: VE should not generally make editing of templates from within an article edit easy, and data which should be easy to change and related to article content should be in the article itself, not in a template. No matter how one slices it, data about a citation (title, author, etc) must be present in the article. If anything, everything you've brought up points to strong reasons why this bot should be stopped from creating new templates, and instead just recover information from offsite and make use of {{cite journal}} or other similar standard citation template which keeps the data in-article. --Netoholic @ 15:42, 13 June 2014 (UTC)
You say "data about a citation (title, author, etc) must be present in the article itself", but you do not say why. Indeed, I think you are, to put it bluntly, wrong. Certainly the reader must be presented with that data when reading the article, but that is not the same thing. Why would we think that it is a good thing to have a hundred variant citations of a source edition if that source that is cited in a hundred different articles? Each of the hundred will be variously incomplete, incorrect, or quirky in punctuation etc. Corrections made to one will leave the other ninety-nine still in error. To illustrate, consider OL2527037W, or if you prefer the WorldCat version. This work lists many editions, true, but many of the edition records are redundant, with trivial variations such as missing pub date, year-only versus month-year; "Cassell" vs. "Cassell & company, itd." vs. "Cassell and company, ltd."; "Rev. [i.e. 8th] ed" vs. "[8th] rev. ed." etc. Such variations are also found in other catalogues (both single-library and union) but do not fundamentally impede the identification of the work or edition. Admittedly, in almost every case it is far more important that a reader knows what work to seek than what specific edition. If a reader goes to her library and finds only the "wrong" edition, it will almost always still contain the cited passage, though perhaps on a different page. That is usually sufficient to verify the supported text, which is after all the point of the whole endeavour. Once an editor identifies a source, the tools should make it as trivial as possible to cite it. In cases where all the metadata is available, it is lunacy to require an editor to do more than pick the cited edition from a list of similar instances, then add a page number. We've simply become accustomed to tolerating this lunacy because we have done it for so long now. This entrenched folly exacts a huge cost in missing or incomplete citations, unverifiable assertions, and longstanding errors that cannot be checked simply because the (incorrectly) cited work cannot be found. We see instances of this all the time at wp:RX. Have a look at Comparison of reference management software to get an idea what's possible with existing software, much of which is freely licensed and some of which has even been implemented on mediawiki platforms. LeadSongDog come howl! 21:04, 13 June 2014 (UTC)
Tell me, what if one of those "trivial variations" in edition leads to a completely different conclusion? 'Facts' tend to have a certain half-life, and lots of things we thought were true at the beginning of the Wikipedia project have been replaced by new facts in later editions of references. Seeing the exact edition of a reference being used in an article is important because it can flag material that needs to be reevaluated or updated. Also, the metadata should be in each article itself, in the form of a 'cite journal' or other standard citation template, because that preserves the data with the text that makes use of that reference. This makes portability of our articles much easier for when its mirrored elsewhere. Creating a parse for 'cite journal' (such as recreating that template on their wiki, or using some other code) is trivial. Expecting them to not only download the article text but also several cite doi subtemplates separately, then tasking them to merge those templates back into the article text, is burdensome.
Look, this is a discussion that is out of place on this page, which is devoted to the operation of bots. I think though that it is valuable in showing there are many perspectives on how to properly handle citation templates, and that the topic deserves a lot more discussion. The bot, then, should not continue on a path that could be changed, since the work required to correct it will only increase as it continues. The bot has bugs, is doing a mass page creation function that it never got approval for, and the operator is frequently unresponsive. It should be stopped until the proper course can be decided and the problems with its operation are addressed. People can revert quite easily to using standard citation templates in the meantime, which can be migrated to a different solution if that's what's called for. The mass page creation, though, is a major administrative concern and is causing a lot of workload for people that have to manage orphaned cite doi's. -- Netoholic @ 02:35, 14 June 2014 (UTC)
Perhaps I drifted off topic. I was principally referring not to different editions, but rather to different ways of describing the same edition, such as "Rev. [i.e. 8th] ed" vs. "[8th] rev. ed.". Ideally we should have only one definitive bibliographic record per edition, irrespective of how many times or ways that edition is cited. This proposal is an attempt to get that addressed.
I agree entirely that the cite doi templates are a bad idea, but we should not address that by stopping the bot: that would only leave a lot of incomplete template subpages. Rather the template itself should be deprecated, and creation of new instances prevented. LeadSongDog come howl! 03:41, 14 June 2014 (UTC)


So, I've been doing some semi-automated editing using my alternative account User:StradBot, and I've been shouted at because it looks like an unapproved bot. Which is fair enough, because it doesn't have a bot flag, and the name ends in "Bot". What's the best way forward here? Do I need to file an RFBA for the next AWB-like task I do using StradBot? Or do I just need someone to give the account a bot flag? I'm new to all of this bot business, so any advice would be appreciated. Best — Mr. Stradivarius ♪ talk ♪ 08:36, 12 June 2014 (UTC)

Since you're manually reviewing every edit, you technically don't need a bot flag, but most people would appreciate if you did get one. Legoktm (talk) 08:38, 12 June 2014 (UTC)
Makes sense. Off to the bureaucrats' noticeboard I go, then. :) — Mr. Stradivarius ♪ talk ♪ 08:41, 12 June 2014 (UTC)
Err, I forgot to answer the second part of your question. To get a bot flag, it'll have to go through WP:BRFA first, and then the crats will flag it after BAG approves it. Legoktm (talk) 08:43, 12 June 2014 (UTC)
Ah, that's important. I'll do that if I have a run involving a large number of pages, then. Thanks for the advice! — Mr. Stradivarius ♪ talk ♪ 08:45, 12 June 2014 (UTC)
Using a bot flagged account for manual edits is also frowned upon, because bot edits don't receive the same level of scrutiny. Not that I mind that much, just alerting you to the rock-strewn path. All the best: Rich Farmbrough13:31, 17 June 2014 (UTC).

Bot that are inactive for the last 2-4 years and may lose bot flag[edit]

This is an (incomplete) list of inactive interwiki bots that not been re-purposed, not edited since February 2013 and in most cases since even earlier. For security reasons and following advice given at Wikipedia:Bureaucrats'_noticeboard#Remove_bot_flag_from_inactive_interwiki_bots.3F, these bots should be de-flagged.

-- Magioladitis (talk) 08:31, 28 June 2014 (UTC)

Next Steps[edit]

This is a rather simple request, proposing the following steps to complete (note when complete below) for this one-off batch:

  1. Ping each operator as to this discussion. Any operator responding that they plan on reactivating their bot can strike through and sign their bot's name above.
    Yes check.svg Done. Notifications sent, was a fun tour around all the other language wikipedias, I'm pretty sure I updated the operator's talk pages there, and not the main pages :D — xaosflux Talk 12:34, 28 June 2014 (UTC)
  2. Wait 10 days for response and community concerns. (Time's up has started. (refresh))Yes check.svg Done
  3. De-authorize the remaining bots (2 BAG endorsements here <for the batch>)Yes check.svg Done - see below
  4. Ping WP:BN for de-flagging.Yes check.svg Done - All bots in the list above are deflagged ·addshore· talk to me! 12:22, 8 July 2014 (UTC)
  5. Update bot pages to reflect no longer authorized. Yes check.svg Done ·addshore· talk to me! 12:45, 8 July 2014 (UTC)

During the wait period, should a community consensus to change this process emerge, all steps to be reconsidered. — xaosflux Talk 11:38, 28 June 2014 (UTC)

@Xaosflux: Why it has to be that complicated? If someone wants their bot flag back they can request to be re-flagged. In most cases these bots were bot operated by editors from other projects and most of them are located in the deprecated (if still exists) toolserver. -- Magioladitis (talk) 11:42, 28 June 2014 (UTC)
I'm sending all the notification now, I suspect this will be a completely non-controversial action, but have run in to sensitivity issues before in cleaning up old permissions, and don't want to open the door to offending editor/operators. I'll have the notifications out today, and if no one shows up it will be done. Gathering the endorsements should be easy too, assuming we will already have yours. — xaosflux Talk 11:48, 28 June 2014 (UTC)
  • Yes, thanks Mag and xaos. Removing the flag is not urgent, so leaving a note and providing some time for response is appropriate. –xenotalk 11:56, 28 June 2014 (UTC)
I agree. Xaosflux's arguments convinced me. -- Magioladitis (talk) 11:59, 28 June 2014 (UTC)
Also endorsed by me, and if nothing comes up I will go through the list once the 10 days has passed remove all of the flags, no need to bother BN :) ·addshore· talk to me! 09:46, 7 July 2014 (UTC)
Addshore thanks! Since my list was hand-made I presume there are more bots that are inactive for more than 2 years and were not included on this list. -- Magioladitis (talk) 12:57, 8 July 2014 (UTC)
There probably are! Infact I did know of a tool that would provide us with such a list but I dont seem to be able to find it anywhere! ·addshore· talk to me! 12:58, 8 July 2014 (UTC)
Addshore using AWB's list comparer I found 1,368(!) bots in "All Wikipedia Bots" since are not in "Unapproved Wikipedia Bots". User:ZsinjBot is inactive since 2006(!) but it has no flag and it it is not in "Unapproved Wikipedia Bots". Maybe we should add that category ot remove the "All..." one? -- Magioladitis (talk) 13:04, 8 July 2014 (UTC)
Well I have removed 116 bot flags today (and put 116 bots in the 'Unapproved' category). Feel free to come up with another list for me ;p Bots in general need a bit of a cleanup! ·addshore· talk to me! 13:23, 8 July 2014 (UTC)
Note on global bots

While running through notifications, noticed there a few of these bots that are ALSO Global Bots, this clean up resolve their account on en:; but they will still have interwiki global bot flags, further discussion on cleaning them up would need to go to meta: — xaosflux Talk 12:14, 28 June 2014 (UTC)

Indeed! If you start a discussion I would love a link! ·addshore· talk to me! 09:46, 7 July 2014 (UTC)
Community Discussion

(Note dropped at WP:VPR)

Any local bot (that is, bot that isn't a global bot) should be geflagged if inactive gor a year - between the fact that this frequently means that the bot's task is over; and the fact that our policies change over time, and aren't alwqays reflected in th \e policy changes, we need to prevent these "ghost bots" from starting up again without discussion. עוד מישהו Od Mishehu 15:47, 3 July 2014 (UTC)
I might suggest a higher threshold (2 years?) but we have certainly deflagged long-inactive bots in the past. We should conduct a similar notification as above to the long inactive bot/owners and go from there. –xenotalk 17:26, 3 July 2014 (UTC)

This is going to be the biggest deflag in Wikipedia's history. -- Magioladitis (talk) 18:41, 8 July 2014 (UTC)

So I have removed around 300 bot flags today from interwiki bots and bots that have not edited since the beginning of 2010 (excluding some bots that showed they were using the bot flag for reading using high limits). I think this is probably as far as I will 'clean up' for now. So there are still bots with ~4 years inactivity with the bot flag but this has certainly made a large dent in the list :) ·addshore· talk to me! 00:14, 9 July 2014 (UTC)
Thank you Addshore, guess you win the top user permissions log for July already :D — xaosflux Talk 00:39, 9 July 2014 (UTC)
I actually removed another 30 flags this morning. This leaves the line currently at bots with 4 years inactivity. ·addshore· talk to me! 10:16, 9 July 2014 (UTC)

DPL bot now writing dead links on user talk pages[edit]

As has been well-discussed at the village pump and the admin noticeboard, toolserver is officially dead. Which means DPL bot (talk · contribs) is now inviting people to solve disambigs using the old toolserver link. (recent example) Of course, you click on that, and find it's dead. Since the bot frequently notifies newbies about their errors, this is bad news.

IMHO all bots that write toolserver links to user talk pages should be shut down until their code is fixed to point to wmflabs. Ritchie333 (talk) (cont) 09:37, 2 July 2014 (UTC)

  • Yes, this needs to be fixed; but it's not simply a matter of changing "" to "", because the tools in question have not been ported to Labs. I don't see any need to shut down the bot; although the dead links are not ideal, the "harm" they cause is relatively minor compared to the beneficial work the bot is doing. Let's give JaGa a few days, at least; he's still around, although not as active as he formerly was. --R'n'B (call me Russ) 13:27, 2 July 2014 (UTC)

It was a fantastic tool, why couldn't the foundation support him and keep it?♦ Dr. Blofeld 19:43, 2 July 2014 (UTC)

@Dr. Blofeld: I would rather not air the user's dirty laundry, but basically boiling down to the tool owner wanting to write a new tool that is both technically resource intensive and possibly legally problematic for the WMF. The user is attempting to use their tools as leverage in that disagreement. (There minimum demand is a 24TB storage array, not including the disks needed to back that much data up). Werieth (talk) 19:57, 2 July 2014 (UTC)
Pish. 24 Tb is nothing to the WMF. All the best: Rich Farmbrough02:11, 3 July 2014 (UTC).

I see, most unfortunate. It actually made dabbing fun. Now I'll have to rely on User:The Banner to do it all :-)♦ Dr. Blofeld 19:59, 2 July 2014 (UTC)

I think you guys just have to switch to WPCleaner. Works perfectly when you want to clean up article by article. The Banner talk 20:15, 2 July 2014 (UTC)
And no, Dr. Blofeld. I am not going to solve them all. Sometimes I come across articles that are so specialist, that the content is almost Chinese for me. In those cases, I use the classic tactic of making a runner. And I think that it will be very difficult to get and keep the number of links to disambiguation pages under 50 000. The Banner talk 20:31, 2 July 2014 (UTC)

Yes check.svg Done I just stripped out the text that included the bad links. Such a shame, it was a fantastic tool. --JaGatalk 03:17, 3 July 2014 (UTC)

But to use WPCleaner, you have to run Java, which exposes your computer to security risks. --Lineagegeek (talk) 14:45, 4 July 2014 (UTC)
Using internet is already a security risk. The Banner talk 13:14, 8 July 2014 (UTC)