Jump to content

User talk:Citation bot: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
D not T
Only four requests at a time?: Replying to Abductive (using reply-link)
Line 188: Line 188:
:::::::I am not ignoring anything. Bots exist to do tedious editing tasks. Your notion that editors have to do the tedious work before giving the bot a task is contrary to the purpose of bots. A number of proposals have been put forward to improve bot performance or relieve pressure on the bot, such as allowing multiple instances of the bot, or allowing users to run the bot from their userspace. These proposals have not been implemented. As the bot is currently configured, there will always be load problems. <span style="font-family: Cambria;">[[User:Abductive|<span style="color: teal;">'''Abductive'''</span>]] ([[User talk:Abductive|reasoning]])</span> 19:29, 27 August 2021 (UTC)
:::::::I am not ignoring anything. Bots exist to do tedious editing tasks. Your notion that editors have to do the tedious work before giving the bot a task is contrary to the purpose of bots. A number of proposals have been put forward to improve bot performance or relieve pressure on the bot, such as allowing multiple instances of the bot, or allowing users to run the bot from their userspace. These proposals have not been implemented. As the bot is currently configured, there will always be load problems. <span style="font-family: Cambria;">[[User:Abductive|<span style="color: teal;">'''Abductive'''</span>]] ([[User talk:Abductive|reasoning]])</span> 19:29, 27 August 2021 (UTC)
:::::::::Load problems that you are exacerbating. We've requested a million times to have better scheduling, or more ressources, but no dice so far. You're cognizant there's an issue, and you yet repeatedly feed the bot low-priority low-efficiency work. That's pretty [[WP:DE]] / [[WP:IDIDNTHEARTHAT]] behaviour from where I stand. &#32;<span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 19:34, 27 August 2021 (UTC)
:::::::::Load problems that you are exacerbating. We've requested a million times to have better scheduling, or more ressources, but no dice so far. You're cognizant there's an issue, and you yet repeatedly feed the bot low-priority low-efficiency work. That's pretty [[WP:DE]] / [[WP:IDIDNTHEARTHAT]] behaviour from where I stand. &#32;<span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 19:34, 27 August 2021 (UTC)
::::::::increasing the bot's capacity would:
::::::::* require a lot of work by the editors who kindly donate their time to maintain and develop this bot. [[WP:NOTCOMPULSORY]], and they should by pressed to donate more time. Their efforts are a gift from them, not a contract.
::::::::* exacerbate to some extent the usage limitations of the external tools which the bot relies on. Increasing the speed of the bot's operation will mean that those limits are encountered more frequently.
::::::::The bot will probably always have load problems, because there is so much work to be done.
::::::::Two examples:
::::::::# Headbomb's jobs of getting the bot to cleanup refs to scholarly journals. That is high-value, because peer-reviewed are the gold standard of [[WP:Reliable sources]], and it is also high labour-saving because those citations are very complex, so a big job for editors to fix manually. It is high-intensity work for the bot because many of the articles have dozens or even hundreds of citations. I dunno Headbomb's methodology for building those jobs or what numbers can be estimated from that, but assume that tens of thousands of such pages remain to be processed.
::::::::# my jobs targeting bare URLs are focused on a longstanding problem of the core policy [[WP:V]] being undermined by linkrot, which may become unfixable. I have lists already prepared of 75,000 articles which need the bot's attention, and have a new methodology mostly mapped out to tackle about 300,000 more of the 450k more articles with bare URL refs.
::::::::My lists are (like Headbomb's lists) all of bot-fixable problems, so they don't waste the bot's time, but they do not tackle such high-value issues as Headbomb's list, so I regard mine as a lesser priority than Headbomb's.
::::::::So whatever the bot's capacity, there will be enough high priority high-efficiency work to keep it busy for a long time to come. it is not all helpful for that work to be delayed or displaced because one editor likes to run big job but doesn't like doing the prep work to create productive jobs.
::::::::In the last few weeks I have approached 4 editors about what seemed to me to poor use of the bot.
::::::::* JamCor eventually stopped feeding the bot the same list of ~200 articles several times per day: [[User talk:JamCor#Citation_bot]]
::::::::* {{u|Awkwafaba}} kindly and promptly agreed to stop feeding the bot sets of mostly-declined AFC submissions: [[User_talk:Awkwafaba#Use_of_Citation_bot]]
::::::::* {{u|Eastmain}} seems for now to have stopped feeding the bot sets of mostly-declined AFC submissions: [[User talk:Eastmain#Citation_bot]].
::::::::Only Abductive persists. --[[User:BrownHairedGirl|<span style="font-variant:small-caps"><span style="color:#663200;">Brown</span>HairedGirl</span>]] <small>[[User talk:BrownHairedGirl|(talk)]] • ([[Special:Contributions/BrownHairedGirl|contribs]])</small> 20:57, 27 August 2021 (UTC)


== Stopping a batch job ==
== Stopping a batch job ==

Revision as of 20:57, 27 August 2021


You may want to increment {{Archive basics}} to |counter= 28 as User talk:Citation bot/Archive 27 is larger than the recommended 150Kb.

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. A 503 error means that the bot is overloaded and you should try again later – wait at least an hour.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

minor cleanup

Status
new bug
Reported by
Keith D (talk) 11:21, 27 July 2021 (UTC)[reply]
What happens
Changes {{citeweb}} to {{cite web}}
Removes parameters which again is just cosmetic
changes pp -> pages
What should happen
Nothing if this is the only change as it a purely cosmetic action.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=St_Helens_R.F.C.&curid=1095032&diff=1035699398&oldid=1034338489
https://en.wikipedia.org/w/index.php?title=Greenhead_College&curid=1055782&diff=1036242447&oldid=1031187573
https://en.wikipedia.org/w/index.php?title=North_Bay_Railway&curid=6428950&diff=1036241873&oldid=1031186675
We can't proceed until
Feedback from maintainers


Ok Mandalisme (talk) 07:11, 18 August 2021 (UTC)[reply]

bot adds |chapter= to cite document

Status
new bug
Reported by
Trappist the monk (talk) 11:30, 28 July 2021 (UTC)[reply]
What happens
bot adds |chapter= to {{cite document}}, a redirect to {{cite journal}}; |chapter= and its aliases is not supported by {{cite journal}}.
What should happen
This:
{{Cite encyclopedia |last=Mari |first=Licia |date=2002 |entry=Amendola, Ugo |encyclopedia=Grove Music Online |publisher=Oxford University Press |doi=10.1093/gmo/9781561592630.article.44755}}
Mari, Licia (2002). "Amendola, Ugo". Grove Music Online. Oxford University Press. doi:10.1093/gmo/9781561592630.article.44755.
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


or in this case, could be converted to the cite grove template.  — Chris Capoccia 💬 14:36, 4 August 2021 (UTC)[reply]

Consistent spacing

Status
new bug
Reported by
Abductive (reasoning) 03:24, 2 August 2021 (UTC)[reply]
What happens
bot added a date parameter in a ref with a space before every pipe, but did not include a space
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=53W53&type=revision&diff=1036681818&oldid=1036681278
We can't proceed until
Feedback from maintainers


I know this is a minor bug, but it bugs me. I know that the bot is written to make an attempt to duplicate the formatting already present in the ref. How it could have failed here, I don't know. But more importantly, it should default to the consensus ref formatting: space,pipe,parametername,=,parametervalue. (Spaces before pipes, no spaces around the equals signs or anywhere else, except perhaps before the curly end brackets if there already was a space there.) Abductive (reasoning) 03:24, 2 August 2021 (UTC)[reply]

I agree. The default should be space,pipe,parametername,=,parametervalue. --BrownHairedGirl (talk) • (contribs) 15:27, 2 August 2021 (UTC)[reply]
Cannot fix since the the bot already uses the existing citation template as a guide. Templates that are mixes in spacing such as these cannot be done in a way that makes everyone happy. AManWithNoPlan (talk) 16:45, 2 August 2021 (UTC)[reply]
But how to explain the example? The bot deviated from the format of the ref it edited? Abductive (reasoning) 16:59, 2 August 2021 (UTC)[reply]
I see, you want the bot to add spaces to existing parameters - in particular the last one. Interesting, the bot by default does not in anyway modify spacing of existing parameters. That parameter has no trailing spaces. As far as the bot in concerned there are no spaces before pipes, just spaces at the end of parameters. AManWithNoPlan (talk) 17:14, 2 August 2021 (UTC)[reply]
The bot must have looked at the lack-of-space of the last parameter (before the end curly braces) to come to the conclusion that the ref was formatted that way. Perhaps it should look after the "cite xxxx" for the cue? Abductive (reasoning) 17:51, 2 August 2021 (UTC)[reply]
not, that is not what it did. It simply does not change the spacing of existing parameters. The existing final parameter has no ending space, so the bot does not add one. AManWithNoPlan (talk) 21:14, 2 August 2021 (UTC)[reply]
Ah, I see what you are saying. It slotted it in at the end. Well, I had hoped that the bot could have provided a cure to the annoying new habit of users removing all spaces from refs, making a wall of text for editors. Abductive (reasoning) 22:25, 2 August 2021 (UTC)[reply]
And creates annoyingly unpredictable line wraps. Does this format really have consensus? If so, bots (any bot) could create a cosmetic function for citations they edit. -- GreenC 17:04, 6 August 2021 (UTC)[reply]
There are some people who like the "crammed" format. I started a conversation about the formatting here, but I don't really understand what they were saying. Abductive (reasoning) 02:06, 7 August 2021 (UTC)[reply]
As Abductive suggests, what the bot should do ideally is to check if the first parameter's pipe following the template name is preceded by a space (or even better, if at least one of the parameters' pipe symbol is preceded by space) and if it is, it should add a space in front of pipe symbol of newly inserted parameters, no matter where they are inserted into the parameter list. If the template has no parameters yet, the bot should fall back to the "default" format "space, pipe, parameter name, equal sign, parameter value" we consistently use in all CS1/CS2 documentation and examples. (Well, IMO, this latter format would ideally be made the only format used at all, but that's a discussion beyond the scope of CB issues here.)
Yeah, it is only cosmetic, but like Abductive I too find it somewhat annoying when previously perfectly formatted citations become misaligned by bot edits.
--Matthiaspaul (talk) 13:34, 7 August 2021 (UTC)[reply]
While I agree, this is actually going to be hard to implement. I will need to think about it. AManWithNoPlan (talk) 18:12, 8 August 2021 (UTC)[reply]

Only four requests at a time?

Status
new bug
Reported by
Abductive (reasoning) 22:54, 2 August 2021 (UTC)[reply]
What happens
It seems that the bot can only work on four jobs at any one time.
We can't proceed until
Feedback from maintainers


I sampled the bot's edits going back a few days, and seems that the bot can only interleave, or even accept a single page, only four requests at any one time. At no point can I find five jobs interleaving, and (although this is harder to be certain about) at no point when there are four jobs interleaving can a fifth job be found, even a single page requested. Is this deliberate, and if yes, is it really a necessary constraint on the bot? Abductive (reasoning) 22:54, 2 August 2021 (UTC)[reply]
that is what i have observed and complained about also. I am convinced that default PHP config is 4. Someone with tool service access needs to get the bot a custom lighttd config file. AManWithNoPlan (talk) 23:03, 2 August 2021 (UTC)[reply]
Gah. Abductive (reasoning) 23:07, 2 August 2021 (UTC)[reply]
lol, you people with "jobs". the rest of us with single page requests can't get anything in no matter how many jobs.  — Chris Capoccia 💬 11:20, 3 August 2021 (UTC)[reply]
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd Look at PHP and the "Default_configuration" area that starts collapsed. AManWithNoPlan (talk) 19:18, 3 August 2021 (UTC)[reply]

This is also part of the wider problem that the bot needs much more capacity, and also that a lot of its time is taken up speculative trawls through wide sets of articles which have not been identified as needing bot attention and which often produce little change. Huge categories are being fed to the bot which changes little over 10% of them, and most of those changes are trivia (type of quote mark in title) or have effect at all on output (removing redundant parameters or changing template type). It would help a lot if those speculative trawls were given a lower priority. --BrownHairedGirl (talk) • (contribs) 22:54, 9 August 2021 (UTC)[reply]

Who would decide what " speculative trawls " are? And what should the limit be? Might be hard to find something that can be agreed on. Perhaps the users that request these large categories see them as very important, while you don't. Of course it will be easy to know that certain specially created maintenance categories will give a high output, and do a lot of work, but if a user just wants to request a "normal" category they can't know how many % of the pages will actually get changed beforehand.
I agree capacity should be increased, more jobs at the same time would be such a good thing, however deciding that one page might be more important than another does not fix the root cause.
I do think there might be something going for giving priority to people that request a single (or low amount of) page(s). A person could be running the most important category that exists, but if I just want to have a single page check about a topic that I am knowledgeable about or have a big interest in, it is a hard swallow waiting for multiple thousand page jobs to finish. This actually has made me just give up a few times, leaving pages that could have been fixed and checked (with my knowledge about said subject) broken, I'm sure many can recognise themselves in this.
It is indeed important to fix high priority pages, and especially improve capacity, but lets not forget about the people who edit on topics that they enjoy, and just want to use the bot on something that might not be important according to some maintenance category, but is important to them. The more people that want to keep using the bot, the better! Redalert2fan (talk) 00:22, 10 August 2021 (UTC)[reply]
@Redalert2fan: I think you are missing my point, which is that there is no benefit to anyone in having the bot processing lots of articles where there is nothing for it to do. No matter how important anyone thinks an article is, there is no gain in having the bot spend ages deciding that there is nothing to.
The reason that single pages get locked out is because its capacity is being used up by these speculative trawls, by which I mean simply that they are not categories selected because they concentrate article which need the bot's attention -- these are "see if you find anything" batches, rather than "cleanup the problem here" batches.
One or two editors are repeatedly feeding it big categories on a huge variety of topics simply because they are big categories which fit under the 4,400 limit for categories. I have analysed the results, and in many cases the result is that only 10%–15% of the pages are edited, and only about half of those have non-trivial changes. So about 95% of the time that the bot spends on these huge categories is completely wasted.
When a resource is limited, it is best used by prioritising pages which have been selected on the basis that there is a high likelihood of something to do. --BrownHairedGirl (talk) • (contribs) 00:44, 10 August 2021 (UTC)[reply]
I see, there is no denying that there is no gain in having the bot spend ages deciding that there is nothing to.
Wouldn't it be an even quicker fix to ask these few editors why they run these not so intermediately helpful categories, and notify them of the problems it causes, and the benefits that can be gained by not requesting any category? It seems more like operator error than a bot mistake, and limiting the bots abilities for something that is caused by a few users, seems maybe questionable?
I agree with the points you make, but I don't feel like we should limit hundred of potential editors that request pages with what you would describe as "less than optimal request" because of 2 or so people. Even though we are limited, I don't think we need a strict priority system. If someone random editor want's to request a hundred pages that have their interest, we can't expect everyone to know beforehand if their request is an optimum use of the system, and see if you find anything might be an honest use case still. Obviously if as you say specific editors that seem to know what they are doing use the bot in a way that basically blocks out others for little gain all the time, action should be taken.
Some sort of priority system might indeed be a good idea, whether if it is in the way of important maintenance categories, "the pages with a high likelihood of something to do" or just giving priority to small request etc. Though it has to be a priority system for some types of requests, not a limitation for all request in my opinion, especially if the problem comes from a very minor selection of users. Redalert2fan (talk) 01:11, 10 August 2021 (UTC)[reply]
Max category size shrunk by one-quarter. AManWithNoPlan (talk) 13:08, 10 August 2021 (UTC)[reply]
Thanks, AManWithNoPlan. That's helpful, but wouldn't it be better to reduce it to the same 2200 limit size as the linked-from-page limit? --BrownHairedGirl (talk) • (contribs) 14:32, 10 August 2021 (UTC)[reply]
@Redalert2fan: Since early July, when I started using Citation bot in to clean up bare URLs, I have seen two editors repeatedly using the bot unproductively.
One was JamCor, who was using the bot to process the same set of almost 200 articles 3 or 4 times per day. Many of the articles were huge, taking several minutes each to process, so i estimated that about 20% of the bot's time was being taken on those 200 articles. I raised it on JamCor's talk, and they stopped, but only after the second request.
The other is Abductive, with whom I raised the problem several times on this page: see User talk:Citation_bot/Archive 27#throttling_big_category_runs. Sadly, that one persists, and I gave up making my case. When I started writing this post a few hours ago in response to you, I analysed the then-current recent contribs of the bot. Abductive had started the bot scanning Category:Use dmy dates from August 2012,and by then the bot had processed 1433 of the category's 4379 pages, but had saved an edited on only 141 of them, i.e. less than 10%. A with many of Abductive's previous big runs, I can't see any way in which this run could have been selected as a set which concentrates articles of interest to Abductive, or which concentrates articles of high-importance, or which concentrates articles that have been identified as being likely to have problems this bot can fix. The only criterion which I can see for its selection is that its size (4379 pages) is very close to the 4400 maximum size of Citation bot category jobs. A quick glance at the parent Category:Use dmy dates shows that very few categories are so close to the limit without exceeding it.
So AFAICS,the only reason for selecting this job was that it is a big set of articles which can be thrown at the bot with no more effort than cop-pasting the category title. I may of course have missed something, and if so I hope that Abductive will set me right. --BrownHairedGirl (talk) • (contribs) 14:33, 10 August 2021 (UTC)[reply]
I meant cut to one fourth, not cut by one fourth. So, category is now half the linked pages API. AManWithNoPlan (talk) 14:37, 10 August 2021 (UTC)[reply]
Cut to 1100 items! This is extreme. Grimes2 (talk) 14:53, 10 August 2021 (UTC)[reply]
@AManWithNoPlan: thanks. 1100 is much better.
However, that will reduce but not eliminate the problem of an editor apparently creating bot jobs just because they can. Such jobs will now require 4 visits to the webform in the course of a day, rather than just one, but that's not much extra effort. --BrownHairedGirl (talk) • (contribs) 15:13, 10 August 2021 (UTC)[reply]
People using the bot is not a problem. Abductive (reasoning) 18:02, 10 August 2021 (UTC)[reply]
Indeed, people using the bot is not a problem.
The problem is one person who repeatedly misuses the bot. --BrownHairedGirl (talk) • (contribs) 18:27, 10 August 2021 (UTC)[reply]
It is not possible to misuse the bot. Having the bot make the tedious decisions on what needs fixing is far more efficient than trying to work up a bunch of lists. Unfortunately, even the best list can be ruined if the API that the bot checks happens to be down. This is why it is it inadvisable to create lists that concentrate on one topic. Abductive (reasoning) 19:49, 10 August 2021 (UTC)[reply]
Bot capacity is severely limited. There is no limit to how much editors can use other tools to make lists, so that makes more efficient use of the bot.
Think of the bot like a hound, which is far more effective at finding quarry if started in the right place. The hound will waste a lot of time if started off miles away from the area where previous clues are.
Lots of other editors are targeting the bot far more effectively than your huge category runs. --BrownHairedGirl (talk) • (contribs) 22:11, 10 August 2021 (UTC)[reply]
Hey BrownHairedGirl, I agree with your ideas, but in the end there are no rules for what the bot can be used for so calling it misuse isn't a fair description. Anyone is allowed to use it for anything. Abductive can request what he wants, and creating bot jobs just because you can is allowed. In my eyes every page is valid to check (provided it isn't just a repeat of the same page or groups of pages frequentely).Redalert2fan (talk) 00:13, 11 August 2021 (UTC)[reply]
Just to be sure, whether that is the optimal way to use the bot or not is still a fair point of discussion. Redalert2fan (talk) 00:17, 11 August 2021 (UTC)[reply]
The question of self-restraint by users of an unregulated shared asset is a big topic in economics.
The article on the tragedy of the commons is an important read. It's well written but long. if you want a quick summary, see the section #Metaphoric meaning.
In this case, it would take only 4 editors indiscriminately feeding the bot with huge sets of poorly-selected articles to create a situation where 90% of the bots efforts changed nothing, and only 5% did anything non-trivial. That would be a tragic waste of the fine resource which the developers and maintainers of this bot have created, and would soon lead to calls for regulation. The question now is whether enough editors self-regulate to avoid the need for restrictions. --BrownHairedGirl (talk) • (contribs) 05:30, 11 August 2021 (UTC)[reply]
@AManWithNoPlan: the new limit of 1100 does not seem to have taken effect; see this[1] at 18:07, where the bots starts work on category of 1156 pages.
That may be due to expected delays in how things get implemented, but I thought it might help to note it. --BrownHairedGirl (talk) • (contribs) 18:51, 10 August 2021 (UTC)[reply]
Bot rebooted. AManWithNoPlan (talk) 20:15, 10 August 2021 (UTC)[reply]
Max category cut again to 550 and now prints out list of category pages so that people can use linked pages API instead, which also means that if the bot crashes the person can restart it where is left off instead of redoing the whole thing as it does with category code. AManWithNoPlan (talk) 20:25, 10 August 2021 (UTC)[reply]
Great work! Thanks. --BrownHairedGirl (talk) • (contribs) 22:02, 10 August 2021 (UTC)[reply]

It seems that the low-return speculative trawls have re-started. @Abductive has just run a batch job of Category:Venerated Catholics by Pope John Paul II; 364 pages, of which only 29 pages were actually edited by the bot, so 92% of the bot's efforts on this set were wasted. The lower category limit has helped, because this job is 1/10th of the size of similar trawls by Abductive before the limit was lowered ... but it's still not a good use of the bot. How can this sort of thing be more effectively discouraged? --BrownHairedGirl (talk) • (contribs) 11:57, 27 August 2021 (UTC)[reply]

A number of editors have pointed out to you that using the bot this way is perfectly acceptable. In addition, there are almost always four mass jobs running, meaning that users with one article can't get access to the bot. A run of 2200 longer articles takes about 22 hours to complete, so if I had started one of those, it would have locked such users out for nearly a day. By running a job that lasted less than a hour, I hoped that requests for smaller and single runs could be accommodated. And, in fact, User:RoanokeVirginia was able to use the bot as soon as my run completed. Abductive (reasoning) 18:14, 27 August 2021 (UTC)[reply]
@Abductive: on the contrary, you are the only editor who repeatedly wastes the bot's time in this way. It is quite bizarre that you regard setting the bot to waste its time as some sort of good use.
On the previous two occasions when you did it, the result was that the limits on job size were drastically cut. --BrownHairedGirl (talk) • (contribs) 18:47, 27 August 2021 (UTC)[reply]
That was in response to your complaints. Since I ran a job that was within the new constraints, I was not misusing the bot. You should request that the limits be increased on manually entered jobs, and decreased on category jobs. There is no particular reason that 2200 is the maximum. Abductive (reasoning) 18:52, 27 August 2021 (UTC)[reply]
@Abductive: you continue to evade the very simple point that you repeatedly set the bot to do big jobs which achieve almost nothing, thereby displacing and/or delaying jobs which do improve the 'pedia. --BrownHairedGirl (talk) • (contribs) 19:04, 27 August 2021 (UTC)[reply]
Using the bot to check a category for errors is an approved function of the bot. The fundamental problem is the limit of 4 jobs at a time. Also, the bot is throttled to run considerably slower than it could, which is a holdover from the time when it was less stable. The various throttlings, which as I recall were implemented in multiple places, should be re-examined and the bot re-tuned for its current capabilities. Abductive (reasoning) 19:11, 27 August 2021 (UTC)[reply]
This is not complicated. Whatever the bot's speed of operation, and whatever the limit on concurrent jobs, its capacity is not well used by having it trawl large sets of pages where it has nothing to do. I am surprised that you repeatedly choose to ignore that. --BrownHairedGirl (talk) • (contribs) 19:19, 27 August 2021 (UTC)[reply]
I am not ignoring anything. Bots exist to do tedious editing tasks. Your notion that editors have to do the tedious work before giving the bot a task is contrary to the purpose of bots. A number of proposals have been put forward to improve bot performance or relieve pressure on the bot, such as allowing multiple instances of the bot, or allowing users to run the bot from their userspace. These proposals have not been implemented. As the bot is currently configured, there will always be load problems. Abductive (reasoning) 19:29, 27 August 2021 (UTC)[reply]
Load problems that you are exacerbating. We've requested a million times to have better scheduling, or more ressources, but no dice so far. You're cognizant there's an issue, and you yet repeatedly feed the bot low-priority low-efficiency work. That's pretty WP:DE / WP:IDIDNTHEARTHAT behaviour from where I stand. Headbomb {t · c · p · b} 19:34, 27 August 2021 (UTC)[reply]
increasing the bot's capacity would:
  • require a lot of work by the editors who kindly donate their time to maintain and develop this bot. WP:NOTCOMPULSORY, and they should by pressed to donate more time. Their efforts are a gift from them, not a contract.
  • exacerbate to some extent the usage limitations of the external tools which the bot relies on. Increasing the speed of the bot's operation will mean that those limits are encountered more frequently.
The bot will probably always have load problems, because there is so much work to be done.
Two examples:
  1. Headbomb's jobs of getting the bot to cleanup refs to scholarly journals. That is high-value, because peer-reviewed are the gold standard of WP:Reliable sources, and it is also high labour-saving because those citations are very complex, so a big job for editors to fix manually. It is high-intensity work for the bot because many of the articles have dozens or even hundreds of citations. I dunno Headbomb's methodology for building those jobs or what numbers can be estimated from that, but assume that tens of thousands of such pages remain to be processed.
  2. my jobs targeting bare URLs are focused on a longstanding problem of the core policy WP:V being undermined by linkrot, which may become unfixable. I have lists already prepared of 75,000 articles which need the bot's attention, and have a new methodology mostly mapped out to tackle about 300,000 more of the 450k more articles with bare URL refs.
My lists are (like Headbomb's lists) all of bot-fixable problems, so they don't waste the bot's time, but they do not tackle such high-value issues as Headbomb's list, so I regard mine as a lesser priority than Headbomb's.
So whatever the bot's capacity, there will be enough high priority high-efficiency work to keep it busy for a long time to come. it is not all helpful for that work to be delayed or displaced because one editor likes to run big job but doesn't like doing the prep work to create productive jobs.
In the last few weeks I have approached 4 editors about what seemed to me to poor use of the bot.
Only Abductive persists. --BrownHairedGirl (talk) • (contribs) 20:57, 27 August 2021 (UTC)[reply]

Stopping a batch job

A few hours ago, I screwed up and mistakenly fed the bot a batch of 2,195 pages which I had already fed to the bot. Oops. (The only good side is that only 14% of the pages being edited shows that the bot did a fairly thoroughly job on its first pass, esp since the second pass edits are mostly minor).

As far as I know, there is no way for a user to stop the bot trawling its way through the set, which is what I would have liked to do. Could this be added?

Since the bot logs the user who suggests each batch, and requests are subject to authentication via OAUth, it seems to me that it should in theory be possible for the bot to accept a "stop my batch" request. Tho obviously I dunno how much coding would be involved.

In addition to allowing users to cancel errors like mine, it would also allow users to stop other jobs which aren't having much benefit. --BrownHairedGirl (talk) • (contribs) 12:21, 27 August 2021 (UTC)[reply]