Wikipedia:Bots/Requests for approval/Joe's Null Bot 4
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Joe Decker (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 23:27, Monday April 29, 2013 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): PERL
Source code available: Yes, this would be a slightly modified version of the code for task 1.
Function overview: Update categories of AfC pending submissions with a purge/forcelinkupdate
Links to relevant discussions (where appropriate): Wikipedia_talk:WikiProject_Articles_for_creation#Maintenance_category_maintanence
Edit period(s): 1/day
Estimated number of pages affected: 1000 (purges, not edits)
Exclusion compliant (Yes/No): y
Already has a bot flag (Yes/No): y
Function details: Articles for Creation tracks pending submissions from new editors by days since submission in order to ensure that new editor submissions don't get stale before review. (See: Category:AfC pending submissions by age) As we've saw with task 1, this sort of categorization-by-template-based-on-what-time-it-is doesn't function properly. I propose to duplicate the existing Null Bot task 1 code but traverse Category:Pending AfC submissions, essentially keeping those categories more or less up to date. Existing task 1 bot code does exclusion compliance and honors the server load indications.
This is different than the task 1 in that the category affected is larger, I'll have to tweak the category length paranoia check. The discussion at AfC linked above suggests a daily cap of 1500.
Since I've been asked this about previous tasks: Yes, should the problem this works around be fixed, I'll gleefully dismantle the 'bot. See T20478, which my T39001 was marked a dupe of. I'm guessing it's a non-trivial thing to fix, though, the server isn't going to necessarily be able to backfigure through code to know when it should reevaluate the templates in advance, and there would certainly be a signficant performance penalty for always recalculating categories on read.
Discussion
[edit]The number of runs is variable depending on the number of pending submissions. Ideally it will be under 400. Typically it will be 200-1500. Occasionally it will get over 2000. Throttling should be sensitive to the loads of the server. I recommend logging "All requested work done, N pages purged, HH:MM:SS elapsed" or something similar on a successful run and, on a run that ended for any reason including a deliberate early termination due to too many pages to purge. It's okay if this is in "version 2.0," as it's more important to get something running soon, we can do enhancements later. davidwr/(talk)/(contribs)/(e-mail) 00:21, 30 April 2013 (UTC)[reply]
- davidwr: Thanks for the additional data on your experiences with the backlog size at AfC, much appreciated. Throttling is sensitive to server load, there's a nice preexisting mechanism for that in the MediaWiki software, and we're using it, my code stacks two layers of increasing backoffs when the server load exceeds a threshold. There's also an elapsed time limit already in my code, so both "max articles handled" and "max total run time" are trivially configured, this is all the sort of things we faced with task one, just in somewhat smaller numbers. Don't know if you read PERL, but you may find the source for task 1 illuminating: [1] --j⚛e deckertalk 01:18, 30 April 2013 (UTC)[reply]
Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. One cycle. MBisanz talk 22:25, 30 April 2013 (UTC)[reply]
- Cycle started, will report when complete. Relative to the existing BLPPROD (task 1) code, I dropped the minimum interval between purges to 5 seconds, raised the maximum number of files addressed to 1500 (it should run about 800), and of course changed the category. --j⚛e deckertalk 22:45, 30 April 2013 (UTC)[reply]
- Trial complete. Cycle complete. There are a couple things it doesn't need to hit (3-4 subcats don't need poking themselves) but they're doing no harm, the basic functionality worked. I did "look over its shoulder" as it was working, however, and noticed one flaw. The individual by age categories, e.g., Category:AfC pending submissions by age/12 days ago do update as expected here. The parent category, however, does not display updated totals for the by age cats, it appears to want it's own its own purge--a plain old purge works fine. Tacking a single purge to the very end of the run should do the trick. 819 purges, 5650 seconds run-time. Before and after snapshots of the category counts and the list of titles purged at [2] If you'd like, I'd be happy to do another run tomorrow with that one extra purge in place. --j⚛e deckertalk 00:48, 1 May 2013 (UTC)[reply]
- Approved. The extra purge is fine. MBisanz talk 02:21, 3 May 2013 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.