Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

DannyS712 bot 23

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 19:13, Wednesday, March 20, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Extend task 22 to other templates from the same TfD

Links to relevant discussions (where appropriate): WP:Bots/Requests for approval/DannyS712 bot 22 and the discussions linked there

Edit period(s): One time run

Estimated number of pages affected: Up to ~7000 (some may overlap)

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details:

Removesubstitute {{PBB_Summary}}. Only removed its html comment (and its inline html comment) - 3543 transclusions

Substitute {{PBB Further reading}} - 3539 transclusions (not deletion since its just a wrapper for citations that should be kept

Discussion

  • Zackmann08 is doing this using AWB already. {{3x|p}}ery (talk) 19:14, 20 March 2019 (UTC)
    @Pppery: I can help, given the number - see User talk:DannyS712#Your BRFA (22) for the discussion causing me to file this --DannyS712 (talk) 19:18, 20 March 2019 (UTC)
    Makes sense. I didn't notice task 22 et al until after posting that comment. {{3x|p}}ery (talk) 19:19, 20 March 2019 (UTC)
  • The content of {{PBB Summary}} is text that needs to be kept. Galobtter (pingó mió) 19:18, 20 March 2019 (UTC)
    @Galobtter: in that case, I can substitute both of the templates --DannyS712 (talk) 19:19, 20 March 2019 (UTC)
    DannyS712 & Pppery, No objection to a bot taking this over! I'll gladly duck out once this is approved! Zackmann (Talk to me/What I been doing) 20:04, 20 March 2019 (UTC)
  • Pinging @MarnetteD: to this as he was manually doing one of those templates, but was ok with a bot to take over. --Gonnym (talk) 20:14, 20 March 2019 (UTC)
    Thanks for the ping. Actually I was doing "summary" and "further reading" at the same time on most of my edits. Remember that all of the items in further reading need to stay just like the summary does. I'm guessing you already know this but I mention it just in case. MarnetteD|Talk 20:19, 20 March 2019 (UTC)
    @MarnetteD: Yes. Except, I just realized, while substituting further reading works great (Special:Diff/888699322), substituting the summary doesn't really (Special:Diff/888699171), so when I saw "substitute" the summary template, I mean replace it using a regex to achieve the intended display, not literally "substitute" the template, since that keeps the parser functions and everything. --DannyS712 (talk) 20:25, 20 March 2019 (UTC)
    Thanks D! I'm not too familiar with all the ins and outs of bots so I appreciate your taking the time to explain things. MarnetteD|Talk 20:28, 20 March 2019 (UTC)
    DannyS712, I've made the summary template cleanly subst. Galobtter (pingó mió) 20:34, 20 March 2019 (UTC)

───────────────────────── @Galobtter: Wow, I just developed a regex to do it without substituting, but I guess its all useless now. Anyway, just subst the templates and remove the inline comments. If anyone cares, the regex was:

Extended content
{{PBB[ _]Summary\n\|\s+section_title\s+=\s+\|\s+summary_text\s+=\s+(.*)\n}}

becomes

$1

and

{{PBB[ _]Summary\n\|\s+section_title\s+=\s+(\S+)\n\|\s+summary_text\s+=\s+(.*)\n}}

becomes

==$1==\n$2

(first regex for not having a section title, second for having one).

Anyway, I think this is good to go: substitute both templates, and remove the html comment. Thanks for the help, --DannyS712 (talk) 20:45, 20 March 2019 (UTC)

Galobot 3

Operator: Galobtter (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:54, Wednesday, March 20, 2019 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (Pywikibot)

Source code available: here

Function overview: Replace uses of "letterhead" class with template

Links to relevant discussions (where appropriate):

Edit period(s): One time run

Estimated number of pages affected: ~400 (almost all results from search of "insource:"letterhead" insource:/class *= *["'][a-zA-Z0-9 ]*letterhead/")

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Replaces <div class = "letterhead"> with {{letterhead start}} and the closing tag with {{letterhead end}}, so that the class "letterhead" can be removed from MediaWiki:Common.css. Uses mwparserfromhell so it should be good with edge cases.

Discussion

KadaneBot 3

Operator: Kadane (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:10, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Not published yet

Function overview: Tags redirects with {{R to disambiguation page}}, {{R from unnecessary disambiguation}}, and {{R from incomplete disambiguation}} if it meets criteria described in function details.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Tag_with_Template:R_from_unnecessary_disambiguation

Edit period(s): Monthly

Estimated number of pages affected: ~56,417 first run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details:

Case 1: If a redirect exists Foo (bar) -> Foo where bar does not equal disambiguation AND Foo is NOT a disambiguation page, then tag Foo (bar) with {{R from unnecessary disambiguation}}
Currently 39,963 articles fit this case


Case 2: If a redirect exists Foo (bar) -> Foo where bar does not equal disambiguation AND Foo is IS a disambiguation page with {{R from incomplete disambiguation}}.
Currently 16,427 articles fit this case


Case 3: If a redirect exists Foo (disambiguation) -> Foo AND Foo is a disambiguation page AND Foo (disambiguation) is NOT malformed, then tag Foo (bar) with {{R to disambiguation page}}
Currently 27 articles fit this case


The following functionality/logic exists for all 3 cases:

Discussion

  • A sample of 1000 edits the bot would make (under current functional details) along with the template it would add to the page is listed at User:KadaneBot/Sandbox Kadane (talk) 16:11, 19 March 2019 (UTC)

Comment @Kadane: The following should be tagged as {{R from incomplete disambiguation}} instead of {{R from unnecessary disambiguation}}

Extended content

Those can be identified by the landing page being a disambiguation page.

This one should be skipped, or tagged with something else (investigating)

These ones should be skipped as malformed DAB pages (missing space, capital D), but collecting them so they can be RFD's would be good.

Headbomb {t · c · p · b} 17:11, 19 March 2019 (UTC)

Okay I have updated the functional details of the bot to fix the cases you brought up. I will update the table of edits when I make it home. Kadane (talk) 19:23, 19 March 2019 (UTC)
@Headbomb: I have uploaded new edits to User:KadaneBot/Sandbox. It contains 100 edits of each of the cases, with the exception of {{R to disambiguation page}} which only has 22 edits total. I have also included all of the malformed disambiguation pages (these will not be modified by the bot, just included in the log). Kadane (talk) 05:48, 20 March 2019 (UTC)

Better, although

Should be tagged with {{R from incomplete disambiguation}} instead of {{R from unnecessary disambiguation}}. Headbomb {t · c · p · b} 09:31, 20 March 2019 (UTC)

@Headbomb: - There was an error in my CSV parsing from the database dump. I forgot to set the parameter quoting=csv.QUOTE_NONE, which resulted in some lines being skipped when the database query was being scanned. Because of this some articles and disambiguation pages were being ignored. This is fixed now. I clicked through most of the cases and I can't find any errors. User:KadaneBot/Sandbox is updated. Kadane (talk) 15:17, 20 March 2019 (UTC)
Of all cases, the following aren't really disambiguation pages.

Maybe a full list should be created so we can purge all cases that shouldn't be tagged. Everything else look fine though. Headbomb {t · c · p · b} 18:03, 20 March 2019 (UTC)

To save time, that full list to review could exclude things that end in \s\(.* (album|song|single|EP|soundtrack|network|channel|episode|series|film|journal|magazine|website|company|publisher|newspaper|company|station|decade|numeral|number|game|novel|book|gene)\) since those are safe. Headbomb {t · c · p · b} 21:02, 20 March 2019 (UTC)

───────────────────────── Alright all edits have been saved with the of the articles that end in what you listed above removed.

Kadane (talk) 21:52, 20 March 2019 (UTC)

Case 3 are all fine, I'll review Case 1 and 2. Headbomb {t · c · p · b} 22:09, 20 March 2019 (UTC)
Actually Always(song)) and a few others with )) are malformed. Headbomb {t · c · p · b} 22:12, 20 March 2019 (UTC)

So are

Extended content

Headbomb {t · c · p · b} 22:19, 20 March 2019 (UTC)

Ah I was under the impression that we only checked malformed disambig on case 3 (when name ends with (disambiguation)). Updated the logic to check for malformed disambigs for all cases. Kadane (talk) 22:37, 20 March 2019 (UTC)

There are actually a few more, which I've sent to RFD.

Extended content

Headbomb {t · c · p · b} 22:49, 20 March 2019 (UTC)

@Kadane:, actually could you break User:KadaneBot/Task3/Case 1 in sections of 100 KB tops? Those pages are pretty slow to load/edit (I have scripts that classify type of links, which slow down these pages considerably). Headbomb {t · c · p · b} 23:06, 20 March 2019 (UTC)

 Done @Headbomb: Also I am catching disambiguation misspellings as well as other words appearing next to disambiguation between parenthesis. If there are any other misspellings they should probably be excluded manually unless there is a pattern. Kadane (talk) 23:15, 20 March 2019 (UTC)

Could you also break down redirects into 'species', e.g. all those ending with \s\(*album\) into a subpage (or section), all those ending with \s(*song\) into another, and so on (and everything else considered "Other")? At least for endings in

  • \d (i.e. ends with digits, like Typhoon Haikui (2012)); album; AM; band; book; channel; comics; company; company; cricketer; decade; district; EP; episode; film; FM; footballer; game; gene; Germany; German Empire; journal; magazine; name; network; newspaper; novel; number; numeral; politician; publisher; series; show; single; song; soundtrack; station; United States; video; website

All case insensitive. Headbomb {t · c · p · b} 23:18, 20 March 2019 (UTC)

@Kadane: and could you also put the target page in those lists? Headbomb {t · c · p · b} 23:21, 20 March 2019 (UTC)
I am on my way to class but I can do that in a couple hours. Kadane (talk) 23:23, 20 March 2019 (UTC)
No rush. Enjoy class. Headbomb {t · c · p · b} 23:24, 20 March 2019 (UTC)

DannyS712 bot 21

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 08:38, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Auto-classify stubs as stub-class

Links to relevant discussions (where appropriate): Wikipedia:Village pump (idea lab)#Automatically mark all stubs as stub-class

Edit period(s): As needed, large run at first

Estimated number of pages affected: Lots (631744 total unassessed pages, so I'd guess a ballpark around ~100,000?)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Automatically assess articles as stub-class if they are tagged as stubs.

Discussion

  • @DannyS712: I'm confused about the task. Does the bot task attempt to determine which articles are stubs itself, or does it rely on a stub template being placed in the article. {{3x|p}}ery (talk) 00:35, 20 March 2019 (UTC)
    @Pppery: it only clasifies pages that are already tagged as stubs with stub templates in the article itself. The bot would make to determinations itself --DannyS712 (talk) 01:16, 20 March 2019 (UTC)
    In that case, (1) AnomieBOT is already approved for a superset of this task (ping: Anomie), and (2) I agree with Czar in the linked discussion that this is better suited for coding added to the individual assessment templates (or Template:WPBannerMeta) than a bot task. {{3x|p}}ery (talk) 01:23, 20 March 2019 (UTC)
    @Pppery: the superset AnomieBOT is approved for is for tagging at the request of a wiki project, and only edits that wikiproject’s template (Anomie please correct me if I’m wrong) - this would tag ‘’all’’ unaddressed stubs as stub-class. --DannyS712 (talk) 01:41, 20 March 2019 (UTC)
    You're correct as to what AnomieBOT is approved for there. Also I haven't had much time/motivation to run that task in a long time now. Anomie 12:10, 20 March 2019 (UTC)
    @DannyS712 and Czar: I've coded a (somewhat crude) template that detects whether a page is a stub at User:Pppery/is page a stub. {{3x|p}}ery (talk) 01:49, 20 March 2019 (UTC)
    @Pppery: such an approach would mean that wikiprojects can't opt out of the auto-categorization, and editors can't see that there is an assessment when looking in edit mode. --DannyS712 (talk) 02:52, 21 March 2019 (UTC)
    Proper coding of the template would remedy the first stated downside (it would be fairly easy to add a |AUTODETECT_STUB=, for instance). To me, the ability to see a stub assessment in edit mode is not a virtue worth expending tens of thousands of bot edits to produce. {{3x|p}}ery (talk) 03:28, 21 March 2019 (UTC)

DannyS712 bot 17

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:13, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Replace {{Expand language}} with the more appropriate language-specific template

Links to relevant discussions (where appropriate): Template talk:Expand language#Bot run,

Edit period(s): One time run, then as needed

Estimated number of pages affected: 54494 ~3000 at first

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Based on the |langcode= parameter of the template, replace the template with that language's specific template, if it exists.

Discussion

Actually only three thousand pages ( Special:Search/insource:"Expand language" ) - each language specific template calls {{Expand language}} so it gets counted as a transclusion there too. Galobtter (pingó mió) 16:47, 19 March 2019 (UTC)

@Galobtter: oh, that makes a lot of sense. I have changed the estimated number of pages accordingly. Thanks, --DannyS712 (talk) 16:52, 19 March 2019 (UTC)

JATMBot

Operator: Tymon.r (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:16, Friday, March 15, 2019 (UTC)

Function overview: Maintenance – automatic (procedural) closure of WP:AfD discussions when nominated pages do not exist

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: standard pywikipedia

Links to relevant discussions (where appropriate): n/a

Edit period(s): continuous, being run every 3 minutes

Estimated number of pages affected: up to a few a day

Namespace(s): Wikipedia

Exclusion compliant (Yes/No): No

Function details: Closing per WP:PCLOSE AfD discussions when nominated pages do not exist, e.g. when they've been already speedy deleted or their title is mistyped. Informing a nominator about a closure on his user's talk page. In every run bot will go through AfD log pages for last 7 days and check for existence of nominated pages. If a nominated page doesn't exist, it will close (edit) page's AfD discussion in accordance with WP:AFD/AI and then inform a nominator about closure performed, stating possible reasons of the closure (title mistyped, article speedy deleted, etc.). The bot shall not perform any actions/closures when a decision oughts to be made. In future bot's functionality could be extended to other XfDs, but if so, it'd requested in separate BRFA. Best, Tymon.r Do you have any questions? 00:16, 15 March 2019 (UTC)

Discussion

I note User:AnomieBOT does something similar for WP:TFD and WP:FFD (and some related tasks at WP:CFD, but currently detecting the nominated categories there seems too prone to errors), although that doesn't preclude your bot doing this task for WP:AFD.

I see your BRFA says the source code is "standard pywikipedia", but I don't see any script included with Pywikibot for doing this task. Useful additional features compared to your manual diff include relaying the deleting admin and deletion reason from the log (after verifying it's not a deletion log entry previous to the AFD itself), detecting "moved without redirect" as being distinct from "nominated title does not exist", and allowing the deleting admin a chance to manually close before the bot does it for them. Anomie 13:21, 15 March 2019 (UTC)

@Anomie, thanks for your input and your work with User:AnomieBOT! Agreed – my description of the programming technique used to create a bot is imprecise. The bot'd based on pywikipedia, but, as you mentioned, it'd need to use some additional self-written scripts to handle non-standard operations, e.g. checking a log of a deleted page. For the time period in which an admin could close AfD himself – I'd propose 10 minutes of delay before performing an automated closure. Best, Tymon.r Do you have any questions? 13:24, 16 March 2019 (UTC)

GreenC bot 12

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 14:34, Tuesday, March 12, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): GNU Awk and Botwikiawk framework

Source code available: TBU

Function overview: Add Template:Austria population Wikidata to infoboxes

Links to relevant discussions (where appropriate):

Edit period(s): One time

Estimated number of pages affected: 2100

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Example edit - it replaces existing hard-coded data with the template that displays it dynamically pulled in from WikiData.

Discussion

@GreenC: my reading of the TFD does not match the request's interpretation. From the TfD you linked: The population should either be directly placed on the page or stored in WikiData., from the request you linked: deleted with a consensus that they (the old templates) should be replaced by WikiData figures. A far as the example edit goes, you are replacing "directly placed" and cited values with a template. Are you validating that the template's values are more recent and even that they actually contain data? — xaosflux Talk 14:59, 12 March 2019 (UTC)

I'm going by what eh bien mon prince requested be done and what looks like the obvious best solution even though there is not a huge amount of discussion it didn't seem like something that would be controversial. I assume the data exists in WikiData (I asked eh bien mon prince in the Bot Request page for "a WikiData query listing names of articles on enwiki that are ready to use the template" and that query works but it is only verifying one of the fields not all five). The bot is currently not polling WikiData to verify data existence. eh bien mon prince do you think that is necessary, or did you load the data into WikiData yourself and confident it is there? Or can we make a more comprehensive query to verify all five exist for each? -- GreenC 16:37, 12 March 2019 (UTC)
My primary concerns are that this could result in less total, cited, or accurate data being presented in our articles - and how we are going to ensure that is avoided. — xaosflux Talk 18:04, 12 March 2019 (UTC)
Sure, that is a valid concern. Will wait to see what eh bien mon prince thinks in terms of where the WikiData data came from and if it is up to date and accurate compared to on-wiki. -- GreenC 18:09, 12 March 2019 (UTC)
Here is a stricter query that checks for area, population and verifies that the date matches the one from the source. I uploaded all the figures myself, from the website of the Austrian statistical office, everything is sourced directly to them. Note that the 'directly placed' stats are 2 years out of date, because the metadata templates were deleted before the move to Wikidata was carried out.--eh bien mon prince (talk) 19:57, 12 March 2019 (UTC)
The data looks complete (sorting on the columns show they all contain something) so it would be redundant to code a Wikidata check in the bot. It sounds like the Wikidata info is at least or more up to date than the static in-wiki. BTW eh bien mon prince, I noticed some entries in the 'municipality_of_AustriaLabel' column do not match up with the Enwiki article name (like some on enwiki have a "(Austria)" trailing disambig for example Raiding (Austria)). Is there a way to show the enwiki article name? If not, it will log whatever it misses. -- GreenC 20:22, 12 March 2019 (UTC)
Try this: query.--eh bien mon prince (talk) 20:30, 12 March 2019 (UTC)
That will work, thanks! -- GreenC 20:35, 12 March 2019 (UTC)

The bot ran successfully offline, ready for live trials. -- GreenC 00:40, 13 March 2019 (UTC)

A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. -- GreenC 13:47, 20 March 2019 (UTC)

DannyS712 bot 13

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:01, Sunday, March 10, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Javascript

Source code available: User:DannyS712 test/GTK.js

Function overview: Easy tag all of a category's subcategories with notices that they are being discussed at CfD

Links to relevant discussions (where appropriate): Wikipedia talk:Categories for discussion/Archive 17#Tagging bot

Edit period(s): As needed

Estimated number of pages affected: Varies, I'd guess ~50 per run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Traverse through the subcategories within a category and add tags to them. Examples of when this would be really useful are listed in that discussion, but in general it would be for mass-cfds ("Rename XYZ and all of its subcategories", etc)

Discussion

How would you ever know it is time to run this task? — xaosflux Talk 03:20, 11 March 2019 (UTC)
@Xaosflux: It would run as-needed (I'd leave a note at CfD saying that if people wanted to nominate an entire category tree, or a list of categories that's really long, and don't want to tag them manually, I could do it for them). --DannyS712 (talk) 03:24, 11 March 2019 (UTC)
@DannyS712: this almost feels like it would be better as a user script (e.g. 'xfd-batch' for twinkle, with a 'recurse' option), any thoughts?
  • Do you plan on any sort of max-tag-per-request limits here? — xaosflux Talk 03:33, 11 March 2019 (UTC)
    @Xaosflux: this would be a user-script, and I would be willing to run it from my account rather than the bot. I don't know what xfd-batch is (admin only?) but since WP:BOTDICT defines automated editing as Refers to editing that is done automatically, without human review, i.e. editing done by bots., I thought I should err on the safe side and file a BRFA, because I do not intend to review each edit --DannyS712 (talk) 03:37, 11 March 2019 (UTC)
    As for max-tag-per-request, since it would be triggered manually I would decide for each request if it seems to broad --DannyS712 (talk) 03:38, 11 March 2019 (UTC)
  • 'xfd-batch' doesn't exist in twinkle, I was comparing it to some options like delete-batch, protect-batch. — xaosflux Talk 03:46, 11 March 2019 (UTC)
    @Xaosflux: Then yes, that is very similar to what I would be doing, but I filed this BRFA to be on the safe side (see explanation above) --DannyS712 (talk) 10:58, 11 March 2019 (UTC)
  • As far as I can see, this is not a bot request. It's a user script, which may or may not be shared to users beyond its creator.
From what I can see, the script above is way too simplistic, and is suitable on for some very simple cases at CFDS. I a concerned that realsing it for wider use will lead to it being used in the many more complex cases where it will produce inaccurate output.
The task which it performs is one which I encounter several times a month, for full CFD discussions, CFDS nominations, and WP:RM nominations. I do it by using of a set of AWB custom modules which I hack on a per-case basis. My experience is that
  • In a bit less than half the cases, the tagging can be achieved by a plaintext replace function
  • In the rest, one or more regexes are needed
  • In all cases, some care is needed to ensure that all 3 tasks are performed accurately:
    • tag so the the tag includes the name of the target category. e.g.
      {{cfr}} → a tag saying "rename to some other title"
      .. but {{cfr|MyNewTitle}} → a tag saying "rename to Category:MyNewTitle"
    • tag so that the name of the discussion section is included, otherwise the links will point to the wrong place. e.g.
      Wrong: {{cfr|MyNewTitle}}
      Right: {{cfr|MyNewTitle|DiscussionSectionHeading}}
    • a meaningful edit summary. The edit summary should both describe the proposed action, and the location of the discussion
Some examples:
  1. CFDS plus subcats:
    • CFDS listing:[1]
    • tagging example [2]
    • code needed: plain text replace
  2. Full CFR discussion of by-year cats for British Empire / British Overseas territories
  3. Full CFR ~650 "Republic of Macedonia" categories:
    • CFD discussion: WP:Categories for discussion/Log/2019 February 16#North_Macedonia
    • tagging examples: [4], [5]
    • code needed: a single regex, to accommodate the fact that some old the old titles were of the form "Republic of Macedonia foo" and some of the form "Foo in the Republic of Macedonia". The word "the" needed to be removed if present, so the regex was s/(([tT]he +)?Republic +of +Macedonia/North Macedonia/
Code such as this can probably do a good job in some simple cases. Unfortunately, there are any other cases where it risks mistagging dozens of categories.
I also think that javascript is not a good tool for these uses, because it does not allow a test of the first edit before proceeding, manual intervention for edge cases, etc. When I use AWB, I do the first edit, then stop and check its effects: is that tag correct? Is it linking to the right discussion section on the right page? Is the edit summary accurate, and doe sit too link correctly?. I then check a few a more variants before whacking the save button repeatedly through the rest of the list
AFAICS, a javascript tool will just proceed through the list in one go, with no possibility of intervention nif there is an unforeseen error (which in my experience there often is).
I have huge regard for Danny's skills and conscentiousness both as an editor and as a programmer (he really should be an admin), but in this case I think he is using the wrong tool, and has not taken enough account of the many variations which arise in this sort of group nomination. --BrownHairedGirl (talk) • (contribs) 09:02, 21 March 2019 (UTC)
@BrownHairedGirl: This task would extend to types 1 and 3 listed above, both of which are done using a single regex. --DannyS712 (talk) 18:53, 21 March 2019 (UTC)
@DannyS712, my concern is that any script will end up being used by other editors for tasks where it wouldn't do the job properly ... as indeed with your own proposal of it at WP:Categories for discussion/Log/2019 March 13#Category:American_female_rappers. --BrownHairedGirl (talk) • (contribs) 20:21, 21 March 2019 (UTC)
@BrownHairedGirl: By if @Xaosflux or another BAG member wants to trial this task, I meant send it to trial for me to run. All of my tasks that are done in javascript are hosted on wiki, meaning that their source code is visible to users (I haven't figured out how to use toolforge yet) with the implicit understanding that using them without permission is just like using any other tool to bot-like edit without permission - against policy. I won't venture into BEANS territory, but any script I make will, as far as I am concerned, only be run by me - if another editor tries to use the script, they are responsible. As for doing the job properly, I would set the regex each time for each run, and would manually check (from the bot account) a few to see that it works before setting the bot loose on an entire category. The reason I didn't do it with AWB is because the regex relies on the name of the category itself, which as far as I am aware can't be accessed from the regex within AWB. I hope this explanation allays your concerns. Thanks, --DannyS712 (talk) 21:40, 21 March 2019 (UTC)
AWB can access the pagename through custom modules. I am using a simple one right now (with 2 alternative plaintext replaces) to tag >1000 categories for WP:Categories for discussion/Log/2019 March 21#Places_of_worship. --BrownHairedGirl (talk) • (contribs) 22:08, 21 March 2019 (UTC)
@BrownHairedGirl: in that case, would you mind sending me that custom module? I'll look into it, and maybe change this task to be AWB-based --DannyS712 (talk) 22:13, 21 March 2019 (UTC)
@DannyS712: Email sent. --BrownHairedGirl (talk) • (contribs) 22:46, 21 March 2019 (UTC)

WikiCleanerBot 2

Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:25, Monday, February 25, 2019 (UTC)

Function overview: To fix ISSN with an incorrect syntax. As described in ISSN#Code format, the correct syntax for an ISSN is "an eight digit code, divided by a hyphen into two four-digit numbers"

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (Wikipedia:WPCleaner)

Source code available: On Github

Links to relevant discussions (where appropriate): Maintenance task.

Edit period(s): At most, twice a month, following the dump analysis that I already perform, see Wikipedia:Bots/Requests for approval/WikiCleanerBot.

Estimated number of pages affected: Around a thousand At most a few hundred pages for the first complete run (pages with such problems are listed in Wikipedia:CHECKWIKI/WPC 106 dump, which currently contains a list of 1315 420 pages), and probably no more than a few dozen after that on each run given the evolution of the number of pages in the list.

Namespace(s): Main namespace

Exclusion compliant (Yes/No): No, because there's no reason to use an incorrect syntax for an ISSN instead of the correct one.

Function details: Based on the list generated on Wikipedia:CHECKWIKI/WPC 106 dump, the bot will only fix trivial problems (like a missing hyphen in the ISSN number, extra whitespace characters...) and will leave the more complex ones to be fixed by a human. It will reduced a lot the list, so human editors can fix the remaining problems.

For the bot flag, I currently don't have it, and I would like to keep it that way (or if need be, only added temporarily for the first run).

Discussion

If you will be operating from the dump, could you not do a dry run outputting to Wikipedia:CHECKWIKI/WPC 106 dump so its handling of the pathological cases there can be inspected? --Xover (talk) 17:48, 25 February 2019 (UTC)

Hi Xover. The dump analysis is performed independently and produces several analysis (Wikipedia:CHECKWIKI/WPC all), I would prefer to keep it separated from automatic fixing. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)
But if you want to know which pages won't be fixed by the bot, I can do a dry run on my computer and give the list of fixed pages. --NicoV (Talk on frwiki) 18:06, 25 February 2019 (UTC)
@NicoV: I was more interested in seeing the before→after list. Several of the instances listed in the WPC 106 dump looked like they would be hard to fix automatically, so if the output of a dry run could be inspected it might provide a priori confidence that the task won't mess anything up. A dry run might be more efficient / reduce the need for a trial period with live edits (but I speak only for myself: the BAG may see it differently). --Xover (talk) 18:24, 25 February 2019 (UTC)
@Xover: Ok, I understand. I will see if I can do something. The idea is to fix only trivial cases automatically, the hard ones will be left to human editors, and I will check what the results are before doing an actual run. --NicoV (Talk on frwiki) 09:28, 26 February 2019 (UTC)

Comment: The dump list appears to have some false positives on it. I picked one page at random, Pocket Dwellers, and there is an ISSN of 00062510 listed within a citation template. This ISSN is valid within a CS1 template; articles with invalid ISSNs are placed in Category:CS1 errors: ISSN. The template handles this unhyphenated ISSN format with no trouble, displaying properly with a hyphen. It should not be "corrected"; the bot would be making a cosmetic edit, leaving the rendered page unchanged. Perhaps the dump analysis should be corrected before this bot attempts to modify articles based on the list. – Jonesey95 (talk) 17:56, 25 February 2019 (UTC)

Hi Jonesey95. On other wikis like frwiki, the templates don't add the hyphen by themselves. If ISSN without the missing hyphen have to be considered correct on enwiki for some templates, then I will first need to add an option in WPCleaner for this (and then generate again the page Wikipedia:CHECKWIKI/WPC 106 dump to check that false positives are removed) before implementing the automatic replacement. I will post here when this part is done. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)
Thanks. It looks like {{ISSN}} does not add the hyphen, but the CS1 citation templates do so. Just to see if I had gotten unlucky, I picked four more articles at semi-random from the list, limiting my "random" choices to articles that were displaying eight digits as the erroneous string. All four articles: Acritogramma metaleuca, Capri (cigarette), David Mba, and Ensoniq VFX contain no ISSN errors. I believe that the dump analysis needs to be debugged before this task can be run. It is possibly telling that there are only 65 pages in the three ISSN error categories combined. – Jonesey95 (talk) 18:16, 25 February 2019 (UTC)
Jonesey95. I've modified my code to allow telling WPCleaner that some templates automatically add the hyphen if it's missing, so the articles you mentionned won't be reported anymore. I'm currently running an update of Wikipedia:CHECKWIKI/WPC 106 dump to see what will be left. --NicoV (Talk on frwiki) 09:24, 26 February 2019 (UTC)

Page Wikipedia:CHECKWIKI/WPC 106 dump has been updated to avoid reporting missing dash when the template automatically adds it to the displayed result, there are only 420 pages remaining compared to the 1315 initially. I could probably also remove reports for internal links to pages like ISSN 1175-5326 which exist, but even if they are reported, the bot won't fix anything there. With the current algorithm, a dry run modifies 115 pages on the 420.

Pages that would be currently modified by the bot

--NicoV (Talk on frwiki) 12:36, 26 February 2019 (UTC)

That list looks much more reasonable. There are still some weird ones in there, like You Are Happy, where |issn= was being used in a {{WorldCat}} template, which doesn't support that parameter. Also, it looks like dashes, as in Iran–Iraq War and The Mauritius Command and Resonant inductive coupling, are also silently converted to hyphens by CS1 templates, so those don't need to be fixed and should be removed from the WPCleaner report.
I can also add an option to ignore such cases where the dash is automatically replaced, like I did for the missing hyphen. But is it a good idea to keep incorrect syntax just because the template itself will fix it ?
For the non-existing parameter in a {{Worldcat}}, I think I will leave it like that and a hyphen will be added, there are only a few pages like that. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)
In a case like Tytthaspis sedecimpunctata, will the bot/script apply the ISSN template, making the ISSN actually useful, or will it just replace the dash with a hyphen? – Jonesey95 (talk) 13:23, 26 February 2019 (UTC)
Currently, it will simply replace the dash with a hyphen, but I can add a feature to use a template instead. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)
I think replacing a plain-text ISSN with a template is a good idea in nearly every case.
I don't want to rain on your parade, but at this point, it looks like a periodic supervised AWB run, combined with a bit more tweaking of the WPCleaner report, might be the best option. The risk of cosmetic edits by the bot (and AWB, unless it is watched carefully) is high. With considerably fewer than 100 pages fixable by the proposed bot, a script may be better. If you still want to get this task bot-flagged in order to avoid cluttering people's watchlists, of course, I would support that. – Jonesey95 (talk) 14:04, 26 February 2019 (UTC)
I will try several modifications to limit the number of false positives in the generated list (which is good in itself), and we'll see then what is the best course of action. --NicoV (Talk on frwiki) 16:38, 26 February 2019 (UTC)

JJMC89 bot III

Operator: JJMC89 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:23, Sunday, February 24, 2019 (UTC)

Function overview: Process WP:CFD/W and its subpages excluding WP:CFD/W/M

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: cfd.py cfdw.py

Links to relevant discussions (where appropriate): WP:BOTREQ#Categories for Discussion bot (permalink)

Edit period(s): Hourly

Estimated number of pages affected: Millions

Namespace(s): Many

Exclusion compliant: delete/move: no; edit: yes

Adminbot: Yes

Function details: Process WP:CFD/W and its subpages excluding WP:CFD/W/M, moving and deleting categories and re-categorizing pages as specified

Notes

Discussion

Since Cyde is inactive, I am requesting t:o take over the task so that bugs can be fixed and feature requests implemented. Additionally, Cydebot will stop functioning at the end of March due to the Toolforge Trusty deprecation unless it is migrated to Stretch. The code is the based on the code that Cydebot is running. — JJMC89(T·C) 07:23, 24 February 2019 (UTC)

You say it's "based on" Cydebot's code, presumably meaning that you made changes. Can you please summarize the effects of these changes? עוד מישהו Od Mishehu 08:45, 24 February 2019 (UTC)
Cyde's code is part of Pywikibot, but Cydebot is not using the current version. I haven't made changes to the scripts yet, but other maintainers have. There haven't been any changes that would change the functionality of this task. — JJMC89(T·C) 22:07, 24 February 2019 (UTC)
I've now rewritten the code that parses the working page. The code that moves/deletes the categories and re-categorizes the pages is still the same as Cydebot's. — JJMC89(T·C) 07:32, 15 March 2019 (UTC)

Fluxbot is already approved for this task. {{3x|p}}ery (talk) 20:00, 24 February 2019 (UTC)

@Pppery: As far as I can tell the last edit Fluxbot made under task1 (cfds) was in July 2017. Pinging @Xaosflux who may be able to fill us in --DannyS712 (talk) 20:06, 24 February 2019 (UTC)
As an AWB bot, Fluxbot doesn't operate without operator intervention. It also isn't an adminbot. — JJMC89(T·C) 22:07, 24 February 2019 (UTC)
@JJMC89, Pppery, and DannyS712: yea, Fluxbot for CFDS is only on-demand, I used to process CFDS regularly, but more efficient bots came around. — xaosflux Talk 22:27, 24 February 2019 (UTC)
Thanks, @JJMC89.
In principle, I very much welcome idea of a new bot with extended functionality. (There are some major gaps in the current feature set).. However, I think that @Black Falcon was a bit hasty in posting the request at WP:BOTREQ when the discussion at WT:Categories_for_discussion/Working#Cydebot_replacement had few participants (I think only 4) and had not been widely notified; the suggested extra functionality needs more discussion.
However, it now turns out that a replacement is needed soon, so thanks to JJMC89 for stepping up in the nick of time.
So I hope that any new bot will run initially with the same functionality as CydeBot. Any enhancements need a clear consensus, which we don't yet have. --BrownHairedGirl (talk) • (contribs) 05:05, 10 March 2019 (UTC)
At a minimum, the new bot should process the main /Working page:
  • Deleting, merging, and renaming (i.e. moving) categories, as specified, with appropriate edit summaries.
  • Deleting the old category with an appropriate deletion summary.
  • In the case of renaming, removing the CfD notice from the renamed category.

Ideally, it would also do some or all of the following:

  • Process the /Large and /Retain subpages.
  • Accept manual input when a category redirect should be created—for example, by recognizing leading text when a redirect is wanted, such as * REDIRECT Category:Foo to Category:Bar.
  • Recognize and update category code in transcluded templates. This would need to be discussed/tested to minimize errors and false positives.
  • Recognize and update incoming links to the old category. This would need to be discussed/tested to minimize errors and false positives.

-- Black Falcon (talk) 20:48, 18 February 2019 (UTC), Wikipedia:Bot requests#Categories for Discussion bot

@BrownHairedGirl: Thanks for pinging me, and I don't disagree with you, fundamentally. More participation would undoubtedly have been better, and I probably should not have assumed many people watchlisted WP:CFD/W, but all the manual work required to close out Cydebot's processing was becoming quite tiresome after two months. In terms of the functionality I requested, only the last three items are enhancements compared to what Cydebot currently does or did when it was functioning properly, and I did note the last two needed more discussion/testing. As I mentioned at WP:BOTREQ, I would be happy if a bot did just the first three items properly. -- Black Falcon (talk) 06:06, 10 March 2019 (UTC)
Thanks, @Black Falcon.
I haven't so far seen anything on the list of suggested improvements, which I would dislike, and I see much I am v keen on.
In particular, the work required to close out CydeBot's efforts is v onerous, and I dearly wish that could be improved. I have been cleaning up after WP:CFD 2019 February 16#North_Macedonia, which renamed ~650 categories, and so far it has taken me about 8 hours to process about 400 of those. There has to be a better way somehow.
I agree that the redirects thing should be simple. However, the precise details of what a bot would do in next two cases needs a lot of scrutiny. It would be easy for an ill-specified bot to wreak much havoc in templates. --BrownHairedGirl (talk) • (contribs) 06:24, 10 March 2019 (UTC)
This BRFA is to take over Cydebot's current functionality. The enhancements, particularly the last two points, are out of scope. After this is up and running (with any bugs are worked out), I'll be happy to look at making enhancements that have consensus. — JJMC89(T·C) 18:36, 10 March 2019 (UTC)
Thanks, @JJMC89. Sounds great. --BrownHairedGirl (talk) • (contribs) 18:38, 10 March 2019 (UTC)
Understood, and thank you. -- Black Falcon (talk) 20:19, 10 March 2019 (UTC)

{{BAG assistance needed}} I would like to get this up and running before Cydebot stops functioning. — JJMC89(T·C) 07:32, 15 March 2019 (UTC)

 Note: Cyde has migrated Cydebot to Stretch, so it is no longer in danger of dying at the end of the month. — JJMC89(T·C) 00:54, 21 March 2019 (UTC)
@JJMC89: I suspect there could be issues with both of these bots trying to run the same page on top of each other - do you want to proceed as a backup plan, or do you want to try to work in tandem going forward? — xaosflux Talk 19:40, 21 March 2019 (UTC)

DannyS712 bot 6

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:09, Sunday, February 24, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Javascript

Source code available: User:DannyS712 test/dead.js, User:DannyS712 test/isdead.js

Function overview: Replace |living=yes and variations thereof with |living=no on talk pages of pages linked from from Wikipedia:Database reports/Potential biographies of dead people (3).

Links to relevant discussions (where appropriate): WP:BOTR#Remove living-yes, etc from talkpage of articles listed at Wikipedia:Database reports/Potential biographies of dead people (3)

Edit period(s): Weekly (the pages is updated weekly)

Estimated number of pages affected: Currently the page has 1227 entries, including those that will be skipped; this will likely decrease in future weeks once the bot is running, so maybe ~1000 per week

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The script will skip pages that are either a: on the list of pages not to edit or b: contain and/& in the title, which suggests that there is more than one person.

Discussion

  • @DH85868993: submitted! --DannyS712 (talk) 02:09, 24 February 2019 (UTC)
  • I do not think this is a good task for a bot, because there are too many possibilities for false positives, either because of the above-mentioned multiple-person issues and/or vandalism or good-faith BLP violations. For example, Myra Landau was modified here saying she died, but with no references to verify. It has since been added to the table listed above. Is she really dead? Was that drive-by vandalism (which I've seen a lot of on people more famous than her) or mistaken identity? Should she be marked as dead even though there are no references to verify she's dead? Basically, I went down seven pages of the dbase report and half of them are problematic, hence my concern. Primefac (talk) 20:13, 24 February 2019 (UTC)
    @Primefac: I completely understand your concern, but if people are added to categories XXXX deaths and aren't really dead, then that is a separate issue. I should be able to come up with a way to set them back to alive based on another report (eg if the category is removed) but this task only seeks to eliminate the discrepancy between the article page and the talk page; the article is usually more up-to-date than the talk page. --DannyS712 (talk) 20:20, 24 February 2019 (UTC)
    It's not really a separate issue in my mind, since they're two sides of the same coin. This is where more discussion might need to come into play; personally I wouldn't want to be automatically marking potentially living people as dead any more than I'd want to be automatically marking potentially dead people as alive. Primefac (talk) 16:18, 25 February 2019 (UTC)
    @Primefac: What would be the appropriate venue to have a bigger discussion about doing this? --DannyS712 (talk) 10:59, 11 March 2019 (UTC)
    I'd agree, and go further to that "marking potentially living people as dead" is riskier, as while our policy on BLPs extends to the recently deceased it's not a guarantee that they are marked as recently deceased. ~ Amory (utc) 11:11, 11 March 2019 (UTC)

Xinbenlv bot

Operator: Xinbenlv (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 06:29, Wednesday, February 20, 2019 (UTC)

Function overview: User:Xinbenlv_bot#Task 1: Notify (on Talk page) cross language inconsistency for birthdays.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Javascript

Source code available: [6]

Links to relevant discussions (where appropriate): Wikipedia:Village_pump_(technical)/Archive_166#Cross_Lang_Conflicts

Edit period(s): daily or twice a week

Estimated number of pages affected: 30 per day to begin with, can increase to 100 per day if community sees it helpful. Speed is completely controllable. Overall, there are a few thousands between major wikis like EN - JA(~3000), EN - DE(~5000).

Namespace(s): Talk

Exclusion compliant (Yes/No): Yes

Adminbot (Yes/No): No

Function details:

The bot will notify editors by writing a new section on Talk page of a subject, if that subject has inconsistent birthdays in this and another wikipedia languages.

The data of inconsistency comes from a public available dataset Github, called Project WikiLoop. An example edit looks like this

- Notifying French Editors fr:Utilisateur:Xinbenlv/sandbox/Project_Wikiloop/unique_value/Discussion:Samuel_Gathimba
- Notifying English Editors en:User:Xinbenlv/sandbox/Project_Wikiloop/unique_value/Talk:Samuel_Gathimba

Discussion

  • {{TakeNote}} This request specifies the bot account as the operator. A bot may not operate itself; please update the "Operator" field to indicate the account of the human running this bot. AnomieBOT 06:49, 20 February 2019 (UTC)
Fixed, changed to User:Xinbenlv. Xinbenlv (talk) 06:54, 20 February 2019 (UTC)
  • {{TakeNote}} This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 06:49, 20 February 2019 (UTC)
@Anomie, @AnomieBOT, Sorry, I mistakenly used my bot account to create its BRFA, it was me manually. The only bot auto edits are those in its User page. Xinbenlv (talk) 06:52, 20 February 2019 (UTC)
Don't worry about it Xinbenlv. I've struck it now as the notice isn't relevant. --TheSandDoctor Talk 04:17, 21 February 2019 (UTC)
Thank you, that makes sense. I also updated the Not for operator. Let me know if I've not done it right. @TheSandDoctor. Xinbenlv (talk) 07:18, 21 February 2019 (UTC)
This bot is helping on cross-language inconsistency therefore it shall be editing other languages, how should I apply for global bot permission? Xinbenlv (talk) 06:53, 20 February 2019 (UTC)
@Xinbenlv:, m::BP should be what you're looking for. RhinosF1(chat)(status)(contribs) 16:16, 20 February 2019 (UTC)
Thank you RhinosF1 thank you!. it seems the m::BP requires the bot to obtain local community permission and keep it running locally for a while. Therefore, I think I shall apply for approvals from multiple local communities each individually for now. Do I understand it correctly? Xinbenlv (talk) 18:48, 20 February 2019 (UTC)
Xinbenlv, That's how it read to me aswell. It's probably best to make them aware anyway before launching anything that will affect them in a big way (e.g. mass notifications being issued). You don't want to cause confusion. RhinosF1(chat)(status)(contribs) 19:01, 20 February 2019 (UTC)
RhinosF1 Thanks, agreed! That's why I am asking advice and approval in English Wikipedia so this most active community can help take a look of my (wild?) idea. Xinbenlv (talk) 19:07, 20 February 2019 (UTC)
Xinbenlv, I think it's a great idea. RhinosF1(chat)(status)(contribs) 19:25, 20 February 2019 (UTC)
Thanks everyone who are interested. Just so that you know, the bot has two trial edits on German wiki, as encouraged by the BRFA discussion. Feel free to take a look and advice is welcomed! Xinbenlv (talk) 21:59, 21 February 2019 (UTC)
Added Xinbenlv (talk) 17:15, 25 February 2019 (UTC)
    1. How often is the dbase updated? Could this potentially result in one page receiving multiple notices simply because no one has either seen or cared enough to fix the missing information?
Datebase will be updated on a daily / weekly basis, currently still in development. I plan to also rely on "Xinbenlv_bot" to surppress articles that already been touched by the same bot. Xinbenlv (talk) 17:15, 25 February 2019 (UTC)
This seems like a reasonable task to deal with cross-wiki data problems, just want to get a better feel for the size and scope of the task. Primefac (talk) 20:26, 24 February 2019 (UTC)
Thanks @Primefac: If I apply to change the bot scope to be "=<200 edits in total" for first phase, what do you think? Xinbenlv (talk) 21:37, 24 February 2019 (UTC)
The number of edits per day/week/month can be discussed, I'm just looking for more information at the moment. Primefac (talk) 21:47, 24 February 2019 (UTC)
What can I do to provide the information you need? Xinbenlv (talk) 02:04, 25 February 2019 (UTC)
Just looking for some numbers. I assume you know where to find them better than I would. Primefac (talk) 02:07, 25 February 2019 (UTC)
@Primefac: The EN-JA file contains around ~3000 inconsistencies of birthdays, the EN-DE contains around ~5000 inconsistencies. To begin with, I think we can limit to 100 - 200 edits on English Wikipedia. Xinbenlv (talk) 16:47, 25 February 2019 (UTC)

@Xover's suggestion regarding using maintenance template

Would adding a maintenance template (that adds a tracking category) be a viable alternative to talk page notices? It might be more effort due to the inherently cross-project nature of the task, but talk page notices are rarely acted on, is extra noise on busy talk pages, and may cause serious annoyance since the enwp date may be correct (it's, for example, the dewp article that's incorrect) and the local editors have no reasonable way to fix it. A tracking category can be attacked like any gnome task, and the use of a maint template provides the option of, for example, flagging a particular language wikipedia as having a verified date or specifying that the inconsistency comes from Wikidata. In any case, cross-project inconsistencies are an increasingly visible problem due to Wikidara, so kudos for taking on this issue! --Xover (talk) 18:41, 25 February 2019 (UTC)

@Xover: thank you. So far, I am applying to 5 different wikis for botflag in the same time. I received 3 suggestions:
1. use template and transclusion
2. add category
3. put it as a over article "cleanup" message box or Talk page message.
For the #1 and #2, there is consensus amongst all responding communities (EN, DE, ZH, FR). So now the trial edits on these communities are using template and category, see ZH examples:
* https://zh.wikipedia.org/wiki/Category:WikiProject_WikiLoop/Inconsistent_Birthday
For #3, put it as an over article "cleanup" message box, the DE community some editors prefer a Talk page message, while some prefer over-article message box. My personal opinion is that we can start slow, do some Talk page message (like 200) for trial edits, and then when they looks good, we can start to approve for allowing the bot to write over article messages? The reason being, I hope it demonstrate more stability before writing on (article) namespace. Especially for such high impact wikis of English wikipedia.
By the way, the format I prepare for English wikipedia is actually a maintenance template at User:Xinbenlv_bot/msg/inconsistent_birthday, could you take a look, @Xover:?
Xinbenlv bot (talk) 22:09, 25 February 2019 (UTC)
Well, assuming the technical operation of the bot is good (no bugs) maint. templates in article space are generally less "noisy" than talk page messages (well, except the big noisy banners that you say dewp want, but that's up to them). I suspect the enwp community will prefer the less noisy way, but I of course speak only for myself. In any case, I did a small bit of copyediting on the talk page message template. It changed the tone slightly, so you may not like it, and in any case you should feel free to revert it for whatever reason. Finally, you should probably use {{BAGAssistanceNeeded}} in the "Trial edits" section below. --Xover (talk) 05:22, 26 February 2019 (UTC)

Trial Edits now available (in sandbox)

Dear all admins and editors,

I have generated 30 trial edits in sandbox, you can find them in en:Category:Wikipedia:WikiProject_WikiLoop/Inconsistent_Birthday. I also generated 3 trial edits in real Talk page namespace


Please take a look. Thank you!

Xinbenlv (talk) 00:13, 26 February 2019 (UTC)

Update: [7] shows editor @LouisAlain: who happens to be the creator of en:Gaston_Blanquart, which is one of our 3 trial edits, update the birthday and death date on English Wikipeda. Xinbenlv (talk) 08:22, 1 March 2019 (UTC)
Update : generated 10 more trial edits in Talk namespace, I will actively monitor them. Xinbenlv (talk) 08:33, 1 March 2019 (UTC)
Dear Admins and friends interested in this topic @RhinosF1:, @Primefac:, @Xover:, @TheSandDoctor:, how do I proceed to apply for the bot status? Xinbenlv (talk) 00:38, 7 March 2019 (UTC)


Confess - realized trial edits before trial approval

{{BAG assistance needed}}

Dear Admin, I just realize English Wikipedia requires trial edits approval before running trial edits, which I already did for 9 edits in (Article) namespace. Shall I revert the trial edits? I am sorry Xinbenlv (talk) 21:13, 15 March 2019 (UTC)
@Xinbenlv: don't revert if they were good edits. — xaosflux Talk 13:49, 20 March 2019 (UTC)
@Xaosflux:, OK, thank you! By the way, is there anything else I need to do other than just wait for people to comment? It seems the discussion has halted.
How should I get trial approval?
Xinbenlv (talk) 18:08, 20 March 2019 (UTC)

PkbwcgsBot 7

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:27, Saturday, December 15, 2018 (UTC)

Function overview: This is an extension to Wikipedia:Bots/Requests for approval/PkbwcgsBot 5 and I will clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links.

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate): This RfC

Edit period(s): ISBNs will be once a fortnight and PMIDs will be once a month.

Estimated number of pages affected: 300-500 pages per run (ISBNs) and 50-100 pages per run (PMIDs)

Namespace(s): Most namespaces (Mainspace, Article Talkspace, Filespace, Draftspace, Wikipedia namespace (most pages), Userspace and Portalspace)

Exclusion compliant (Yes/No): Yes

Function details: The bot will replace ISBN magic links with templates. For example, ISBN 978-94-6167-229-2 will be replaced with {{ISBN|978-94-6167-229-2}}. In task 5, it fixes incorrect ISBN syntax and replaces the magic link with the template after that. This task only replaces the ISBN magic link with the template using RegEx.

Discussion

Working in article space only? – Jonesey95 (talk) 23:48, 15 December 2018 (UTC)

@Jonesey95: The problem is in multiple namespaces, not just the article namespace. Pkbwcgs (talk) 09:39, 16 December 2018 (UTC)
Since Magic links bot is already handling article space, it looks like this bot's focus will be in other spaces. I think those spaces will require manual oversight in order to avoid turning deliberate magic links into templates. Happily, there are only 4,000 pages, down from 500,000+ before the first couple of bots did their work. – Jonesey95 (talk) 12:10, 16 December 2018 (UTC)
I can distinguish deliberate magic links and not touch them. There are very few deliberate ones; an example is at Wikipedia:ISBN which shouldn't be changed. Pkbwcgs (talk) 13:06, 16 December 2018 (UTC)
"I can distinguish" -- how can you do this automatically? This is WP:CONTEXTBOT. —  HELLKNOWZ   ▎TALK 21:45, 16 December 2018 (UTC)

We don't generally approve bots for non-mainspace unless there is a specific problem. Especially without a discussion or consensus. In short, the problem with non-mainspace namespaces is that there is no expectation that article policies or guidelines should apply or are even necessary. Userspace is definitely not a place for bots to run without opt-in. You also cannot automatically work on talk pages with a task like this -- users can easily be discussing syntax and no bot should be changing their comments. The discussion may very well be archived. Same goes with Wikipedia and there are many guideline and help and project pages where such a change may not be desired. Draft, File and Portal seem fine. To sum up, we either need community consensus for running tasks in other namespaces or bot operator assurance (proof) that there are minimal to none incorrect/undesirable edits. —  HELLKNOWZ   ▎TALK 21:45, 16 December 2018 (UTC)

@Hellknowz: I have struck the namespaces which I feel that will cause problems. I assure that there won't be no incorrect edits. Pkbwcgs (talk) 21:51, 16 December 2018 (UTC)
Looks good then. Will wait for resolution at Wikipedia:Bots/Requests for approval/PkbwcgsBot 5. —  HELLKNOWZ   ▎TALK 21:57, 16 December 2018 (UTC)
I think the revised list of spaces (at this writing: Main, Draft, Portal, File) makes sense. – Jonesey95 (talk) 01:46, 17 December 2018 (UTC)
@Hellknowz: Task 5 has went to trial and this has been wait for over one month. Pkbwcgs (talk) 22:08, 27 January 2019 (UTC)

Bots in a trial period

FastilyBot 14

Operator: Fastily (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:07, Wednesday, March 20, 2019 (UTC)

Function overview: Leave courtesy notifications for PROD'd files if the tagger has not done so.

Automatic, Supervised, or Manual: automatic

Programming language(s): Java

Source code available: after I write it

Links to relevant discussions (where appropriate):

Edit period(s): daily

Estimated number of pages affected: 0-10 daily

Namespace(s): User talk

Exclusion compliant (Yes/No): Yes

Function details: Leaves courtesy notifications for PROD'd files if the tagger has not done so. This task is an extension to Task 6 and Task 12. -FASTILY 23:07, 20 March 2019 (UTC)

Discussion

Approved for trial (50 edits or 21 days). go ahead and trial and let us know how it goes. — xaosflux Talk 23:32, 20 March 2019 (UTC)

DannyS712 bot 18

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:18, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): JavaScript

Source code available: Not written yet

Function overview: Clerking at Wikipedia:WikiProject Abandoned Drafts/Stale drafts

Links to relevant discussions (where appropriate): WP:BOTREQ#STALE Drafts

Edit period(s): As needed (probably ~weekly)

Estimated number of pages affected: ~40

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Remove entries on pages that link to either deleted pages, redirects, or blanked drafts.

Discussion

This is only going to edit the single page Wikipedia:WikiProject Abandoned Drafts/Stale drafts, corret? — xaosflux Talk 14:21, 20 March 2019 (UTC)
@Xaosflux: No, it would clerk the ~40 subpages of it to remove pages that are deleted, redirected, or blanked --DannyS712 (talk) 14:22, 20 March 2019 (UTC)
Approved for trial (50 edits or 30 days). OK to trial, please link edit summary to this BRFA and leave a note at Wikipedia talk:WikiProject Abandoned Drafts/Stale drafts that i4t has begun. — xaosflux Talk 14:29, 20 March 2019 (UTC)
@Xaosflux: Is it okay if I don't start this yet? I have a lot on my plate at the moment, both on- and off-wiki. Thanks, --DannyS712 (talk) 19:33, 20 March 2019 (UTC)
@DannyS712: OK, consider the 14 days from whenever you start....maybe stop opening BRFAs until you can catch up too...... — xaosflux Talk 19:36, 20 March 2019 (UTC)
@Xaosflux: Thanks --DannyS712 (talk) 19:40, 20 March 2019 (UTC)

DannyS712 bot 16

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 06:21, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: After a category is listified as the result of a CfD, add links to that list in the "see also" sections of the pages

Links to relevant discussions (where appropriate): Wikipedia talk:Categories for discussion/Archive 17#See also bot

Edit period(s): As needed

Estimated number of pages affected: Probably <50 per run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Add links to the see also section of pages. If approved for trial, I would like to trial this with Category:Battles won by Indigenous peoples of the Americas, which originally prompted the idea.

Discussion

Will you be creating see also sections just for this if they don't exist? — xaosflux Talk 13:47, 20 March 2019 (UTC)
@Xaosflux: No, at first this would only apply to pages that already have see also sections. --DannyS712 (talk) 13:57, 20 March 2019 (UTC)
@DannyS712: process wise, how would you know it was time to do this task and what the inputs should be? — xaosflux Talk 14:12, 20 March 2019 (UTC)
Would this only apply when the list is a newly created page? — xaosflux Talk 14:12, 20 March 2019 (UTC)
@Xaosflux: yes, I would do it only after a new list is created. Process: Create list (using User:DannyS712/Cat links), remove the category from the pages (using cat-a-lot), add the list to the see also section (bot run, manually activated) --DannyS712 (talk) 14:14, 20 March 2019 (UTC)
@DannyS712: How would you know that the trigger for this task (that A CfD was closed as listify) and the prerequisite (the new list was actually created) has occurred? — xaosflux Talk 14:18, 20 March 2019 (UTC)
@Xaosflux: CfDs closed as listified are listed at WP:CFD/W/M for the list to be created. I would check both the CfD and the list before running --DannyS712 (talk) 14:19, 20 March 2019 (UTC)
OK so each run will be manually started only after validation, seems OK. If for some reason the CfD closes with a similar result to listify in to an existing list (merge into an existing list) this could still be useful, but you would need to ensure that there is not an existing link on the page (anywhere) to the list before adding it. — xaosflux Talk 14:24, 20 March 2019 (UTC)
Approved for trial (100 edits). (Please trial with 2 or 3 lists, not to exceed 100 edits in total). — xaosflux Talk 14:25, 20 March 2019 (UTC)

DannyS712 bot 15

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 08:34, Saturday, March 16, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): JavaScript

Source code available:

Function overview: adding {{Germany district OSM map}} to the 'map' parameter of {{Infobox District DE}}, where QXXXX is the Wikidata ID of the German state the district belongs to

Links to relevant discussions (where appropriate): WP:BOTREQ#OSM location map for German districts

Edit period(s): One time run

Estimated number of pages affected: <400

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Using the API, retrieve the wikidata wikibase_item for a page, and then add the corresponding parameter to {{Infobox District DE}}.

Discussion

Approved for trial (50 edits). @DannyS712: I thought I'd done this last night, but apparently not. Anyways, approved for trial. As usual, please post the diffs here when done and take all the time you need. --TheSandDoctor Talk 18:03, 17 March 2019 (UTC)

DannyS712 bot 14

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:59, Tuesday, March 12, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): JavaScript

Source code available:

Function overview: Fix stub templates with impropere |name parameters in their calls to {{Asbox}}.

Links to relevant discussions (where appropriate):

Edit period(s): As needed

Estimated number of pages affected: ~30 per run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Process the pages in Category:Stub message templates needing attention - only those with erroneous name parameters would be changed, but they all would be "edited" be replacing the name parameter with {{subst:FULLPAGENAME}} to fix the template. I would manually trigger the task as needed, but it would automatically edit the templates. A current version of the code to edit the page the user is looking at is available at User:DannyS712 test/stubname.js, and I have used it repeatedly (see, eg, the occasion that prompted me to file this BRFA: the 51 edits listed here that contain "Stub name.js" in their summary).

Discussion

  • Approved for trial (50 edits). @DannyS712: I realize I have approved 2 of your tasks for trial at this time. There is no rush by any means, take all the time you need. Please perm link to the contribs area with these diffs when done. Questions? Ask. :) --TheSandDoctor Talk 20:21, 13 March 2019 (UTC)
    @TheSandDoctor: Thanks. This task will be run as-needed, so it may be a while. Also, since I was going to have it run automatically once triggered, when should the cutoff be for doing another run or not? (Eg, if I have 46 so far, should I do another run and potentially edit 30?) Thanks, --DannyS712 (talk) 21:43, 13 March 2019 (UTC)

PkbwcgsBot 23

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:18, Monday, January 28, 2019 (UTC)

Function overview: The bot will fix pages with Template:RomanianSoccer with deprecated parameters. The pages with deprecated parameters are located at Category:RomanianSoccer template with deprecated parameters.

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): One-time run

Estimated number of pages affected: 739

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot will fix deprecated parameters in Template:RomanianSoccer. An example edit is located here. The bot is going to change the old_id parameter to id if id is not defined in the template. For example, * {{RomanianSoccer|old_id=a/achim_sebastian}} is wrong because old_id has no accompanying id parameter. This was change in my later edit to * {{RomanianSoccer|id=a/achim_sebastian}} which is correct. It is quite clear at Template:RomanianSoccer's documentation "The "old_id" parameter may contain an ID such as a/augustin_ionel, which is the ID portion of http://www.romaniansoccer.ro/players/a/augustin_ionel.shtml or http://www.statisticsfootball.com/players/a/augustin_ionel.shtml. This parameter is optional if the "id" parameter (or unnamed parameter "1") is used." Update: The "a/" before the name of the player with change to "97/" in the template as stated at Template:RomanianSoccer#Examples and the name will be reversed so "achim_sebastian" will become "sebastian-achim" so this task will also update the links. However, there may need to be some regex to have those changes.

Discussion

I got the URL updates wrong so I will stick with fixing the deprecated parameters. Pkbwcgs (talk) 18:43, 28 January 2019 (UTC)

  • Approved for trial (50 edits).. Pkbwcgs, please link to this BRFA in the edit summaries. This was actually a task that I was thinking of taking on with DFB a couple months ago but ultimately did not have the time, so I thank you for taking this on. --TheSandDoctor Talk 06:11, 29 January 2019 (UTC)
A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) was the trial completed? — xaosflux Talk 18:50, 12 March 2019 (UTC)
@Xaosflux: Not yet. I will go forward with the trial soon. Pkbwcgs (talk) 19:05, 12 March 2019 (UTC)

HostBot 9

Operator: Maximilianklein (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:50, Monday, January 7, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/notconfusing/hostbot-ai

Function overview: User:Jtmorgan and User:Maximilianklein have planned, and received consent to run an A/B experiment between the current version of HostBot and a newly developed-AI version. The AI version uses a machine-learning classifier based on ORES to prioritize which users should be invited to the TeaHouse whereas the current version uses rules. The point is to see if we can improve user retention by turning our attention to the most promising users.

The two versions would operate simultaneously. Both versions would log-in as "User:HostBot" so that the end-users would be blinded as to what process they were interacting with.

The A/B experiment would run for 75 days (calculated by statistical power analysis).



Links to relevant discussions (where appropriate): Wikipedia_talk:Teahouse#Experiment_test_using_AI_to_invite_users_to_Teahouse

Edit period(s): Hourly (AI-version) and Daily (rules-version)

Estimated number of pages affected: ~11,000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: All technical details on meta:Research:ORES-powered_TeaHouse_Invites.

Discussion

Just posting here to confirm that I am excited to collaborating with Maximilianklein on this experiment. I've been wanting to improve HostBot's sampling criteria for a while now, and other Teahouse hosts have asked for it. J-Mo 19:33, 7 January 2019 (UTC)

Thought I'd drop by to voice my support, both for the experiment and for Maximilianklein. During the earlier discussion, I posted a couple of question on their talk page and got both a timely and thoughtful reply. I'm also interested in learning about the outcomes of this experiment, looking forward to them! Cheers, Nettrom (talk) 15:20, 15 January 2019 (UTC)

 Comment: - HostBot seems to be having a few issues. Which version is this? See here. RhinosF1(chat)(status)(contribs) 08:21, 18 February 2019 (UTC)
Resolved- They were on trial version. RhinosF1(chat)(status)(contribs) 07:54, 19 February 2019 (UTC)
Next check in: April 7. — xaosflux Talk 12:02, 19 March 2019 (UTC)

PkbwcgsBot 5

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:15, Thursday, December 13, 2018 (UTC)

Function overview: The bot will fix ISBN syntax per WP:WCW error 69 (ISBN with incorrect syntax) and PMID syntax per WP:WCW error 102 (PMID with incorrect syntax).

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Once a week

Estimated number of pages affected: 150 to 300 a week

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot is going to fix incorrect ISBN syntax per WP:ISBN. So, if the syntax is ISBN: 819345670X, it will take off the colon and make it ISBN 819345670X. The other case of incorrect ISBN syntax this bot is going to fix is when the ISBN number is preceded by "ISBN-10" or "ISBN-13". For example, in ISBN-10: 995341775X, it will take off "-10:" and that will make it ISBN 995341775X. The bot will only fix those two cases of ISBN syntax. Any other cases of incorrect ISBN syntax will not be fixed by the bot. The bot will also fix incorrect PMID syntax. So, for example, if it is PMID: 27401752, it will take off the colon and convert it to PMID 27401752 per WP:PMID. It will not make it PMID 27401752 because that format is deprecated.

Discussion

Please make sure to avoid ISBNs within |title= parameters of citation templates. Also, is there a reason that you are not proposing to use the {{ISBN}} template? Magic links have been deprecated and are supposed to go away at some point, although the WMF seems to be dragging their feet for some reason. There is another bot that converts magic links to templates, but if you can do it in one step, that would probably be good. – Jonesey95 (talk) 12:05, 13 December 2018 (UTC)

@Jonesey95: The bot will convert to the {{ISBN}} template and it will not touch ISBNs in the title parameters of citations. Pkbwcgs (talk) 15:19, 13 December 2018 (UTC)
What about the PMID's? Creating more deprecated magic words isn't ideal. — xaosflux Talk 19:16, 14 December 2018 (UTC)
@Xaosflux: I did say that was going to happen in my description that they will be converted to templates. However, now I need to code in RegEx and I have been trying to code that but my RegEx skills are unfortunately not very good. Pkbwcgs (talk) 19:52, 14 December 2018 (UTC)
I have tried coding in RegEx but I have gave up soon after as it is too difficult. Pkbwcgs (talk) 21:14, 14 December 2018 (UTC)
@Pkbwcgs: After removing the colon you can use Anomie's regex from Wikipedia:Bots/Requests for approval/PrimeBOT 13: \bISBN(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++((?:97[89](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?)?(?:[0-9](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?){9}[0-9Xx])\b and \b(?:RFC|PMID)(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++([0-9]+)\b, or you can adjust them to account for the colon. Primefac could advise if he made any changes to them. — JJMC89(T·C) 06:27, 15 December 2018 (UTC)
@JJMC89: Thanks for the RegEx. I will be able to remove the colon easily. It is the RegEx for the ISBN that I struggled with. Thanks for providing it. Pkbwcgs (talk) 09:49, 15 December 2018 (UTC)
It is saying "nested identifier" and it is not replacing when I tested the RegEx on my own AWB account without making any edits. Pkbwcgs (talk) 09:53, 15 December 2018 (UTC)
@Pkbwcgs: The regex comes from PHP, but AWB (C#) doesn't support possessive quantifiers (e.g. ++). Replacing ++ with + in the regex should work. — JJMC89(T·C) 18:57, 15 December 2018 (UTC)
@JJMC89: I have tested the find RegEx on my AWB account without making any edits and it works. I also worked out the replace RegEx and it is {{ISBN|$1}}. That works too. I think this is ready for a trial. I will also request a small extension for this task which is to clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links. That will be PkbwcgsBot 7. Pkbwcgs (talk) 20:15, 15 December 2018 (UTC)
I adjusted the RegEx to accommodate ISBNs with a colon. Pkbwcgs (talk) 20:33, 15 December 2018 (UTC)
This diff from my account is good and perfectly justifies what this bot is going to do for this task. Is this good enough? Pkbwcgs (talk) 20:53, 15 December 2018 (UTC)
This is what it will look like if the bot handles an ISBN with the "ISBN-10" prefix. That diff is also from my account. Pkbwcgs (talk) 21:08, 15 December 2018 (UTC)
{{BAG assistance needed}} There is a huge backlog at Wikipedia:WikiProject Check Wikipedia/ISBN errors at the moment. This task can cut down on that backlog through replacing the colon with the correct syntax. It has also been waiting for two weeks. Pkbwcgs (talk) 22:12, 27 December 2018 (UTC)

Approved for trial (25 edits). --slakrtalk / 20:43, 4 January 2019 (UTC)

The first thirteen edits are here. Pkbwcgs (talk) 09:54, 12 January 2019 (UTC)
This edit put the ISBN template inside an external link, which is an error. This one has the same error. The other eleven edits look good to me. I recommend a fix to the regex and more test edits. – Jonesey95 (talk) 19:51, 12 January 2019 (UTC)
@Jonesey95: I fixed those errors. Pkbwcgs (talk) 19:57, 12 January 2019 (UTC)
Symbol tick plus blue.svg Approved for extended trial (25 edits). OK try again. — xaosflux Talk 04:10, 30 January 2019 (UTC)
I apologise for the delay to the trial of this task. I will do the trial as soon as I can. Pkbwcgs (talk) 11:00, 22 February 2019 (UTC)
A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) any update on the trialing? — xaosflux Talk 18:49, 12 March 2019 (UTC)
@Xaosflux: I will go forward with the trial this week. Pkbwcgs (talk) 19:06, 12 March 2019 (UTC)

Bots that have completed the trial period

DannyS712 bot 12

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:55, Tuesday, March 5, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Solve WP:CWERRORS #17 - Category duplication

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/PkbwcgsBot

Edit period(s): One time run

Estimated number of pages affected: ~8000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Currently, PkbwcgsBot only fixes a maximum of 300 instances of this error per week. While this certainly helps with the backlog, I'd like to do a one-time run to clean it out. Using AWB, I would do find-and-replace on the regex (\[\[Category:.*\]\])((.|\n)*)\1\n, replacing it with $1$2. I did a few of these manually to perfect the regex (eg [8], [9], [10]). While gen-fixes would fix this issue, they would not be activated, so no other edits would be made.

Discussion

Approved for trial (50 edits). Primefac (talk) 19:42, 10 March 2019 (UTC)

@Primefac: Should I have AWB autosave, or hit save manually? --DannyS712 (talk) 19:55, 10 March 2019 (UTC)
Does it matter? The results are the same. Primefac (talk) 19:55, 10 March 2019 (UTC)
Trial complete. - 50 edits made. [11] (search for "Category duplication"). 2 issues: marking the pages as fixed within the wikiproject (I posted on the discussion page to figure out if the list is regenerated, or how to mark the pages automatically from AWB); and the regex doesn't work if the categories have different sort keys. So, the current regex would work on ~2300 pages. Once I finish those, I can look into a different regex that removes the second instance of a category, even if it has a different sortkey, but that is a separate issue, and should probably be a separate task --DannyS712 (talk) 21:30, 10 March 2019 (UTC)
Update - marking as done taken care of - automatically updated at the end of the day, so issue 1 is unneeded. Isuse 2 doesn't prevent the bot from running, but rather just limits the scope, so as far as I can tell I should be able to run the bot overnight (once its approved). Forgot to @Primefac last time. Thanks, --DannyS712 (talk) 22:33, 10 March 2019 (UTC)
So does that mean the bot skips pages that have duplicate cats but different sortkeys? Primefac (talk) 22:35, 10 March 2019 (UTC)
@Primefac: yes, as it currently stands, I have the bot skip pages where no changes are made. The only change that is made is based on the find-and-replace regex, which relies on either identical sortkeys or having no sortkeys at all. --DannyS712 (talk) 23:15, 10 March 2019 (UTC)
  • This task smells pretty badly of WP:COSMETICBOT - arguable fixing sortkey collisions would be more reader-useful, but that isn't even happening here. The page output before and after this task gets done has no change for readers. I see a minor benefit for editors. Is there room to combine this with another task? — xaosflux Talk 13:31, 12 March 2019 (UTC)
    @Xaosflux: WikiProject Check Wikipedia says that solving this error is Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT. I'll try to create a regex for sort key collision, but for now I'd prefer to avoid combining my tasks, since I'm still only starting out as a bot-op. --DannyS712 (talk) 18:34, 12 March 2019 (UTC)
    @Xaosflux: I think I have a working regex to fix duplicate categorization even if one or both of the categories have sort keys:

(\[\[Category:[^|\]]*)((?:\|[^\]]*)?\]\])((?:.|\n)*)\n\1(?:\|[^\]]*)?\]\]\n?

Which is replaced with $1$2$3. This is, as you said, more reader-useful. What do you think of an extended trial? --DannyS712 (talk) 03:16, 14 March 2019 (UTC)

As a comment, I'm the one that added "Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT." back then. And the reason is that I felt this is a future-proofing situation, because someone that wants to update a sort key might only do it in one place, and it won't kick in because there's a dual listing of the category. Or they might remove the category in one place, thinking they removed the category from the article, unaware there's a duplication of it. This wasn't RFC'd or BRFA'd before however. Headbomb {t · c · p · b} 23:46, 20 March 2019 (UTC)
Also is there a particular reason why genfixes are disabled for this? They'd seem worth making on top of the main task, IMO. Headbomb {t · c · p · b} 23:49, 20 March 2019 (UTC)
@Headbomb: I'd prefer not to automatically run genfixes, but if you'd like them enabled I can supervise an extended trial --DannyS712 (talk) 00:11, 21 March 2019 (UTC)
In my experience genfixes have been pretty stable and well tested for a while now. But it's your bot, so it's your call ultimately about whether or not you want to enable them. It just seems to me that if you're going to make some genfix-like edits (duplicate category removal is covered by them after all), you might as well enable the full suite of genfixes. Headbomb {t · c · p · b} 00:15, 21 March 2019 (UTC)
@Headbomb: in that case, sure. Would you be willing at approve an extended trial with both regexes (to also fix category duplication) and also genfixes? --DannyS712 (talk) 00:17, 21 March 2019 (UTC)
Symbol tick plus blue.svg Approved for extended trial (50 edits). I'll approve for further trial, but since I'm the one that added "Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT." back then, I'll recuse myself from final approval. Headbomb {t · c · p · b} 00:21, 21 March 2019 (UTC)
@Headbomb: Trial complete. 50 edits, see [12] - for the first 10, I forgot to enable genfixes. I didn't see any errors, except for one where there were multiple repeated categories, which I fixed. Thanks, --DannyS712 (talk) 00:56, 21 March 2019 (UTC)

How does the bot handle cases like this [13]? Should it? Headbomb {t · c · p · b} 01:08, 21 March 2019 (UTC)

@Headbomb: I don't really understand the first question - that case is a bot edit, and I think it should handle it exactly as it did. --DannyS712 (talk) 01:20, 21 March 2019 (UTC)
There are two clashing sortkeys. How does the bot decide which to remove? Headbomb {t · c · p · b} 01:25, 21 March 2019 (UTC)
@Headbomb: it always removes the second instance of a category. If one or both have sortkeys, it still just removes the second instance of the category, and keeps the first, regardless of if the second had a sort key and the first didn't, etc --DannyS712 (talk) 01:28, 21 March 2019 (UTC)

Ahechtbot 5

Operator: Ahecht (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:06, Sunday, February 17, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB, replacement strings posted at User:Ahechtbot#Task 5

Function overview: Fixes the specific signatures and substituted template text with unclosed formatting tags listed at User:Ahechtbot#Task 4. These are now causing linter errors and/or formatting issues on entire pages due to the Change from HTML Tidy to RemexHTML. Also fixes unclosed <s>...</s> tags where found on pages where other changes are already being made.

Links to relevant discussions (where appropriate): Continuation of approved tasks at Wikipedia:Bots/Requests for approval/Ahechtbot, Wikipedia:Bots/Requests for approval/Ahechtbot_2, and Wikipedia:Bots/Requests for approval/Ahechtbot_3.

Edit period(s): Will be run in several batches.

Estimated number of pages affected: 25000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Continuation from Task 3 with another batch of strings to replace. These strings are listed at User:Ahechtbot#Task 5, and are based on the non-controversial strings from Task 4 as well as additional strings. No "Automatic changes" (genfixes, etc.) will be enabled. All edits will be marked as "bot" and "minor".

Discussion

I have some additional signature replacement patterns listed in the "//Fix Linter individual signatures" section of User:Jonesey95/AutoEd/doi.js, if you would like to replace those as well. I have never encountered a false positive in using those patterns. You might also consider the patterns in the "//font wrapping links - move inside link and convert to span tag" section as well; they catch a lot of "Tidy font link" errors. The three sections after that one all replace font tags, which are low-priority fixes but might benefit from getting done while your bot is visiting pages. I'm assuming that your bot will work primarily on Talk spaces and on Project space, since those are the spaces where user signatures are the most common. You might also consider a supervised run through the DYK pages in Template space, since those are really discussion pages. – Jonesey95 (talk) 09:32, 17 February 2019 (UTC)

I already have an unweildy amount of fixes for this run, but I can incorporate those additional signatures into Task 6. I especially don't want to muddy the waters by including general fixes in with the specific signatures that I'm targeting with this task. --Ahecht (TALK
PAGE
) 15:25, 20 February 2019 (UTC)

LkolblyBot

Operator: Lkolbly (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:57, Monday, December 24, 2018 (UTC)

Function overview: This bot automatically updates Alexa rankings in website infoboxes by querying the Alexa Web Information Service.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: https://github.com/lkolbly/alexawikibot (presently, the actual saving is commented out, for testing)

Links to relevant discussions (where appropriate): Previous bot that performed this task: OKBot_5

Edit period(s): Monthly or so

Estimated number of pages affected: 4,560 articles are in the current candidate list. A subset of these pages will be updated each month. Other pages could be pulled into the fray over time if someone adds alexa information to a page. Also, there will be a whitelist copied from User:OsamaK/AlexaBot.js of pages that will be edits (presently containing 1,412 pages).

Namespace(s): Articles

Exclusion compliant (Yes/No): Yes (via whatever functionality is already in pywikipedia)

Function details: This bot will scan all pages (using a database dump as a first pass) to find pages which have the "Infobox website" template with both "url" and "alexa" fields.

It will parse the domain from the url field using a few heuristics, and query the domain with AWIS. Domains that have subdomains return incorrect results from AWIS (e.g. mathmatica.wolfram.com returns the result for just wolfram.com), so these domains are discarded (and the page not touched). It will then perform an AWIS query to determine the current website rank and trend over 3 months.

Websites will be classified into {{Increase}}, {{Decrease}}, and {{steady}} (Increase, Decrease, and Steady, respectively). A site increasing in popularity will gain it the Increase tag, even though it is numerically decreasing (previously, many sites were also classified into IncreaseNegative and DecreasePositive that I didn't understand)

Then, in the text of the article, whatever the current alexa data is will be replaced by something like:

{{Increase}} 169,386 ({{as of|2018|12|24}})<ref name="alexa">{{cite web|url= http://www.alexa.com/siteinfo/darwinawards.com | publisher= [[Alexa Internet]] |title=Darwinawards.com Traffic, Demographics and Competitors - Alexa |accessdate= 2018-12-24 }}</ref> <!-- Updated monthly by LkolblyBot -->

(e.g. Increase 169,386 (As of 24 December 2018)[1] )

There are two as-yet untested test cases that I'll test (and fix if necessary) before any full-scale deployment:

  • Apparently some infoboxes have multiple |alexa= parameters? I have to go find one and see what the bot does with it. (probably the right thing to do is to not touch the page at all in that situation)
  • Some pages have an empty |alexa= parameter, which should be fine, but worth testing anyway.

References

Discussion

Please make the bot's talk page.

"whatever the current alexa data is will be replaced" - how do you know there isn't more than just the previous value? Or that there isn't a reference that is used elsewhere?

I imagine many pages that copy-paste the template code will have an empty |alexa= parameter. This would not be any different to not having it at all.

Do you preserve template's formatting?

The particular citation style the bot uses may not match the article's, especially the date format. (I wonder why we don't have an Alexa citation template still.) —  HELLKNOWZ   ▎TALK 21:26, 24 December 2018 (UTC)

The format of the template code overall is preserved, the value is replaced by replacing the regex r"\|\s*alexa\s*=\s*{}".format(re.escape(current_alexa)), so the rest of the template is unaffected. (the number of spaces before the equal sign goes from "any number" to "exactly one", though)
Yeah, I was debating having it skip empty alexa parameters. There's value in adding it (as much as updating it), though for very small sites the increase/decrease indicator may not be particularly useful.
I didn't think to check whether there's more than the previous value, though I can't think of what else would be there. There's at least two common formats for this data, basically the OKBot format, and a similar format with parenthesized month/year instead of the asof (see https://en.wikipedia.org/wiki/Ethnologue - note lack of a reference). I guess it would be safest to check that the value is in a whitelisted set of alexa formats to replace, I'll bet a small number of regexes could cover 90% of cases (and the remaining 10% could be changed to a conforming case by hand :D)
The reference is interesting, because it's basically a lie. It's a link to the alexa page, but that isn't where the data was actually retrieved from, it was retrieved from their backend API. As for if someone's already using that reference, it shouldn't be too hard to check for that, I would think. I imagine (with only anecdotal evidence) that most of those cases will be phrases like "as of 2010, foobar.com had an alexa rank of 5". Updating that reference to the present value may not make sense in the context of the article (myspace isn't as big as it used to be, an article talking about how big it was in 2008 won't care how big it is now). But either way they should probably be citing a source that doesn't automatically change as soon as you read it.
The ethnologue page already looks like it has diverging date formats? I don't know how common that is, I'll have to go dig up the style guide for citations (maybe we should have a bot to make that more uniform). What would it take to make a template? (also, would that solve the uniformity issue? I guess at least it'd be uniform across all alexa rankings)
Lkolbly (talk) 14:52, 25 December 2018 (UTC)
WP:CITEVAR and WP:DATEVAR is the relevant stuff on date and citation differences. On English wiki, changing or deviating from citation or date style without a good reason is very controversial. The short answer is "don't". Bots are generally expected to do the same, although minor deviations are somewhat tolerated. But bots are expected to follow templates, like {{use dmy dates}} or |df= parameters. —  HELLKNOWZ   ▎TALK 16:36, 25 December 2018 (UTC)
Okay, it looks like it should be pretty straightforward to just check for the two Template:Use ... dates tags and set the |df= parameter. Lkolbly (talk) 14:39, 26 December 2018 (UTC)
Updated the bot so that it follows mdy/dmy dates, updating the accessed date and asof accordingly. Also constrained the pages that will be updated to a handful of matching regexes and also pulled a list from User:OsamaK/AlexaBot.js, which eventually I'll make a copy of. Lkolbly (talk) 18:20, 1 January 2019 (UTC)
  • Approved for trial (50 edits). Primefac (talk) 00:43, 20 January 2019 (UTC)
Trial complete. Ran bot to edit 50 randomly selected pages. So far I've noticed two bugs that cropped up, one involving leading zeros in the access dates and another where the comment "Updated by LKolblyBot" got repeated. I'm going to go through and fix the issues by hand for now and apply fixes to the bot. Lkolbly (talk) 20:20, 27 January 2019 (UTC)
Also, looking closer, some pages got a "Retrieved" date format that doesn't match the rest of the page (e.g. Iraqi News), but I'm pretty sure it's because those pages aren't annotated with dmy or mdy. Lkolbly (talk) 20:47, 27 January 2019 (UTC)
I have questions.
  • First, Special:Diff/880480890 - is there a reason it chooses http over https?
  • Second, why do some diffs use ISO formatting for the date while others actually change it to dmy?
  • Third, are OKBot and Acagastya still updating these pages, and would it make sense to remove those names from the comments?
My fourth/fifth questions were going to be what you were going to do about duplicate names, but it looks like you noticed that and are taking care of it, along with a lack of leading zeros issue with the 2019-01-27.
Also, as a minor point, even if you've only done 44 edits with the bot, please make sure when you finish with a trial that you link to the specific edits, since while "Contribs" might only show those 44 edits now, after you've made thousands they won't be the first thing to look at.
Actually, I do have another thought - for brevity, it might be best to have a wikilink in the edit summary instead of a full URL. Primefac (talk) 20:12, 28 January 2019 (UTC)
I have answers.
  • There's no particular reason it uses http over https for the alexa.com link, I hadn't given it a second thought. I can change it to https.
  • The variations in date formatting are an attempt to stick with the articles predominant style. The default style being ISO format, and if there's a use dmy or use mdy tag it uses the respective format.
  • OKBot appears defuct, I wasn't aware of Acagastya, though from their user page it looks like they've left English Wikipedia at least. It does make sense to remove the (now duplicate) comments, that was ultimately the goal but it didn't work as planned.
  • Good point on the making a list of the trial edits, conveniently it looks like I can search the contribs to make a view of just the trial edits.
  • Yeah, the wikilink idea occurred to me a few minutes too late, it looks terrible in the commit message :/ Lkolbly (talk) 23:32, 28 January 2019 (UTC)
With the constant modification that Alexa goes through, it is not a good idea to put manual labour for updating the ranks.
acagastya 08:53, 29 January 2019 (UTC)
  • Regarding the 'Updated monthly by ...' lines - as is being demonstrated here there are stale entries - and it can be expected as no bot should ever be expected to operate in the future. To that end I don't think this should be added, and would support having the next update remove any existing such comment codes. — xaosflux Talk 15:21, 7 February 2019 (UTC)
    Symbol tick plus blue.svg Approved for extended trial (50 edits). Please implement the above changes in this run. Primefac (talk) 21:30, 14 February 2019 (UTC)
was the trial completed? What are the results (please link to diffs as well). — xaosflux Talk 18:51, 12 March 2019 (UTC)
Sorry I've been dead in the water this last month, time hasn't been on my side (I figured I'd re-architect my server before I ran the trial, and have everything nice and containerized, but that didn't work out and then one thing led to another). I haven't done the trial yet, I plan to run it this coming weekend though. Lkolbly (talk) 19:10, 12 March 2019 (UTC)
Trial complete. Okay, ran the bot on these 50 pages. Some notes:
  • r.e. the "Updated by" comments: So it turns out the framework I'm using (pywikibot) strips out the comments, which is why they were being duplicated. This run did not add "updated by" comments. Removing existing comments could be done but would have to be a separate script.
  • I think I'll change the change comment to "Bot: Update Alexa ranking (link to a list of sites that the bot maintains)"
  • Some sites (e.g. Gothamist) list a URL in the infobox that is not ostensibly the site's actual (or main) URL, which gives an inaccurate alexa ranking. I think this is beyond my control though.
  • The original formatting of the infobox is unfortunately lost in pywikibot. The spacing varies - some (Adventure Gamers) use no spaces after the vertical bar, most one space, some align the equals signs, some don't (or do so inconsistently). Regardless, the information is gone at rewrite time.
  • A large number of sites had an "April 2014" style alt text specified for the "as of" tag. This script eliminates those.
  • One page (Shutterfly) had the "alexa" ref specified in a separate infobox references section at the bottom of the infobox, which led to a duplicate reference name error.
Otherwise, everything seemed to run fairly smoothly. The last point I might be able to handle by searching for name="alexa" or something in the page text. I think it's a fairly rare occurrence though.
Lkolbly (talk) 02:13, 20 March 2019 (UTC)

DeprecatedFixerBot 7

Operator: TheSandDoctor (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:51, Wednesday, March 13, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/TheSandDoctor/thefinalball_template_remover

Function overview: Removes {{TheFinalBall}} wherever present.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Remove_Template:TheFinalBall, Wikipedia:Templates for discussion/Log/2019 February 13#Template:TheFinalBall

Edit period(s): one time run

Estimated number of pages affected: 300-400

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The bot would run through all the transclusions of {{TheFinalBall}} and remove them where found on the respective pages.

Discussion

Approved for trial (50 edits). SQLQuery me! 22:02, 13 March 2019 (UTC)

It worked as expected in most cases. It is not context sensitive to the location (ie if between ref tags) and that is something I will have to investigate further. A solution is to manually clean up after the bot where necessary, which I would of course volunteer to do. That said, I will look for a programatic solution in the morning. --TheSandDoctor Talk 06:11, 14 March 2019 (UTC)
  • It appears that I was able to implement a fix by using regex first to look and see if there are <ref></ref> tags around the template being removed. If found, they are then removed then the original code executes. @SQL: could I please have a new trial to trial this fix? --TheSandDoctor Talk 16:24, 14 March 2019 (UTC)
    Sure, let's see another 50. Approved for trial (50 edits). SQLQuery me! 04:22, 15 March 2019 (UTC)
    Trial complete. @SQL: There were some issues encountered, but I have changed my programatic approach and the bot has now performed 24 consecutive edits without issue. Instead of preemptively running the text through regular expressions (more error prone), I have switched to running mwparserfromhell (insanely stable, just can't see <ref></ref> tags) and then running it through a single regular expression looking for empty tags (the original problem). Once this change was made, the only issue that arose was not anticipating the "name" field. Once adjusted, it ran fine and the issue has not reappeared. --TheSandDoctor Talk 15:27, 15 March 2019 (UTC)
    TheSandDoctor, looks like someone has already gotten to all the transclusions.. Galobtter (pingó mió) 15:04, 21 March 2019 (UTC)

GreenC bot 11

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:02, Sunday, March 3, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): GNU Awk

Source code available: TBU

Function overview: Add {{Unreferenced}} template to target articles. {{Unreferenced}} currently has about 220,000 instances the bot will add about 25,000 more or about a 10% increase.

Links to relevant discussions (where appropriate): Wikipedia:Village_pump_(proposals)#Bot_to_add_Template:Unreferenced_and_Template:No_footnotes_to_pages_(single_run) (RFC)

Edit period(s): one time run

Estimated number of pages affected: 25,000 (est)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details:

Discussion

Informing previous BRFA participants of new BRFA: @Xaosflux, Headbomb, Ajpolino, MZMcBride, SD0001, Xover, DannyS712, and Wugapodes: -- GreenC 17:24, 3 March 2019 (UTC)

  • BAG notes, I encouraged GreenC to restart this BRFA. Avoiding FP's is important, and this is not an "easy" filter. The prior RfC is supportive in general of the tagging - but it has to be accurate. There could be a small margin of error, but we need to focus on reducing it. Feedback on FP avoidance and examples is extremely welcome below, thank you! — xaosflux Talk 17:47, 3 March 2019 (UTC)

Hi GreenC, for Skip if section named "External links", "References", "Sources", "Further reading", "Bibliography", "Notes", "Footnotes" if you have that as equals can you change it to contains? I've run across pages with sections such as "Literature and References". — xaosflux Talk 17:35, 3 March 2019 (UTC)

Will do. -- GreenC 18:13, 3 March 2019 (UTC)

I've had a look at a dozen or so of the articles identified at User:GreenC/data/noref. Here are a few articles that point to ways to potentially adjust the selection criteria:

  1. Vemuri – this is a surname article and it doesn't need sources as it's mostly serves as a disambiguation page
  2. Dom Aleixo Timorese – not unsourced. I guess it needs to be taken into account that the bibliography section might have a different title ("Literature" in this case). Also, if articles with external links are to be excluded, then articles with {{Authority control}} will need to be excluded as well.
  3. Callichirus – similarly, it has a {{Taxonbar}}.
  4. Fukushima's Theorem – it has hand-formatted citations in a section called "Journal articles"
  5. Cordichelys – weren't stubs meant to be excluded?
  6. There are quite a few articles on films, TV episodes, books or music albums (like Parade (Bottom) or The Platinum Collection (Blue album)) that indeed list no sources, but a fair amount of whose content – plot synopses, track listings and the like – are obviously sourced to the publication that is the subject of the article. I don't think tagging with {{unsourced}} is a good idea, but there certainly is an underlying issue and that's the fact that they don't use any secondary sources. A more appropriate tag would probably be {{Primary sources}}, though it use normally entails some form of editorial judgement. – Uanfala (talk) 17:36, 3 March 2019 (UTC)
  1. It will now filter anything with "surname" in a category name. Normally it would have been filtered out by one of the index templates in Category:Set index article templates but the page has none which is in error.
  2. {{Authority control}} can be filtered. "Literature" can be added to the section title list.
  3. {{Taxonbar}}, {{Authority control}} and others are now removed via Category:External link templates to linked data sites with reciprocal links
  4. Section titles with "Articles" are now filtered (the section title words are case and plural sensitive)
  5. It does not tag articles marked as stubs in an abundance of caution but that doesn't preclude stubby articles without sources can't or shouldn't be tagged. The article is unsourced and should be tagged. It was actually tagged previously, but some sort of deletion-by-redirect reversal caused them to be lost. The bot uncovered this problem.
  6. There is no source, primary or otherwise. The presumption of a source is not the same as a literal source ie. what is the name of the source, where is it located, who is the author, what date was it accessed etc.. all that is missing. There is no verifiable source. That is why we have this tag, so the community can be made aware of articles like this that need a source. -- GreenC 18:13, 3 March 2019 (UTC)

{{BAGAssistanceNeeded}} - the bot is ready to begin trials. -- GreenC 14:08, 6 March 2019 (UTC)

Approved for trial (50 edits or 14 days). Go ahead a run a trial with your adjusted parameters. — xaosflux Talk 00:46, 7 March 2019 (UTC)
Trial complete. diffs. -- GreenC 18:03, 13 March 2019 (UTC)
I skimmed these pretty quickly so I may have missed some. Thoughts:
  1. Sawsan, looks like the same problem as Vemuri discussed above. Can you also just skip articles with "Given name" categories? (same for Gaurav)
  2. Municipalities of Central Finland is basically a list article, but I can't think of a clever way to skip it. Maybe it's best articles like that have a reference anyway...
  3. Communes_of_the_Aisne_department is a bona fide list. Maybe you can skip articles with "List" in the category names? This one was in Category:Lists of communes of France. (Members of the 5th Dáil and Duchess of Brabant (by marriage) would also be skipped with this).
Otherwise looks great! Ajpolino (talk) 20:43, 13 March 2019 (UTC)
Given-name articles have sources (see the Category tree for example Abdul Hamid or William or Alexander). List-of articles also have sources eg. List of counties in New York. -- GreenC 21:27, 13 March 2019 (UTC)
@Ajpolino: Courtesy ping ^ --TheSandDoctor Talk 16:58, 16 March 2019 (UTC)
I'm not arguing that any kind of article ought not have references, but we pitched this in the RfC as a conservative bot skipping stubs, lists, et al. So if it's not too much trouble (and maybe it is), I think it'd be best if we skipped lists even if they aren't titled "List of..."... Also someone added a source to one of the articles you tagged in your most recent test run. So that's somewhat validating. That was kind of the point of all this. Thanks for all your work! Ajpolino (talk) 17:41, 16 March 2019 (UTC)
There was no 'pitch' to skip lists nor can I think of any reason to they have sources just list any other article. -- GreenC 18:40, 16 March 2019 (UTC)
Also, the most recent run suggests there will be around 10,000 edits not the 25,000 as originally thought, due to the additions of filters suggested by Uanfala. Each filter causes a significant reduction. To put 10,000 in perspective that is 0.00175 of all articles (about one-fifth of one percent) or an increase of {{unreferenced}} by 5%. These to me are conservative numbers. -- GreenC 18:52, 16 March 2019 (UTC)
Ah, sorry to be stuck on this point, but just to clarify does the bot in its current configuration skip articles that have titles "List of..."? I think that was in your original exclusion list (per the old BRFA) but perhaps you've decided against it. Ajpolino (talk) 20:00, 16 March 2019 (UTC)
Ah indeed it is filtering 'list of' articles, sorry! Not sure what I was thinking, loosing track. OK, more filtering can be be done on the category layer as you suggested. My code notes say the reason for filtering 'list of' articles it was picking up too many false positives. Also rethinking Given-name articles, those also are already filtered by way of the Set Index templates and those showing up here are edge cases that are not properly templated, so they should also be filtered on the category level. Thanks for your better memory keeping this straight :) -- GreenC 22:19, 16 March 2019 (UTC)

A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. - above new filters added, ready for next trial, recommend another 50. -- GreenC 14:09, 17 March 2019 (UTC)

PkbwcgsBot 12

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:10, Monday, December 24, 2018 (UTC)

Function overview: The bot will fix a range of unicode control characters in articles. This is WP:WCW error 16.

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Five times a week

Estimated number of pages affected: 100-250 at a time

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: This is an extension to Task 1 as I am already fixing Unicode Control Characters there. However, this task does more fixes to error 16 and fixes a range of Unicode control characters that WPCleaner can't fix. The following will be removed:

  • U+200E - Left-to-right mark (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+FEFF - Byte order mark (all instances of this can be safely removed)
  • U+200B - Zero-width space (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+2028 - Line separator (all instances of this can be safely removed)
  • U+202A - Left-to-right embedding (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+202C - Pop-directional formatting (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+202D - Left-to-right override (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+202E - Right-to-left override (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+00AD - Soft hyphen (all instances of this can be safely removed)

The following will be turned into a space:

  • U+2004 - Three-per-em space
  • U+2005 - Four-per-em space
  • U+2006 - Six-per-em space
  • U+2007 - Figure space
  • U+2008 - Punctuation space
  • U+00A0 - No-breaking space (any cases of U+00A0 that are okay per MOS:NBSP will not be removed) (this is the most frequent unicode character in WP:WCW error 16)

The bot will use RegEx and general fixes will be switched on but typo fixing will be turned off as they are both not required for this task.

Discussion

I'm not sure about some of these. In particular, U+00AD may have been added by editors to specify the proper place for long words to be broken, and U+00A0 should more likely be turned into the &nbsp; entity than changed into U+0020. The same might apply to the other space characters, editors may have specifically used these in preference to U+0020. Anomie 17:06, 24 December 2018 (UTC)

@Anomie: After going through the WP:WCW list, there are no instances of U+00AD anywhere. However, if it does come up, then I will replace it with a hyphen. U+00A0 takes up more bytes than a regular space (U+0020) so it is easier to leave a space. The other space characters can be safely replaced as they are unnecessary and they mostly come up in citations. See 1 which is taking out U+2005 which is four-per-em space, 2 which is taking out U+2008 which is punctuation space, 3 which is taking out U+2005 again, 4 which is taking out U+2008 again and 5 which is also taking out U+2008. All these occurred inside citations. Pkbwcgs (talk) 17:43, 24 December 2018 (UTC)
Replacing U+00AD with a hyphen would not be correct either. You'd want to replace it with {{shy}} or the like. For NBSP "takes up more bytes" is a very poor argument, and replacing it with a plain space could break situations described at MOS:NBSP. A figure space might be intentionally used to make columns of numbers line up properly where U+0020 would be a different width, and so on. I don't object to fixing things where specific fancy spaces don't make a difference, but you're arguing that they're never appropriate and that strikes me as unlikely. Anomie 17:55, 24 December 2018 (UTC)
@Anomie: There are no cases of U+00AD so the bot doesn't need to handle that. In terms of U+00A0, I will make sure my RegEx replaces the cases described at MOS:NBSP with &nbsp or otherwise skip them. Pkbwcgs (talk) 18:04, 24 December 2018 (UTC)
If you're not intending to handle U+00AD after all, you should remove mention of U+00AD from the task entirely. (I see you struck it) As for "the cases described", good luck in managing to identify every variation of those cases. It would probably be better to just make that part of the task be manually evaluated rather than "always replace". Anomie 18:09, 24 December 2018 (UTC)
@Anomie: The bot will still strip U+00A0 in wikilinks because replacing them with &nbsp is not going to work. Pkbwcgs (talk) 18:15, 24 December 2018 (UTC)
Replacing the cases stated at MOS:NBSP is trickier than I thought so I am going to skip those cases manually. This task is supervised. Pkbwcgs (talk) 18:20, 24 December 2018 (UTC)
{{BAG assistance needed}} I have made some amendments to this task including reducing down to five times a week and added general fixes so the removal of unicode control characters and general fixes can be combined together. I have also specified that non-breaking space will not be removed in cases described at MOS:NBSP and the bot will replace those cases with "&nbsp" with the general fixes. Pkbwcgs (talk) 20:10, 17 January 2019 (UTC)
  • Approved for trial (50 edits). Primefac (talk) 00:45, 20 January 2019 (UTC)
    @Primefac: Trial complete. The edits are located here. WP:GenFixes were switched on as stated for this task. I will point out a couple of good edits. This edit removed unicode control character no-breaking space in the infobox. Because of that character, the "distributor" character disappeared from the infobox. Once the bot removed that character, it re-appeared which makes it a good edit. There were some good general fixes in this edit as well as the removal of a non-breaking space character. This edit is also a good edit because it changed the direction of text from being right-to-left to left-to-right. Before, the right-to-left text would have been confusing but now the direction is changed so it is not confusing anymore. That edit removed the U+202E character which is "Right-to-left override". Some edits were removing non-breaking space within citations, U+200E was also removed in some edits in Arabic text and a few edits were removing U+2008 which is punctuation space. Pkbwcgs (talk) 20:02, 20 January 2019 (UTC)
    It might take me a few days to be able to verify any of these (and I have zero issue if another BAG gets to it first), but as a note it's much more helpful to point us to the bad/incorrect edits. In other words, we know how the bot is supposed to run, and pointing us to runs where the bot did what it was supposed to is... kind of pointless. Primefac (talk) 20:14, 22 January 2019 (UTC)
    Anomie, I don't know if you wanted to go through these or not, given your previous interest/concerns. Primefac (talk) 19:52, 28 January 2019 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.