Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

ZabesBot

Operator: Zabe (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:43, Saturday, January 15, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: pywikibot interwikidata.py script

Function overview: cleaning up old interwikilinks

Links to relevant discussions (where appropriate):

Edit period(s): one time rum

Estimated number of pages affected: Did not count, probably between a few hundret and a few thousand

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: There are still quite a few articles that contain old interwikilinks. I would like to clean them up. The bot basically goes through all pages and then removes the interwiki links if the article is linked in the corresponding Wikidata object and there are no conflicts with the interwiki links.

Discussion

Could you give an example or three of what edits you will be making with your bot? (please do not ping on reply) Primefac (talk) 09:31, 16 January 2022 (UTC)

  • Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 12:17, 16 January 2022 (UTC)

I made 3 test edits ([1], [2], [3]). --Zabe (talk) 12:18, 16 January 2022 (UTC)

I apologise but I should have asked this the first time - how are you finding these and could you give a slightly more accurate indication of how many edits are being planned? Primefac (talk) 19:05, 16 January 2022 (UTC)
I am fairly simply going through all pages in the article namespace. If that is too bad, I guess I could use a xml dump. --Zabe (talk) 21:39, 16 January 2022 (UTC)
For the number of edits, I need to query a bit. --Zabe (talk) 21:41, 16 January 2022 (UTC)

ElliBot

Operator: Elliot321 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 14:46, Saturday, January 23, 2021 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s):

Source code available:

Function overview: Automatically apply {{redirect category shell}} templates to redirects with Wikidata, and remove redundant {{Wikidata redirect}} templates.

Links to relevant discussions (where appropriate):

Edit period(s): one time run

Estimated number of pages affected: 50,000-100,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: I recently modified {{Redirect category shell}} to automatically detect Wikidata links and apply the template {{Wikidata redirect}} if they exist. This was previously already done with protection levels, and there is no reason that {{Wikidata redirect}} should not also be applied.

There are currently 100,000 redirects in the category Category:Redirects connected to a Wikidata item, which is applied by the software. There are currently 30,000 redirects in the category Category:Wikidata redirects. Nearly all of these were put into that category by applying {{Wikidata redirect}} manually, meaning they will need the tag removed (as it will be a duplicate).

Many of the remaining 70,000 pages will need the template {{rcat shell}} added. As the change to {{Redirect category shell}} was recent, many redirects connected to Wikidata items, without {{Wikidata redirect}}, but with {{Redirect category shell}}, have not been added to Category:Wikidata redirects. The difference in count between Category:Wikidata redirects and Category:Redirects connected to a Wikidata item is the number of pages that will be modified.

The edits will be carried out with AWB running as an automated bot. There is very low risk of disruption in this task, though the number of edits is significant. Using AWB, this bot can also carry out other generic fixes to redirects, though this is not a significant part of its functions.

A somewhat similar failed request was Wikipedia:Bots/Requests for approval/TomBot, but that that request was for a bot that would edit ~30-60x more pages, with less benefit overall. This is a much more narrow and useful request.

Discussion

  1. Any prior discussions on doing this that you're aware of, which establish broader consensus for this task?
  2. Will this BRFA cause Template:Wikidata redirect to become redundant? If I understand correctly, this task will orphan all of its transclusions? If so, and especially if there's no prior discussion, I suggest sending that template to TfD (and then this bot task can be technically implementing that TfD). That would be one way to test the wider consensus for this task, too.

ProcrastinatingReader (talk) 16:01, 23 January 2021 (UTC)

There are no discussions I know of establishing consensus around this particular task. {{Wikidata redirect}} will not become redundant for two reasons. {{redirect category shell}} transcludes it. However, this usage could be subst, of course. The other usage is in cross-Wiki (such as to Wiktionary) and category redirects, the "soft" usage. The "hard" usage could be deprecated from the template, however (they are implemented slightly differently, with an automatic switch). Elliot321 (talk | contribs) 16:20, 23 January 2021 (UTC)
To begin with, I'd say stylistically this presentation is inferior. See eg here. The one on the top (caused by the edit) doesn't look as good as the manual one & looks slightly out of place with the plaintext.
If the rcat shell has to be manually added by bot, is there really a point to this? Why not have a bot add {{Wikidata redirect}} to pages in Category:Redirects connected to a Wikidata item? ProcrastinatingReader (talk) 00:39, 24 January 2021 (UTC)
Sorry - that was due to my changes being misunderstood and reverted. If you check now, you can see the way they were intended to look.
The reason for adding {{redirect category shell}} over {{wikidata redirect}} is for automatic detection. If the link on Wikidata is removed, no update on Wikipedia is necessary (likewise, if a link on Wikidata is added to one using the shell, no update is necessary). Elliot321 (talk | contribs) 07:52, 24 January 2021 (UTC)
Okay, makes sense. I'd suggest dropping a link to this BRFA from the template talk pages for the two templates, to allow some time for comments. ProcrastinatingReader (talk) 09:37, 24 January 2021 (UTC)
Done. Elliot321 (talk | contribs) 10:08, 24 January 2021 (UTC)

So the idea is that {{RCAT shell}} should add the Wikidata box by checking for the connected item. Manually adding the template wouldn't be necessary then because the software can already detect if a page is connected to a Wikidata item. Is that correct? --PhiH (talk) 13:20, 24 January 2021 (UTC)

@PhiH: pretty much. The shell will automatically detect a link to Wikidata, and if found, transclude the template. Therefore, this bot will remove the redundant manual transclusions of the template, and add the shell to automatically transclude on any redirect linked to Wikidata. Elliot321 (talk | contribs) 15:36, 24 January 2021 (UTC)

If I understand what changed with {{wikidata redirect}} and {{redirect category shell}} correctly, redirects that only have {{wikidata redirect}} will be changed to an empty {{redirect category shell}}, which then results in an error. This means that manual inspection is needed to determine another redirect category to apply, which obviously this Bot task cannot do. —seav (talk) 01:02, 25 January 2021 (UTC)

FYI, an empty Rcat shell results in sorting the redirect to the Miscellaneous redirects category, which is monitored by editors who will then tag the redirect with appropriate categories. P.I. Ellsworth  ed. put'r there 03:41, 25 January 2021 (UTC)
Would that be a problem then, Paine Ellsworth? Filling the cat up with some tens of thousands of pages? ProcrastinatingReader (talk) 08:12, 25 January 2021 (UTC)
An empty RCAT shell with a Wikidata item doesn't need to be categorised in Category:Miscellaneous redirects because it generates the Template:Wikidata redirect. I didn't check if that is implemented yet. A page with that template and no Wikidata item is a problem as well. They just move from one tracking category to another. --PhiH (talk) 08:44, 25 January 2021 (UTC)
Why doesn't it need to be categorised into misc redirects? Having a Wikidata item connected/existing isn't really an explanation of why there's a redirect on enwiki. Surely the redirect still needs to be categorised? ProcrastinatingReader (talk) 09:06, 25 January 2021 (UTC)
@PhiH: @ProcrastinatingReader: the {{redirect category shell}} template should not be applied without any categories by a bot as the Category:Miscellaneous redirects should be filled manually. Consequently, I don't plan to do that with this bot. I can manually categorize the redirects that do not have any categories.
(though, a tracking category for uncategorized redirects that can be applied by a bot would probably be useful. I don't feel like gaining consensus for that, though, as that would likely be much more contentious than this proposal) Elliot321 (talk | contribs) 11:37, 25 January 2021 (UTC)
I think my point is easier to demonstrate with an example, or I’m mistaken about exactly what is proposed here. Can you make 5 edits as a demonstration, either with the bot or by hand if you don’t have the bot coded yet? ProcrastinatingReader (talk) 12:10, 25 January 2021 (UTC)
Maybe a page demonstrating what changes would be made would be more useful, since there are a few differing cases here? Elliot321 (talk | contribs) 03:46, 26 January 2021 (UTC)
An actual edit or two of each case would be preferable, as that's the least confusing way to see what is actually proposed. ProcrastinatingReader (talk) 09:57, 26 January 2021 (UTC)
But you want to add an empty RCAT shell to pages that currently only use {{Wikidata redirect}}, don't you? Should they be added to Category:Miscellaneous redirects if they are connected to a Wikidata item or not? --PhiH (talk) 12:32, 25 January 2021 (UTC)
I can manually categorize the pages currently in that situation. Elliot321 (talk | contribs) 03:46, 26 January 2021 (UTC)
It's not about manual categorisation. We are discussing whether the category should be added by the RCAT shell when the redirect is connected to a Wikidata item. --PhiH (talk) 14:19, 26 January 2021 (UTC)
To ProcrastinatingReader: as long as there is at least one rcat template within the Rcat shell, such as the "Wikidata redirect" template, then the redirect would not be sorted to Category:Miscellaneous redirects. As the proposer suggests, that would not be a problem. The proposer appears to know that a bot should not add an empty Rcat shell to redirects, which would bloat the Misc. redirects category. P.I. Ellsworth  ed. put'r there 15:35, 25 January 2021 (UTC)

I think there are multiple cases we have to discuss, feel free to comment below.

  1. Redirects that already use the RCAT shell
    This should be uncontroversial: Where {{Wikidata redirect}} is used it gets removed and the template is transcluded by the RCAT shell.
  2. Redirects without the RCAT shell…
    1. …that use {{Wikidata redirect}} and are connected to a Wikidata item
      The template gets removed and the RCAT shell is added. Should the RCAT shell be programmed to add these pages to Category:Miscellaneous redirects?
    2. …that don't use {{Wikidata redirect}} and are connected to a Wikidata item
      The RCAT shell is added. Same question as above arises.
    3. …that use {{Wikidata redirect}} and are not connected to a Wikidata item
      The template gets removed. Adding the RCAT shell would cause them to be added to Category:Miscellaneous redirects.
      Currently these pages are tracked in Category:Unlinked Wikidata redirects. Before this bot task begins someone should work through this list and add the pages on Wikidata if necessary.

--PhiH (talk) 14:46, 26 January 2021 (UTC)

If I understand correctly, this bot will add the Rcat shell along with an internal {{Wikidata redirect}} tag when it senses any redirect that is already itemized on Wikidata. If that is what happens, then the redirect will not be sorted to the Misc. redirects category. I also sense a possible challenge where the {{NASTRO comment}} template is applied. One of many examples is at 3866 Langley. Would this bot do anything to those many redirects? I actually like the idea of more Rcat shell transclusions. I wonder if the bot will continue checking for new redirects that become connected to a Wikidata item? P.I. Ellsworth  ed. put'r there 21:57, 26 January 2021 (UTC)

The bot won't touch the {{NASTRO comment}} redirects, since it has no need to.
I could run this after the main clean-up job (probably a weekly run would be sufficient). Elliot321 (talk | contribs) 05:25, 27 January 2021 (UTC)
NASTRO comment applies the Rcat shell and so would auto-apply the Wikidata redirect template. There will then be two renditions of Wikidata redirect. Won't one of them have to be removed? P.I. Ellsworth  ed. put'r there 18:49, 27 January 2021 (UTC)
I thought the point of this bot is that these edits wouldn't be necessary anymore. Here you said If the link on Wikidata is removed, no update on Wikipedia is necessary (likewise, if a link on Wikidata is added to one using the shell, no update is necessary) --PhiH (talk) 19:00, 27 January 2021 (UTC)
A weekly run would handle any new redirects that have been created. Editors LOVE to create new redirects; however, they generally leave it up to bots and Wikignomes to categorize their new redirects. P.I. Ellsworth  ed. put'r there 17:11, 28 January 2021 (UTC)
That sounds reasonable. I hadn't thought about completely new pages. --PhiH (talk) 17:29, 28 January 2021 (UTC)
@PhiH: "Redirects that already use the RCAT shell: This should be uncontroversial": Have you thought about the cases where the rcat shell only contains the Wikidata rcat? 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 21:25, 16 February 2021 (UTC)

Also curious as to why AWB is used? Don't get me wrong; I love AWB. However it's not known for its speed or lack of clunkiness. According to the manual, ...any edit to the bot's talk page will halt the bot. Before restarting the bot, the bot operator must log in to the bot account and visit the bot's talk page, so that the "new messages" notification is cancelled. So why not make a non-AWB bot to do the task? P.I. Ellsworth  ed. put'r there 22:14, 26 January 2021 (UTC)

Mainly because I know AWB and regex better than I know any other frameworks to interface with Wikipedia. I could write custom code, if that would be preferred. Elliot321 (talk | contribs) 05:26, 27 January 2021 (UTC)
I was just curious, so it would be up to you, of course. I just know how it drives me crazy sometimes when I have to stop in the middle of something, log out of AWB, log in to Wikipedia just to check notifications, log back out and into AWB to commence. That happens with non-bot-auto work as well. P.I. Ellsworth  ed. put'r there 18:53, 27 January 2021 (UTC)
Um... you don't have to log out of AWB to reset the counter. Also, technically you don't have to log out of Wikipedia either if you log in to the bot account in private mode. Primefac (talk) 13:44, 8 March 2021 (UTC)

So just to clarify what I'm waiting on: An actual edit or two of each case would be preferable, as that's the least confusing way to see what is actually proposed. After that, it'll be more clear to have the discussion on which edits are good and which need further discussion. ProcrastinatingReader (talk) 15:32, 13 February 2021 (UTC)

Re: the message above. Primefac (talk) 13:44, 8 March 2021 (UTC)
@Primefac and ProcrastinatingReader: thanks for the ping. I've actually expanded the scope of what I'm intending to do here a bit (see User:Elli/rcat standardization) - and planning on getting consensus for the changes elsewhere first, before going through with this, so if I could put this request on hold or something that would be ideal (sorry, I'm not sure exactly how this type of situation works, but getting approval for the narrow task of shelling and removing one rcat doesn't really make sense given my goal to deal with ~20 of them). Elli (talk | contribs) 18:38, 8 March 2021 (UTC)
Image-Symbol wait old.svg On hold. Just comment out the template when you're ready to go (no harm in having it sit here for a while). Primefac (talk) 19:29, 8 March 2021 (UTC)
This very productive discussion probably should have happened somewhere else prior to the BRFA being filed. Maybe this BRFA should be withdrawn pending a full discussion and manual demonstration of various test cases, and then resubmitted with a link to that discussion and a better explanation of the task. – Jonesey95 (talk) 02:25, 7 December 2021 (UTC)
@Elli: Just to follow up here: do you still intend to go through with this BRFA? ProcrastinatingReader (talk) 01:07, 8 November 2021 (UTC)
@ProcrastinatingReader: yes, just been busy with other stuff and not completely sure of the technical details yet. I'll try to follow up on that soon-ish. Elli (talk | contribs) 01:10, 8 November 2021 (UTC)

Bots in a trial period

BattyBot 64

Operator: GoingBatty (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:57, Monday, January 24, 2022 (UTC)

Function overview: Remove extra text from |volume= parameters

Automatic, Supervised, or Manual: Automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB

Links to relevant discussions (where appropriate): Help:CS1 errors#extra text volume

Edit period(s): Daily

Estimated number of pages affected: 15,000

Namespace(s): Mainspace and Drafts

Exclusion compliant (Yes/No): Yes

Function details: To remove articles and drafts from Category:CS1 errors: extra text: volume, this bot task will fix citations that contain variations of the word "Volume" in the |volume= field. For example:

Change {{cite book |author=John Smith |title=Title |volume=Volume 2}}

  • John Smith. Title. Vol. Volume 2. {{cite book}}: |volume= has extra text (help)

to {{cite book |author=John Smith |title=Title |volume=2}}

  • John Smith. Title. Vol. 2.

This is similar to BattyBot's task 46, which removes superfluous text from |edition= and |pages= fields. The bot will also apply AWB general fixes as needed. Thank you for your consideration. GoingBatty (talk) 15:57, 24 January 2022 (UTC)

Discussion

Approved for trial (25 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. ProcrastinatingReader (talk) 00:22, 25 January 2022 (UTC)

Dušan Kreheľ (bot)

Operator: Dušan Kreheľ (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 11:51, Sunday, December 5, 2021 (UTC)

Function overview: Primary: Add, edit or update of Slovak statistical information's (Ex. municipalities, town and cities) on this wiki. Usually wikis do not contain current that information.

Automatic, Supervised, or Manual: Semi-automatically

Programming language(s): Based / used the Wikimate (PHP).

Source code available: No.

Links to relevant discussions (where appropriate):

Edit period(s): If necessary. For the update data usually once a year.

Estimated number of pages affected: maximum approx. 3030

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details:

  • Actual code implement:
    • The update Infobox of places (Example).
  • Maybe later (The idea, none strict mission):
    • The table "Population development"
    • Act. data of census.

The pages about the Slovak place have the equally structured pages usually.

Discussion

Hi Dušan Kreheľ -- do you intend to proceed with this BRFA? ProcrastinatingReader (talk) 23:31, 20 January 2022 (UTC)

Yes, but the current status is suspend/sleeped (because not in the bot table). I will You inform when the change of actual status. ✍️ Dušan Kreheľ (talk) 14:23, 21 January 2022 (UTC)

How do you plan to update the data? I would definitely suggest considering adding this data to Wikidata if you haven't thought of that already, which will mean there won't be edits to articles for each update of data (and also other projects can use the info then). Other common alternatives are editing a template on English Wikipedia, or a Commons Data page. I think these are considered the best ways to do bot updates of statistical data. ProcrastinatingReader (talk) 15:04, 23 January 2022 (UTC)

@ProcrastinatingReader: The actual data (year 2020 and before) are under CC-BY. The license is broken on Wikidata. From skwiki, cswiki, plwiki, hrwiki, ukwiki, huwiki and srwiki have the only the one wiki (huwiki) the data from Wikidata and the data are not actual for the every place. It is none so big catastrophe for resource when one change per year. Another, the actual implementation want do least administration requirements for humans if I want to update data for other wikis. The existing solution, maybe not on the resources nice, but the consensus has the easiest. One user wants such a template, the other so, someone like text, an existing solution, maybe not perfect, but he did it well. ✍️ Dušan Kreheľ (talk) 15:40, 23 January 2022 (UTC)
Fair enough, I don't see a good reason to require updating a non-article data source, especially if that means rewriting code just for enwiki, and if it's only annual updates. ProcrastinatingReader (talk) 16:17, 23 January 2022 (UTC)

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. ProcrastinatingReader (talk) 16:17, 23 January 2022 (UTC)

DoggoBot 4

Operator: EpicPupper (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:40, Monday, January 17, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python 3

Source code available: GitHub, CC BY-SA 3.0

Function overview: Updates "available" status for users at Wikipedia:Adopt-a-user/Adoptee's Area/Adopters.

Links to relevant discussions (where appropriate): [4], Wikipedia:Bots/Requests for approval/Theo's Little Bot 19

Edit period(s): Weekly

Estimated number of pages affected: 1

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: Theo's Little Bot was formerly approved to do this, but it is now inactive. Source code functionally same to the previous bot, only revisions are updating to Python 3 and newer versions of libraries. Example diff: here.

Discussion

  • Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. for two updates of the page. ProcrastinatingReader (talk) 16:05, 18 January 2022 (UTC)
    Hi @ProcrastinatingReader! Would the edits be on the regular schedule (1 time per week), or should I accelerate them? 🐶 EpicPupper (he/him | talk) 23:46, 18 January 2022 (UTC)
    Not necessarily but space them out long enough such that there is actually a status update (I doubt the content would change between a random two consecutive days, for example) ProcrastinatingReader (talk) 12:26, 19 January 2022 (UTC)

Bots that have completed the trial period

Qwerfjkl (bot) 6

Operator: Qwerfjkl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 19:15, Thursday, January 20, 2022 (UTC)

Automatic, Supervised, or Manual: supervised

Programming language(s): JavaScript

Source code available: User:Qwerfjkl (bot)/code/6

Function overview: Capitalise short descriptions.

Links to relevant discussions (where appropriate): Wikipedia talk:Short description#Automatic capitalization of first character

Edit period(s): Continuous

Estimated number of pages affected: <10,000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The bot will capitalise the short description's first character, and do other cosmetic cleanup, such as moving the short description to the top of the page. I will review all the edits, and if necessary, revert them and add words to a whitelist. The false positive rate is <1%. List at User:Qwerfjkl/lcSD.

Discussion

What's the relevant MOS provision? ProcrastinatingReader (talk) 21:58, 20 January 2022 (UTC)

WP:SDFORMAT ― Qwerfjkltalk 22:00, 20 January 2022 (UTC)
I see. That doesn't seem to technically be part of the MOS (it's an information page), so can this task be advertised to the village pump (maybe WP:VPP) with a description of the task and a link to this BRFA? Would be good to give people an opportunity for input. ProcrastinatingReader (talk) 22:07, 20 January 2022 (UTC)
I don't think it's necessary. The affected pages amount to 10,000 out of over 4,000,000 pages with short descriptions (0.25%), so this is more of a minor cleanup task to standardize a few outliers. The guidance to use sentence case has been in SDFORMAT for almost four years now, with no significant objections that I can remember. – Jonesey95 (talk) 00:15, 21 January 2022 (UTC)
Yeah, I don't anticipate any objections, but I still think it's reasonable to advertise it such that there's an opportunity for people to raise any concerns or add any comments. ProcrastinatingReader (talk) 14:11, 21 January 2022 (UTC)
A notice was posted two days ago at VPR. – Jonesey95 (talk) 15:02, 23 January 2022 (UTC)

Approved for trial (25 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. To run a parallel trial. ProcrastinatingReader (talk) 15:06, 23 January 2022 (UTC)

@ProcrastinatingReader: Trial complete. See these 21 contributions + these 4 contributions. There were no false positives.  ― Qwerfjkltalk 16:11, 23 January 2022 (UTC)
I forgot to mention, the bot will also replace 'wikimedia/wikipedia list article' with 'None' per WP:SDNONE. This will affect ~1/3 of the list articles (at least 210 articles). ― Qwerfjkltalk 21:52, 23 January 2022 (UTC)
  • Sounds good to me. I wonder if we made the right call to capitalize short descriptions (Wikidata's approach seems to preserve more information, since it's easy to capitalize but hard to uncapitalize), but that's very much water under the bridge now. {{u|Sdkb}}talk 05:38, 25 January 2022 (UTC)

IndentBot

Operator: Notsniwiast (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:20, Friday, October 15, 2021 (UTC)

Function overview: Adjust indentation on discussion pages.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python, pywikibot

Source code available: On Github

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests/Archive_83#Bot_to_fix_indents

Edit period(s): Continuous (tracking recent changes on a delay)

Estimated number of pages affected: Depends on parameters. With delay of 10 minutes, around 20-30 pages are checked per 10 minutes (see function details below). Initially, most pages having substantial content will be edited, but since the bot processes the entire page, this will get reduced over time as it covers more ground.

Namespace(s): All talk namespaces, and the project namespace. Not sure if any other namespaces have discussion pages.

Exclusion compliant (Yes/No): Yes, uses pywikibot's save function.

Function details: First, the wikitext is partitioned into lines in the usual manner using \n as a delimiter, except that certain newlines, such as those immediately preceding table, template, or tag (as detected by WikiTextParser), are not considered the end of a line. Then we apply fix_gaps, fix_extra_indents, and fix_indent_style to the sequence of lines.

Definitions

  • The indentation characters are *, :, and #.
  • Given a line X, we denote the indentation characters of the line by indent_text(X), and we denote the indentation level by lvl(X). In particular, if X is not indented then lvl(X) == 0.
  • A blank line is a line consisting of whitespace only.
  • A gap is a nonempty contiguous sequence of blank lines sandwiched between two indented lines, which are called the opening line and closing line.
  • The length of a gap is the length of the sequence of blank lines.

Fixes

  1. fix_gaps: This fix has many variations. Let A and B be the opening and closing lines, respectively. No gap with an opening or closing line beginning with # is removed. Otherwise, all length 1 gaps are removed, and longer gaps are removed only if lvl(B) > 1.
  2. fix_extra_indents: We iterate over the lines from beginning to end. If we encounter a line A followed by a line B such that lvl(B) > lvl(A) + 1, then the subsequent chunk of lines which have indentation level greater than or equal to lvl(B), beginning with B, is shifted to the left by lvl(B) - lvl(A) - 1 positions. This is done by stripping out indent_text[lvl(A):lvl(B)-1] (in Python notation) from these lines.
  3. fix_indent_style: We iterate over the lines from beginning to end and adjust the indent_text of each line to use corresponding characters from the closest previous line with the same or smaller level, except that # characters are not removed from, introduced to, or shifted inside a line.

The above description leaves out some details (namely some exceptions for edge cases). The fixes are repeatedly applied in the above order until another round won't alter the page (one round is almost always enough).

It's basically impossible to handle all edge cases and it's not difficult to come up with some of them, especially when you use ordered lists and combinations of possible mistakes. The hope is these are rare enough to be acceptable.

The bot tracks recent changes with a delay minute delay in chunks of chunk minutes, checking for non-minor non-bot edits which include a user signature with the edit that have not been superseded in the most recent delay minutes. The effect of this is that IndentBot is activated by signature-adding edits only, and does not edit any page which has had a signature-adding edit in the most recent delay minutes. I believe delay should be set to 10 to 30 minutes. Too long of a delay results in editors manually fixing indentation in active discussions, partially defeating the purpose of the bot. Non-talk pages must have at least 3 signatures to be edited, ensuring that a single accidental signature to a non-discussion page doesn't trigger the bot. Most sandboxes are avoided.

Discussion

  • Also, can someone make IndentBot a confirmed user so that it can bypass CAPTCHAs? Winston (talk) 04:01, 15 October 2021 (UTC)
    • Nevermind, now autoconfirmed. Winston (talk) 22:54, 15 October 2021 (UTC)
  • Does anyone know why I still see some bots when filtering recent changes for human edits only? Winston (talk) 08:25, 17 October 2021 (UTC)
    Answered here. Winston (talk) 01:51, 18 October 2021 (UTC)
  • Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 23:07, 18 October 2021 (UTC)
    Sorry, ran the wrong function once. Winston (talk) 01:42, 19 October 2021 (UTC)
  • Thanks for working on this. In response to Not sure if any other namespaces have discussion pages, DYK noms are the odd example that always comes to mind, e.g. Template:Did you know nominations/La Folia Barockorchester. It's probably fine if we skip these to keep things simple, though. — The Earwig (talk) 03:39, 20 October 2021 (UTC)
    If it's only a couple cases like the DYK noms, then it's pretty easy to handle them with a quick title prefix check. Winston (talk) 03:53, 20 October 2021 (UTC)
  • The code has pretty much settled and the bot is ready for a short trial if the example diffs given look good. Winston (talk) 04:07, 20 October 2021 (UTC)

Approved for trial (50 edits or 7 days, whichever happens first). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 07:35, 20 October 2021 (UTC)

@Primefac Should edits be minor? Winston (talk) 07:36, 20 October 2021 (UTC)
For the trial, let's go with "no" so that it receives a bit more scrutiny. I think if this goes through, marking as minor would match similar bots. Primefac (talk) 09:06, 20 October 2021 (UTC)

Trial complete. See the diffs here.

  • Haven't looked too carefully yet, but one edge case I saw was Line 80 in the diff Wikipedia:Arbitration enforcement log/2021 involving {{Div col}}. It would be fine if {{Div col}} started on the same line as the comment. Winston (talk) 12:56, 20 October 2021 (UTC)
    Possible fix is to not adjust style if the previous line contains an exceptional newline character, in this case the exceptional newline is the one just before {{Div col}} (since newlines just before templates do not count as delimiters in the line partitioning phase). Winston (talk) 13:03, 20 October 2021 (UTC)
  • Note: I suspect the easiest way to handle edge cases as they come up is to simply prevent the bot from making certain edits to certain lines, rather than trying to handle every case correctly. Winston (talk) 13:07, 20 October 2021 (UTC)
  • Not sure if this is open to all comments, feel free to remove if not. I came her after seeing the edit at Talk:List of Ayatollahs and I was interested. If you look at the edit it made, it didn't manage to get it correct. Although it start well it fails at the signature section starting "please study the answers", which should have been indented. Because it missed this, all the edits made afterwards are wrong.
    Also the messages it changed are over a decade old, will it be normal practice for it to change messages that are that old or was this part of the test? ActivelyDisinterested (talk) 23:34, 20 October 2021 (UTC)
    The edit looks better formatted to me and I don't see unintended edge cases, though I'm interested in others' opinions. Be sure to check out the links at User:IndentBot#IndentBot to understand why and how the indentation is being adjusted.
    As for old messages, the bot does not take that into account. It adjusts indentation on the entire page at once. For more active talk pages, old discussions are often stored in archives. Since the bot is only activated by a recent edit with signature, archived pages shouldn't be touched. Winston (talk) 00:07, 21 October 2021 (UTC)
    It's indented part of a message, and left the last section unindented (the second grey unedited section). That's definitely not right. ActivelyDisinterested (talk) 00:46, 21 October 2021 (UTC)
    Could you partially quote the lines you are referring to? Note that the bot does not fix indentation completely—in particular, it does not add extra indentation (so unindented lines will remain unindented). It only changes indentation characters, removes blank lines, and reduces over-indentation. Winston (talk) 00:56, 21 October 2021 (UTC)
    Ah! I see what happened. The section at issue in full is:
    please study the answers in his discussion pages in different languages. Academycanada (talk) 03:21, 24 November 2009 (UTC)
    This is the end of the message that began in full:
    :By a simple search, you can find the sources such as Islamic organizations, independent websites and academic institutions which introduced him as one of Marjas and Grand Ayatollahs. Here are some of them in different languages:. note the message starts with an indent
    The start of the message was indented, the bot correctly indented the middle lines of the message, but the end section was not originally indented and so the bot ignored it. As you said unindented lines remain unindented, but that does leave one message with two levels of indents. ActivelyDisinterested (talk) 01:08, 21 October 2021 (UTC)
    Yeah, this bot isn't smart enough to fix all errors. It would have to be way more advanced to tackle issues at a "per message" level of detail, and even then there are too many edge cases. Winston (talk) 01:14, 21 October 2021 (UTC)
  • Made minor improvements to the line partitioning. Also, fix_indent_style now resets its "memory" after list-breaking newlines. This behavior makes more sense and is more faithful to the original indentation. It solves quite a few bugs including the {{Div col}} one I mentioned earlier. Winston (talk) 05:08, 22 October 2021 (UTC)
  • @Primefac: Can I do another trial to draw more scrutiny? (Also want to test it on Toolforge this time.) Winston (talk) 08:42, 22 October 2021 (UTC)
  • Regarding [5]: this is not the bot's fault, but on AFDs its convention to use bullets for voting, but delsort notices use colon indent. In this case, the bot changed all bullets to colons after the first delsort notice – and this would happen on literally every AFD. Can something be done about this?
    Symbol tick plus blue.svg Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I would suggest citing a policy/information page in the edit summary. Also, consider using minor edit flag for user talk pages even in trial as otherwise the users would get new messages alert. – SD0001 (talk) 12:24, 24 October 2021 (UTC)
    Ok, edits to user talk pages will get minor edits. Also, I can add a simple exception for comments beginning with <small class="delsort-notice". Added link to MOS:INDENTMIX to the edit summary. Winston (talk) 12:31, 24 October 2021 (UTC)
    @SD0001: Actually, how are these delsort notices inserted? Are they manually typed out or is there some automation involved? I noticed one of them just used <small> without the "class" attribute. Winston (talk) 12:48, 24 October 2021 (UTC)
    Found it. It is {{Deletion sorting}}. But I guess every now and then someone adds it manually. I'll just use a regex for a small tag followed by "Note:". Winston (talk) 12:57, 24 October 2021 (UTC)
    Yes, that template is substed by a couple of tools – MediaWiki:Gadget-twinklexfd.js and User:Enterprisey/delsort.js being the two common ones. – SD0001 (talk) 12:58, 24 October 2021 (UTC)
    @Notsniwiast Actually, I went ahead and boldly edited that template to use a bullet instead. If no one reverts my edit, then an exception would be unnecessary. – SD0001 (talk) 13:02, 24 October 2021 (UTC)
    @SD0001 Do you still want me to add this exception for the trial, then maybe remove it later? Winston (talk) 13:05, 24 October 2021 (UTC)
    Yes that would be better, as the bot would be touching many pages that already have colon-indented delsort notices. – SD0001 (talk) 13:06, 24 October 2021 (UTC)

Trial complete. See the contributions here, or see the diffs in alphabetical order here. Winston (talk) 14:50, 24 October 2021 (UTC)

  • It seems Wikipedia:Categories for discussion also uses some templates which trigger the bot. Winston (talk) 14:53, 24 October 2021 (UTC)
    See Template talk:Cfd2#Remove leading colons regarding those templates. – SD0001 (talk) 16:58, 24 October 2021 (UTC)
  • I think this edit should not have been made. It divided User:Salimfadhley's comment into 3 bullet points, when it looks like they intended to create an effect similar to parabreaks. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 15:53, 24 October 2021 (UTC)
  • Special:Diff/1051601137 is a worry for me. Not necessarily because the bot shouldn't have made the edit, but because those entries were all made by the default templates. Either we change the template, exclude the PERM pages from the bot, or accept the fact that every time someone requests a permission the bot will follow behind and fix it. I think option 1 (changing the pre-set layout) is likely best but that will likely require further discussion and/or consensus, especially since there's a bot that needs to clerk (not sure how that will affect it). Primefac (talk) 17:08, 24 October 2021 (UTC) sorry for the no-show today, dealing with a rather heavy headache for some reason
    • Yeah it seems there's a couple of these templates around. I guess the plan right now is to exclude the relevant pages, and include them later if the templates are changed. But I'm still not sure if all the relevant entries are made using templates. I see some variation in the delsort notices, e.g. <small class="delsort-notice"> versus just <small>, so unless there's more than one version of the templates or editors are doing it manually, there might be some other tools involved (I don't know anything about assisted editing tools). Another example is Wikipedia:Articles_for_deletion/Metropolitan_Gazette_(2nd_nomination) where the delsort notices still use : even though it was made after SD0001's edit. For now, I will skip "Wikipedia:Requests_for_permissions/" and "Wikipedia:Categories for discussion/". The notices using <small> tags were already handled for the trial. Winston (talk) 02:16, 25 October 2021 (UTC)
    I ended up asking one editor why their delsort notices didn't have the class attribute, and apparently they were just doing it manually. So I think it's likely that variations are due to manual edits. Winston (talk) 11:14, 25 October 2021 (UTC)
  • Note: (This is in reply to ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ's comment, but I'm posting it here since it's more generally relevant.) Unfortunately, there’s not much to do in these inevitable cases. SD0001 brought up a similar example before. The nature of the problem requires that the bot operate on entire discussions at once. As a result, anything more than a single minor “violation” in a discussion makes it impossible to create a consistent and accessible list without sometimes changing an editor’s indentation visually. Making exceptions leaves broken lists/markup, and often just shifts the issue to a different part of the list. Since the change is usually minor and doesn’t alter core content, I hope this is acceptable. I also hope the bot’s work will increase awareness of templates such as {{pb}} and {{HTML lists}} which address the most common reasons (that I’ve seen) for incorrect markup. I have links to these templates and other guidelines on the bot's user page. Winston (talk) 02:26, 25 October 2021 (UTC)
  • I have noticed the bot doing useless edits removing blank lines, which is not needed. In fact everything listed for this bot to do is useless. I will deliberately indent more or change style of indent , so it looks as if this will try to undo that. Looks like this bot is trying to fix a non-problem. Surely tehre are more useful things to do with bots around here. Graeme Bartlett (talk) 10:24, 26 October 2021 (UTC)
    @Graeme Bartlett Could you provide an example of your using over-indentation or changing indent style, for which normal indentation would be inadequate and for which an accessible solution is impractical? From what I've seen, this is quite rare, but it could be an edge case that can be avoided. Winston (talk) 02:46, 27 October 2021 (UTC)
 – Winston (talk) 11:10, 26 October 2021 (UTC) First time moving a discussion, tell me if I did it incorrectly.

This ?bot? made a useless edit here: https://en.wikipedia.org/w/index.php?title=Talk:Bicarbonate&curid=1450293&diff=1051599806&oldid=1051598562

which has no effect on the output we see. I thought that bots were not permitted to make cosmetic only changes. Even if the extra blank line is redundant, ether is no need to remove it! Graeme Bartlett (talk) 10:18, 26 October 2021 (UTC)

Break

What's the status of this? Not sure where to go from here. I've noticed that on mobile, bulleted and unbulleted comments don't line up (check here for example), so the bot is even more effective there. Winston (talk) 01:06, 29 October 2021 (UTC)

{{BAG assistance needed}} Winston (talk) 09:17, 31 October 2021 (UTC)
I think this needs another round of trial, this time a larger one. The CfD templates have been fixed per talkpage note, and I see you've edited the PERM template too. As for WP:RFUD, which is where I assume @Graeme Bartlett is coming from, the issue seems to be that {{UND}} when substed produces a bullet indent, but most users haven't noticed this and are anyway adding a indent character of their own.
Also, I think the issue of changing the final indent character should be discussed. I don't have any preferences, but I think changing a visible bullet to no bullet (or vice versa, see several cases in [6]) can be seen as intrusive. Would like to hear others' thoughts on this. – SD0001 (talk) 12:59, 31 October 2021 (UTC)
Apologies for radio silence on this one, it's relatively low-priority at this point in my life, but I do agree based on a read-through here that a further trial would probably be good. Primefac (talk) 13:03, 31 October 2021 (UTC)
I did realize that changing the final (and hence visual) character could be annoying, but the point is that mixing characters shouldn't happen in the first place. So if the final indent character is not changed, it neuters a large portion of the fixes. Even a simple single-level list such as
* Comment 1.
: Comment 2.
* Comment 3.
: Comment 4.
would be left as four separate lists in HTML and to screen readers. Let me see if I can compute approximately what fraction of indentation style fixes occur in the final character. Winston (talk) 13:17, 31 October 2021 (UTC)
In Category:Non-talk pages that are automatically signed (just using this to get a quick collection of pages), 2770 lines would have indentation characters altered, and 839 of those lines would have an altered final character. Each altered character represents (almost always) a new list being started where there shouldn't be. Winston (talk) 13:31, 31 October 2021 (UTC)
@SD0001 I'm confused about the {{UND}} template. When I substed it into my sandbox I didn't see a bullet point, and the template's doc doesn't show bullet points either. I believe Graeme Bartlett noticed the bot through the diff they linked. Winston (talk) 14:26, 31 October 2021 (UTC)
Indeed it doesn't. I assumed that was the reason why so many of the RFUD comments were over-indented ([7]). – SD0001 (talk) 14:36, 31 October 2021 (UTC)

@SD0001 I reviewed the "last character issue" and I see how it can be intrusive when, for example, the first comment in a level is unbulleted, but the following comments are all or mostly bulleted which then get changed by the bot. Two examples are in the sections "Unban request for Soumya-8974" and "SoyokoAnis unban appeal" in this diff. Perhaps I could implement a compromise where the bot first computes which type (bulleted or unbulleted) is more common for each level, then when it encounters an INDENTMIX violation, it uses the more common type. Winston (talk) 10:10, 1 November 2021 (UTC)

With this strategy, the number of lines with altered final character gets reduced by 25% to 630. Winston (talk) 13:27, 1 November 2021 (UTC)

I've made a number of slight improvements to each of the three fixes and I think the bot is ready for a third trial. I don't think the final character issue can be mitigated any more without simply ignoring final character INDENTMIX violations. I guess we can see whether anyone complains during/after the trial. I'll continue the non-minor edit policy except for user talk pages for the trial to draw more scrutiny. Winston (talk) 13:56, 2 November 2021 (UTC)

@SD0001: I realized the bot wasn't conservative enough and would sometimes make the text harder to understand by not preserving the original editor's indentation style. After some brainstorming and trial and error, I've managed to make the bot respect the original indentation much more while sacrificing a bit of accessibility, i.e. it defers to the original text for certain INDENTMIX violations. The number of final indentation characters changed has been reduced a further 48%. Can we start a third trial? Winston (talk) 11:04, 4 November 2021 (UTC)

Sure go ahead. Symbol tick plus blue.svg Approved for extended trial (200 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.SD0001 (talk) 06:48, 5 November 2021 (UTC)
An indent-bot is definitely required. Some editors make mistakes with their indents. Some simply don't know how to indent. Most frustrating? some deliberately mis-indent (usually after their mistakes have been pointed out) & when they 'continue' to deliberate mis-indent? it's basically their way of giving you (the adviser) the figurative 'middle finger'. GoodDay (talk) 17:47, 5 November 2021 (UTC)

Trial feedback

Examples
  • I just reverted this massive refactoring when I saw this bot editing my discussion; I chose bullets on purpose to break that section apart. — xaosflux Talk 13:53, 5 November 2021 (UTC)
  • Here is another example: diff - this doesn't make sense, that first line was clearly not intended to be part of the "discussion" - so was stylized differently. — xaosflux Talk 14:01, 5 November 2021 (UTC)
  • More bad edits (already reverted by another editor). — xaosflux Talk 14:03, 5 November 2021 (UTC)
  • Lets not chase around another bot example that I can assume was specifically programmed to edit one way already. — xaosflux Talk 14:09, 5 November 2021 (UTC)
  • Another example diff that made the new list worse, see the section around "Person who is autistic" - where this bot has introduced double bullets. — xaosflux Talk 14:54, 5 November 2021 (UTC)
Discuss
  • I think this task is going to need a much larger discussion before being released on all edits, all the the time; I expect it will continue to make contentious edits that don't have a policy to support them (i.e. a policy that only certain indentation or list styles are allowed to be used). — xaosflux Talk 13:58, 5 November 2021 (UTC)
  • The more I look at these edits, the more fundamentally broken I think this is. Perhaps as an OPT-IN-ONLY on certain pages it could be useful? — xaosflux Talk 14:04, 5 November 2021 (UTC)
    I guess I should disable altering the final indent character completely for now. Too many edge cases. Sorry about that. I'll review the diffs you posted and see if the bot would still have made those edits after disabling this behavior. The final character issue was brought up before, but I underestimated the problem. Winston (talk) 14:09, 5 November 2021 (UTC)
    Yes, I'd suggest not changing the final character indents at all. They're not much of an accessibility issue in practise I believe, and fixing them is clearly looking like more trouble than it's worth. – SD0001 (talk) 14:47, 5 November 2021 (UTC)
  • I don't know what you people were thinking when you approved this thing, but it's completely screwing up existing discussions [8]. And BTW, according to a friend who actually uses a screen reader, the whole idea that indenting patterns are this big deal is a myth. This has the potential to make literally hundreds of thousands of discussions and posts unintelligible. Cut it out RIGHT NOW. EEng 14:10, 5 November 2021 (UTC)
    EEng, this is a trial run, which is done specifically to see if these sorts of issues arise. Clearly, there are major concerns, and based on the last few posts here I'm starting to think that this bot will not be approved without significant overhaul. Primefac (talk) 14:12, 5 November 2021 (UTC)
    Ya think? How can you possibly have ever thought this could fly? Above I read I've managed to make the bot respect the original indentation much more -- oh, he's respecting the indentation used by discussants, which is critical to following the flow of the discussion, much more? You mean, like, you guys are willing to compromise on that and only make discussions somewhat impossible to follow? EEng 14:19, 5 November 2021 (UTC)
    Maybe it would be better not to do a trial run on userpages without pre-approval by the users involved? I've reverted the bot at EEng's talkpage, just because it seemed really hard to believe EEng would like the effect. Remember, his talkpage can be seen from space. Bishonen | tålk 14:23, 5 November 2021 (UTC).
    You're right, I've removed the user talk namespace. Winston (talk) 14:25, 5 November 2021 (UTC)
    You've removed the user talk namespace, so you're only going to fuck up article talk pages and project guideline talk pages? Well, I guess that's a start.
    You're not getting this. There is no possible way to do what you're doing without screwing up existing pages, because there's a fundamental conflict between the assertions in INDENT (or wherever) and the way people actually format their discussions. What you're trying to do inevitably changes the formatting of existing discussion so that the meaning of editors' comments is changed. You're trying to square the circle, and need to give it up completely. EEng 14:56, 5 November 2021 (UTC) P.S. I just noticed above that the plan is to fuck of project pages (e.g. actual guidelines and policies, not just the talk pages) as well. The lunatics have clearly taken over the asylum.
    There were 2 trials already done (50 + 50 = 100 edits) which drew basically no negative feedback, which was why this was approved for extended trial of 200 edits. Something looks to have regressed in the newer code that's causing the issues. It looks like @Notsniwiast has stopped the bot now. – SD0001 (talk) 14:25, 5 November 2021 (UTC)
  • Agree. This thing is jacking up the formatting on talk pages. Sometimes the formatting is there intentionally. Just undid the bot at Talk:Stanley Kubrick for an example. Jip Orlando (talk) 14:14, 5 November 2021 (UTC)
    @Jip Orlando Sorry about that, was the issue the swapping from bullet/no bullet issue? If the issue was isolated, could you mention which part? Winston (talk) 14:27, 5 November 2021 (UTC)
    [9] here, it looks like it's tweaking the replyto stuff by moving the discussions to the left and adding bullets where colons where. I understand that it is making the formatting appear consistent, but it is undoing what appears do have been done intentionally. I see the bullets as used for making a salient point and the indents as a reply to the point. Maybe I'm being nitpicky, but having a sudden sea of bullets doesn't make things look organized. Jip Orlando (talk) 14:38, 5 November 2021 (UTC)
  • Not a fan of having an alert pop up wrt my account talkpage, only to find out it's a semantically void whitespace twiddle. If it had been another person doing the same thing I'd be miffed. More so when it's a mindless thing. ☆ Bri (talk) 14:21, 5 November 2021 (UTC)
    Yeah sorry, I should have respected the user talk space more. It's been removed from the bot for now. Winston (talk) 14:29, 5 November 2021 (UTC)
    For the record, user talk page notifications are suppressed when edits are marked as minor edit + bot edit + account has bot flag. I believe you need to add bot=True near here. Also the bot flag has expired. – SD0001 (talk) 14:39, 5 November 2021 (UTC)
    I forgot the bot flag expired. When the bot flag is on, the edits are automatically marked as a bot edit. Winston (talk) 14:41, 5 November 2021 (UTC)
  • USER TALK testing should not happen unless this has a bot flag. By combing +bot and +minor attributes this could make use of the (nominornewtalk) feature to not trigger the new message notifications. (This is not an endorsement that this should be currently tested). — xaosflux Talk 14:39, 5 November 2021 (UTC)
  • @SD0001: I suggest that the operator, @Notsniwiast: needs to go manually review every edit they just made and revert anything that possibly made the page worse. — xaosflux Talk 14:56, 5 November 2021 (UTC)
    I don't think he can be trusted to do that. What he needs to do is revert everything immediately, and where a page has been edited subsequent to the bot's edit, post a message or something warning watchers to take a look themselves. This is really serious. I cannot believe this got anywhere at all. EEng 14:59, 5 November 2021 (UTC)
    Yeah I'm reverting right now. Winston (talk) 15:00, 5 November 2021 (UTC)
  • From my watchlist: I don't want to pile on, but [10] this changed the placement (and hence meaning) of, at least, AlwaysInRed's message. Many of the diffs have large changes, so it is hard to figure out which are problematic. Urve (talk) 15:51, 5 November 2021 (UTC)
    @Urve Could you partially quote the message so I can find it? Is it "I am the lead-moderator and one"? Winston (talk) 15:53, 5 November 2021 (UTC)
    Yes. Even if that was an unintentional error, people do purposefully comment in this way (several indents after an outdent), to continue reply to the comment that isn't outdented. Why people do this (instead of a message directly underneath what they wish to reply to), I'm not sure. But the problem is that the meaning is changed if these are all outdented without regard to what they're replying to. Urve (talk) 15:57, 5 November 2021 (UTC)
  • I think this bot is doomed to failure, should not be approved for ongoing use, and should never have been approved even for the limited testing runs it made. The first thing I saw here was several diffs of completely wrong talk page refactorings and the diff-posters were correct that they were completely wrong. People on discussion pages use indentation to mean different things, that cannot adequately be guessed by a bot, because the meaning of the indentation is in the semantics of what they're saying rather than in the syntax of their comment. Just to pick an easy example, people will choose the indentation level of a comment (among several different indentation levels for a comment placed in the exact same place in the discussion) to indicate to whom they are replying; unless the bot can understand that part of the back-and-forth (and it can't) it cannot correctly adjust the indentation. People will sometimes deliberately choose between *-indentation of their comments or :-indentation of their comments according to how prominent they want that comment will be, and will use both *-indentation and :-indentation for sub-elements within comments as well as for whole comments. Additionally, editors often take significant offense even at careful human refactoring of their comments. This is not a task that can be solved without full human-level AI, which does not exist, and even then is of dubious value. A bot rampage that changes what is meant is a bad thing, and completely unnecessary. We do not need our talk pages to be well structured according to some spec. We need them to communicate with each other. —David Eppstein (talk) 16:16, 5 November 2021 (UTC)
    • @David Eppstein: What if only edits with no visual difference were made? That is, edits of the sort
      * One.
      :: Two.
      
      to
      * One.
      *: Two.
      
      Winston (talk) 16:25, 5 November 2021 (UTC)
      • Look at your example above. Look at the wikicode. If you ran your bot on this very page, it would "fix" your first example, rendering your message meaningless. That is the inherent problem here: you can't write logic that will know that this particular instance of "*" followed by "::" should remain because it's an intentional example of an error. You need a human brain for that. Levivich 16:33, 5 November 2021 (UTC)
        The bot wouldn't fix the first example since it is inside a "syntaxhighlight" tag. Winston (talk) 16:34, 5 November 2021 (UTC)
        I realized that as soon as I hit publish :-) But most editors wouldn't know to use such a tag. Anyway, what can the bot do about this:
        * One.
        : Two.
        
        Can it tell if "Two" is a new comment or the second paragraph of "One"? What if One were unsigned? Etc. Levivich 16:40, 5 November 2021 (UTC)
        It would do nothing, since final indentation characters would no longer be altered at all since that would change the visual appearance. Winston (talk) 16:43, 5 November 2021 (UTC)
        OK, then what about this:
        * One.
        : Two.
        * Three.
        : Four.
        * Five.
        
        Should Two and Four be bullets? Or, alternatively:
        : One.
        * Two.
        : Three.
        * Four.
        : Five.
        
        Is this all one comment with two bullet lists in it, or five different comments? Are we changing to colons to bullets or bullets to colons or nothing? Levivich 16:45, 5 November 2021 (UTC)
        Nothing would change. Sorry I should clarify, by final indentation character I mean the last indentation character for a line. So for *: it would be :. Winston (talk) 16:48, 5 November 2021 (UTC)
        Only basic list gaps and non-final characters could be altered. So indentation levels and final bullet/no bullet would not be changed at all. Winston (talk) 16:49, 5 November 2021 (UTC)
        Heh, you anticipated my next question about indentation levels :-) So these two changes (no change to indentation level, no change to final character) would be two things that are different from the last trial run that was just run? Levivich 16:56, 5 November 2021 (UTC)
        Correct. The visuals should not change. Winston (talk) 16:57, 5 November 2021 (UTC)
        Well, technically the visual would change if there was something like ::: followed by ***, since the bot would change the latter to ::*. Winston (talk) 17:01, 5 November 2021 (UTC)
        Yeah, but it seems like that particular example (::: followed by ***) is just a flat-out mistake, so the change would be for the better for both sighted and non-sighted readers. I think you're right that not changing the indentation level, and not changing the final character, are key to not making a visual change. I'm not a BAG member or anything, but it seems reasonable to me to do another trial run with those modifications you've suggested (and limiting the namespaces for the trial, etc.). It does seem like limiting the bot as you're describing would make the changes invisible to sighted readers. I recognize it won't totally fix the problem that you're setting out to fix (which can't be fixed, because editing text files and using indentation to separate one comment from another is downright stone-age archaic, we might as well use vacuum tubes), but it could improve things without pissing editors off. :-D Levivich 17:05, 5 November 2021 (UTC)
        Yeah I feel bad for angering/annoying a bunch of people. I was overzealous. Winston (talk) 17:11, 5 November 2021 (UTC)
        No worries. Heck, many of us annoy EEng just for sport. Levivich 17:18, 5 November 2021 (UTC)
    • (edit conflict) This would run afoul of WP:COSMETICBOT. Jip Orlando (talk) 16:35, 5 November 2021 (UTC)
      • Nevermind, this is an exception. Either way, you'll have a horde of mad users cluttering their watchlists. Jip Orlando (talk) 16:37, 5 November 2021 (UTC)
        • The COSMETICBOT argument is compelling to me but there's more to it than that. If you think that changing talk pages to normalize indentation coding without changing the appearance is helpful as a way to produce semantically clean wikimarkup, you're deluded. :-indentation is never semantically clean. :-formatting is only proper within definition lists, where its actual purpose is to delimit the body of a definition and the indentation is merely a side effect of how this kind of list is formatted. Its use on talk pages for indentation is a hack. As such, the bot's task would be to fill our watchlists with edits while polishing a hack rather than accomplishing anything useful. —David Eppstein (talk) 17:56, 5 November 2021 (UTC)
          It's not about the semantics. The changes are to help screen readers. I was simply overzealous with the bot, and unfortunately it took until this trial to become apparent. The limited version described above in the comment chain with Levivich should be much better. If you use macOS, you can try reading a list with gaps and/or mixed indents with VoiceOver to see how screen readers are affected. Winston (talk) 18:39, 5 November 2021 (UTC)
  • This would be better as a script than a bot. Preferably, a script that worked on just one section of a talk page. As a script, editors could manually review/correct mistakes before publishing. Levivich 16:31, 5 November 2021 (UTC)
  • Look, Winston, I know you're trying to help, but you have only 1800 edits to Wikipedia, and only a handful of those are to talk pages. You don't have the experience to even begin to understand the subtleties of what you're getting into. It's like having someone who's never driven a car start redesigning the highways. EEng 16:44, 5 November 2021 (UTC)
  • We most certainly do need an Indent Bot. Some editors don't know how to indent, or make human mistakes or simply refuse to, after being given advice. GoodDay (talk) 17:49, 5 November 2021 (UTC)
    • That's not something a bot is capable of fixing. —David Eppstein (talk) 17:56, 5 November 2021 (UTC)
      • Wish one could be created, that was capable of doing so. Frustrating, when you read long drawn out discussions, with mis-indents. Throws you off, as to who's responding to who. GoodDay (talk) 18:15, 5 November 2021 (UTC)
  • It's a good idea and I'm sure there's some sort of bot task that could be approved someday. Editing the wikitext of discussions happens to be just about the hardest bot task I can think of to do correctly. I think starting with the LISTGAP change or otherwise trying to limit the amount of change the bot does would be a good idea. Please ask me if you have any questions; I (unfortunately? lol) have a few years of experience with manipulating discussion wikitext. Enterprisey (talk!) 21:15, 5 November 2021 (UTC)
    Basically I caught feature creep. I've pared the bot back to simple LISTGAP and non-final-indentation-character INDENTMIX changes. Winston (talk) 21:37, 5 November 2021 (UTC)

Changes to bot

In the original bot request for a bot to fix indentation, two examples were given. The first example was the removal of a single extra indent (a general fix), and the second was a non-final-indent-character indentmix fix (an accessibility fix). I decided to tackle this request, but caught feature creep and took the idea too far. This ended up making some "fixes" the very opposite, as the last trial demonstrated. I believe the issues brought up (other than procedural issues like editing user talks and missing the bot flag) were due to the features I implemented beyond the original request, and I apologize.

I have limited the bot to listgap and non-final-character indentmix fixes only. Indentation levels and final indentation characters are not changed (so the first example in the original bot request would actually be left alone). Here are some sandbox diffs. These are accessibility changes, and the only noticeable change for sighted readers should be the hiding of “floating bullets” which are bullet points that appear not as the last indent character. For example,

Markup Renders as
:One.
*: Two
*** Three.

One.
  • Two
      • Three.

would become

Markup Renders as
:One.
:: Two
::* Three.

One.
Two
  • Three.

Winston (talk) 10:56, 7 November 2021 (UTC)

@Notsniwiast: here is just a sample mixed up list - what, if anything would you do to it?
Extended content
  • A
    • A
      • A
        A
        A
        • A
          1. A
          2. A
          3. A
        • A
          1. A
          2. A
          3. A
        A
      • A
      • A
    • A
  • A
xaosflux Talk 13:05, 7 November 2021 (UTC)
I've just tested it on this list. It does nothing. Winston (talk) 13:09, 7 November 2021 (UTC)
@Xaosflux: Please see the above. --TheSandDoctor Talk 07:39, 29 December 2021 (UTC)
  • Trial complete. Closing out the previous trial which was aborted. Winston (talk) 06:29, 5 January 2022 (UTC)
  • {{BAGAssistanceNeeded}} I'd like to try out the limited version as described above. To recap, there shall be no changes to indentation levels and no changes to the final indent character. The only noticeable visual difference should the hiding of "floating" bullet points. The other changes are reductions in the number of list gaps and amount of indentation-style mixing, which should not be visually noticeable. Here are some fresh diff examples. I can do more sandboxed runs if we're still wary of a trial on the live wiki. Winston (talk) 06:29, 5 January 2022 (UTC)
    Btw, the ordered pair in the edit summaries represents (# of blank lines removed, # of lines with at least one altered indent character). Winston (talk) 06:51, 5 January 2022 (UTC)
  • ...Sure. Symbol tick plus blue.svg Approved for extended trial (200 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I totally support approving this task in some form. But: although I'd rather not say this, recognizing that people really don't like discussions getting messed up (as you can see above), I must warn you that any more edits that change the meaning of discussions (even in the most insignificant way) or mistakes aren't going to look too good for the request. I'd err on the side of being cautious. From my experience developing reply-link, in the land of Wikipedia talk pages, even if something looks like a mistake, there's a decent chance that it's intentional. Enterprisey (talk!) 07:02, 5 January 2022 (UTC)
    Understood. If we find a legitimate use of floating bullets affecting meaning, then I can simply prevent the bot from changing * to : , thus preventing bullets from disappearing whether floating or not. I'll do this trial in smaller chunks, posting the diffs for each chunk after I review them and point out diffs where bullet points have been removed. Winston (talk) 07:19, 5 January 2022 (UTC)
    @Enterprisey Before starting, should the bot be given a (temp) bot flag? Also, minor edits or not minor? Winston (talk) 07:22, 5 January 2022 (UTC)
    If the previous trials weren't flagged, I wouldn't think this one should be; and I'd mark them as not minor (even though the distinction isn't very important these days) because it's a trial and I think people would be slightly more likely to pay attention to non-minor edits. Enterprisey (talk!) 07:34, 5 January 2022 (UTC)
    @Notsniwiast, I'd even recommend, to be extra cautious, making sure that the edits don't change the visual appearance of the page (besides removing the "double bullets" error); we can always add more tasks to the bot later. Not sure if you were doing that already (I didn't check); just making a note. Enterprisey (talk!) 08:12, 5 January 2022 (UTC)
    Yes, only bullets which are not the final indent character get removed. I will pause after 20 edits to show the diffs, pointing out the ones where a bullet has been hidden. Winston (talk) 08:16, 5 January 2022 (UTC)

Chunk 1 (20 diffs)

  • See here. I'm pausing here to see if any concerns are raised. Winston (talk) 08:54, 5 January 2022 (UTC)
    The few I checked look fine. If nobody objects in the next day or two, feel free to keep going. Maybe pause again after 100 edits have been made? Enterprisey (talk!) 01:48, 6 January 2022 (UTC)
    @Notsniwiast, I notice the bot is currently editing your sandbox. If the sandbox edits aren't part of the trial, keep going and ignore this message. However, since you linked to one of them just above, I'm assuming you're counting the sandbox edits as the trial. Since the bot task is for editing actual discussion pages, the trial should be as similar to that usage as possible. That means the bot should edit the actual pages, not just its sandbox, for this trial. Part of the trial, in my view, is making sure that people won't object to the edits, and they won't have the opportunity to object if the edits are made to the sandbox. Enterprisey (talk!) 08:10, 6 January 2022 (UTC)
    Yup the sandbox edits aren't part of the trial. Not sure which link you're referring to, but when I link to the actual trial diffs I use a permanent url to a revision of my sandbox where I put the diffs, so as not to clutter up this page. Winston (talk) 08:21, 6 January 2022 (UTC)
    Sounds good. My bad; misread. Enterprisey (talk!) 08:41, 6 January 2022 (UTC)

Chunk 2 (50 diffs)

  • See here. Winston (talk) 10:07, 6 January 2022 (UTC)
  • This is the first I'm aware of this bot, and I've not read all the text above so sorry if this has been addressed before, but could the edit summaries be improved please: e.g. "Adjusted indentation per MOS:ACCESS#Lists. Trial edit. (1, 10)" has three parts:
    • "Adjusted indentation..." is sort of OK, but could imply that it is changing the indentation level (which it isn't), "Fixing indentation markup" would be better imo.
    • "Trail edit." is entirely unproblematic
    • "(1, 10)" is cryptic and while potentially useful to the operator for debugging is just confusing for editors who aren't intimately familiar with the bot.
    • The edits summary does not mention that it removed multiple blank lines from lists, or why. Personally I know that this is per MOS:LISTGAP, but not everybody will. I recommend including it in the summary as (a) noting what the bot has done, and (b) noting why it has done it so that people aren't tempted to revert the bot and also learn why they shouldn't leave blank lines in the first place. I do a bit of fixing of lists, and Redrose64 does even more, both of us mention LISTGAP in edit summaries and I've seen positive responses to that. Thryduulf (talk) 13:02, 6 January 2022 (UTC)
      • Good points. I changed the edit summaries to "Adjusting indentation markup per MOS:LISTGAP and MOS:INDENTMIX. X blank lines removed. Y adjustments of indent markup. Trial edit." Winston (talk) 15:18, 6 January 2022 (UTC)

Chunk 3 (60 diffs)

  • See here. I am stuck figuring out an error in this one: Diff for Talk:Mass killings under communist regimes. The bot apparently introduced floating bullets to the line beginning with "I don't think it is fair". But when I copy the wikitext into my sandbox here, it looks fine (can anyone confirm this). It also looks fine in the edit preview of Talk:Mass killings under communist regimes. Is wikitext displayed differently in User talk vs Talk or something? I can't reproduce the error (though I haven't tried reproducing it in actual Talk pages and there doesn't seem to be a sandbox Talk page). Winston (talk) 20:22, 7 January 2022 (UTC)
    Ok so I copied the entire wikitext rather than just the section, and indeed the floating bullets showed up. So I tried to produce a minimal reproducible example (the original talk page is over 600k bytes), and have discovered that it has something to do with links. Consider this revision (excuse the gibberish, I did some transformations to reduce the page size). The line we are interested in is the one containing "conclusions were rejected". Notice the floating bullets. Now edit the page and delete the wikilink to water at the end of the wikitext (deleting some other wikilink may work too). Notice how the floating bullets are gone. Instead of deleting a wikilink, you can delete the first template on the page and the floating bullets also disappear... Not sure what's going on. Winston (talk) 01:42, 8 January 2022 (UTC)
    • From a discussion at WP:VPT, this is probably a GIGO issue due to a newline inside a wikilink. I had thought that such newlines were allowed since they seemed to work ok, but apparently not. To keep the fix simple and to be extra conservative, I'm having the bot simply refuse to perform any indentmix fix on the page at all if it encounters a wikilink containing \n.
      I did not see anything unexpected in the other edits for this chunk. Winston (talk) 00:13, 9 January 2022 (UTC)

Chunk 4 (70 diffs)

  • See here. There was only one error here where a bullet point was introduced. This was due to a template creating a table which the bot did not anticipate. The bot now expands templates to check for tables. Trial complete. Winston (talk) 06:59, 10 January 2022 (UTC)

Gonna take a break. Code is still available. Withdrawing this request. Symbol abstain vote grey.svg Withdrawn by operator. Notsniwiast (talk) 05:15, 13 January 2022 (UTC)

Well, as the trial has been completed, assuming no issues are found, you don't have to do anything more as of now. If this is approved, you can start running the bot whenever you return. – SD0001 (talk) 09:51, 13 January 2022 (UTC)
SD0001, are you approving this request, or are you accepting their withdrawal? Primefac (talk) 15:10, 23 January 2022 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.