Wikipedia:Bots/Requests for approval/InfoboxBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Garzfoth (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 04:36, Tuesday, October 24, 2017 (UTC)
Automatic, Supervised, or Manual: Supervised
Programming language(s): Python and mwparserfromhell
Source code available: No source code available at this time, sorry. Example for original functionality: User:InfoboxBot/wikipedia_edit_pages_clean.py Yes (available at User:InfoboxBot/wikipedia_edit_pages_clean.py)
Function overview: This bot would assist me in fixing various widespread yet minor issues with non-standard infobox parameters in articles (primarily focused on issues with Template:Infobox power station and possibly Template:Infobox dam).
Links to relevant discussions (where appropriate): I do not believe that this bot would be controversial - any changes made by it are going to be uncontroversial minor changes.
Edit period(s): As needed (it'll vary significantly). It will not be anywhere near continuous.
Estimated number of pages affected: There are ~2500 articles using infobox power station and ~3500 articles using infobox dam. The number of articles out of these that would be affected by my bot is unknown. For now, let's call it an absolute upper limit of ~6000 affected articles.
Namespace(s): Mainspace only.
Exclusion compliant (Yes/No): No, as in my experience articles with infobox power station or infobox dam on them never use the bots template in the first place. I am not adverse to implementing detection for this template in the future, but I don't see the need for it unless I broaden the scope of the bot's work to different infoboxes. Yes (added Apr 4 2018)
Function details: I have already scraped all articles with infobox power station and infobox dam in them, placed the infobox data from said articles into a MySQL database, and am using analysis of that dataset/database to discover issues that can be fixed via this approach. Here is a good example of what kind of issues this bot can help me fix:
- For infobox param "th_fuel_primary": There are 153 articles using the term "[[Coal]]", 90 articles using the term "Coal", 80 articles using the term "Coal-fired", and 14 articles using the term "[[Coal]]-fired". This bot can automatically change the value of "th_fuel_primary" to "[[Coal]]" for the 184 articles that use equivalent terms, resulting in 337 articles that all use the same correct homogenous terminology and are all wikilinked correctly.
So yeah, this is essentially just a specialized high-speed-editing/assisted-editing tool. As far as I understand, it is still possibly classified as a bot and thus I have to submit it to BRFA as I am doing now. I did run this on my personal account for a single run (on the infobox param "status" - changing the non-standard value "Active" into "O" (expands to "Operational") for 185 articles) before realizing that it may be classifiable as a bot (and that I was also performing operations too fast if the bot action speed limits applied - I had quite a bit of trouble locating the actual documentation on this so I had initially assumed that it was the same as the API itself and set a 1s + overhead delay between requests) and stopping. So if you want a demonstration of what this bot does in the real world, just look at the long string of commits in my history with the edit summary "Automated edit: fixing infobox parameter "status"".
Discussion
[edit]- Could the bot implement some of User:Headbomb/sandbox (expand collapsed sections)? Headbomb {t · c · p · b} 11:07, 24 October 2017 (UTC)[reply]
- 1.a/1.c crash my scraping script, so I’ve already manually fixed those in all affected articles using either infobox dam or infobox power station. I can look into building a new script to locate and automatically fix those types of issues in other infoboxes, it would be an interesting problem to try to solve automatically, but no promises on that since it might not be doable automatically with high confidence.
- For the rest, yes, the bot can do at least some of them if not most or all of them (and in fact I was already planning on implementing a number of those items), although it’s going to require additional work to implement them, and my first priority is still going to be fixing the more substantial issues. Garzfoth (talk) 17:36, 25 October 2017 (UTC)[reply]
- I would greatly appreciate getting a response to at least the specific question of if this use is classified as a bot or not (i.e. does it actually need approval as a standalone bot through BRFA or can I just run it on my personal (or InfoboxBot?) account(s)?)... I have been waiting two and a half weeks for another response and it's getting a bit frustrating. I would prefer to have an account with the bot flag to run it on simply because of the expanded API limits available in that case (and being able to edit without unnecessarily cluttering up anyone's watchlist, since I could then flag my edits as bot-made which allows them to be easily hidden by users if desired), but I do not by any means need the bot flag to operate the program. Garzfoth (talk) 19:58, 11 November 2017 (UTC)[reply]
It has been over a month since the last response. I would greatly appreciate a response to at least the question highlighted in bold above (is this use even classifiable as a bot or can I just run this as a script on my personal account without approval required?). Thanks! Garzfoth (talk) 21:17, 26 November 2017 (UTC)[reply]
- From the BOTPOL definitions, the fact that you aren't personally approving each edit means that this is probably a bot, and would likely need to be approved here. It shouldn't be controversial, though. Going through the edits you made (convenience link!), the random sample that I picked all look good. It would be nice if you had some examples of the Coal change, as opposed to just the "Active" to "O" change, however. Even better would be if the code were somewhere BAG members and others could review it - you don't even have to put it on GitHub, as it's just as readable in the bot's userspace.
- One important change you should make is the edit frequency: 1 second between edits is too low. For nonessential maintenance tasks, the usual delay is 10 seconds (source: WP:BOTREQUIRE). I'm not a BAG member myself, so I can't grant a trial; so I'll leave the tag here. You should probably fix the rate thing before the trial, though. Enterprisey (talk!) 13:40, 5 December 2017 (UTC)[reply]
- Thanks for the feedback! I am aware of the editing frequency issue (it's specifically mentioned in my BRFA if you missed it), I would of course change that to 10 seconds between edits for a production run, as I said I only operated that fast in the first place because I originally could not locate the correct documentation on bot policies and had assumed that the general API rate limits applied.
- I can't exactly give more precise examples of changes since I apparently wasn't supposed to be running the bot without BRFA approval in the first place, but I suppose I could manually make some example edits to show what the bot would be capable of doing? My main goal originally was just to homogenize a lot of common simple stuff like the coal example, but then I got branched out and started thinking of wider applications, so my application is admittedly a bit open-ended.
- As far as the code goes, I dislike open-sourcing anything I've written for personal use until it's been extensively polished because I keep a lot of debug stuff commented out and don't write my commented notes for a general audience, so it gets more than a bit sloppy/unprofessional and I prefer to only publish very clean code unless absolutely necessary. I guess I could strip the comments entirely and publish it more or less as-is though. I'll think about that.
- I'll leave the tag up until someone from BRFA can drop by to discuss a trial. Garzfoth (talk) 03:35, 11 December 2017 (UTC)[reply]
- I've cleaned up and posted the original code used for the Active => O change run: User:InfoboxBot/wikipedia_edit_pages_clean.py Garzfoth (talk) 03:48, 11 December 2017 (UTC)[reply]
@Garzfoth: This request has sat for a very long time. I would like to apologize for that.
Minor code review. This line:
tpl = next(x for x in templates if x.startswith("{{Infobox power station") or x.startswith("{{infobox power station") or x.startswith("{{Infobox power plant") or x.startswith("{{infobox power plant") or x.startswith("{{Infobox wind farm") or x.startswith("{{infobox wind farm") or x.startswith("{{Infobox nuclear power station") or x.startswith("{{infobox nuclear power station"))
would look better as:
tpl = next(x for x in templates if x.name.matches(["Infobox power station", "Infobox power plant", "Infobox wind farm", "Infobox nuclear power station"]))
Now, my only real concern here is that certain changes can seem uncontroversial on the surface but are actually not once you do them en-masse. The "Active" to "O" thing is surely fine, but whether or not to wikilink "Coal" is something I could see as contentious. How do you determine what the convention is when the most common option is used by only 45% of articles (153/337, per your numbers)? Arguments could exist either way, and it might depend on the article (maybe).
Anyway, let's do a fairly loose trial to get a sense of the kinds of changes you would like to make and how they pan out. If possible, please do a variety of types of fixes, but if you only have a couple in mind right now, that's fine too. Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. — Earwig talk 06:23, 12 December 2017 (UTC)[reply]
- Thank you for your comments. The code suggestion is extremely helpful, I tested it and subsequently refactored all of my code (including components that have not been published such as the scraping stuff) to incorporate it.
- I have thought extensively about the issue of balancing too-minor/controversial changes with real action for a while now. For wikilinking stuff like that I think it's no contest — a wikilink is almost always going to be justified for stuff like that (especially as the infobox is a separate entity and the MOS makes the provision that repeating links in infoboxes is fine if helpful for the readers). For capitalization issues, it's a messier situation, but I think the best approach is to focus on choosing the option that makes the most grammatical sense (something I've tried to clarify with limited research), fits best within the generalized context of an infobox, adheres to the MOS, is the most visually consistent & pleasing with other infobox elements, and corresponds with the established consensus (I can see how popular each option is while analyzing the DB for variables to work on, so that lets me measure the rough level of consensus for existing options). I'm actually really curious if anyone will object to the capitalization standardization I'm using — if it triggers an objection, I'll of course discuss the issue, and if the discussion results are to use non-capitalization for the standard (or whatever else), I can then use the bot to put the articles in line with the outcome of the discussion instead.
- I started on the trial run. Here are changes done so far:
IPS parameter
name/key/categoryOriginal value Modified value # th_technology steam
[[Steam turbine]]
2 th_technology Steam
[[Steam turbine]]
17 th_technology [[gas turbine]]
[[Gas turbine]]
3 th_technology [[Gas Turbine]]
[[Gas turbine]]
3 country United States
[[United States]]
5[a] country England
[[England]]
5[b] ps_units_manu_model Siemens
[[Siemens]]
3 ps_units_manu_model Vestas
[[Vestas]]
2 status Operating
O
(expands toOperational
)5[c] status operational
O
(expands toOperational
)17 status Baseload
O
(expands toOperational
)6 status Peak
O
(expands toOperational
)5 th_fuel_primary Coal
[[Coal]]
5[d] th_fuel_primary Coal-fired
[[Coal]]
5[e] th_fuel_primary [[Natural Gas]]
[[Natural gas]]
5[f] th_fuel_primary [[natural gas]]
[[Natural gas]]
5[g] th_fuel_primary Natural gas
[[Natural gas]]
5[h] Total edits made during initial trial: 98
- ^ There were 257 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 105 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 38 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 88 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 72 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 27 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 24 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- ^ There were 23 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
- During the run only one edit was reverted (this one), with the reason being "editing tests". The editor in question subsequently thanked the bot's account for a different edit, and I'll be replying to their message on the bot's talk page to explain the matter and see what their views on the capitalization change really are (i.e. did they truly intend to revert or did they simply not notice that the edit actually changed something).
- Here is the updated primary bot code, with various improvements made, functionality added, code cleaned up, and most code comments preserved (even the stupid ones): User:InfoboxBot/wikipedia_edit_pages_clean.py
- Thanks again! Garzfoth (talk) 14:02, 15 December 2017 (UTC)[reply]
- WP:OVERLINK applies. You should not be linking countries like the U.S. and England. — JJMC89 (T·C) 19:40, 15 December 2017 (UTC)[reply]
- That seems fair enough for the specific case of countries. Here's a question: if WP:OVERLINK unambiguously applies to the country field, then would it be justified to edit the infobox to remove all country wikilinks for violating WP:OVERLINK? This would mean for example that all instances of
country = [[United States]]
would be changed tocountry = United States
, and so on and so forth for all the other countries. Garzfoth (talk) 14:25, 19 December 2017 (UTC)[reply]
- That seems fair enough for the specific case of countries. Here's a question: if WP:OVERLINK unambiguously applies to the country field, then would it be justified to edit the infobox to remove all country wikilinks for violating WP:OVERLINK? This would mean for example that all instances of
- WP:OVERLINK applies. You should not be linking countries like the U.S. and England. — JJMC89 (T·C) 19:40, 15 December 2017 (UTC)[reply]
- Any update on progress for this request? — xaosflux Talk 19:37, 21 March 2018 (UTC)[reply]
- I've been wondering that myself! Due to personal issues I haven't been able to spend very much time on Wikipedia over the past few months and never got around to re-flagging this page with the BAGAssistanceNeeded tag, but I have been periodically checking this page for updates, wondering why I never got a response. I think I'll have those personal issues finally mostly resolved within the next week or so and should be able to resume active editing again, at which point I'll re-flag this request with BAGAssistanceNeeded if nobody has responded by then. As far as I'm aware I've completely fulfilled the requirements for a post-trial status update and I don't see any remaining issues that should preclude moving onwards to either an extended trial or the final approval/denial stage. Is there a reason why this request has sat like this for so long? If you guys are actually waiting on something from me to proceed, please tell me what that is! Garzfoth (talk) 17:12, 24 March 2018 (UTC)[reply]
- Looks like you never tagged it {{BotTrialComplete}} so it may be in the wrong queue of watchers!— xaosflux Talk 03:12, 26 March 2018 (UTC)[reply]
- Oh dear, I didn't even notice that part at all. What a stupid mistake to make on my part! Thanks for pointing that out! Hmm, in the future, maybe it'd be a good idea to directly/explicitly tell people when their bots get approved for a trial, in the same message as the trial approval message, that when the trial is complete they should tag the page with that particular template? Something along the lines of "[bot trial approval template here] — when this trial is complete, please tag this page with {{BotTrialComplete}}"? Just an idea... Anyways, tagging it now...
- Trial complete.
- Garzfoth (talk) 13:57, 26 March 2018 (UTC)[reply]
- @Garzfoth: no big deal, as you can see at Wikipedia:Bots/Requests_for_approval - we are struggling with a big backlog on this page right now. — xaosflux Talk 14:01, 26 March 2018 (UTC)[reply]
- Got it. I added exclusion compliance to the bot (and in the process of testing, I discovered and fixed a major bug in the example Python code on Template:Bots). Source code has been updated to reflect that change.
- Just to be sure that this is prioritized appropriately in the backlog, I'm going to flag this page with {{BAGAssistanceNeeded}} once again, as I fear otherwise that this will end up getting overlooked given the current size of the backlog (which it is kinda buried in):
- {{BAGAssistanceNeeded}}
- Garzfoth (talk) 21:43, 4 April 2018 (UTC)[reply]
- @Garzfoth: no big deal, as you can see at Wikipedia:Bots/Requests_for_approval - we are struggling with a big backlog on this page right now. — xaosflux Talk 14:01, 26 March 2018 (UTC)[reply]
- Looks like you never tagged it {{BotTrialComplete}} so it may be in the wrong queue of watchers!— xaosflux Talk 03:12, 26 March 2018 (UTC)[reply]
- I've been wondering that myself! Due to personal issues I haven't been able to spend very much time on Wikipedia over the past few months and never got around to re-flagging this page with the BAGAssistanceNeeded tag, but I have been periodically checking this page for updates, wondering why I never got a response. I think I'll have those personal issues finally mostly resolved within the next week or so and should be able to resume active editing again, at which point I'll re-flag this request with BAGAssistanceNeeded if nobody has responded by then. As far as I'm aware I've completely fulfilled the requirements for a post-trial status update and I don't see any remaining issues that should preclude moving onwards to either an extended trial or the final approval/denial stage. Is there a reason why this request has sat like this for so long? If you guys are actually waiting on something from me to proceed, please tell me what that is! Garzfoth (talk) 17:12, 24 March 2018 (UTC)[reply]
So um... It's kinda been over two months since the last reply from a BAG editor and nearly two months since I last replied (in which I also flagged this for BAG assistance)... It looks like every single BRFA except for this one (that's currently seven BRFAs — one open, four in trials (one of which is an extended trial), and two in trial complete (one of which has BAG assistance requested)) have received at least one reply from a BAG editor within the last two weeks (with a further six BRFAs having been approved during that time period as well!), and at this point I'm starting to wonder if BAG editors are actively avoiding or ignoring this BRFA for some reason... I'm getting rather tired of the endless waiting... Over the past two days I've completed another round of improvements to the bot's entire stack (the scraper scripts, parser scripts, database structures, and the editing script were all worked on), but it's getting rather hard to maintain much enthusiasm and interest in continuing to improve a project that I started over 7 months ago, especially so when it comes to the editing functionality as that's something that I haven't been able to actually use for its intended purpose at all for more than 5.5 months now and after seven freaking months I still don't know if this BRFA application will ultimately be approved or rejected (which makes me particularly leery of continuing to spend further time on improving the editing script in particular — the rest of the project is still quite helpful on its own for assisting with manual editing in various ways, but the editing script (which is massively more efficient than manual editing could ever be for the many issues where using it is appropriate) is pretty much useless if its usage is ultimately not approved). Anyways... Hopefully you guys will finally have some time to get around to processing this BRFA at some point in the next week or two now that every BRFA except for this one has received BAG attention within the last two weeks... Garzfoth (talk) 01:40, 3 June 2018 (UTC)[reply]
- @Garzfoth: apologies for the long delay here - the scope of "any infobox" is quite wide, would you be OK reducing the scope to either a specific list, or even a grouping such as "infoboxes in Category:Template-Class energy articles" ? — xaosflux Talk 13:41, 11 June 2018 (UTC)[reply]
- @Xaosflux: As I said in my initial application (apparently not very clearly — but given all the edits and discussion over such a long period of time, it's understandable that some confusion may have arisen), I was only aiming to be approved for the initial scope of the entirety of Template:Infobox power station + Template:Infobox dam, which as of June 2 2018 (when my last full scrape of targeted infoboxes on enwiki took place) was 5856 distinct articles containing 5904 unique in-scope infoboxes. Any further extension(s) of the bot's scope beyond those two infoboxes would of course require additional BRFA(s).
- The scope Category:Template-Class energy articles is actually substantially outside of the requested scope because I am only looking at pages within the article namespace for these two infoboxes, and restricting myself to something like "only C-class energy articles" (or whatever) makes absolutely no sense whatsoever. Perhaps it would help if I recreated the queries I run to get a list of articles containing all in-scope infoboxes in the API sandbox? Here's the two initial queries recreated in the API sandbox:
- https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=embeddedin&eititle=Template%3AInfobox_power_station&einamespace=0&eilimit=max + https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=embeddedin&eititle=Template%3AInfobox_dam&einamespace=0&eilimit=max
- My scripts of course also send a unique descriptive user agent and iterate over the full set of results with subsequent queries containing the appropriate additional continue+eicontinue query param values, and I split the task of scraping+parsing+saving each result into entirely different scripts altogether. Also, with these two particular infoboxes I definitely only want to target the article namespace, hence the einamespace=0 argument.
- So to be absolutely clear, my initial approval request here pertains only to the pages within the article namespace containing one or more usages of the templates "Infobox power station" and/or "Infobox dam", which is currently less than six thousand unique pages/articles in total (and even after accounting for pages with multiple infoboxes that is still less than six thousand unique infoboxes). That is the only scope I am requesting approval to operate this bot within at the moment. I have already extensively explained the kind of changes I want to use this bot to make within this scope. A 100-edit trial has already been approved and carried out within this scope, and I have summarized the results of said trial in detail. If and when the bot's scope merits any further expansion beyond these two initial templates, I will of course submit an additional BRFA for this bot regarding the potential expansion of scope. Is this satisfactory and does it answer your remaining questions adequately? Garzfoth (talk) 16:01, 11 June 2018 (UTC)[reply]
- Approved. Note scope is limited to edits related to Template:Infobox power station and/or Template:Infobox dam, broadly construed. — xaosflux Talk 16:57, 11 June 2018 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.