Jump to content

Wikipedia:Bot requests: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Ronhjones (talk | contribs)
Removing date headers: start the coding
Line 620: Line 620:
:: No time for long explanations, but mods to Scsbot for this purpose are unlikely, so do carry on with Plan B. —[[User:scs|Steve Summit]] ([[User talk:scs|talk]]) 03:59, 22 August 2018 (UTC)
:: No time for long explanations, but mods to Scsbot for this purpose are unlikely, so do carry on with Plan B. —[[User:scs|Steve Summit]] ([[User talk:scs|talk]]) 03:59, 22 August 2018 (UTC)
:::{{BOTREQ|coding}} Thanks, Steve. [[User:Ronhjones|<b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron<span style="color:red">h</span>jones&nbsp;</b>]]<sup>[[User talk:Ronhjones|&nbsp;(Talk)]]</sup> 16:28, 22 August 2018 (UTC)
:::{{BOTREQ|coding}} Thanks, Steve. [[User:Ronhjones|<b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron<span style="color:red">h</span>jones&nbsp;</b>]]<sup>[[User talk:Ronhjones|&nbsp;(Talk)]]</sup> 16:28, 22 August 2018 (UTC)

== Vandalism from user:194.199.4.202 ==

Hello this anonymous user is changing verified articles left and right.
This is a vandalism

User:194.199.4.202

Revision as of 16:28, 22 August 2018

This is a page for requesting tasks to be done by bots per the bot policy. This is an appropriate place to put ideas for uncontroversial bot tasks, to get early feedback on ideas for bot tasks (controversial or not), and to seek bot operators for bot tasks. Consensus-building discussions requiring large community input (such as request for comments) should normally be held at WP:VPPROP or other relevant pages (such as a WikiProject's talk page).

You can check the "Commonly Requested Bots" box above to see if a suitable bot already exists for the task you have in mind. If you have a question about a particular bot, contact the bot operator directly via their talk page or the bot's talk page. If a bot is acting improperly, follow the guidance outlined in WP:BOTISSUE. For broader issues and general discussion about bots, see the bot noticeboard.

Before making a request, please see the list of frequently denied bots, either because they are too complicated to program, or do not have consensus from the Wikipedia community. If you are requesting that a template (such as a WikiProject banner) is added to all pages in a particular category, please be careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively (see example difference).

Alternatives to bot requests

Note to bot operators: The {{BOTREQ}} template can be used to give common responses, and make it easier to keep track of the task's current status. If you complete a request, note that you did with {{BOTREQ|done}}, and archive the request after a few days (WP:1CA is useful here).


Please add your bot requests to the bottom of this page.
Make a new request
# Bot request Status 💬 👥 🙋 Last editor 🕒 (UTC) 🤖 Last botop editor 🕒 (UTC)
1 Bot to remove template from articles it doesn't belong on? 4 4 Wikiwerner 2024-09-28 17:28 Primefac 2024-07-24 20:15
2 Removing redundant FURs on file pages 5 3 Wikiwerner 2024-09-28 17:28 Anomie 2024-08-09 14:15
3 de-AMP bot BRFA filed 13 7 Usernamekiran 2024-09-24 16:04 Usernamekiran 2024-09-24 16:04
4 QIDs in Infobox person/Wikidata BRFA filed 11 4 Tom.Reding 2024-10-06 14:23 Tom.Reding 2024-10-06 14:23
5 Remove outdated "Image requested" templates 3 2 7804j 2024-09-21 11:26 DreamRimmer 2024-09-19 18:53
6 "Was" in TV articles 6 4 Pigsonthewing 2024-11-11 12:30 Primefac 2024-09-29 19:34
7 Films by director  done 9 4 Usernamekiran 2024-10-03 13:30 Usernamekiran 2024-10-03 13:30
8 altering certain tags on protected pages? 10 5 Primefac 2024-10-20 14:47 Primefac 2024-10-20 14:47
9 Request for Bot to Remove ARWU_NU Parameter from Articles Using Infobox US University Ranking Template 4 2 Primefac 2024-10-13 12:50 Primefac 2024-10-13 12:50
10 Removal of two external link templates per TfD result 6 4 Primefac 2024-10-14 13:48 Primefac 2024-10-14 13:48
11 Replace merged WikiProject template with parent project + parameter  Done 7 3 Primefac 2024-10-21 10:04 Primefac 2024-10-21 10:04
12 Bot Request to Add Vezina Trophy Winners Navbox to Relevant Player Pages 3 3 Primefac 2024-10-19 12:23 Primefac 2024-10-19 12:23
13 Replace standalone BLP templates  Done 7 3 MSGJ 2024-10-30 19:37 Tom.Reding 2024-10-29 16:04
14 Assess set index and WikiProject Lists based on category as lists 19 5 Mrfoogles 2024-11-06 16:17 Tom.Reding 2024-11-02 15:53
15 Request for WP:SCRIPTREQ 1 1 StefanSurrealsSummon 2024-11-08 18:27
16 LLM summary for laypersons to talk pages of overly technical articles? 10 7 Legoktm 2024-11-12 17:50 Legoktm 2024-11-12 17:50
17 Redirects with curly apostrophes 6 5 Pppery 2024-11-11 17:30 Primefac 2024-11-11 16:52
18 Bot for replacing/archiving 13,000 dead citations for New Zealand charts 3 2 Muhandes 2024-11-14 22:49 Muhandes 2024-11-14 22:49
Legend
  • In the last hour
  • In the last day
  • In the last week
  • In the last month
  • More than one month
Manual settings
When exceptions occur,
please check the setting first.


Take over GAN functions from Legobot

Legobot is an enormously useful bot that performs some critical functions for GAN (among other things). Legoktm, the operator, is no longer able to respond to feature requests, and is not very active; they've asked in the past if someone would be willing to take over the code. I gather from that link that the code is PHP; see here [1]. There would be a lot of grateful people at GAN if we could start addressing a couple of the feature requests, and if we had an operator who was able to spend more time on the bot. This is not to criticize Legoktm at all -- without their work, GAN could not function; Legobot is a core part of GAN functionality.

I left a note on Legotkm's talk page asking if they would mind a request here for a new operator, and Redrose64 responded there with a link to the note I posted above, so I think it's clear they'd be glad for someone else to pick this up. Any takers? Mike Christie (talk - contribs - library) 23:10, 6 February 2018 (UTC)[reply]

I've heard from Legoktm and they would indeed be glad to have someone else take this over. If you're capable in PHP, this is your chance to operate a bot that's critical to a very active community. Mike Christie (talk - contribs - library) 00:21, 8 February 2018 (UTC)[reply]
I would like to comment that it would be good to expand the functionalities of the bot for increased automation, like automatically adding to the GA lists. Perhaps it would be better to rewrite the bot in a different language? I think Legoktm has tried to get people to take over the php for awhile with no success. Kees08 (Talk) 04:44, 8 February 2018 (UTC)[reply]
The problem with adding to the GA lists is knowing which one. There is no indication on the GAN as to where. All we have is the topic. Even the humans have trouble with this. Hawkeye7 (discuss) 20:20, 16 February 2018 (UTC)[reply]
To correct for the past, we could add a parameter to the GA template for the 'subtopic' or whatever we want to call that grouping. A bot could go through the current listing and then add that parameter to the GA template. Then, when nominating, that could be in the template, and the bot could carry that through all the way to automatically adding it to the GA page at the end. Kees08 (Talk) 20:23, 16 February 2018 (UTC)[reply]
Nominators would need to know those tiny divisions within the subtopics; as it's not something we have on the WT:GAN page, I doubt most are even aware of the sub-subtopics. Even regular subtopics are sometimes too much for nominators, who end up leaving that field blank when creating their nominations. BlueMoonset (talk) 22:15, 26 February 2018 (UTC)[reply]

@Hawkeye7: For what it is worth, due to your bot's interactions with FAC, I think it would be best if you took over the GA bot as well, for what it is worth. I think at this point it is better to just write a new bot than salvage the old bot; no one seems to want to work on salvaging. Kees08 (Talk) 21:59, 26 February 2018 (UTC)[reply]

We'd need to come up with a full list of functionality for whoever takes this on, not only what we have now but what we're looking for and where the border conditions are. BlueMoonset (talk) 22:15, 26 February 2018 (UTC)[reply]

I might interested in lending a hand. A features list and functionality details (as mentioned by BlueMoonset) would be nice to affirm that decision though. I shall actively watch this thread. --TheSandDoctor (talk) 21:30, 11 March 2018 (UTC)[reply]

Okay, I will attempt to list the features, please modify as needed:

  • Place notifications on nominators talk page when their nomination is onreview diff, onhold diff, passed diff, failed
  • Update GAN page when status of a review changes (new, on hold, on review, passed, failed, also number of reviews editors have performed) diff
  • Update the stats page (related to the last bullet point, this is where the stats are stored) diff
  • Transcludes GA review on article talk page diff
  • Adds GA icon to articles that pass diff
  • Adds the oldid parameter to the GA tempate diff

@BlueMoonset: Are you aware of other functions? Looking through the bots edit history and going off of what I know of the bot, this is what I came up with. Kees08 (Talk) 22:10, 11 March 2018 (UTC)[reply]

Thanks Kees08. Does anyone know if it would be possible to take a look at the database structure? --TheSandDoctor (talk) 22:28, 11 March 2018 (UTC)[reply]
@Legoktm: Are you able to answer their question? Thanks! Kees08 (Talk) 23:43, 11 March 2018 (UTC)[reply]
TheSandDoctor, it's great that you're interested in this. Kees08, the second item (about updating the GAN page) is much broader. I don't know whether the bot simply updates the GAN page or generates/recreates the contents of all sections on it. It's basically dealing with all the GA nominee templates out there—which indicate what is currently a nominated article that belongs on the GAN page, and the changes to that page. If an entry wasn't on the page last time but is this time, then it's considered new; if it was on last time but a review page has appeared for it, then it's considered under review and the review page is parsed for reviewer information (but if the GA nominee template instead says "status=onhold", then it's noted as being on hold)... there's a lot involved, including cross-checking, and any field in the GA nominee template, including page number, subtopic, status, and note, can change at any time. If the GA nominee template has disappeared and a GA template is there for that same "page" number, then it has passed; if a FailedGA is there for that same "page" number, then it has failed (but the current bot's code doesn't check this properly, so any FailedGA template on the talk page results in the "failed" message being sent to the nominator even if the nomination was just passed with a new GA template showing). Sometimes review pages disappear when they were created illegally or by mistake and are speedy deleted, and the bot realizes their absence and updates the GAN page accordingly, so it's a comprehensive check each time the bot runs (currently every 20 minutes). If the bot doesn't know how to characterize the change it has found, it appears under an edit summary of "Maintenance": status changes to 2ndopinion go here, as do passes and failures where there was something wrong with the article talk page according to its lights. For example, it views with suspicion any talk page of a nomination under review that doesn't have a transcluded review on it, so it doesn't send out pass or fail messages for them (and maybe not even hold messages; I've never checked that).
There's a difference here between features and functionality. I think the features (with the exception of the 2ndopinion status and the display of anything in the "notes" field of GA nominee) have been listed here. The actual functions—how it needs to work and what it needs to check—are harder to break down. One thing that was mentioned above is the use subtopics: we have been unable to add new subtopics for several years now, so new subtopics on the GA page are not yet available on the GAN page. I'm not sure how the bot gets its list of subtopics—I've found more than one possible page where they could be read from, but there may be a database for subtopics and the topics they come under that actually controls them, with the pages I've found being a place for some templates, like GA, FailedGA, and Article history, to figure out what subtopics translate to which topics, and which subtopics are legitimate. GA nominee templates that have invalid subtopics or missing status or note fields (or other glitches) can cause the bot to try every 20 minutes to enter or update a nomination/review and fail to do so; there are times when a transaction is listed dozens of times, one bot run after another, as the GAN edit summary because it needs to happen, but it ultimately doesn't (until someone sees the problem and fixes the problematic GA nominee template or GA review page). I'm hoping any new bot will be smarter about how to handle these (and many other) situations, and maybe there will be an accessible error log to aid us in determining what's wrong. BlueMoonset (talk) 00:55, 12 March 2018 (UTC)[reply]
Yeah there is a lot in the second bullet point I did not include diffs for, on account of me being lazy. I will try to do that tonight maybe. I tried to limit what I said to the current functionality of the bot and not include my wishlist of new things, including revamping how subtopics are done. There was an error log at some point in time (located here), not sure when we stopped using that, and if it was on purpose or not. Kees08 (Talk) 01:18, 12 March 2018 (UTC)[reply]
@TheSandDoctor: Just giving you a ping in case this slipped off your radar. Kees08 (Talk) 07:48, 20 March 2018 (UTC)[reply]
Thanks for the ping Kees08. I had not forgotten, but was waiting for other responses. I am still interested (and might be able to port it to Python), we just need to get Legoktm involved in the discussion. --TheSandDoctor Talk 15:31, 20 March 2018 (UTC)[reply]
Kees08 BlueMoonset I have started to (somewhat) work on a Python port of the GAN task. There are some libraries that can be taken advantage of to (hopefully) reduce the number of lines (hopefully simplify it) etc. --TheSandDoctor Talk 22:49, 20 March 2018 (UTC)[reply]
That's great, TheSandDoctor. I'm very happy you're taking this one on. There are some places where the current code doesn't do what it ought. Here are a few that I've noticed:
  • As mentioned above, even if the review has just concluded with the article being listed as a GA, if the article talk page also has a FailedGA template on it from prior nomination that was not successful, the bot will send out a "Failed" message rather than a "Passed" message.
  • If a subtopic isn't capitalized exactly right, the nomination is not added to the GAN page even though the edit summary claims it is; for example, the subtopic "songs" isn't written as "Songs", which prevents the nomination from being added to the page until it is fixed.
  • If a GA nominee template is missing the status and/or note fields, a new review is not added to the template, even though it is (ostensibly) added to the GAN page. One example: Abdul Hamid (soldier) was opened for review and appeared on the GAN page as under review, but in actuality, the review page was transcluded but the GA nominee status was not updated because the GA nominee template was missing the "note" field; only after that was manually added did the bot add the "onreview" status. It would make so much more sense for the bot to add the missing field(s) to GA nominee and proceed with adding the status to the template (and the transclusion of the review page on the talk page), instead of leaving its process step incomplete.
  • When an editor opens a GA review, the bot will increment the number of reviews they have, and it will adjust this number on all nominations and reviews that editor has open. Unfortunately, not only does it produce an edit summary that lists the new review, it also includes those other reviews in the edit summary because of that incremented number, when nothing new has happened to the other reviews. This was a problem before, and it's gotten much worse now that edit summaries can be 1024 characters rather than 128 or 256. For example, when Iazyges opened a GA review of Jim Bakker, the edit summary overflowed the 1024 characters, and it shouldn't have; the Bakker review was the only one that should have been listed for Iazyges.
I'm sure there are others; I'll try to think of them and let you know. Thanks again for taking this on. BlueMoonset (talk) 04:52, 21 March 2018 (UTC)[reply]
@BlueMoonset: Thanks! At the moment I am just trying to get the current code ported, but once I am confident that it should work, I will see about the rest. (The main issue of course being that I cannot actually test/run the ported script (isn't ready for that stage yet, but once it is. The most I could do would be to output to text files diffs instead of saving for a couple as I dont have bot access etc etc; Lego needs to be a part of these discussions at some point as they involve their bot). --TheSandDoctor Talk 05:17, 21 March 2018 (UTC)[reply]
@BlueMoonset:@Kees08: I have emailed Legoktm requesting for a glimpse at the database structure. --TheSandDoctor Talk 16:00, 27 March 2018 (UTC)[reply]
TheSandDoctor, that's excellent news. I hope you hear back soon. Incidentally, I noticed that Template:GA/Subtopic was modified by Chris G, who was the GAN bot owner (then called GAbot) prior to Legoktm, back when the Warfare subtopic "War and military" was changed to "Warfare", so I imagine this is one of the files that might need to be updated if/when the longstanding requests to update/expand the subtopics at GAN to break up some of the single-subtopic topics (something that's already been done at WP:GA. In particular, the Warfare topic/subtopic and the Sports and recreation topic/subtopic have been on our wishlist for several years, but Legoktm never responded to multiple requests; the last changes we had were under Chris G before his retirement in 2013. I don't know whether Template:GA/Topic is involved, and the underlying Module:Good article topics and the data it loads at Module:Good article topics/data, which would also need to be updated when topics and/or subtopics are revised or added to. BlueMoonset (talk) 23:26, 28 March 2018 (UTC)[reply]
Hi there BlueMoonset, I was waiting to hopefully hear from Lego, but have not. Exams have delayed my progress in this (and will continue to do so until next week), but unfortunately, even when I have the bot converted, there is no guarantee with would work (at first) as I don't have a way to test it nor do I have access to the existing database etc. I could probably figure out what the database looks like from the code, but the information contained within would be very useful (especially to get it up and running). It is still unclear if I would gain access to Legobot or have to make a "Legobot 2" (or similar). (cc Kees08) --TheSandDoctor Talk 01:05, 20 April 2018 (UTC)[reply]
TheSandDoctor, I don't know what you can do at this point, aside from pinging Legoktm's email account again. I know that Legoktm would like to give over the responsibility for this code, but doesn't seem to be around Wikipedia enough any more to give the time necessary to help achieve such a transition. With luck, one of these pings will eventually catch them when they have time and energy to make it happen. I do hope you hear something soon. BlueMoonset (talk) 03:42, 20 April 2018 (UTC)[reply]
@TheSandDoctor: What database are you looking for? This may be a dumb question..but we identified where the # of reviews/user list was, and the GA database is likely just from the GA page itself. Is there another database you are looking for? Kees08 (Talk) 00:04, 22 April 2018 (UTC)[reply]
Hi there and sorry for the delay in my response Kees08, Legobot uses its own database to keep track of the page states (to know if they have changed). Having access or at least an outline of the structure would speed things out somewhat as I would not have to regenerate the database and could have it clarified what exactly is stored about the pages etc. It is not a necessity, but would be a nice convenience, especially if I am to take over the bot's functions and maintenance to have access to its database (or at least a "snapshot" of its structure). As for further development on the translation to Python, once finals are wrapped up (by Tuesday PST), I should hopefully have more time to dedicate to working on it. In the meantime, I have an important final in between me and programming. I shall keep everyone updated here. I still foresee an issue with verifying that the bot works as expected though due to the lack of available testing and a bot account to run it on. Things will sort themselves out though in the next while, I am sure. Minus editing I could always check if it "compiles"/runs and could probably work in a dry-run framework similar to my other projects (where they go through the motions, without making actual edits, printing to a local text file(s) instead). --TheSandDoctor Talk 05:44, 22 April 2018 (UTC)[reply]
Sounds good; no rush, just seeing if I can help you hit the ground running when you get to it. Perhaps DatGuys's config structure would help you figure out a way to do dry runs; mildly similar, you would just have to make up some pages and a database structure, to get the best dry run that is possible prior to hitting the real articles. Best of luck on your finals, and if it makes you feel any better, you will still wake up in cold sweats about them several years in the future (note to dreaming self: no, I have no finals. No, it does not matter you did not study.). Kees08 (Talk) 06:21, 22 April 2018 (UTC)[reply]
Not sure how this is going, but I have found User:GA bot/Stats to be inaccurate. It simply needs to list the number of pages created by each editor with "/GA" in the title. Most editors have less listed than they have done. It might b e easier to look into this while the bot is being redone. AIRcorn (talk) 22:13, 11 May 2018 (UTC)[reply]
Can't we get the database structure from the code? Enterprisey (talk!) 04:25, 13 June 2018 (UTC)[reply]
I committed the current database structure. Let me know if you want dumps too. Legoktm (talk) 07:04, 13 June 2018 (UTC)[reply]
@Enterprisey and TheSandDoctor: Ping, in case you missed Lego's comment like I did. Thanks lego! Kees08 (Talk) 02:58, 25 June 2018 (UTC)[reply]
I totally missed that. Thank you so much Legoktm (and Kees08 for the ping)! A dump would be probably useful, though not entirely necessary. --TheSandDoctor Talk 03:55, 25 June 2018 (UTC)[reply]

Cyberbot I Book report updates

The Cyberbot I (talk · contribs) bot used to update all the book reports but stopped from January 2018. It seems its owner is caught up IRL to fix this. Can anyone help by checking the bot or the code? —IB [ Poke ] 16:13, 5 May 2018 (UTC)[reply]

You need to check with User:cyberpower678 - see Wikipedia:Bots/Requests for approval/Cyberbot I 5 - User:cyberbot I says it's enabled. There is no published code. Ronhjones  (Talk) 20:53, 14 May 2018 (UTC)[reply]
@Ronhjones: I have tried contacting Cyberpower a number of times, but he/she does not look into it anymore. Although the bot is listed as active for book status, it has stopped updating it. So somewhere it is skipping the update somehow. —IB [ Poke ] 14:30, 21 May 2018 (UTC)[reply]
@IndianBio: Sadly the original request Wikipedia:Bots/Requests for approval/NoomBot 2 has "Source code available: On request", so there is no working link to any source code. If User:cyberpower678 cannot fix it the current system, then maybe the only option is to write a new bot from scratch. I see user:Headbomb was involved in the original BRFA, maybe he might have some ideas? I can think about a re-write if there's no alternative - I will need a bit more info on what the bot is expected to do. Ronhjones  (Talk) 14:57, 21 May 2018 (UTC)[reply]
The modification likely isn't very big, and User:Cyberpower678 likely has the source code. The issue most likely consist of finding what makes the bot crash/not perform, and probably update a few API calls or something hardcoded into the bot (like a category). Headbomb {t · c · p · b} 15:01, 21 May 2018 (UTC)[reply]
Yes, I have the source, but it was modified as needed to keep it operational over time. @Headbomb: If you email me, I can email you a current copy of the source to look at.—CYBERPOWER (Chat) 15:58, 21 May 2018 (UTC)[reply]
I suppose I could, but I'm a really shit coder. Is there a reason to not make the source public? Headbomb {t · c · p · b} 16:35, 21 May 2018 (UTC)[reply]
It actually is.—CYBERPOWER (Chat) 17:03, 21 May 2018 (UTC)[reply]
@Headbomb: will you take a look at the code? I'm sorry I really don't understand the link which Cyberpower has given. I only code in Mainframe lol, but let me know what seems to be the issue. —IB [ Poke ] 08:04, 22 May 2018 (UTC)[reply]
Like I said, I'm a shit coder. This looks to be in PHP so presumably anyone that knows PHP could take over the book reports. Headbomb {t · c · p · b} 13:09, 22 May 2018 (UTC)[reply]
Someone only need to file a pull request, and I will deploy it.—CYBERPOWER (Chat) 13:42, 22 May 2018 (UTC)[reply]
I can have a look - I'm not a PHP expert by any means (I prefer Python! ;) ) but I've used it extensively in a past life. Richard0612 19:31, 22 May 2018 (UTC)[reply]
Richard0612, that will be a real help if you can do it. A lot many books are lagging in their updates. —IB [ Poke ] 12:32, 23 May 2018 (UTC)[reply]

(→) Hey @Richard0612: was wondering did you get a chance to look into the code base? —IB [ Poke ] 09:17, 4 June 2018 (UTC)[reply]

Not getting any response, so pinging @Cyberpower678: what can be done? —IB [ Poke ] 06:32, 10 June 2018 (UTC)[reply]
@Richard0612: sorry can we have any update on this? —IB [ Poke ] 15:10, 26 June 2018 (UTC)[reply]

I just happened to stumble across this discussion while visiting for something else, but I can say that the book report updates have resumed as of 28 June. --RL0919 (talk) 18:50, 2 July 2018 (UTC)[reply]

Alexa rankings / Internet Archive

This isn't really a bot request, in the sense that this doesn't directly have anything to do with the English Wikipedia and no pages will be edited (no BRFA is required), but I'm putting it here nonetheless because I don't know of a better place and it's used 500% more than Wikidata's bot requests page. However, it will benefit both Wikidata and Wikipedia.

I have been archiving (with wget and a list of URLs) a lot of alexa.com pages onto the Internet Archive and archive.is, currently about 75,000 daily (all the same pages). This was originally supposed to be for Wikidata and would have been done once a month on a lot more URLs, but that hasn't materialized. Unfortunately maintaining this automatically would be beyond my rudimentary shell script skills, and to run it as I am doing currently would require time which I do not have.

Originally d:User:Alexabot did this based on some URLs from Wikidata, but the operator seems to have vanished after being harangued on Wikidata's project chat because he added the data to items which were not primarily websites. It follows that in the absence of an established process to add values for the property to Wikidata, the archiving should be done separately, with the data to be harvested where needed. Module:Alexa was to have been used with the data, but the bot only completed three runs so it would be outdated at best, and the Wikidata RFC might end up restricting its use.

Could someone set their Unix-based computer, and/or or their bit of the WMF cloud servers, to

  • once a day, archive (to the Internet Archive) and/or download several lists of domain names (e.g. those used on Wikipedia and Wikidata; from CSV files which are sitting on my computer; lists of the top 1 million websites) and combine the lists
  • format those domain names with the regular expression below
  • once a month (those below about ~100,000 in rank) or daily/weekly (those ~100,000 and above), archive (to the Internet Archive or archive.is) all of the URLs (collected on a given day) between 17:10 UTC and 16:10 UTC the day after (Alexa seems to refresh data erratically between 16:20 and 17:00 each day, independent of daylight saving time)
    • wget allows archival of lists of websites; use -i /path/to/file and -o /path/to/file flags for archival and logging respectively
  • possibly, as an unrelated process, download the archived pages using URL format https://web.archive.org/web/YYYYMMDD054000/https://www.alexa.com/siteinfo/url.website (where YYYYMMDD is some date) and then harvest the data (Unix shell script regular expressions are almost entirely sufficient)
    • alternatively, just download directly from alexa.com around the same time (see below)
https://web.archive.org/save/https://www.alexa.com/siteinfo/$1
https://web.archive.org/save/https://traffic.alexa.com/graph?o=lt\&y=t\&b=ffffff\&n=666666\&f=999999\&p=4e8cff\&r=1y\&t=2\&z=30\&c=1\&h=150\&w=340\&u=$1
https://web.archive.org/save/https://traffic.alexa.com/graph?o=lt\&y=q\&b=ffffff\&n=666666\&f=999999\&p=4e8cff\&r=1y\&t=2\&z=0\&c=1\&h=150\&w=340\&u=$1

Caveats:

  • The Wikidata property currently uses the URL access date as the point in time, instead of the date that the data was collected (one day and 17 hours before a given UTC time), and does not require an archive URL or even a date. This might be fine for Google's Wikidata item since it will be number 1 until the end of time, but for everything else it will need to be fixed at some point
  • If you don't archive the graph images at the same time (or you archive pages too quickly), alexa.com will start throttling connections from the Internet Archive and you will be unable to archive /siteinfo/* for about a week
  • web.archive.org does not allow a large number of incoming connections per IP for either upload or download (only tested with IPv4 addresses – might be better with multiple IPv6 addresses), so you may want to get around this somehow. I have been funneling connections through Tor simply because it seemed easier to configure torsocks, but this is not ideal
  • Given the connection limit, it is only possible to archive about 100,000 pages and 200,000 graphs per day per IP address (and there might be another limit on alexa.com, which I haven't tried testing)
  • You can use wget's --spider and --max-redirect flags to avoid downloading content
  • Rankings below a certain point (maybe 1 million) are probably not very useful, since the rate of change is high. The best way to check this – which I haven't tried, because it only just occurred to me – is probably to download pages straight from alexa.com while the data is being archived, and check website rankings that way.
  • Some URLs are inexplicably blocked from archival on the Wayback Machine. Those are …/facebook.com, …/blogger.com and …/camelcamelcamel.com (there may be others but I haven't found any more). archive.is (which archives page requisites server-side) seems to block repeated daily archival after a certain point but you can avoid this by using URLs which redirect to the content to be archived
    • archive.is isn't supposed to be scriptable, but I did it anyway with a Lynx script
  • Some websites inexplicably disappear from the rankings from day to day, so don't stop archiving websites simply because their ranking disappears

If you want I can send you the CSV files of archive links that I've accumulated (email me and/or my alternate account, Jc86035 (1)). I have also been archiving spotifycharts.com and if it would be of any use I've written a shell script for that website.

Notifying Cyberpower678, just because you might know something I don't due to running InternetArchiveBot (or you might be able to get the Internet Archive to do this from their servers). Jc86035 (talk) 15:02, 20 May 2018 (UTC)[reply]

Jc86035 - I read all this and understand some things, but don't really understand what the goal is. Can you explain in a sentence or two. Are you trying to archive all URLs obtained through alexa.com onto archive.org / archive.is on a daily/weekly basis? What is the connection to wikidata? What is the purpose/goal of this project? -- GreenC 15:53, 26 May 2018 (UTC)[reply]

Jc86035 - re: archive.is - there are some archive.is libraries on GitHub for making page saves, but generally when doing mass uploads you'll want to setup an arrangement with the site owner to feed them links, as it gets better results if he does the archiving in-house, as he can get around country blocks and other things. -- GreenC 15:53, 26 May 2018 (UTC)[reply]

@GreenC: Originally the primary motivation was to collect data for Wikidata and Wikipedia. Currently most Alexa.com citations do not even have an archive link (and sometimes don’t even have a date), so the data is completely unverifiable unless Alexa for some reason releases archives of their old data. Websites ranked lower than 10,000 had usually been archived about once before I started archiving. However, I don’t really know what data should be archived (I don’t know how to make a list based on Wikipedia/Wikidata outlinks and haven’t asked anyone for such a list yet), and as such have just archived pages based on other, more easily manipulable lists of websites (such as some CSV file that I found in a web search for the top 1 million websites, which is apparently monthly Alexa data), and because it’s generally difficult and tedious to maintain I’ve just gotten a script to run the same list of about 75,000 archive links at the same time every day.
archive.is seems to only archive one page every two seconds at maximum, based on its RSS feed. Since the Internet Archive is evidently capable of a much higher throughput I would rather not overwhelm archive.is with lots of data which isn’t really all that important. I might ask the website owner to archive those three pages every day, though. Jc86035's alternate account (talk) 15:21, 27 May 2018 (UTC)[reply]
Anytime a new external link is added to Wikipedia, the Wayback Machine sees it and archives it. This is done automatically daily with a system created and run by Internet Archive. In addition archive.is has done active archiving of all links though I am not sure what the current ongoing status. Between these two most (98%) newly added links are getting archived. I don't know what an Alexa/com citation is, a Special:External links search only shows about 500 alexa.com URLs on enwiki. -- GreenC 04:14, 30 May 2018 (UTC)[reply]
@GreenC: How does the external link harvesting system work? Is the link archival performed only for mainspace, or for all pages? If an added external link has already been archived, is the link archived again? (A list could be created in user space every so often, although there would be a roughly 136 chance of a given page's archival being done when the pages are being changed to use the next day's data, which would make the archived pages slightly less useful.)
There are lots of pages which currently do not have Alexa ranks but would benefit from having them added, mostly the lists of websites and the articles of the websites listed (as well as lists of other things which have websites, like newspapers). It would work as a proxy for popularity and importance. Jc86035's alternate account (talk) 08:11, 7 June 2018 (UTC)[reply]
@Jc86035: NoMore404. It gets links via the IRC system which I believe is for all spaces. Could test by adding a link to a talk page (not yet on Wayback) and check in 48hrs to see if it's on Wayback. Once a link is in the Wayback it automatically recrawls though how often hard to say. some pages multiple times a day, others once a year, etc.. not sure how they determine freq. -- GreenC 12:48, 7 June 2018 (UTC)[reply]
@GreenC: I've added links to Draft:Alexa Internet and User:Jc86035/sandbox, which should be enough for testing. Jc86035's alternate account (talk) 06:18, 8 June 2018 (UTC)[reply]
Both those URLs redirect to a page already existing in the Wayback not sure how nomo404 and wayback machine will respond. Redirects are a complication on Wayback. -- GreenC 15:42, 8 June 2018 (UTC)[reply]
@GreenC: None of the URLs have been archived. I think I'll probably stick to using the long list of URLs, although I might try putting them in the WMF cloud at some point. Jc86035 (talk) 16:19, 16 June 2018 (UTC)[reply]
Jc86035 The test URLs you used won't work, they are already archived on the Wayback. As I said above, "Both those URLs redirect to a page already existing in the Wayback". Need to use URLs that are not yet archived. -- GreenC 18:47, 16 June 2018 (UTC)[reply]

@GreenC: Okay. I've replaced those links with eight already-archived links and eight unarchived links; none of them are redirects. Jc86035 (talk) 06:39, 17 June 2018 (UTC)[reply]

Ok good. If not working 24-48hr I will contact IA. -- GreenC 14:07, 17 June 2018 (UTC)[reply]
Jc86035 - those spotify links previously exist on Wayback, archived in March. Need to find links not yet in the Wayback. -- GreenC 13:50, 19 June 2018 (UTC)[reply]
@GreenC: Eight of them (2017-12-14) were archived by me, and the other eight (2018-06-14) are too recent to have been archived by me. Jc86035 (talk) 14:41, 19 June 2018 (UTC)[reply]
@Jc86035: Clearly the links did not get archived. It still might be caused by a filter of userspace, so I added one of the links onto mainspace to see what happens. -- GreenC 01:56, 28 June 2018 (UTC)[reply]
I saved one manually to check for robots.txt or something blocking saves, but it looks OK. The one testing in mainspace: https://spotifycharts.com/regional/jp/daily/2018-06-14-- GreenC 02:03, 28 June 2018 (UTC)[reply]
@Jc86035: NoMo404 appears to be working. I added a Spotify link into mainspace here. The next/same day it showed up on Wayback. Looks like it's only tracking links added to mainspace, not Draft or User space. -- GreenC 23:25, 29 June 2018 (UTC)[reply]
@GreenC: Thanks. I guess it should work well enough for the list articles, then. Jc86035 (talk) 23:32, 29 June 2018 (UTC)[reply]

Peer review - periodically contacting mailing list with unanswered reviews

Hi all, could a bot editor help out at peer review by creating a bot that periodically contacts editors on our volunteer list with a list of unanswered reviews? Some details

  • Discussion is here: WT:PR - the problem we are trying to answer is that of a large number of outstanding reviews that haven't been answerd
  • List of unanswered reviews is here: WP:PRWAITING
  • List of volunteers is here: WP:PRV
  • We will remove inactive volunteers, and I will reformat the list in a bot readable format similar to this: {{User:Tom (LT)/sandbox/PRV|Tom (LT)|anatomy and medicine|contact=never}}
    • Editors will opt in to the system - all will be set to default to never contact
    • Options for contact will be never, monthly, quarterly, halfyearly, and yearly (unless you can think of a more clever way to do this)

Looking forward to hearing from you soon, --Tom (LT) (talk) 23:12, 4 June 2018 (UTC)[reply]

Addit: ping also to Anomie who very kindly helped create the AnomieBot that now runs PR.--Tom (LT) (talk) 23:12, 4 June 2018 (UTC)[reply]
In the meantime, I'll mention that WP:AALERTS will report peer reviews requests to Wikiprojects, if the articles are tagged by banners. Headbomb {t · c · p · b} 11:29, 5 June 2018 (UTC)[reply]
Bump. --Tom (LT) (talk) 00:51, 25 June 2018 (UTC)[reply]
Bump. --Tom (LT) (talk) 03:24, 10 July 2018 (UTC)[reply]
Bump. --Tom (LT) (talk) 10:47, 30 July 2018 (UTC)[reply]
@Tom (LT): BRFA filed Kadane (talk) 00:27, 10 August 2018 (UTC)[reply]

Bot to change redirects to 'Redirect'-class on Talk pages?

As per edits like this one I just did, is it possible to have a bot go around and check all the extant Talk pages of redirect pages, and confirm/change all of their WP banner assessment classes to 'Redirect'-class?... Seems like this should be an easy and doable task(?)... FWIW. TIA. --IJBall (contribstalk) 15:41, 10 June 2018 (UTC)[reply]

Just ran a petscan on Category:All redirect categories, which I assume includes all redirects, but which also contains 2.9 million pages. Granted, 1.4mil of these are in Category:Unprintworthy redirects (which likely do not have talk pages of their own), and there are probably another million or so with similar no-talk-page status, but that's still a metric buttload of pages to check. Not saying it can't be done (haven't really even considered that yet), just thought I'd give an idea of the scale of this operation. Primefac (talk) 16:25, 10 June 2018 (UTC)[reply]
No one says it needs to be done "fast"!... Maybe a bot can check just a certain number of redirect pages per day on this, so it doesn't overwhelm any resources. --IJBall (contribstalk) 16:59, 10 June 2018 (UTC)[reply]
I believe the banners automatically set the class to redirect when the page is a redirect, sp that would fall under WP:COSMETICBOT. Not sure what happens if |class=C is set on a redirect, but that should be easy to test. If |class=C overrides redirect detection, that would be suitable for a task. Headbomb {t · c · p · b} 03:39, 11 June 2018 (UTC)[reply]
I'm telling you, I just had to do this twice, earlier today – two pages that had been converted to redirects years ago still had still had |class=B on their Talk pages. It's possible that this only affects pages that were converted to redirects years ago, but it looks there is a population of them that need to be updated to |class=Redirect. --IJBall (contribstalk) 03:47, 11 June 2018 (UTC)[reply]
Setting |class=something overrides the automatic redirect class. This should be handled by EnterpriseyBot. — JJMC89(T·C) 05:11, 11 June 2018 (UTC)[reply]
Yup, as JJMC89 mentioned, this is EnterpriseyBot task 10. It hasn't run for a while, because I let a couple of breaking API changes pass by without updating the code. I'm going to fix the code so it can run again. Enterprisey (talk!) 05:45, 11 June 2018 (UTC)[reply]
If such a task is done, it's best to either remove the classification and leave the empty parameter |class=, or remove the parameter entirely. As Headbomb and JJMC89 have noted, redirect-class is autodetected when no class is explicitly set. This is true with all WikiProject banners built around {{WPBannerMeta}} (but see note), so setting an explicit |class=redir just means that somebody has to amend it a second time if the page ceases to be a redirect.
Note: there are at least four that are not built around {{WPBannerMeta}}, and of the four that I am aware of, only {{WikiProject U.S. Roads}} autodetects redir class; the other three ({{WikiProject Anime and manga}}; {{Maths rating}}; and {{WikiProject Military history}}) do not autodetect, so for these it must be set explicitly; moreover, those three only recognise the full form |class=redirect, they don't recognise the shorter |class=redir that many others handle without problem. --Redrose64 🌹 (talk) 07:54, 11 June 2018 (UTC)[reply]
Yes, I skip anime articles explicitly, and the bot won't touch the other two troublesome templates due to the regular expressions it uses.
A bigger problem concerns the example diff that started this thread. It's from an article in the unprintworthy redirects category. I thought the bot should have gotten to that category already, so I just went into to inspect the logs. Unbelievably, after munching through all of the redirect categories, it has finally gotten stuck on exactly that category (unprintworthy redirects). Apparently Pywikibot really hates one of the titles in it. I'm trying to figure out which title precisely, so I can file a bug report, but for now the bot task is on hold.
However, all of the other redirect categories that alphabetically come before it should only contain articles that the bot checked already. Enterprisey (talk!) 20:11, 18 June 2018 (UTC)[reply]
It was actually a bug in Pywikibot, so the bot's held up until a patch is written for that. Enterprisey (talk!) 18:56, 26 June 2018 (UTC)[reply]

New York Times archives moved

Diff

The new URL can be obtained by following Location: redirects in the headers (in this case two-deep). I bring it up because of the importance of NYT to Wikipedia, uncertainty how long redirects last and the new URL is more informative including the date. -- GreenC 21:46, 13 June 2018 (UTC)[reply]

Comment Looks like there are 29,707 links with "query.nytimes.com" Ronhjones  (Talk) 15:25, 9 July 2018 (UTC)[reply]

BRFA filed -- GreenC 15:44, 20 July 2018 (UTC)[reply]

Hi! Potentially untagged misspellings (configuration) is a newish database report that lists potentially untagged misspellings. For example, Angolan War of Independance is currently not tagged with {{R from misspelling}} and it should be.

Any and all help evaluating and tagging these potential misspellings is welcome. Once these redirects are appropriately identified and categorized, other database reports such as Linked misspellings (configuration) can then highlight instances where we are currently linking to these misspellings, so that the misspellings can be fixed.

This report has some false positives and the list of misspelling pairs needs a lot of expansion. If you have additional pairs that we should be scanning for or you have other feedback about this report, that is also welcome. --MZMcBride (talk) 02:58, 15 June 2018 (UTC)[reply]

Oh boy. Working with proper names variations are often 'correct', usage is context dependent so a bot shouldn't decide. My only suggestion is to skip words that are capitalized. For the rest use something like approximate (fuzzy) matching to identify paired words that are only slightly different due to spelling (experiment with the agrep threshold without creating too many false positives), then use a dictionary to determine if one of the paired words is a real word and the other not. At that point there might a good case for it being a misspelling and not an alternative name. This is one of those problems computers are not good at and is messy. Unless there is an AI solution. -- GreenC 14:23, 17 June 2018 (UTC)[reply]
Spelling APIs, some AI-based. -- GreenC 01:44, 28 June 2018 (UTC)[reply]

Misplaced brace

In this diff, I replaced } with | As a result, the article went from [[War Memorial Building (Baltimore, Maryland)}War Memorial Building]] to [[War Memorial Building (Baltimore, Maryland)|War Memorial Building]], and the appearance went from [[War Memorial Building (Baltimore, Maryland)}War Memorial Building]] to War Memorial Building. Is a maintenance bot already doing this kind of repair, and if not, could it be added to an existing bot? Nyttend (talk) 13:39, 21 June 2018 (UTC)[reply]

Probably best we get an idea of how common this is before we look at doing any kind of mass-repair (be it a bot or adding it to AWB genfixes). Could someone look through a dump for this sort of thing? I'm happy to do it if nobody else gets there first. ƒirefly ( t · c · who? ) 20:02, 21 June 2018 (UTC)[reply]
This has some false positives but also some good results. Less than 50. -- GreenC 15:56, 24 June 2018 (UTC)[reply]

Tag cleanup of non-free images

I noticed a couple of non-free images here on the enwiki to which a {{SVG}} tag has been added, alongside with a {{bad JPEG}}. Non-free images such as logos must be kept in a low resolution and size to comply with the Fair Use guidelines. Creation of a high-quality SVG equivalent, or the upload of a better quality PNG for these files should not be encouraged. For this reason I ask a bot to run through the All non-free media category (maybe starting from All non-free logos) and remove the following tags when they are found:

{{Bad GIF}}, {{Bad JPEG}}, {{Bad SVG}}, {{Should be PNG}}, {{Should be SVG}}, {{Cleanup image}}, {{Cleanup-SVG}}, {{Image-Poor-Quality}}, {{Too small}}, {{Overcompressed JPEG}}

Two examples: File:GuadLogo1.png and File:4th Corps of the Republic of Bosnia and Herzegovina patch.jpg. Thanks, —capmo (talk) 18:16, 28 June 2018 (UTC)[reply]

This seems like a bad idea to me. There's no reason a jpeg version of a non-photographic logo shouldn't be replaced with a PNG lacking artifacts, or a 10x10 logo be replaced with something slightly larger should there be a use case. As for SVGs of non-free logos, that has long been a contentious issue in general. Anomie 12:37, 29 June 2018 (UTC)[reply]
I agree, particularly relating to over-compressed JPEGs. We can have small (as in dimensions) images without them being occluded by compression artefacts. ƒirefly ( t · c · who? ) 13:40, 1 July 2018 (UTC)[reply]

Fix station layout tables

In the short term, could someone code AWB/Pywikibot/something to replace variations of <span style="color:white">→</span> (including variations with font tag and &rarr;) with {{0|→}}? This is for station layout tables like the one at Dyckman Street (IND Eighth Avenue Line). Colouring the arrow white is not really ideal when the intention is to add padding the size of an arrow. I'm not sure why there are still so many of them around.

In the long term, it would be nice to clean up these station layout tables further, perhaps even by way of automatically converting them to a Lua-based {{Routemap}}-style template (which does not exist yet, unfortunately). Most Paris Metro stations' articles have malformed tables, for instance, and probably a majority of stations have deprecated or incorrect formatting somewhere. Jc86035 (talk) 04:03, 2 July 2018 (UTC)[reply]

@Jc86035: How do we find the articles - are they categorised? or is it a "brute force" search and hope we find them all? ---
insource:/span style.*color:white.*→/ Gives 1818 Articles
insource: &rarr; Gives 1039 Articles.
Ronhjones  (Talk) 00:30, 4 July 2018 (UTC)[reply]
In the long run, I'd be happy to start sandboxing/discussing a way to make a station layout template/suite, and a bot run should be possible if it's shown that {{0}} is a better way to code it. Happy to do the legwork on the AWB side of things.
I do have a question, though: just perusing through pages, I found Ramakrishna Ashram Marg metro station , which has the "hidden" → but I'm not really sure why it's there - is it supposed to be padded like that? Also, for later-searching, a search of the code Jc86035 posted (with and without ") gives 1396 pages. I know it misses out on text that isn't exactly the same, but it's probably a good estimate of the number of pages using → as described. Primefac (talk) 13:54, 4 July 2018 (UTC)[reply]
It may well be a simple run with AWB. I will play devil's advocate and ask is there any MoS for s station page? It seems to me that the pages involved will be all US based - there seems to be a big discrepancy to layouts of equivalent sized stations in my side of the pond - where most the stations seem to have plumped for using succession boxes. I'm not saying either is correct, just should there be a full discussion at, say, Wikipedia:WikiProject_Trains with regards to having a consistent theme for all? Maybe then it might be worth construction a template system, like the route maps.Ronhjones  (Talk) 15:43, 4 July 2018 (UTC)[reply]
The station layout tables (such as that top left of Dyckman Street (IND Eighth Avenue Line)#Station layout) should not be confused with routeboxes (which are always a form of succession box). --Redrose64 🌹 (talk) 20:26, 4 July 2018 (UTC)[reply]
I was asking if there should not be a general station MoS, and maybe a template system for the station layout. The US subway stations all seem to have a layout table and no routeboxes, whereas, say, the London ones all have a routebox and no layout table. Maybe they need both. Also just using a plain wikitable tends to result in non-consistent layoutsRonhjones  (Talk) 16:53, 5 July 2018 (UTC)[reply]
@Ronhjones: In general I think it's a mess. Some station articles use the style of layout table used in most articles for stations in the US, some use the Japanese style with {{ja-rail-line}}, some use standard BSicon diagrams, some use Unicode diagrams, some probably use other styles, and some (like Taipei station) use a combination. I found an old discussion about some layout templates for the American style in Epicgenius's user space, but no one's written a Scribunto module with them (or used them in articles) yet. Jc86035 (talk) 17:48, 5 July 2018 (UTC)[reply]
Regarding a general MoS for stations: there certainly is inconsistency between countries, but within a country there is often consistency.
For articles on UK stations, we agreed a few years ago that layout diagrams were not to be encouraged. Reasons for this include: (i) in most cases, there are are two platforms and the whole plan may be replaced with text like "The main entrance is on platform 1, which is for services to X; there is a footbridge to platform 2 which is for services to Y."; (ii) for all 2,500+ National Rail stations, their website provides a layout diagram which is more detailed than anything that we can do with templates (examples of a simple one-platform station; a major terminus); (iii) trying to draw a layout plan for London Underground stations is fraught with danger since a significant number have their platforms at different levels or angles, even crossing over one another in some cases. --Redrose64 🌹 (talk) 18:23, 5 July 2018 (UTC)[reply]
I know, I was born in Barking... District East on Pt.2 (train doors open both sides), Hammersmith & City line ends Pt.3 (down the end Pt.2), District West on Pt.6... :-) Ronhjones  (Talk) 22:24, 5 July 2018 (UTC)[reply]
Since I was pinged here in Jc86035's comment, I suppose I'll put my two cents. I experimented with a modular set of station layout templates a couple of years ago. (See all the pages in Special:PrefixIndex/User:Epicgenius/sandbox that start with "User:Epicgenius/sandbox/Station [...]".) This itself was based off of {{TransLink (BC) station layout}}, which is used for SkyTrain (Vancouver) stations. Template:TransLink (BC) station layout is itself a modular template, and several instances of this template can be used to construct a SkyTrain station layout. epicgenius (talk) 23:33, 5 July 2018 (UTC)[reply]
@Primefac: Sorry I didn't reply about this earlier. The use of the "hidden" arrow on Ramakrishna Ashram Marg metro station is actually completely wrong, since it is in the wrong table cell, and the visible right arrow also being in the wrong place makes it pointless. A more correct example is in Kwai Hing station. Jc86035 (talk) 09:14, 6 July 2018 (UTC)[reply]
Ah, I see. I guess that would be my main concern, though I suppose the GIGO argument could be made... but then again I like to try and minimize the number of false positives that later have to be re-edited by someone else. Primefac (talk) 17:41, 6 July 2018 (UTC)[reply]

Bot to deliver Template:Ds/alert

Headbomb {t · c · p · b} 21:13, 2 July 2018 (UTC)[reply]

Ancient Greece

Not exactly bot work, but you bot operators tend to be good with database dumps. Can someone give me a full list of categories that have the string Ancient Greece in their titles and don't begin with that string, and then separate the list by whether it's "ancient" or "Ancient"? I'm preparing to nominate one batch or another for renaming (I have a CFD going, trying to establish consensus on which is better), and if you could give me a full list I'd know which ones I need to nominate (and which false positives to remove) if we get consensus.

If I get to the point of nominating them, does someone have a bot that will tag a list of pages upon request? I'll nominate most of the categories on one half of the list you give me, and there are a good number, so manually tagging would take a good deal of work. Nyttend (talk) 00:01, 6 July 2018 (UTC)[reply]

quarry:query/28035
Category Caps Notes
Category:Ambassadors of ancient Greece Lower Redirect to Category:Ambassadors in Greek Antiquity
Category:Articles about multiple people in ancient Greece Lower
Category:Artists' models of ancient Greece Lower
Category:Arts in ancient Greece Lower
Category:Athletics in ancient Greece Lower CfR: Category:Sport in ancient Greece
Category:Battles involving ancient Greece Lower
Category:Cities in ancient Greece Lower
Category:Coins of ancient Greece Lower
Category:Economy of ancient Greece Lower
Category:Education in ancient Greece Lower
Category:Eros in ancient Greece Lower Redirect to Category:Sexuality in ancient Greece
Category:Films set in ancient Greece Lower
Category:Glassmaking in ancient Greece and Rome Lower Redirect to Category:Glassmaking in classical antiquity
Category:Gymnasiums (ancient Greece) Lower
Category:Historians of ancient Greece Lower
Category:History books about ancient Greece Lower CfR: Category:History books about Ancient Greece
Category:Military ranks of ancient Greece Lower
Category:Military units and formations of ancient Greece Lower
Category:Naval battles of ancient Greece Lower
Category:Naval history of ancient Greece Lower
Category:Novels set in ancient Greece Lower
Category:Operas set in ancient Greece Lower
Category:Pederasty in ancient Greece Lower
Category:Plays set in ancient Greece Lower
Category:Political philosophy in ancient Greece Lower
Category:Portraits of ancient Greece and Rome Lower
Category:Prostitution in ancient Greece Lower
Category:Set indices on ancient Greece Lower
Category:Sexuality in ancient Greece Lower
Category:Ships of ancient Greece Lower
Category:Slavery in ancient Greece Lower
Category:Social classes in ancient Greece Lower
Category:Television series set in ancient Greece Lower
Category:Wars involving ancient Greece Lower
Category:Wikipedians interested in ancient Greece Lower
Category:Comics set in Ancient Greece Upper
Category:Festivals in Ancient Greece Upper
Category:Military history of Ancient Greece Upper
Category:Military units and formations of Ancient Greece Upper Redirect to Category:Military units and formations of ancient Greece
Category:Museums of Ancient Greece Upper
Category:Populated places in Ancient Greece Upper
Category:Transport in Ancient Greece Upper
Category:Works about Ancient Greece Upper CfR: Category:Works about ancient Greece
@Nyttend: See list above. If you tag one, I can tag the rest. — JJMC89(T·C) 04:53, 6 July 2018 (UTC)[reply]
You might also want to search for "[Aa]ncient Greek", as is Category:Scholars of ancient Greek history and Category:Ancient Greek historians. – Jonesey95 (talk) 05:06, 6 July 2018 (UTC)[reply]

HTML errors on discussion pages

Is anyone going to be writing a bot to fix the errors on talk pages related to RemexHtml?

I can think of these things to do, although there are probably many more things and there might be issues with these ones.

  • replace non-nesting tags like <s><s> with <s></s> where there are no templates or HTML tags between those tags
  • replace <code> with <pre> where the content contains newlines and the opening tag does not have any text between it and the previous <br> or newline
  • replace <font color=#abcdef></font> with <span style="color:#abcdef"></span> where the content does not only contain a single link and nothing else

Jc86035 (talk) 13:50, 15 July 2018 (UTC)[reply]

Changing font tags to span tags is a bit of a loosing battle when people can still have font tags in their signatures. -- WOSlinker (talk) 13:48, 16 July 2018 (UTC)[reply]
I was under the impression all of the problematic users had been dealt with, and all that was left was cleaning up extant uses. Primefac (talk) 17:24, 16 July 2018 (UTC)[reply]
Unfortunately not. There are a couple unworked tasks on the point in phab, but I've seen a few lately who use obsolete HTML. (Exact queries somewhere on WT:Linter I think.) --Izno (talk) 05:47, 17 July 2018 (UTC)[reply]
What is the use of fixing the old messages? I suggest you leave them as they are. This is not useful to change them. Best regards -- Neozoon 16:16, 21 July 2018 (UTC)[reply]
@Neozoon: The thing is, when there are HTML tags which are unclosed or improperly closed, their effects remain in place, not to the end of the message but to the end of the page. See for example this thread which contains two <tt> tags instead of one <tt> and one </tt>, so every subsequent post is in a monospace font. One tiny change - the addition of the missing slash - fixes the whole page. --Redrose64 🌹 (talk) 11:01, 22 July 2018 (UTC)[reply]
If we're going to fix font tags, one issue I'd really like to see fixed is when a font tag is opened inside a link label and never closed, since I see that one frequently while browsing archives. Enterprisey (talk!) 20:16, 29 July 2018 (UTC)[reply]
BRFA filed Basically does what Jc86035 suggested in the first bullet point Galobtter (pingó mió) 07:44, 3 August 2018 (UTC)[reply]

Populate Selected Anniversaries with Jewish (and possibly Muslim) Holidays

Right now there is a manual process to go year by year to the last year's date and remove the Jewish holiday and then find the correct Gregorian date for this year and put it in. This is because Jewish and Muslim holidays are not based on the Gregorian calendar. There is a website, http://www.hebcal.org for Jewish holidays that lists all the Gregorian dates for the respective Jewish holidays. I suggest a bot take that list and update the appropriate dates for the current/next year. Sir Joseph (talk) 19:29, 20 July 2018 (UTC)[reply]

Which articles are we talking about? Enterprisey (talk!) 20:15, 29 July 2018 (UTC)[reply]
Not articles, the Selected Anniversaries pages which then lands up on the main page. For example, Wikipedia:Selected_anniversaries/July_22 has a Jewish holy day for 2018 but in 2017 and 2019.... that day falls on a different date. So the idea is to get a list of days that are worthy of being listed and then go to the page for the prior year and delete the entry and then add the entry for the current year. Sir Joseph (talk) 23:38, 29 July 2018 (UTC)[reply]

Hello @Sir Joseph and Howcheng: - I'm thinking about and investigating this as a possible project, if you are still interested. It would help which Jewish/Israeli holidays are tracked on Selected Anniversary pages? -- GreenC 15:25, 22 August 2018 (UTC)[reply]

@GreenC:, yes and thank you for your interest. I know that hebcal is open source and has a data feed, but not sure how useful it is. My thinking is to create a subpage listing "Holidays to be posted on the Front Page" and then the bot goes to this year or last year and updates it. I'm sure you might have a better way to do it. Sir Joseph (talk) 15:44, 22 August 2018 (UTC)[reply]
http://www.hebcal.org appears to be expired now a click-farm. What I would need is a list of holidays, then I can see what resources are available. -- GreenC 16:22, 22 August 2018 (UTC)[reply]
my huge mistake, I meant hebcal.com Sir Joseph (talk) 16:25, 22 August 2018 (UTC)[reply]

Repair dead URLs from NYTimes

A while ago the NYTimes adjusted their website structure, effectively killing their 'select' and 'query' sub-domains. A few weeks back I updated the URLs for the 'query' sub-domain to point to their current URLs, and there were a few links that were broken but I simply repaired them manually. The 'select' sub-domain is proving to be more difficult. So far, around 12% of the links redirect to a "server error" or "not found" page - this becomes a significant amount when there are 19,000 links to the https://select.nytimes.com sub-domain from the mainspace, which would be approximately 2,000 dead links.

A quick solution I've found is to pop the URL into https://web.archive.org, and it'll have the correct redirect chain archived to the current, live URL. Example : https://select.nytimes.com/gst/abstract.html?res=F00611FC3A5A0C778EDDAD0894DB494D81 is a dead link, but https://web.archive.org/web/20160305140459/https://select.nytimes.com/gst/abstract.html?res=F00611FC3A5A0C778EDDAD0894DB494D81 eventually redirects to the live link. The desirable outcome would be to have all the old URLs updated to the current URL by using the redirects cached by WebArchive, but I do not think it is possible with IABot as it is currently configured. From my understanding, IABot would add the archive URL instead of updating the original URL to the live link. @Cyberpower678: would it be possible to tweak IABot for this batch to run through the list of URLs I provide and update the page with both the new URL under https://www.nytimes.com as well as an archive URL while we're at it? Jon Kolbert (talk) 04:20, 22 July 2018 (UTC)[reply]

I've been looking into it the past few days, and am quite confused by the URL taxonomy. For example https://query.nytimes.com/gst/fullpage.html?res=9E0CE2DD113BF93BA35751C1A964958260 still works. Notice the "fullpage" versus "abstract". There are also search queries that still work https://query.nytimes.com/search/query?ppds=des&v1=STRIKE%20THE%20GOLD%20(RACE%20HORSE) and "mem" (or deep archive) queries that work https://query.nytimes.com/mem/archive/pdf?res=9D0DE0DF163FEE3ABC4952DFB767838E659EDE. Though in the later case these now redirect to a new "timesmachine" sub-domain. Are there other URL types? When you say "I updated the URLs for the 'query' sub-domain" I don't understand as there are about 22,000 still in enwiki. That's an interesting idea to pull redirect URLs from Wayback, though not simple or guaranteed to always work (sometimes redirects point back on themselves in loops, or lead to soft404s). There is also a NYT API that might return the URL given an article title. -- GreenC 05:01, 22 July 2018 (UTC)[reply]
@GreenC: Ah, you're right. I only did it for the HTTP PDF links it seems (see Special:Diff/846028034 for example). There still remains work to be done on the query sub-domain as well. The majority of query and select URLs work just fine, it's only a select (heh) few that are problematic, the rest can be updated quite easily while we have the chance. Here's a random sample of 10 links from what I've tested so far : https://tools.wmflabs.org/paste/view/raw/fd073f61. Of them, only one doesn't work. Most are just fine and can be updated quite easily, and I plan on doing so shortly. It's the ones that end up in status=nf that are no easy fix. Jon Kolbert (talk) 05:25, 22 July 2018 (UTC)[reply]
@Jon Kolbert: That is not a tweak, that is far outside of the bot's original programming, and requires a separate bot. I would just advise a simple search and replace bot to replace the originals and let IABot get to them eventually.—CYBERPOWER (Chat) 14:11, 22 July 2018 (UTC)[reply]
@Cyberpower678: Okay, fair. The problem we face is finding a reliable method of finding the replace URLs for the 12% of dead URLs. Jon Kolbert (talk) 14:58, 22 July 2018 (UTC)[reply]

@Jon Kolbert: - There are about 12,000 articles containing one or more "select.nytimes.com". Are you testing all those, finding which ones are "nf" and in that set need help with discovering a working URL? How are you updating on wiki using programming tools or manually? -- GreenC 15:35, 22 July 2018 (UTC)[reply]

@GreenC: I have tested each of the 20K URLs and separated the results into two separate lists, one list contains the dead URLs and the other list contains the old URLs with live replacement URLs. I have been updating them semi-automatically using pywiki for efficiency reasons, it has much more customization than AWB but has the same effect (previewing the diff before submitting). As I've said, I'm still not entirely sure with what to do with the dead links but I have them all saved in a list. Jon Kolbert (talk) 15:48, 22 July 2018 (UTC)[reply]
Jon Kolbert - Ok great. Send me 5 or 10 of those dead URLs and I'll try sending them through my bot WP:WAYBACKMEDIC offline, which follows redirects at Wayback. I think it will log the final destination url. -- GreenC 16:11, 22 July 2018 (UTC)[reply]
@GreenC: Okay! Here is a sample. Let me know how it goes. Jon Kolbert (talk) 16:36, 22 July 2018 (UTC)[reply]
Jon Kolbert - It's not working because Wayback is not returning redirects rather working snapshots or no snapshot. However archive.is is working (sometimes). I put together this awk script which you can run. Download the file library.awk and create a list of URLs called "file.txt" like with the sample 10 above. It will print the original followed by the new.
awk -i ./library.awk 'BEGIN{for(i=1;i<=splitn("file.txt",a,i);i++) {sub(/^https/,"http",a[i]); match(sys2var("wget -q -O- \"http://archive.is/timemap/" strip(a[i]) "\""),/https?[:]\/\/archive.is\/[0-9]{14}/,dest);match(sys2var("wget -q -O- \"" strip(dest[0]) "/" strip(a[i]) "\""),/\|[ ]*url[ ]*[=][^|]*[^|]/,dest);sub(/\|[ ]*url[ ]*[=][ ]*/,"",dest[0]);print strip(a[i]) " | " dest[0]}}'
The script works by finding the archive.is URL via it's API (http://archive.is/timemap/) then web scrapes that page for the URL. -- GreenC 18:30, 22 July 2018 (UTC)[reply]
@GreenC: I have finished a good portion of the select.nytimes.com, I'm now analyzing all the query links so I can compile one large dead link list for both sub-domains after having fixed all the ones with redirects that work (for now). Jon Kolbert (talk) 21:10, 23 July 2018 (UTC)[reply]
Excellent. Seeing it in my watchlist. Another script for checking Wayback header Location: redirects, it might catch some more:
awk -i ./library.awk 'BEGIN{for(i=1;i<=splitn("file.txt",a,i);i++) {sub(/^https/,"http",a[i]); c = patsplit(sys2var("wget -q -SO- \"https://web.archive.org/web/19700101010101/" strip(a[i]) "\" 2>&1 >/dev/null"), field, /Location[:][ ]*[^\n]*\n/); sub(/Location[:][ ]*/, "", field[c]); print strip(a[i]) " | " strip(field[c])}}'
-- GreenC 23:35, 23 July 2018 (UTC)[reply]

Bot to remove religion parameter in bio infoboxes

2016 Policy RFC re religion in bio infoboxes Don't know if this has been discussed here previously. I've noticed several editors manually removing the religion from biographical infoboxes. Surely, there are thousands of these infoboxes that contain the religion entry. Wouldn't it make sense to just run a bot? — Maile (talk) 19:49, 29 July 2018 (UTC)[reply]

How does a bot know whether the parameter is appropriate for a certain topic? --Izno (talk) 21:14, 29 July 2018 (UTC)[reply]
FYI, it looks like there are roughly 4,000 instances of {{Infobox person}} that use the |religion= parameter. I am sure that there are other person-related infoboxes using this parameter as well. They will probably require a human editor, per the RFC outcome, to ensure that those cases in which the religion is significant to the article subject is adequately covered either in the body text or in a custom parameter. – Jonesey95 (talk) 21:31, 29 July 2018 (UTC)[reply]
Misunderstanding here. It's not by topic, or by individual person. It involves all instances of Template:Infobox person, which seems to used on 290,000 (plus) pages. The notation says, Please note that in 2016, the religion and ethnicity parameters were removed from Infobox person as a result of the RfC: Religion in biographical infoboxes and the RfC: Ethnicity in infoboxes as clarified by this discussion. Prior to 2016, the religion parameter was allowed, and there's no way of knowing how many hundreds of thousands of Infobox person templates have the religion there. The immediate result, is that the existing religion stated in the infobox remains in place, but just doesn't show up on the page. What random editors are doing is going to articles, one by one, and removing the stated religion from the infobox. My question is whether or not a bot could be run to go through all Infobox persons in place, and remove that entry — Maile (talk) 22:58, 29 July 2018 (UTC)[reply]
There is a way of knowing, which is how I came up with the estimate above. Go to Category:Pages using infobox person with unknown parameters and click on "R" in the table of contents. Every page listed under "R" has at least one unsupported parameter in Infobox person that starts with the letter "R". (Edited to add: see also Category:Infobox person using religion).
As to whether this is a task feasible for a bot, see my response above, which explains why this request probably runs afoul of WP:CONTEXTBOT. – Jonesey95 (talk) 01:00, 30 July 2018 (UTC)[reply]
Jonesey95, I'm not really sure this is a CONTEXT issue. The template doesn't show anything when |religion= is given (which, based on the TemplateData, is actually used 8000+ times). Now, I wouldn't want to run this bot purely to remove one bad parameter, but if I could get a list of the top 20 "bad params" (or anything with 50+ uses) and/or include |ethnicity= and |denomination= (which also appear to be deprecated) then we might be getting somewhere. Primefac (talk) 01:38, 30 July 2018 (UTC)[reply]
Following on from the above, there are (according to TemplateData) about 156 "invalid" params with 10+ uses, 75 with 25+ uses, 38 with 50+ uses, 20 with 100+ uses and 3 (religion, ethnicity, and imdb_id) with 2000+. There are about 2k invalid params all told, but the majority look like they're either typos (i.e. they are context-dependent) but by removing the most common ones it'll cut down the work load. Primefac (talk) 01:43, 30 July 2018 (UTC)[reply]
How does a bot determine that in cases in which the religion is significant to the article subject[, the person's religion] is adequately covered either in the body text or in a custom parameter? (words in brackets added to RFC outcome excerpt to avoid quoting the whole thing). As for other parameters, my experience with removing/fixing unsupported infobox parameters is that a large number of them are typos that need to be fixed, rather than removed. Maybe an AWB user with a bot flag can make these changes, but I don't see how an unsupervised bot would work reliably. – Jonesey95 (talk) 04:18, 30 July 2018 (UTC)[reply]
Because it's not a valid parameter. If |religion= is used in {{infobox person}} it does nothing, and thus there is zero reason to have it. As for the second half of your question - yes, the majority of the invalid parameters are typos, but the 2108 uses of |imdb_id= are not typos and could be removed without any issue (see my bot's tasks 7, 8, 10, 18, 20, 23, and 26 for similar instances of parameters being changed/removed). Primefac (talk) 16:01, 30 July 2018 (UTC)[reply]
Maybe a specific example will help. If there is a person, Bob Smith, who is Roman Catholic, and his religion is significant, and the religion was placed into the infobox by an editor in good faith, and that religion is not adequately covered either in the body text or in a custom parameter, the religion parameter in the infobox should not simply be removed. That's what the RFC says. – Jonesey95 (talk) 16:32, 30 July 2018 (UTC)[reply]
I suppose that's a fair point. The issue is that at this exact moment the template doesn't currently accept the parameter. Was the removal so that the "invalid" uses could be removed and the "valid" ones kept, so that it could be (at some point in the future) reinstated? Or will it never be reinstated and the religion parameter be reincarnated as something else entirely? Primefac (talk) 20:04, 30 July 2018 (UTC)[reply]
Based on the close it sounds more like the religion parameter shouldn't be used in {{infobox person}} and thus the template call should be changed to something more appropriate. Pinging Iridescent as the closer, mostly to see if I've interpreted that correctly. Primefac (talk) 20:07, 30 July 2018 (UTC)[reply]
The consensus of the RFC was fairly clear that the religion parameter should be deprecated from the generic {{infobox person}},and that in those rare cases where the subject's religion is significant a more specific infobox such as {{infobox clergy}} should be used. That the field was left in place and disabled rather than bot-blanked was, as far as I'm aware, an artefact of the expectation that those people claiming "the religion field is necessary" would subsequently go through those boxes containing it and migrate the infoboxes in question to a more appropriate infobox template. ‑ Iridescent 06:51, 31 July 2018 (UTC)[reply]

Lengthy post-RfC discussion at Template_talk:Infobox_person/Archive_31#Ethnicity?_Religion? .. -- GreenC 01:19, 30 July 2018 (UTC)[reply]

  • Just an FYI from my personal experience. The bigger issue is the infinite perpetuation of the religion parameter through copy and paste off an existing article. If the religion is not supported, it merely doesn't show in a given article. Where the issue perpetuates is when a new article is created, and the editor uses the infobox from another article as the template, changing what needs to be changed. That's what I do, and I've been here more than a decade. Why bother figuring out usage from a new blank when I know an article that already has the basics I need? The only reason I know the religion parameter is not supported, is because of editors who are manually correcting templates, one by one. Might there be a lot of other editors, newbies and long-timers, who do the copy-from-existing-article-and-paste method? — Maile (talk) 11:54, 31 July 2018 (UTC)[reply]
  • BRFA filed. Primefac (talk) 01:09, 11 August 2018 (UTC)[reply]

Moving/removing ticker symbols in article lead

Hello all. Per WP:TICKER, ticker symbols should not appear in article leads if the infobox contains this information. My idea is to remove ticker symbols from article lead (more specifically the first sentence) if the article has {{Infobox company}}, and adding parameter traded_as along with the ticker symbol if it did not contain such parameter:

{{Infobox company
|name = CK Hutchison Holdings Limited
|type = Public
|traded_as = {{SEHK|1}}
...}}

If the article does not contain any infobox, it will be moved in the "External links" section of the page, the alternative position mentioned in the information page.

==External links==
* {{Official website}}
* {{SEHK|1}}

I was doing this action repeatedly in the past few days and realized that it would be a great idea if a bot can do this, especially when there is not an accurate list of these problematic articles for human editors to reference from. Would like to help creating a bot but not too familiar in this field, now asking for any experienced editor to help. Cheers. –Wefk423 (talk) 12:46, 2 August 2018 (UTC)[reply]

TICKER isn't policy or guideline which makes consensus difficult. Also tricky to automate since everything is free floating in the lead section it might break layout or context if removed. Perhaps AWB semi-automated would be better? -- GreenC 10:31, 3 August 2018 (UTC)[reply]

Removing BLP templates from non-BLP articles

Hello. As an example: according to PetScan, there are 1050 articles belonging to the 2 categories "All BLP articles lacking sources" and "21st-century deaths". We have this kind of templates for people who died 50 or even 100 years ago: William Colt MacDonald, Joseph Bloomfield Leake. Thanks and regards, Biwom (talk) 04:34, 6 August 2018 (UTC)[reply]

Are they verifiably dead? If not we presume alive, unless they were born more than 115 years ago, see WP:BDP. --Redrose64 🌹 (talk) 09:40, 6 August 2018 (UTC)[reply]
Hello. You are correct, but two of the first things I can read at WP:BLP are "we must get the article right" and "requires a high degree of sensitivity". Having a big banner saying that the person is alive before a lede that states a date of death is both wrong and insensitive. Let's keep in mind that these so-called maintenance templates are visible for the casual Wikipedia reader.
On a side note: PetScan is giving me 162 results for "All BLP articles lacking sources" and "19th-century births". Thanks and regards, Biwom (talk) 11:54, 6 August 2018 (UTC)[reply]

Archive disambiguation bot

Something that's been bugging me for a while, but which I've been reminded recently by ClueBot (talk · contribs) is that section links to archived sections break all the time. So I'm proposing that we have a fully dedicated bot for this

When you have a section link like

  • [[Talk:Foobar#Barfoo]]

where the section link is broken, search through all 'Talk:Foobar/Archives', 'Talk:Foobar/

  • [[Talk:Foobar/Archives 2009#Barfoo]]<!--Updated by Bot-->

If you find multiple matches, instead tag the link with

  • [[Talk:Foobar#Barfoo]]{{Old section?|Talk:Foobar/Old#Barfoo|Talk:Foobar/Old 3#Barfoo}}

to render

Lastly, if you're looking in a dated archive (e.g. Archives/2008) or sequential (e.g. /Archives 19), then only search in the archives older than that. Headbomb {t · c · p · b} 15:09, 9 August 2018 (UTC)[reply]

I would like a modification made to the Search facility. Simply, I would like the cursor to be placed after the text of the first instance of the search. The reason for this is that it would make editing much quicker in that you don't have to search for the text (which is highlighted) and then place the cursor after it to make an update. — Preceding unsigned comment added by Ralph23 (talkcontribs) 02:44, 11 August 2018 (UTC)[reply]

Ralph23, this has nothing at all to do with bots or bot editing. Heck, I'm not even sure that it's a Wikipedia thing; I'm pretty sure this is browser-determined. Primefac (talk) 02:45, 11 August 2018 (UTC)[reply]

Lint Error elimination of poorly formatted italics....

Would it be possible for their to be a bot to look for and repair specifc LintErrors based on regexp?

https://en.wikipedia.org/w/index.php?title=Special:LintErrors/missing-end-tag&offset=70252016&namespace=0 was where I'd reach manually, but it's takign a while..

In this instance, the reqeust would be to search for mismatched italics, bold and SPAN's in a page ShakespeareFan00 (talk) 13:13, 11 August 2018 (UTC)[reply]

For the italics at least, I think this would be too context-dependent. For example, Special:Diff/854452962 contains two mismatched sets and it took me reading through it twice for the first one before I figured out where the closing '' was supposed to go. Primefac (talk) 13:27, 11 August 2018 (UTC)[reply]

Request for someone to run SineBot for me.

Hello Admins, I'd like to request someone to please run SineBot for me to auto-sign the signatures as I know nothing about software / website programming for me to use the bot myself. Thanks! --Mkikoen (talk) 16:30, 15 August 2018 (UTC)[reply]

Mkikoen, you sign your posts with ~~~~, and you shouldn't necessarily rely on SineBot to do it for you. However, SineBot does work automatically, so if you forget it will likely sign your posts for you. Primefac (talk) 16:33, 15 August 2018 (UTC)[reply]
@Primetac: Sorry but I don't use Wikipedia very often as I edit very infrequently. I know that you sign your messages by adding 4 tildes characters or if you're using the WikiEditor then you click on the Signature button but I sometimes forget to sign it manually as I usually assume the system does it for me automatically (at least from what I can somewhat recall from using other fandom wiki sites if I remember correctly). I just find it a little bit tedious as I'm still new to the editing process. I honestly hope one day there may be a preference setting to have logged in users like me automatically sign in only because like I said I don't much about all of this programming technical stuff. It's just for me to save time and an extra step. — Preceding unsigned comment added by Mkikoen (talkcontribs) 16:40, 15 August 2018 (UTC)[reply]
Well, as I said, SineBot already (usually) signs posts when the signature is left off, so there's nothing for you to do/request. Primefac (talk) 16:41, 15 August 2018 (UTC)[reply]
@Primetac: Oh it does automatically sign it if I forget? Okay, my mistake I didn't read that part of your previous message, Ritchie333 was the person who recommended this bot to me so I apologize if I wasted your time. I hope that maybe the SineBot feature will be implemented as a preference feature that logged in users can enable if they so wish to do so. Thanks. — Preceding unsigned comment added by Mkikoen (talkcontribs) 16:50, 15 August 2018 (UTC)[reply]
You should make every effort to sign your posts, as SineBot is not infallible. There's no "preference" to engage because it does it automatically for everyone (except when it doesn't...). Primefac (talk) 16:53, 15 August 2018 (UTC)[reply]
Just to clarify a few things, SineBot runs in the background and will (or is supposed to!) sign anyone's post within about 2-3 minutes if they've forgotten to do it themselves, provided that user has less than 800 edits. After that, the bot assumes they are used to signing posts. As you can see from this discussion, SineBot has signed one of Mkikoen's posts. Ritchie333 (talk) (cont) 16:56, 15 August 2018 (UTC)[reply]
Is the cutoff 800 edits? I was pretty sure I've seen it sign my posts before. And for what it's worth, I've been the one "signing" Mkikoen's posts, but only because I'm avoiding doing real work and getting the notifications right after they post. Primefac (talk) 16:58, 15 August 2018 (UTC)[reply]
@Primetac: Again, that's something I'm still not used to as I assume and would prefer the wikipedia system to auto-sign my messages as I have no experience with software / website programming. It's just to make it slightly easier for newcomer editors who edit very infrequently. I can't give any specific details on how the feature could maybe be implemented but I'm just trying to give out a trivial suggestion. (Finally remembered to click on the Signature and Timestamp button ... when it comes to editing articles or writing messages on talk pages, signatures is just not someone I prioritize as I just focus on accurately writing / editing the article or message carefully and just wait for the system to auto-sign it without it saying "preceding unsigned comment added by [Username here]." --Mkikoen (talk) 17:05, 15 August 2018 (UTC)[reply]
@Primetac: It's just a trivial suggestion for an update in the future as long as it's possible to add an auto-sign signature option in the logged in user's preferences page whether it functions as a bot or not. I guess I just need to remember to sign my messages manually for now until that feature possibly gets implemented if it causes slight additional work for someone else.— Preceding unsigned comment added by Mkikoen (talkcontribs)
@Mkikoen: sign your edits (see how). The software has no idea whether you're simply adding modifying an existing comment, modifying a template, adding a piece of text that should not be signed, or whatever. SineBot is a stopgap measure, not a solution. Headbomb {t · c · p · b} 17:14, 15 August 2018 (UTC)[reply]
@Headbomb: Very good point Headbomb. I'll keep in mind to sign my messages manually at least until a feature like that can be implemented in the future so everyone's messages can be auto-signed. --Mkikoen (talk) 17:27, 15 August 2018 (UTC)[reply]

SineBot was offline for the last two weeks and has only just returned, sometime in the past 24hrs or so. -- GreenC 17:42, 15 August 2018 (UTC)[reply]

Mass category change for location userboxes lists

I created new category Category:Lists of location userboxes to better organize userboxes. I would like to change (and add where it's missing) all entries of

[[Category:Lists of userboxes|.*]] 

to

[[Category:Lists of location userboxes|{{subst:SUBPAGENAME}}]]

in all subpages of following pages:

  1. WP:Userboxes/Life/Citizenship
  2. WP:Userboxes/Life/Origin
  3. WP:Userboxes/Life/Residence
  4. WP:Userboxes/Location
  5. WP:Userboxes/Travel

There are some exceptions, like WP:Userboxes/Location/United States/Cities should be [[Category:Lists of location userboxes|United States]], and not, [[Category:Lists of location userboxes|Cities]]. It would be nice if bot could distinguish such subpages (with titles equal to Cities, Regions, States, Nations), but it would be OK if it didn't — there is only a handful of such subpages — they can be updated later manually.

—⁠andrybak (talk) 12:40, 20 August 2018 (UTC)[reply]

It appears that only /Location has subpages, although /Travel has /Travel-2 and /Travel-3, which are a sort of parallel page that might be suitable for this categorization. I might have missed other pages that are not technically subpages. – Jonesey95 (talk) 16:37, 20 August 2018 (UTC)[reply]
Thanks. I guess going by page prefix would be better:
  1. Special:PrefixIndex/Wikipedia:Userboxes/Life/Citizenship
  2. Special:PrefixIndex/Wikipedia:Userboxes/Life/Origin
  3. Special:PrefixIndex/Wikipedia:Userboxes/Life/Residence
  4. Special:PrefixIndex/Wikipedia:Userboxes/Location
  5. Special:PrefixIndex/Wikipedia:Userboxes/Travel
—⁠andrybak (talk) 17:03, 20 August 2018 (UTC)[reply]

[r] → [ɾ] in IPA for Spanish

(reviving) A consensus was reached at Help talk:IPA/Spanish#About R to change all instances of r that either occur at the end of a word or precede a consonant (i.e. any symbol except a, e, i, o, or u, or j or w) to ɾ inside the first parameter of {{IPA-es}}. There currently appear to be about 1,140 articles in need of this change. Could someone help with this task with a bot? Nardog (talk) 12:55, 20 August 2018 (UTC) – Fixed Nardog (talk) 15:28, 20 August 2018 (UTC)[reply]

Hi Nardog - I have a script, but could you confirm how these testcases should look, perhaps add other unusual cases that might come up:
*{{IPA-es|aˈðrβr|lang}}
*{{IPA-es|aˈðr βr|lang}}
*{{IPA-es|aˈðr-βr|lang}}
*{{IPA-es|aˈðr|lang}}
*{{IPA-es|aˈðrer|lang}}
*{{IPA-es|aˈerβ|lang}}
*{{IPA-es|ri|lang}}
*{{IPA-es|r|lang}}
*{{IPA-es|r r r|lang}}
*{{IPA-es|ir er or|lang}}
Thanks, -- GreenC 14:42, 20 August 2018 (UTC)[reply]
@GreenC: Thanks for taking a stab at this. In principle, any instance of r that is followed by anything (including a space, | or }) except a, e, i, j, o, u, or w in the first parameter of {{IPA-es}} must be replaced with ɾ, so the first seven would be
*{{IPA-es|aˈðɾβɾ|lang}}
*{{IPA-es|aˈðɾ βɾ|lang}}
*{{IPA-es|aˈðɾ-βɾ|lang}}
*{{IPA-es|aˈðɾ|lang}}
*{{IPA-es|aˈðreɾ|lang}} <!-- the first [r] should also be [ɾ] anyway but that is not relevant here -->
*{{IPA-es|aˈeɾβ|lang}}
*{{IPA-es|ri|lang}}
and the final one *{{IPA-es|iɾ eɾ oɾ|lang}}. *{{IPA-es|r|lang}} and *{{IPA-es|r r r|lang}} should probably be unmodified, but I found no such occurrence by searching hastemplate:IPA-es insource:/IPA-es[^\|]*?\|[^\|\}]*?[ \|]r[ \|\}]/. As a side note, a few transclusions include a comment inside the first parameter, but those have been fixed so you can likely exclude them. I've also fixed some other idiosyncratic cases like this and this. Nardog (talk) 15:28, 20 August 2018 (UTC)[reply]
Alright that works. Since it's only 1,140 we can probably do this right away fully supervised. Would you be willing to help verify the edits? I can do 50 or so on the first run to make sure it's working correctly, and then 500 after that. If there are no problems in the 500 the rest can be finished with spot checks (it will be obvious there is a problem by looking at the edit byte count which will be uniform size 0 byte). -- GreenC 16:35, 20 August 2018 (UTC)[reply]
Sounds great! (ɾ is 3 2 bytes long so it would be +2 +1 per change, by the way.) Nardog (talk) 16:39, 20 August 2018 (UTC)[reply]
Nardog - Alright just did 73, hard to control exact number - see diffs for User:GreenC bot (ignore first 5 a typo in the script caused catastrophic fail). -- GreenC 17:06, 20 August 2018 (UTC)[reply]
@GreenC: Sorry, please add j and w to the list of symbols that can follow [r] (see the correction to my OP of this thread, I should have made it clearer). Other than that, the corrections are spot on. (2 bytes it was...) Nardog (talk) 17:31, 20 August 2018 (UTC)[reply]

500 done, I'll wait till tomorrow to finish the rest. -- GreenC 19:03, 20 August 2018 (UTC)[reply]

@Nardog: No response by anyone from yesterday's run which is a good sign so I went ahead and finished the rest it's a clean sweep (nice search formulation btw). This script is basically a cut and paste "bot complete" for anyone with a bot flag and OAuth credentials; it identifies the article names, downloads the wikisource, makes the changes and uploads the new article with edit summary. I can do it or anyone else in the future as needed. -- GreenC 15:58, 21 August 2018 (UTC)[reply]
Awesome ;) Nardog (talk) 16:48, 21 August 2018 (UTC)[reply]
Script

Dependencies: GNU awk, wikiget.awk, library.awk. MIT License User:GreenC 2018
./wikiget -a ": hastemplate:IPA-es insource:/IPA-es[^\|\}]*?\|[^\|\}\<]*?r[^aeijouw]/" > ipa; awk -ilibrary '{fp=sys2var("./wikiget -w " shquote($0)); c=patsplit(fp,field,/{{IPA-es[^}]*}}/,sep);for(i=1;i<=c;i++){patsplit(field[i],aa,/[|]/,a); sub(/r$/,"ɾ",a[1]); gsub(/r[ ]/,"ɾ ",a[1]); d=patsplit(a[1],b,/r[^aeioujw]/,bb);for(j=1;j<=d;j++) sub(/^r/,"ɾ",b[j]);a[1]=unpatsplit(b,bb); field[i] = unpatsplit(aa,a) } if(unpatsplit(field,sep) != fp){ print shquote($0); sys2varPipe(unpatsplit(field,sep), "./wikiget -E " shquote($0) " -S " shquote("[r] → [ɾ] in IPA for Spanish [[Help_talk:IPA/Spanish#About_R|per discussion]] and [[Wikipedia:Bot_requests#%5Br%5D_%E2%86%92_%5B%C9%BE%5D_in_IPA_for_Spanish|botreq]]") " -P STDIN")}}' ipa

Removing date headers

Many pages, such as the help desk, and all of the reference desks, have level 1 date headers for each day questions are asked. Scsbot automatically adds these headers at the beginning of each day. However, if no questions are asked a certain day, users have to manually remove the headers, which has to be done quite often for reference desks with less traffic. So I'm wondering, would it be possible to have a bot who removes these headers at the end of a day, if no questions were asked then?--SkyGazer 512 Oh no, what did I do this time? 01:39, 21 August 2018 (UTC)[reply]

Doing... @SkyGazer 512: It should be possible. Looks like they are all just a Heading 1 tags (single =) with a date, then a newline (from the next date addition). I'll start an in-depth look tonight. Ronhjones  (Talk) 15:25, 21 August 2018 (UTC)[reply]
@SkyGazer 512: I've created Category:Wikpedia Help pages with dated sections - it saves having to hard code page names (makes any later updating of any new help pages a dream), I'll just get the list of pages. Is that all the (current) pages that need fixing? How quick do you actually want the date removed? I see two options, examples...
  1. August 1 (no edits); August 2 (some edits); August 3 (current day) - remove August 1 (don't have to worry about when Scsbot adds the August 3 heading
  2. August 1 (no edits); August 2 (current day) - remove August 1 - need to make sure that Scsbot has been and updated the page (I note sometimes he can be a few hours late).
Obviously I would prefer option 1 - but 2 is possible, if necessary. Option 1 can be set to be not long after 00:00 UTC. I suspect option 2 would have to be 06:00 to ensure the other bot has edited (obviously if it hasn't edited, then it would skip) Ronhjones  (Talk) 18:46, 21 August 2018 (UTC)[reply]
If Scsbot (talk · contribs) adds them, would it not be possible to ask the botop (Scs) to amend that bot for this new request? What I suggest is that when Scsbot is about to add a date heading, it checks to see if the page presently ends with an unused level 1 date heading and if so, removes that before adding the new heading. --Redrose64 🌹 (talk) 18:52, 21 August 2018 (UTC)[reply]
That might work, I've not started any coding yet, I'll wait until he comments - I think he's using a shell script to add the date - not sure how well that will work on analysing the page and removing the unwanted dates. Ronhjones  (Talk) 19:28, 21 August 2018 (UTC)[reply]
I've pinged him by e-mail as he does not appear to log on often. Ronhjones  (Talk) 19:31, 21 August 2018 (UTC)[reply]
Imo, sooner's better. I personally would support option 2 (that is, remove the August 1 header as soon as the August 2 header is added), although if the first is easier, that would certainly be better than nothing.--SkyGazer 512 Oh no, what did I do this time? 20:43, 21 August 2018 (UTC)[reply]
No problem. We'll plan it for option 2, and see how it pans out. We'll wait for (Scs) to comment first, in case he can kill two birds with one stone. Ronhjones  (Talk) 23:04, 21 August 2018 (UTC)[reply]
Sounds great! Thank you.--SkyGazer 512 Oh no, what did I do this time? 23:11, 21 August 2018 (UTC)[reply]
No time for long explanations, but mods to Scsbot for this purpose are unlikely, so do carry on with Plan B. —Steve Summit (talk) 03:59, 22 August 2018 (UTC)[reply]
Coding... Thanks, Steve. Ronhjones  (Talk) 16:28, 22 August 2018 (UTC)[reply]

Vandalism from user:194.199.4.202

Hello this anonymous user is changing verified articles left and right. This is a vandalism

User:194.199.4.202