Wikipedia:Bot requests

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This is a page for requesting tasks to be done by bots per the bot policy. This is an appropriate place to put ideas for uncontroversial bot tasks, to get early feedback on ideas for bot tasks (controversial or not), and to seek bot operators for bot tasks. Consensus-building discussions requiring large community input (such as request for comments) should normally be held at WP:VPPROP or other relevant pages (such as a WikiProject's talk page).

You can check the "Commonly Requested Bots" box above to see if a suitable bot already exists for the task you have in mind. If you have a question about a particular bot, contact the bot operator directly via their talk page or the bot's talk page. If a bot is acting improperly, follow the guidance outlined in WP:BOTISSUE. For broader issues and general discussion about bots, see the bot noticeboard.

Before making a request, please see the list of frequently denied bots, either because they are too complicated to program, or do not have consensus from the Wikipedia community. If you are requesting that a template (such as a WikiProject banner) is added to all pages in a particular category, please be careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively (see example difference).

Note to bot operators: The {{BOTREQ}} template can be used to give common responses, and make it easier to keep track of the task's current status. If you complete a request, note that you did with {{BOTREQ|done}}, and archive the request after a few days (WP:1CA is useful here).


Please add your bot requests to the bottom of this page.
Make a new request


Bot to improve names of media sources in references[edit]

Many references on Wikipedia point to large media organizations such as the New York Times. However, the names are often abbreviated, not italicized, and/or missing wikilinks to the media organization. I'd like to propose a bot that could go to an article like this one and automatically replace "NY Times" with "New York Times". Other large media organizations (e.g. BBC, Washington Post, and so on) could fairly easily be added, I imagine. - Sdkb (talk) 04:43, 19 November 2018 (UTC)

  • What about the Times's page? The page says: 'The New York Times (sometimes abbreviated as the NYT and NY Times)…' The bot might replace those too and that might be a little confusing…The 2nd Red Guy (talk) 14:55, 23 April 2019 (UTC)
    • And this page too! Wait, what if it changes its own description on its user page?The 2nd Red Guy (talk) 15:43, 23 April 2019 (UTC)
  • I would be wary of WP:CONTEXTBOT. For instance, NYT can refer to a supplement of the Helsingin Sanomat#Format (in addition to the New York Times), and maybe is the main use of Finland-related pages. TigraanClick here to contact me 13:40, 20 November 2018 (UTC)
    • @Tigraan:That's a good point. I think it'd be fairly easy to work around that sort of issue, though — before having any bot make any change to a reference, have it check that the URL goes to the expected website. So in the case of the New York Times, if a reference with "NYT" didn't also contain the URL nytimes.com, it wouldn't make the replacement. There might still be some limitations, but given that the bot is already operating only within the limited domain of a specific field of the citation template, I think there's a fairly low risk that it'd make errors. - Sdkb (talk) 10:52, 25 November 2018 (UTC)
  • I should add that part of the reason I think this is important is that, in addition to just standardizing content, it'd allow people to more easily check whether a source used in a reference is likely to be reliable. - Sdkb (talk) 22:01, 25 November 2018 (UTC)
@Sdkb: This is significantly harder than it seems, as most bots are. Wikipedia is one giant exception - the long tail of unexpected gotchas is very long, particular on formatting issues. Another problem is agencies (AP, UPI, Reuters). Often times the NYT is running an agency story. The cite should use NYT in the |work= and the agency in the |agency= but often the agency ends up in the |work= field, so the bot couldn't blindly make changes without some considerable room for error. I have a sense of what needs to be done: extract every cite on Enwiki with a |url= containing nytimes.com, extract every |work= from those and create a unique list, manually remove from the list anything that shouldn't belong like Reuters etc.., then the bot keys off that list before making live changes, it knows what is safe to change (anything in the list). It's just a hell of a job in terms of time and resources considering all the sites to be processed and manual checks involved. See also Wikipedia:Bots/Dictionary#Cosmetic_edit "the term cosmetic edit is often used to encompass all edits of such little value that the community deems them to not be worth making in bulk" .. this is probably a borderline case, though I have no opinion which side of the border it falls other people might during the BRFA. -- GreenC 16:53, 26 November 2018 (UTC)
@GreenC: Thanks for the thought you're putting into considering this idea; I appreciate it. One way the bot could work to avoid that issue is to not key off of URLs, but rather off of the abbreviations. As in, it'd be triggered by the "NYT" in either the work or agency field, and then use the URL just as a confirmation to double check. That way, errors users have made in the citation fields would remain, but at least the format would be improved and no new errors would be introduced. - Sdkb (talk) 08:17, 27 November 2018 (UTC)
Right that's basically what I was saying also. But to get all the possible abbreviations requires scanning the system because the variety of abbreviations is unknowable ahead of time. Unless pick a few that might be common, but it would miss a lot. -- GreenC 14:54, 27 November 2018 (UTC)
Well, for NYT at the least, citations with a |url=https://www.nytimes.com/... could be safely assumed to be referring to the New York Times. Headbomb {t · c · p · b} 01:20, 8 December 2018 (UTC)
Yeah, I'm not too worried about comprehensiveness for now; I'd mainly just like to see the bot get off the ground and able to handle the two or three most common abbreviation for maybe half a dozen really big newspapers. From there, I imagine, a framework will be in place that'd then allow the bot to expand to other papers or abbreviations over time. - Sdkb (talk) 07:01, 12 December 2018 (UTC)
Conversation here seems to have died down. Is there anything I can do to move the proposal forward? - Sdkb (talk) 21:42, 14 January 2019 (UTC)
I am not against this idea totally but the bot would have to be a very good one for this to be a net positive and not end up creating more work. Emir of Wikipedia (talk) 22:18, 14 January 2019 (UTC)
@Sdkb: you could build a list of unambiguous cases. E.g. |work/journal/magazine/newspaper/website=NYT combined with |url=https://www.nytimes.com/.... Short of that, it's too much of a WP:CONTEXTBOT. I'll also point out that NY Times isn't exactly obscure/ambiguous either.Headbomb {t · c · p · b} 17:47, 27 January 2019 (UTC)
Okay, here's an initial list:

Sdkb (talk) 03:54, 1 February 2019 (UTC)

What about BYU to Brigham Young University?The 2nd Red Guy (talk) 15:41, 23 April 2019 (UTC)

Changing New York Times to The New York Times would be great. I have seen people going through AWB runs doing it, but seems like a waste of human time. Kees08 (Talk) 23:32, 2 February 2019 (UTC)

@Kees08: Thanks; I added in those cases. - Sdkb (talk) 01:19, 3 February 2019 (UTC)
Not really sure changing Foobar to The Foobar is desired in many cases. WP:CITEVAR will certainly apply to a few of those. For NYT/NY Times, WaPo/Wa Po, WSJ, LA Times/L.A. Times, are those guaranteed to a refer to a version of these journals that were actually called by the full name? Meaning that was there as some point in the LA Times's history were "LA Times" or some such was featured on the masthead of the publication, in either print or webform? If so, that's a bad bot task. If yes, then there's likely no issue with it. Headbomb {t · c · p · b} 01:54, 3 February 2019 (UTC)
For the "the" publications, it's part of their name, so referring to just "Foobar" is incorrect usage. (It's admittedly a nitpicky correction, but one we may as well make while we're in the process of making what I'd consider more important improvements, namely adding the wikilinks to help readers more easily verify the reliability of a source.) Regarding the question of whether any of those publications ever used the abbreviated name as a formal name for something, I'd doubt it, as it'd be very confusing, but I'm not fully sure how to check that by Googling. - Sdkb (talk) 21:04, 3 February 2019 (UTC)
The omission of 'the' is a legitimate stylistic variation. And even if 'N.Y. Times' never appeared on the masthead, the expansion of abbreviations (e.g. N.Y. Times / L.A. Times) could also be a legitimate stylistic variation. The acronyms (e.g. NYT/WSJ) are much safer to expand though. Headbomb {t · c · p · b} 21:41, 3 February 2019 (UTC)
It is a change I have had to do many times since it is brought up in reviews (FAC usually I think). It would be nice if we could find parameters to make it possible. Going by the article, since December 1, 1896, it has been referred to as The New York Times. The ranges are:
  • September 18, 1851–September 13, 1857 New-York Daily Times
  • September 14, 1857–November 30, 1896 The New-York Times
  • December 1, 1896–current The New York Times
New York Times has never been the title of the newspaper, and we could use date ranges to verify we do not hit the edge cases of pre-December 1, 1896 The New York Times articles. There is The New York Times International Edition, but it seems like it has a different base-URL than nytimes.com. I can go through the effort to verify the names of the other publications throughout the years, but do you agree with my assessment of The New York Times? Kees08 (Talk) 01:51, 4 February 2019 (UTC)

Is anyone interested in this? I still think it would save myself a lot of editing time. Headbomb did you have further thoughts? Kees08 (Talk) 16:21, 15 March 2019 (UTC)

@Kees08: I definitely still am, but I'm not sure how to move the proposal forward from here. - Sdkb (talk) 21:45, 21 March 2019 (UTC)

Auto-archive IP warnings[edit]

I imagine it's fairly confusing for IP users to have to scroll through lots of old warnings from previous users of their IP before getting to their actual message. We have Template:Old IP warnings top (and its partner), but it's rarely used—thoughts on writing a bot to automatically apply it to everything more than a yearish ago? Gaelan 💬✏️ 16:21, 10 January 2019 (UTC)

Technically feasible and is a good idea, IMO. Needs wider community input beyond BOTREQ. -- GreenC 17:09, 10 January 2019 (UTC)
Brought it to WP:VPR. Gaelan 💬✏️ 19:50, 11 January 2019 (UTC)
@Gaelan: FYI it was archived a while back to Wikipedia:Village pump (proposals)/Archive 156#Auto-archive old IP warnings --DannyS712 (talk) 05:39, 12 March 2019 (UTC)

It seems like there is community support to implement this from the discussions. Should be open another discussion to iron out the implementation details? If there is consensus to do this task with a bot, I am willing to do it. Kadane (talk) 05:45, 15 March 2019 (UTC)

Taxa[edit]

Bot to create entry in the (english) Wikipedia Category: Plants described in (year)[edit]

Data to be taken from Wikidata to give the the year of publication of a taxon and create "Category:Taxa described in ()" within the(English) wikipedia taxon entry, if a wikipedia enty has been created. MargaretRDonald (talk) 22:55, 22 January 2019 (UTC)

@MargaretRDonald: why? is there any support for such mass-creation? --DannyS712 (talk) 06:04, 1 March 2019 (UTC)
@DannyS712: Currently we have "Category:Taxa named by x" when a user links to the category, he/she gets a ridiculously uninformative list, which fails to include many of the plants cuthored by x for which there are wikipedia articles. If there were some automatic creation of the category for a plant article, then the only reason that a plant would be missing from the list of taxa authored would be that there was no wikipedia article. As it stands, the category:Taxa named by x is ludicrously unhelpful. See for example, Category:Taxa named by Ferdinand von Mueller. (I put this up here in the hope that others might consider the issue and perhaps do something about it. MargaretRDonald (talk) 06:13, 1 March 2019 (UTC)
@MargaretRDonald: Is this part of the request below? --DannyS712 (talk) 06:16, 1 March 2019 (UTC)
Hi @DannyS712: They are related, but slightly different. It is always clear who named the taxon (the final author). It is somewhat less clear the year in which it was described: with some wikipedia editors choosing the year of first the first publication, while others consider that the person(s) who gave the current name should get the year of publication too, in that, they have perfected (refined) the description. Thus, in Decaisnina hollrungii (K.Schum.) Barlow, the year in which the plant is described has been given as that of the publication by [[K.Schum.}, but there is no doubt that the taxon was named by Barlow. (I am not sure what the wikipedia consensus is on this!!) MargaretRDonald (talk) 06:38, 1 March 2019 (UTC)

Bot to create category "Category:Taxa described by ()"[edit]

The bot would use the wikidata taxon entry to find the auhor of a taxon, and then use it again to find the corresponding author article to find the appropriate author category. (This will not always work - but will work in large number of cases. Thus, the English article for "Edward Rudge" corresponds to the category:"Category:Taxa named by Edward Rudge", and the simple strategy outlined here would work for Edward Rudge, Stephen Hopper and .... The category created would be an entry in the article. MargaretRDonald (talk) 23:08, 22 January 2019 (UTC)

@MargaretRDonald: why? is there any support for such mass-creation? Also, what do you mean by the category created would be an entry in the article, and do you want "described by" or "named by"? --DannyS712 (talk) 06:05, 1 March 2019 (UTC)
@DannyS712: 1. See my answer to your preceding question. 2. There are two categories related to authorship and publication: (i) Category:Plants described in (year), and (ii) Category:Taxa named by (author). You can see how they are used in (for example) Velleia paradoxa. For my money I am not sure that I would really want to know what plants were described in 1810, but I would certainly like, when clicking on Category:Taxa named by Robert Brown, to be getting a complete list of wikipedia articles for which this is true. (Hope this explains why I think it important) MargaretRDonald (talk) 06:26, 1 March 2019 (UTC)
@MargaretRDonald: So basically, add "Plants described in ___" and "Taxa named by ___" to all currently existing taxa pages if they are missing? --DannyS712 (talk) 06:35, 1 March 2019 (UTC)
Yes. That would be great. That is, "Category:Plants described in ___" and "Category:Taxa named by ___" to the end of the taxon page.. MargaretRDonald (talk) 06:39, 1 March 2019 (UTC)
@MargaretRDonald: is this at all related to the |authority parameter in {{Speciesbox}} and its ilk? That would make this a lot simpler... --DannyS712 (talk) 06:51, 1 March 2019 (UTC)
@DannyS712: For the author, yes. It is the parameter |authority in {{Speciesbox}}. The year is not. It is found associated with the basionym in Wikidata entry (an entry which is often missing from wikidata, but if it exists that would be the safest place to take it from). Most articles show the author of the basionym (the name in the brackets), bur have no taxonomy section and even when they do it is unstructured text... So probably the year of the description is in the too-hard basket. (But as I indicated, I find the year category somewhat less important..) MargaretRDonald (talk) 07:07, 1 March 2019 (UTC)

And if we were to do this the result would be that we would get, e.g., a list of accepted taxa named by John Lindley, and not a whole ragtag list of plants where the assigning of the initial genus is now considered incorrect. In achieving that we could be a far better resource than IPNI. MargaretRDonald (talk) 06:57, 1 March 2019 (UTC)

@MargaretRDonald: I don't think that wikipedia is going to be a better resource than IPNI for this field - ~maybe~ wikispecies? In any event, this task is beyond my abilities, but hopefully my questions have made it clearer to others what you are requesting. --DannyS712 (talk) 07:06, 1 March 2019 (UTC)
@DannyS712: Probably not a better resource, but an extremely useful resource should it list only accepted names. (IPNI lists everything for an author and it then can require checking every name in say tropicos or Plants of the world to find which of them are accepted, a considerable task.) MargaretRDonald (talk) 07:14, 1 March 2019 (UTC)
@MargaretRDonald: yeah, but a bot shouldn't tag those without the same or comparable sources... --DannyS712 (talk) 07:35, 1 March 2019 (UTC)
(Not sure what you are trying to say here..) MargaretRDonald (talk) 07:42, 1 March 2019 (UTC) (In any case my comment on lists of accepted species was little more than a throw-away comment.) I just find it frustrating that "Category:Taxa named by Robert Brown" is not remotely within cooee of being so. And if the parameter authority in the species box were to be used it might just come within cooee of being so. MargaretRDonald (talk) 07:42, 1 March 2019 (UTC)
I'm saying that unless the reliable sources are there, we shouldn't be adding the category, especially not with a bot --DannyS712 (talk) 07:45, 1 March 2019 (UTC)
There are many reliable sources, which usually agree, but like all things requiring man-power, they can be out of sync and people in disagreement. I think it would be better if we used wikidata (with whatever its errors) to populate these categories. The result would be better than the entirely misleading stuff we have now where almost none of the taxa named by a person show up because of the failure by humans to populate the categories. MargaretRDonald (talk) 15:36, 15 March 2019 (UTC)

Detect Hijacked journals[edit]

Stop Predatory Journals maintains a list of hijacked journals. Could someone search wikipedia for the presence of hijacked URLs and produce a daily/weekly/whateverly report? Maybe have a WP:WCW task for it too? Headbomb {t · c · p · b} 00:09, 4 February 2019 (UTC)

This is a good idea. Made a script to scrape the site and search WP, it found three domains in 11 articles. -- GreenC 16:50, 4 February 2019 (UTC)
Extended content

https://scholarlyoa.com/other-pages/hijacked-journals/u[edit]

http://www.bnas.org/[edit]

  • Emma Yhnell <snippet>wins BSA Award Lecture | News | The British Neuroscience Association". www.bna.org.uk. Retrieved 2018-10-11. Video of Emma Yhnell speaking on public engagement</snippet>
  • Catherine Abbott <snippet>Neuroscience Day 2018 | Events | The British Neuroscience Association". www.bna.org.uk. Retrieved 2018-04-15. "Funding Panel membership | NC3Rs". www.nc3rs</snippet>
  • Irene Tracey <snippet>Winners 2018 Announced! | News | The British Neuroscience Association". www.bna.org.uk. Retrieved 2019-01-04. Tracey, Irene; Farrar, John T.; Okell, Thomas</snippet>
  • John H. Coote <snippet>"Professor John Coote | News | The British Neuroscience Association". www.bna.org.uk. British Neuroscience Association. Retrieved 4 December 2017. "John</snippet>

http://acjournal.in/journal-of-renewable-natural-resources-bhutan[edit]

@Headbomb: can post the report on a regular basis if there is a page. Script takes less than 20 seconds to complete so not expensive on resources. -- GreenC 17:02, 4 February 2019 (UTC)

@GreenC:, just a note, bnas.org/ ≠ bnas.org.uk/. Likewise, acjournal.in ≠ acjournal.org. Headbomb {t · c · p · b} 01:03, 6 March 2019 (UTC)
They were found with CirrusSearch (Elasticsearch) it got some close matches. -- GreenC 17:27, 7 March 2019 (UTC)
@GreenC: [1] is a better link than the above one for hijacked journals. It's pretty much the same as the old link, but this one is updated. In particular, there's an additional journal (Arctic, at the very bottom of the page).
There's a few place a report like that could be generated. Category talk:Hijacked journals seems as good a place as any. I'd suggest creating a section and just overwriting it every day (if there's a change). Headbomb {t · c · p · b} 17:46, 7 March 2019 (UTC)
How about WP:Hijacked journals / WP:HIJACKJOURNAL (an essay or how-to) that can define the meaning, describe the problem for wikipedia, link to external sites, and link to the bot-generated list as a sub-page. Nothing complicate but a central place for discussion and info that can be linked to from other pages. -- GreenC 18:08, 7 March 2019 (UTC)
If we have a dedicated page, Wikipedia:Reliable sources/Hijacked journals seems to be the natural place to me. Headbomb {t · c · p · b} 18:13, 7 March 2019 (UTC)
I don't want to mess with creating a sub-page in a Guideline document and the needed top-hat navigations etc. You can if you want let me know. -- GreenC 17:08, 8 March 2019 (UTC)

Credits adapted from[edit]

Thousands of articles about music artists, albums and songs reference the source in the body text (example: OnePointFive). Such references belong in a <ref> block at the end of the page and not in the body text. Most of these references follow a common pattern, so I hope this kind of edit can be made by a bot.

I suggest making a bulk replacement from

= =Track listing= = Credits adapted from [[Tidal (service)|Tidal]].<ref name="Tidal">{{cite web|url=https://listen.tidal.com/album/93301143|title=ONEPOINTFIVE / Aminé on TIDAL|publisher=Tidal|accessdate=August 15, 2018}}</ref>

to

= =Track listing<ref name="Tidal">{{cite web|url=https://listen.tidal.com/album/93301143|title=ONEPOINTFIVE / Aminé on TIDAL|publisher=Tidal|accessdate=August 15, 2018}}</ref>= =

Difference sources: Tidal (service), “the album notes”, “the album sleeve”, “the album notes”, “the liner notes of XXX” Different heading names, including “Track listing”, “Personnel”, ”Credits and personnel”. Variants: “Credits adapted from XXX”, “All credits adapted from XXX”, “All personnel credits adapted XXX”

Does this sound feasible/sensible? --C960657 (talk) 17:14, 28 February 2019 (UTC)

References should not be located in section titles. Pretty sure there is a guideline about it, and not good for a couple reasons. The correct way is current, create a line that says "Source: [1]" or something. -- GreenC 17:43, 28 February 2019 (UTC)
Citations should not be placed within, or on the same line as, section headings.WP:CITEFOOT — JJMC89(T·C) 03:38, 1 March 2019 (UTC)
Also (from MOS:HEADINGS): Section headings should: ... Not contain links, especially where only part of a heading is linked. Unless you use pure plain-text parenthetical referencing, refs always generate a link. --Redrose64 🌹 (talk) 12:41, 1 March 2019 (UTC)
You are right. It could not find a guideline on how to place the reference, if it is the source of an entire section/table/list. "Source: [1]" is a good suggestion, perhaps even moved to the last line of the section.--C960657 (talk) 17:25, 1 March 2019 (UTC)
Note, I replaced == with = = above so the bots that update the TOC of this page can function as normal. Headbomb {t · c · p · b} 22:12, 1 April 2019 (UTC)

Fix 'background' in sortable tables[edit]

See background (pardon the pun).

The idea is to change the css element background to background-color (and other similar attributes) in sortable tables (example). Headbomb {t · c · p · b} 19:14, 5 March 2019 (UTC)

@Magioladitis: The checkwiki team could also get in on this. Headbomb {t · c · p · b} 19:18, 5 March 2019 (UTC)
I would be sensitive here to whether there is another background style declared, as background is shorthand for a number of attributes. Otherwise seems like a good idea. --Izno (talk) 22:44, 5 March 2019 (UTC)
Oppose as written. Changing background to background-style would break all existing uses, because background-style is not a defined property. See CSS Backgrounds and Borders Module Level 3 for examples of valid property names. --Redrose64 🌹 (talk) 13:03, 7 March 2019 (UTC)
@Redrose64:, amended. I meant background-color, not background-style. Headbomb {t · c · p · b} 16:34, 7 March 2019 (UTC)

Bot to generate list of editor's creations which have been tagged for improvements[edit]

This would be useful for New Page Patrol: it would save us sending multiple messages about an editor's creations (which can cause upset) and show clearly what the problem is and what articles have been identified as needing improvements. This has been requested more than once of me by an editor and I've had to find and list them manually. It would also benefit other editors - I would love to look over which of my creations have tags and improve them. This would give creators (if they want to) the chance to make improvements and bring down the backlogs. Is it feasible? Thanks for looking into this, Boleyn (talk) 08:43, 9 March 2019 (UTC)

@Boleyn: Maybe ask at User talk:Community Tech bot? Currently, Wikipedia:Database reports/Editors eligible for Autopatrol privilege already tracks if a user's pages have been tagged, so they might be able to help (though the code is on github). That specific report is overseen by User:MusikAnimal (WMF), so pinging @MusikAnimal if they want to chime in. --DannyS712 (talk) 08:48, 9 March 2019 (UTC)
Thanks for the suggestions, DannyS712. Boleyn (talk) 08:57, 9 March 2019 (UTC)
I think this is better fit for an external tool, rather than a bot. I have debated for some time adding this functionality to XTools. The problem is the relevant maintenance categories are different on every wiki. I suppose we can just make them configurable. I'll look into it!
In the meantime, quarry:query/34173 is an example query you could use to find such articles. Note that this does not encompass all maintenance categories, just the major ones. You can fork the query and tweak it as desired. Best, MusikAnimal talk 18:42, 9 March 2019 (UTC)
Thaks, MusikAnimal, that's really helpful. Adamtt9, you may want to check this out, and thanks for raising the idea. Boleyn (talk) 08:28, 10 March 2019 (UTC)

Make Articles in Compliance with MOS:SURNAME[edit]

I've noticed that a lot of articles are not in compliance with MOS:SURNAME, especially in Category:Living people. I've manually changed a few pages, but as a programmer, I think this could be greatly automated. Any repeats of the full name, or the first name, beyond the title, first sentence, and infobox should not be allowed and replaced with the last name. I can help out in creating a bot that can accomplish this. InnovativeInventor (talk) 01:21, 21 March 2019 (UTC)

Just bumped into this: Wikipedia_talk:Manual_of_Style/Biography#Second_mention_of_forenames, so there should be detection of other people with the same last name. Additionally, this bot should intend to provide support for humans, not to automate the whole thing (as context is important). InnovativeInventor (talk) 03:57, 21 March 2019 (UTC)

@InnovativeInventor: Is this about the ordering of names in a category page, or about the use of names in the article prose? --Redrose64 🌹 (talk) 17:07, 21 March 2019 (UTC)
@Redrose64: This is about the reuse of names in the article prose and ensuring that the full name is only mentioned once (excluding ambiguous cases where the full name is necessary to clarify the subject of the sentence). InnovativeInventor (talk) 19:40, 21 March 2019 (UTC)
I don't like this, and I'm calling WP:CONTEXTBOT on it. Consider somebody from Iceland, such as Katrín Jakobsdóttir - the top of the article has
Or somebody from a family with several notable members - have a look at Johann Ambrosius Bach (which is quite short) and consider how it would look if we used only surnames: After Bach's death, his two children, Bach and Bach, moved in with his eldest son, Bach. --Redrose64 🌹 (talk) 21:05, 21 March 2019 (UTC)
@Redrose64: The idea is that this will be a human-assisted bot, not a completely automated bot. Just something that can speed up the process. I agree that it depends on the context. But, it would be nice to assist efforts to regularize articles that are not in compliance with MOS:SURNAME.InnovativeInventor (talk) 03:23, 22 March 2019 (UTC)
InnovativeInventor - Considering it will be human assisted, wouldn't it be better to include the functionality inside AWB or create a user script? Kadane (talk) 21:35, 22 March 2019 (UTC)
Kadane I think something that can crawl all of Wikipedia's bio pages would be better. Not sure though. I'm not familiar with the best way to help regularize all the bio pages. InnovativeInventor (talk) 23:46, 22 March 2019 (UTC)

A heads up for AfD closers re: PROD eligibility when approaching NOQUORUM[edit]

When an AfD discussion ends with no discussion, WP:NOQUORUM indicates that the closing admin should treat the article as one would treat an expired PROD. One mundane part of this process is specifically checking whether the article is eligible for PROD ("the page is not a redirect, never previously proposed for deletion, never undeleted, and never subject to a deletion discussion"). It would be really nice, when an AfD listing is reaching full term (seven days) with no discussion, if a bot could check the subject's page history and leave a comment on, say, the beginning of the listing's seventh day as to whether the article is eligible for PROD (a simple yes/no). If impossible to check each aspect of PROD eligibility, it would at least be helpful to know whether the article has been proposed for deletion before, rather than having to scour the page history. A bot here could help the closing admin more easily determine whether to relist or soft delete. More discussion here. czar 21:12, 23 March 2019 (UTC)

@Czar: preliminary thoughts:
  • not currently a redirect - detectable via api ([2])
  • Never previously proposed for deletion: search in past edit summaries?
  • Never undeleted - log events for the page ([3])
  • Never subject to a prior deletion discussion: check if the title contains 2nd, 3rd, etc nomination.
Does that sound about right in terms of automatically verifying prod eligibility? --DannyS712 (talk) 21:37, 23 March 2019 (UTC)
@DannyS712, I would add to #4: check the talk page for history templates indicating prior deletion listings. E.g., it's possible that the previous AfD was under a different article title altogether. (Since those instances would get complicated, would also be helpful for the AfD comment to note if the article was previously live under another title so the closer can manually investigate.) re: #2, I would consider searching edit summaries for either added or removed PRODs or mentions of deletion (as PRODs not added via script may have bad edit summaries). Otherwise this sounds great to me! czar 21:54, 23 March 2019 (UTC)
@Czar: okay, this seems like something I could do, but it would be a while before a bot was up and running. As far as I can tell, the hardest part will be parsing the AfD itself - how to detect if other users have cast a !vote, rather than just commenting, sorting the AfD, etc. Furthermore, since I'm not very original and implement most of my bot tasks via either AWB (not very usable in this case) and javascript, the javascript bot tasks are generally just automatically running a user script on multiple pages. So first, I will be able to have a script that alerts the user if an AfD could be subject to PROD, and then post such notices automatically. The first part is just a normal user script, so it (I think) doesn't need a BRFA, and I'll let you know when I have a working alpha and am ready to start integrating the second part. This will be a while though, so if anyone else wants to tackle this bot request I won't be offended :). Thanks, --DannyS712 (talk) 22:07, 23 March 2019 (UTC)
You should look to see how the AFD counter script counts votes. That aside, the first iteration can always just add the information regardless. --Izno (talk) 00:44, 24 March 2019 (UTC)
This seems vaguely related to this discussion on VPT. --Izno (talk) 00:41, 24 March 2019 (UTC)
Yes Izno, you are correct. I will make a note there that a bot request is the manner being pursued. I think your idea of an edit filter might also be useful. That would ensure the presence of a specific string of text in the edit summary which the bot could search for IAW #2. I agree that simply adding a message to the effect that the subject being discussed either is or is not eligible for soft deletion without relisting would be good for the initial iteration and suggest that it might be best to maintain that as the functional standard indefinitely. I do want to thank the many editors who have stepped up to assist in this effort. I am proud of my affiliation with such a fine lot. Sincerely.--John Cline (talk) 01:52, 24 March 2019 (UTC)

Indian settlements: updating census data[edit]

Most articles on settlements in India (eg. Bambolim) still use 2001 census data. They need to be updated to use the 2011 census data. SD0001 (talk) 18:10, 29 March 2019 (UTC)

Is 2011 Census data available on WikiData? Template:Austria metadata Wikidata provides an example template and User:GreenC bot/Job 12 was a recent BRFA to add the template to Austria settlement articles: Example. -- GreenC 19:16, 29 March 2019 (UTC)
I don't think they're there on wikidata. This site does provide the data in what could be considered machine-readable format, though. SD0001 (talk) 16:08, 30 March 2019 (UTC)
Another site is https://www.census2011.co.in If these sites were scraped and converted to CSV, the data could be uploaded to Wikidata via Wikipedia:Uploading metadata to Wikidata. Although this is a big job given the size of India, and the next census is in 2021, when it would be done over again. The number of potential locations must be immense, I went to http://www.censusindia.gov.in/pca/Searchdata.aspx and entered "Hyderabad" and it brought up a list of villages one having a population of 40 people, although which village of "Haiderpur" it is who knows as there are many listed. -- GreenC 17:28, 30 March 2019 (UTC)
The link I've given above already has the data in in Excel format. Ignore the columns part-A ebook and part-B ebook, what we need are the ones under "Town amenities" and "Village amenities". That's two Excel sheets for each of the 35 Indian states and union territories. Some of these files are phenomenally large as you said - Andhra Pradesh contains 27800 villages, for instance. SD0001 (talk) 20:54, 30 March 2019 (UTC)
Ah I see better. Checking Assam "Town Amenities" spreadsheet, for "Goalpara" (line #17), it has a population of 11,617 but our Goalpara says 48,911. If we assume this is for the Goalpara district it is 1,008,959, but in the spreadsheet it only adds up to about 20,000 (line #15-#18). Since most people there speak Goalpariya it seems unlikely there was a sudden population loss due to emigration. Are the spreadsheet numbers in some other counting system, or decimal offset? -- GreenC 22:34, 30 March 2019 (UTC)
GreenC, 11617 is the number of households. Population is 53430, which is reasonable. To get total population of Goalpara district, you need to add up populations in line #15-#25 plus line #2161-#2989 in 'Village amenities' sheet, which roughly gives a figure close to 1,008,959. SD0001 (talk) 23:22, 30 March 2019 (UTC)
Ah thanks again, SD0001! A program to extract and collate the data looks like the next step. I can't do it immediately as I am backlogged with programming projects. Extracting the data and uploading to Wikipedia per Wikipedia:Uploading metadata to Wikidata would be more than half the battle. Also ping User:Underlying lk who made the Wikidata instructions. -- GreenC 00:19, 31 March 2019 (UTC)
It seems like we have 2011 population figures for over 70,000 Wikidata entities, though once we only consider entities with an en.wiki article, it drops to less than 4,000.--eh bien mon prince (talk) 05:15, 31 March 2019 (UTC)
Interesting queries, thanks. Notice some Wikidata entries are referenced some not. Probably the data was loaded by different processes with variable levels of reliability and completeness. I would not be comfortable loading into encyclopedia until it has been checked against a known source and the source field updated. Found Administrative divisions of India helpful to understand the census divisions though the more I look the bigger and more complex it becomes. -- GreenC 14:14, 31 March 2019 (UTC)
@Magnus Manske: This might be Gameable. --Izno (talk) 15:57, 31 March 2019 (UTC)

CFDS tagging and listing for "eSports" categories[edit]

There was a request to move categories with "eSports" to "esports" per WP:C2D at WT:VG, but that list is sizable. Is there someone here who can take care of the listing and tagging? (Avoid the WikiProject assessment categories.) --Izno (talk) 18:04, 31 March 2019 (UTC)

@Izno: sure, I've added it to my current BRFA (WP:Bots/Requests for approval/DannyS712 bot 13) as a request for a trial. --DannyS712 (talk) 19:50, 1 April 2019 (UTC)
 Done --DannyS712 (talk) 22:31, 11 April 2019 (UTC)

WikiProject Civil Rights Movement[edit]

I'm trying to set-up a bot to perform assessment and tagging work for Wikipedia:WikiProject Civil Rights Movement. The bot would need to rely only on keywords present in pages. The bot would provide a list of prospective pages that appear to satisfy rules given it. An example of what the project is seeking is something similar to User:InceptionBot. WikiProject Civil Rights Movement uses that bot to generate report Wikipedia:WikiProject Civil Rights Movement/New articles. Whereas that bot generates a report of new pages, the desired bot would assess old pages. Mitchumch (talk) 16:27, 1 April 2019 (UTC)

At Wikipedia:Village pump (technical)#Assessment and tagging bot I didn't intend that you should try to set up your own bot. There are plenty of bots already authorised to carry out WikiProject tagging runs. Just describe the selection criteria, and we'll see who picks it up. --Redrose64 🌹 (talk) 19:46, 1 April 2019 (UTC)
The selection criteria are keywords on pages:
  • civil rights movement
  • civil rights activist
  • black panther party
  • black power
  • martin luther king
  • student nonviolent coordinating committee
  • congress of racial equality
  • national associaton for the advancement of colored people
  • naacp
  • urban league
  • southern christian leadership conference
Mitchumch (talk) 22:02, 1 April 2019 (UTC)

Deal with links to split article (Batting average)[edit]

About 6 months ago Batting average was split into a short parent article about the concept of batting average across sports and 2 child articles Batting average (cricket) and Batting average (baseball) dealing with the specifics of the metric in the individual sports. Articles related to each sport still point to the parent article but should generally point to the sport specific one. After some searches using AWB, I found just over 15k links to Batting average. Using a recursive category search, I found that Category:Cricketers, Category:Seasons in cricket and Category:Years in cricket account for about 3k links and Category:Baseball players, Category:Seasons in baseball, Category:Years in baseball about 12k. There are about 300 remaining links in none of these categories, I am working through those manually with AWB. As an aside, a lot of the baseball players have a link in both an infobox and in article text. I had the cricketer infobox changed already, as that had a hardcoded link to the parent article.

The plan would be to replace

  • [[Batting average]] with [[Batting average (cricket)|]]
  • [[Batting average|foo]] with [[Batting average (cricket)|foo]]

in the first set of categories and

  • [[Batting average]] with [[Batting average (baseball)|]]
  • [[Batting average|foo]] with [[Batting average (baseball)|foo]]

in the second set. A lot of the non-piped links use lower-case, so don't know if that needs another set of rules. I'm also assuming that the pipe trick works in bot edits, otherwise the replacement text will need to be slightly expanded. I can provide the lists I created of the links to the article, of the categories and then intersections if this helps. Spike 'em (talk) 20:27, 1 April 2019 (UTC)

pipe trick works in bot edits It does outside of references and other tags. --Izno (talk) 20:42, 1 April 2019 (UTC)
Why not:
  • [[Batting average]] with [[Batting average (cricket)|Batting average]]
It would be standard, and less error prone for other bots/tools. -- GreenC 14:26, 2 April 2019 (UTC)
Sure, no problem with that. As I said, the above relies on the pipe trick and it should be no different for a bot to replace the string with a slightly longer one. Spike 'em (talk) 14:46, 2 April 2019 (UTC)
Another idea is a bot could word check for "cricket" in baseball articles and "baseball" in the cricket articles and log those aside. To help avoid cases where a cricket article might be talking about baseball (rare for sure). -- GreenC 15:03, 2 April 2019 (UTC)
Could do. I did find 8 (all Australian) cricketers who were in both playing categories and did them manually, so there may be more. Spike 'em (talk) 15:08, 2 April 2019 (UTC)
As per comments on the BRFA page, there are a few (but less than 1%) articles in each category that has mention of the other sport. Many of these are in hat-notes. Spike 'em (talk) 11:16, 9 April 2019 (UTC)
BRFA filed -- GreenC 00:26, 3 April 2019 (UTC)

Population for Spanish municipalities[edit]

Adequately sourced population figures for all Spanish municipalities can be deployed by using {{Spain metadata Wikidata}}, as was recently done for Austria. See this diff for an example of the change.--eh bien mon prince (talk) 11:35, 11 April 2019 (UTC)

BRFA filed Well, since my bot for Austria is already written and completed, I might as well do this too. -- GreenC 13:47, 11 April 2019 (UTC)

Russia district maps[edit]

Replace image_map with {{Russia district OSM map}} for all the articles on this list, as in this diff. The maps are already displayed in the articles, but currently this is achieved through a long switch function on {{Infobox Russian district}}; transcluding the template directly would be more efficient.--eh bien mon prince (talk) 11:58, 11 April 2019 (UTC)

@Underlying lk: should be pretty similar to the German maps, right? --DannyS712 (talk) 22:31, 11 April 2019 (UTC)
Yes pretty much. In fact, the German template is based on this one.--eh bien mon prince (talk) 13:26, 12 April 2019 (UTC)
@Underlying lk: I can do this. I have a few BRFAs currently open, but once some finish I'll file one for this task --DannyS712 (talk) 04:20, 14 April 2019 (UTC)

Category:Pages using deprecated image syntax[edit]

Category:Pages using deprecated image syntax has over 89k pages listed, making manually fixing these not possible. Could a bot be created to handle this? --Gonnym (talk) 06:18, 12 April 2019 (UTC)

@Gonnym: I might be able to help, but can you give some examples of the specific edits that would need to be made (ideally with diffs) and how to screen for those? Thanks, --DannyS712 (talk) 06:26, 12 April 2019 (UTC)
Pages in this category use a template that uses Module:InfoboxImage in a {{#invoke:InfoboxImage|InfoboxImage|image={{{image|}}}|size={{{image_size|}}}|sizedefault=frameless|upright={{{image_upright|1}}}|alt={{{alt|}}}}} style that pass to the |image= field an image syntax in the format |image=File:Example.jpg. However, as per usual when dealing with templates, the exact parameters used and their names will differ between the templates. So for example:
  • {{Infobox television}} has {{#invoke:InfoboxImage|InfoboxImage|image={{{image|}}}|size={{{image_size|}}}|sizedefault=frameless|upright={{{image_upright|1.13}}}<!-- 1.13 is the most common size used in TV articles. -->|alt={{{image_alt|{{{alt|}}}}}}}}
  • {{Infobox television season}} has {{#invoke:InfoboxImage|InfoboxImage|image={{{image|}}}|size={{{image_size|{{{imagesize|}}}}}}|sizedefault=frameless|upright={{{image_upright|1}}}|alt={{{image_alt|{{{alt|}}}}}}}}
  • {{Infobox television episode}} has {{#invoke:InfoboxImage|InfoboxImage|image={{{image|}}}|size={{{image_size|}}}|sizedefault=frameless|alt={{{alt|}}}}}

Also, an image isn't the only value that can be passed in |image=File:Example.jpg, but it sometimes is combined with an image size and caption, which will need to be extracted and passed through the correct parameters. --Gonnym (talk) 06:37, 12 April 2019 (UTC)

@Gonnym: okay, now it looks way more complicated. Maybe 1 infobox at a time. Can you provide some diffs for a few different types of cases with an infobox of your choice? Thanks, --DannyS712 (talk) 06:41, 12 April 2019 (UTC)
  • The West Wing (season 3) ({{Infobox television season}}) has image=[[File:West Wing S3 DVD.jpg|250px]]. Instead it should be, |image=West Wing S3 DVD.jpg and |image_size=250px (it can also be without "px" as the module does that automatically).
  • Red Dwarf X has image=[[File:Red Dwarf X logo.jpg|alt=Logo for the tenth series of ''Red Dwarf''|250px]]. Instead it should be, |image=Red Dwarf X logo.jpg, |image_size=250px and |image_alt=Logo for the tenth series of Red Dwarf.
For a better systematic approach though, maybe it would be better finding out what the top faulty templates are, and create a mapping of what parameters the templates use and their names. If the bot can check the template name and know what parameters to use, this should speed things up.--Gonnym (talk) 07:00, 12 April 2019 (UTC)
@Gonnym: And now I'm completely lost. I don't think I'm the right bot op to help with this, sorry. --DannyS712 (talk) 07:02, 12 April 2019 (UTC)
I think someone could start with {{Infobox election}}, which appears to have roughly 11,000 articles in the error category. Here's a sample edit. Basically, for this template, you need to remove the initial brackets and the "File:" part of the image parameter value, then move the pixel specification (which may come in a variety of forms, like "x150px" or "150x150px") to the next line to a new |image#_size= parameter. The number "#" needs to match the image# parameter, e.g. |image2= gets |image2_size=. Drop me a line if this is confusing; I feel like it's a lot to explain in a short paragraph.
This may be a good mini-project to discuss at length at Category talk:Pages using deprecated image syntax. – Jonesey95 (talk) 07:59, 12 April 2019 (UTC)
In many cases, the |image_size=250px (or equivalent) may simply be omitted, because most infoboxes are set up to use a default size where none has been set (example). In my opinion, falling back to the default is preferable since it gives a consistent look between articles. --Redrose64 🌹 (talk) 12:46, 12 April 2019 (UTC)
Mostly true, but unfortunately, that is not the case at {{Infobox election}}, as you can see in this before-and-after comparison. – Jonesey95 (talk) 13:15, 12 April 2019 (UTC)
It appears that Number 57 (talk · contribs) is against the proposal. --Redrose64 🌹 (talk) 13:44, 12 April 2019 (UTC)
I guess I was pinged because of this edit? I don't really understand what is being discussed here, but removing the image size parameters like this edit means that the images in the infobox are different sizes – is this because there is no default size for this infobox, or the default size is for a single dimension (and not all photos have the same aspect ratio)? Can the default size be set to 150x150 (which is the most commonly used size)? Cheers, Number 57 13:52, 12 April 2019 (UTC)
{{Infobox election}} has a default size of 50px for |flag_image=, a 300px for |map_image#= and no default for |image#= which defaults then to frameless (which I'm not sure what it is). If there is a correct size that the template should use, then the template should probably be edited to handle it. --Gonnym (talk) 14:02, 12 April 2019 (UTC)
(edit conflict) @Number 57: If you use the |image1=[[File:Soleiman Eskandari.jpg|150x150px]] format it puts the page into Category:Pages using deprecated image syntax, because the parameter is intended for a bare filename and nothing else, as in |image1=Soleiman Eskandari.jpg. --Redrose64 🌹 (talk) 14:05, 12 April 2019 (UTC)
OK. I have no problem with using some other way to get matching image sizes, but if it is added as a default, it needs to be a two-dimensional, otherwise it ends up in a bit of a mess where images have different aspect ratios. Number 57 14:07, 12 April 2019 (UTC)
Redrose64: your edit, like my edit that I linked above (and self-reverted) resulted in image sizes that look bad. Either the template needs to be modified, or the image sizes need to be preserved in template parameter values within the article, but removing them changes the image rendering in a negative way in that article (and presumably others). – Jonesey95 (talk) 17:00, 12 April 2019 (UTC)

Wikipedia:Good articles/mismatches[edit]

The Wikipedia:Good articles/mismatches page details some conflicts with good articles and usually indicates a mistake of some sort that needs to be sorted out. Category:Good articles means that an article has the green spot that indicates it is classified as good, while Category:Wikipedia good articles are articles which have undergone a review. So the In Category:Good articles but not Category:Wikipedia good articles indicates that a good article symbol may be present on an article that has not actually undergone a review. Wikipedia:Good articles/all is a list of all good articles and is manually updated. The last two headings usually indicate articles that have not been added after passing a review or removed after being delisted.

This page was originally created by JJMC89 a year ago using AWB after I requested it. At the time it contained thousands of mismatches[4]. We have just resolved all those, mainly through the efforts of DepressedPer. I was hoping there could be a bot that would update the page periodically so we can keep on top of any further mismatches. I have tried running it myself through AWB, but the number of articles is too large to do in one hit. There was also an issue that articles that had been moved would show up as a mismatch if the name was different at the Wikipedia:Good articles/all page. Maybe there is a better workaround for this, the last time I just renamed the articles at the GA list but that was quite time consuming. Regards AIRcorn (talk) 04:47, 13 April 2019 (UTC)

@Aircorn: I can run it with AWB, my computer seems to be able to handle it. Do you know what AWB conditions they used? --DannyS712 (talk) 05:02, 13 April 2019 (UTC)
Thanks for the offer. I believe it is more a numbers thing than a power one (although my computer is certainly lacking the last one). From my understanding you need to be an administrator to run AWB above a certain number of entries. If I run it it misses a whole lot. As far as I can tell it was made by going to tools and using list comparer. You add each category as a source (or links in the case of Wikipedia:Good articles/all) and compare. It should show you which ones were unique in each list and which were common. The unique ones can then be saved and copied under the correct header in Wikipedia:Good articles/mismatches. It is not too laborious, but if I run it it only pulls 25000 from Category:Good articles when there are nearly 30000 entries. Also I don't know how to account for redirects. If you can figure out a better way to make it work it would be interesting to see how many new mismatches have occurred in the last year, but I was ultimately hoping for a more automated update process. AIRcorn (talk) 06:01, 13 April 2019 (UTC)
I would be happy to setup an automated cron-based bot on Toolforge do this. There would be no manual processes involved it would run at a set time and be totally hands off. It will pull the data via the API. This is something my tool wikiget does. I even have an example of how to do list compare in the documentation. -- GreenC 06:22, 13 April 2019 (UTC)
@GreenC: That sounds awesome. It probably doesn't need to run too often. I am not terribly tech literate, but let me know if I can help in some other way. AIRcorn (talk) 01:47, 15 April 2019 (UTC)
Ok following up at Wikipedia talk:Good articles/mismatches. -- GreenC 15:32, 15 April 2019 (UTC)
Petscan is your friend. In GA but not WGA; In WGA but not GA; in GA but not linked from Wikipedia:GA/All; linked from Wikipedia:GA/All but not in GA. You can get different output on the output tab. --Izno (talk) 13:34, 13 April 2019 (UTC)
Thanks. However, the first link has pulled all the articles in the category and the second none. Might be to do with one category being on the talk page and the other on the article page. AIRcorn (talk) 01:47, 15 April 2019 (UTC)

Request to add "List of Medal of Honor in non-combat incidents" in 185 articles of recipients that received them.[edit]

Request to add "List of Medal of Honor recipients in non-combat incidents" in 185 recipients that are still dated with the old main's article's title. — Preceding unsigned comment added by XXzoonamiXX (talkcontribs) 04:02, 14 April 2019 (UTC)

@XXzoonamiXX: can you explain? Do you just want the redirects to be bypassed? (Replace the old title with the new title in links?) --DannyS712 (talk) 04:12, 14 April 2019 (UTC)
There are 185 persons with the old main article's title that I just recently changed so yes replace the old title of each recipient with the new title I changed in the "See also" sections. XXzoonamiXX (talk) 04:22, 14 April 2019 (UTC)
@XXzoonamiXX: the old title is now a redirect to the new page, so that's not needed. See Wikipedia:Redirect#Do not "fix" links to redirects that are not broken for more. --DannyS712 (talk) 04:30, 14 April 2019 (UTC)
I'm not talking about that, I'm talking about editing and changing the old title into a new one in many recipenets "See Also" sections. Otherwise, i'll give people impression that it's what the old title implied rather than clicking on it for a deeper subject. — Preceding unsigned comment added by XXzoonamiXX (talkcontribs) 04:51, 14 April 2019 (UTC)
In that case, I suggest getting consensus at the article's talk page first. I'd be happy to file a BRFA once there is clear support for the changing of the links. Also, please remember to sign your comments. Thanks, --DannyS712 (talk) 05:09, 14 April 2019 (UTC)

"Accidents and incidents"[edit]

All of our articles and categories on transport "accidents and incidents" use that phrasing, as opposed to "incidents and accidents" (which is a line from "You Can Call Me Al"). However, there are a lot of section heads that are "== Incidents and accidents". I would like a bot to search articles for the phrasing "== Incidents and accidents ==" and replace it with "== Accidents and incidents ==". Can that be done?--Mike Selinker (talk) 19:13, 20 April 2019 (UTC)

Why? What difference does it make? I fail to see how a Paul Simon song should influence our choice of phrasing. --Redrose64 🌹 (talk) 22:27, 20 April 2019 (UTC)
That's just a random thing that might be causing some users to think that's the right format. The reason is that every article title and every category title uses the phrasing "Accidents and incidents" (there are hundreds of these). Only some articles' section heads use a different format. It's just about being consistent, which you can either value or not at your discretion.--Mike Selinker (talk) 22:59, 20 April 2019 (UTC)

Request for one-time run to tag pages with bare references with Template:Cleanup bare URLs[edit]

This is what I see to be a rather uncontroversial request which I have been doing manually for about a month or so now. In order to better identify pages that use bare URL(s) in reference(s) in an effort to get the URLs fixed, I am requesting that the {{Cleanup bare URLs}} tag be added to all pages by a bot which meet the following conditions:

  1. Has at least one instance of a <ref> tag immediately followed by http: and/or https:, followed by any combination of keystrokes and a </ref> closing tag when there are no instances of spaces between the <ref> and </ref> tags (underscores are okay).
    (In such aforementioned instances, the reference tags should not be enclosed inside a citation template.)
  2. There is currently not a transclusion of {{Cleanup bare URLs}} or any of its redirects on that page
  3. The page is in the "(article)" namespace

...From my experiences recently with tagging these pages, tagging the pages with the aforementioned parameters will avoid most, if not all, false positives.

I am requesting this run only once so that it doesn't need constant checks, and this should adequately provide an assessment on how many pages need reference url correction. Steel1943 (talk) 17:54, 22 April 2019 (UTC)

@Derek R Bullamore, MarnetteD, and Meatsgains: Pinging editors who I know either do work on or have worked on correcting pages tagged with {{Cleanup bare URLs}} in the past to make them aware of this discussion, and to see if there are any concerns or issues I'm not seeing at the moment. Steel1943 (talk) 17:58, 22 April 2019 (UTC)
  • Is there a group or individual who cleans up after these tags? Wouldn’t a report suffice? –xenotalk 18:30, 22 April 2019 (UTC)
    • @Xeno: The individuals that I'm aware of are pinged in the aforementioned comment. And unfortunately, a report would not suffice since a report does not place the respective pages in appropriate cleanup categories that the aforementioned editors monitor. In addition, the report may become outdated, whereas in theory, the tags on these pages should not since they tend to get removed once the bare reference urls on the pages are resolved; once the tag gets removed, then, of course, the page gets removed from the appropriate cleanup category. Steel1943 (talk) 18:52, 22 April 2019 (UTC)
  • I don't think this is a good idea. Bare URLs are so common and constantly being added it would tag a significant percent of the entire project. There is also context, like an article with 400 refs and someone adds a single bare URL, a banner would be overkill. If a bot were to do this it should probably search out the egregious cases like an article with > 50% bare citations. Reports can work if you do it right, see this report I recently created. It has documentation, regenerated automatically every X days, linked from other pages, etc.. -- GreenC 19:05, 22 April 2019 (UTC)
Further thought, a report could categorize pages by percentage of bare links so you can better allocate your time on which pages to fix and how. -- GreenC 19:07, 22 April 2019 (UTC)
The original proposal is likely to unearth tens of thousands of articles so affected - given the very small number of editors who work on the {bare URLs} cases, this might generate more problems than that small gang could possibly manage. The latter amendment(s) seem more feasible, but nevertheless we could still "dig up more snakes than we can kill", to borrow an old Texas expression. (This despite the fact that I am from the North of England !) I think a "dummy run" may be better, to get a true perspective of numbers. - Derek R Bullamore (talk) 19:26, 22 April 2019 (UTC)
  • I think that what Derek R Bullamore states may be a good starting point: Before (or in lieu of) a bot performs this task, is it possible for a bot or script to get a count of how many pages fall under the parameters I stated at the beginning of this thread? (I guess this goes somewhat in line with the "report" inquiry Xeno stated above.) Steel1943 (talk) 19:48, 22 April 2019 (UTC)
@MZMcBride: do you still make these kind of reports? –xenotalk 21:52, 22 April 2019 (UTC)
In previous effort, at least one bot has run which expanded the linked URL to at least include a title. I would guess a BRFA for that effort would succeed. I would see that as greatly preferential to any taggings. --Izno (talk) 00:23, 23 April 2019 (UTC)
Ideally this would be done manually or semi-manually (with assist of tools) as expanding citations is basically impossible to do well fully automated. CitationBot is a start as is refTool (hope those names are right). Those tools took years and they are still not reliable enough to be full auto. We could add a title and call it a day but not ideal. -- GreenC 00:50, 23 April 2019 (UTC)
I echo DRb's post though I would up the guestimate to more than a million articles that would need work. Years ago I tried to put a dent into "External links modified" (example Talk:War and Peace (film series)#External links modified) task and wound up being overwhelmed by the fact that more articles were being added than those that I had checked each time the bot did a new run. Now there was a time when edit-a-thons were arranged around tasks like this but I haven't seen one of those in years. GreenC's idea of limiting it to > 50% bare citations might be a workable solution. MarnetteD|Talk 01:18, 23 April 2019 (UTC)
Agreed - I think we begin with > 50% bare URLs to start and if we can manage to stay on top of those pages, then we can incrementally decrease the percentage. Meatsgains(talk) 02:48, 23 April 2019 (UTC)
  • As I was staring at this discussion thinking of a way to simplify this task, I can up with an idea for a way to update this proposal. How about something along these lines: Rather than being a one-time run bot task, the bot runs at certain intervals (such as once every couple days), and stops tagging pages when the respective cleanup category has a set number of pages tagged maximum (such as 75–100 pages)? This will keep the backlog manageable, but still keep bringing the pages with bare ref url issues to light. Steel1943 (talk) 14:54, 23 April 2019 (UTC)

I believe GreenC could do a fast scan (a little bit offtopic, but could that awk solution work with .bz2 files?). For lvwiki scan, I use such regex (more or less the same conditions as OP asked for) which works pretty well: <ref>\s*\[?\s*(https?:\/\/[^][<>\s"]+)\s*\]?\s*<\/ref>. For actually fixing those URLs, we can use this tool. Can be used both manually and with bot (it has pretty nice API). --Edgars2007 (talk/contribs) 15:36, 23 April 2019 (UTC)

I recently made a bot that looks for articles that need {{unreferenced}} and this is basically the same thing other than a change to the core regex statement, which User:Edgars2007 just helpfully provided. So this could be up and running quickly. It runs on Toolforge and uses the API to download each of 5.5M articles sequentially. The only question is which method: > 50%, or max size of the tracking category, or maybe both (anything over 50% is exempted from the category max size). The mixed method has the advantage of filling up the category with the worst cases foremost and lesser cases will only make it there once the worst cases are fixed. -- GreenC 17:51, 23 April 2019 (UTC)

Fine as far as it goes. I am still concerned about the small band of editors that mend such bare links, being potentially swamped by the sheer number of cases unearthed. Let the opening of the can of worms begin ! - Derek R Bullamore (talk) 19:59, 24 April 2019 (UTC)
Derek R Bullamore, I thought about this some more and think it would be easiest, at least initially, to limit the category to some number of entries (1000) and run weekly while working its way through the 5.5m articles ie. if the first article doesn't have a bare link when it is checked, it won't be checked again until all the others have been checked. If the category is being cleared by editors rapidly it can always be adjusted to run more frequent. -- GreenC 21:29, 24 April 2019 (UTC)
GreenC From my experience 1000 is still a massive number of articles and would take much more than a week to clean up. The situation would quickly become like the example I gave above. Don't forget other editors will still be adding bare url tags to articles so the number will exceed 1000 per 7 days. As far as I know there are only two or three editors who check and work on these regularly. We appreciate having a few days where there are only one or two in the category so we can focus on other editing. I can see a situation where we burnout and abandon the work completely. I would suggest a smaller number like 200 at most. Another possibility is to run 1000 but then not do another run until those have been finished. I probably should have mentioned this earlier but there is a wide range of problems to fix with the bare urls - some are easy and take a few seconds. Others are labor intensive and can require days to finish. For example I am currently working on Ordinance (India). Neither refill or reflinks could format these and I am having to do them one at a time. Now these are just my thoughts and others may feel differently. MarnetteD|Talk 21:52, 24 April 2019 (UTC)

MarnetteD, yes understand what you are saying. Was thinking, what about an 'on demand' system where you can specify when to add more, and how many to add - and it only works if the category is mostly empty, and maxes at 200 (or less). This is more technically challenging as it would require some kind of basic authentication to prevent abuse, but I have an idea how to do it. It would be done all on-Wiki similar to a bot stop page. This gives participants the freedom to fill the queue whenever they are ready, and it could keep a log page. Would that be useful? -- GreenC 19:19, 25 April 2019 (UTC)

That sounds good GreenC. It sure seems to address my concerns. If other editors are adding a batch of bare url tags the bot wouldn't be piling up more on top of those. Thanks for the suggestion. MarnetteD|Talk 20:17, 25 April 2019 (UTC)
Yeah, it looks a good idea - the best so far - particularly if it can be made to operate successfully. Bring it on. - Derek R Bullamore (talk) 22:08, 25 April 2019 (UTC)

noLDS.orgBOT[edit]

The Church of Jesus Christ of Latter-Day Saints recently gave an announcement about the correct name of the church[1]. Because of this announcement, the church site has been changed from lds.org to ChurchofJesusChrist.org, and the newsroom from mormonnewsroom.com to newsroom.ChurchofJesusChrist.org. Most wiki pages still have the old site linked. I need a bot to go through and change al the links. The only thing to be changed is the domain. The rest of the URLs are the same.

Thanks, The 2nd Red Guy (talk) 14:50, 23 April 2019 (UTC)

@The 2nd Red Guy: I suggest posting at Wikipedia:Link rot/URL change requests instead --DannyS712 (talk) 16:32, 23 April 2019 (UTC)
Oh, thanks, @DannyS712:. The 2nd Red Guy (talk) 16:36, 23 April 2019 (UTC)

References

  1. ^ an announcement about the correct name of the church Nelson, President Russell M. (7 October 2018). "The Correct Name of the Church - By President Russell M. Nelson". www.churchofjesuschrist.org. Retrieved 23 April 2019.