Wikipedia:Link rot/URL change requests

	; Archives (Index) ;
	2019/October; 2019/November; 2019/December; 2020/February; 2020/March; 2020/April; 2020/May; 2020/June; 2020/July; 2020/August; 2020/September; 2020/October; 2020/November; 2020/December; 2021/January; 2021/February; 2021/March; 2021/April; 2021/May; 2021/June; 2021/July; 2021/August; 2021/September; 2021/October; 2021/November; 2021/December; 2022/January; 2022/February; 2022/March; 2022/April; 2022/May; 2022/June; 2022/July; 2022/August; 2022/September; 2022/October; 2022/November; 2022/December; 2023/January; 2023/February; 2023/March; 2023/April; 2023/May; 2023/June; 2023/July; 2023/August; 2023/September; 2023/October; 2023/November; 2023/December; 2024/January; 2024/February;
	This page is archived by ClueBot III.

Shortcut

WP:URLREQ

This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.

apnews.com

Per request at Wikipedia:Administrators'_noticeboard#Major_source_problem_with_Associated_Press. -- GreenC 21:49, 30 October 2023 (UTC)[reply]

I created a list of pages with broken links at User:Bri/AP fixup pages. ☆ Bri (talk) 21:57, 31 October 2023 (UTC)[reply]

Per the above link, I'll hold off a while longer to see if AP fixes itself. They acknowledged the problem. Thanks for your search recipe results. -- GreenC 04:09, 2 November 2023 (UTC)[reply]

Looking at [1] (May 15, 2000) from ...Baby One More Time (album) the link remains dead. -- GreenC 17:29, 13 November 2023 (UTC)[reply]

reports.iihf.hockey

The "reports.iihf.hockey" website is not responding. Apparently, all those references will need to be rewritten to "stats.iihf.com". Maiō T. (talk) 13:45, 5 November 2023 (UTC)[reply]

Maiō T.: the bot edited 1,706 pages. It modified 8,795 URLs. Example: Special:Diff/1090651163/1184693466. Plus other misc. It found a few dozen dead links in 2009 IIHF Inline Hockey World Championship / Special:Diff/1178262869/1184678508. Maybe the URLs have a syntax error? -- GreenC 01:33, 12 November 2023 (UTC)[reply]

Thank you very much GreenC!

As for those wrong URLs, the word "inline" is missing there. Correct URL looks like this: https://stats.iihf.com/Hydra/inline/137/IHM137A04_74_5_0.pdf.

Maiō T. (talk) 10:56, 12 November 2023 (UTC)[reply]

It's OK. This task took the better part of a day because my boilerplate code wasn't up to the task due to the way the URLs were used in the article, took a while to figure out, so I was able to make improvements to the boilerplate for generalized future use. The missing inline is also fixed: Special:Diff/1184690551/1184846336 -- GreenC 00:21, 13 November 2023 (UTC)[reply]

metrolyrics.com

first reported at meta:User_talk:InternetArchiveBot#metrolyrics.com

Domain is dead and has template exposure. Reported by User:Billinghurst. -- GreenC 17:35, 7 November 2023 (UTC)[reply]

User:Billinghurst: the bot found about 40 pages that needed updating, and I couldn't find anything in template namespace. -- GreenC 02:44, 12 November 2023 (UTC)[reply]

@GreenC: There are three templates showing at wikidata, though only one has broad usage (Template:MetroLyrics song (Q13256314), which was the intent to mention at metawiki. I will drop a note on all the pertinent talk pages where it will go unseen, <shrug> — billinghurst sDrewth 07:31, 12 November 2023 (UTC)[reply]

User:Billinghurst: It was deleted Wikipedia:Templates_for_discussion/Log/2021_November_20#Template:MetroLyrics_song on enwiki. Unfortunately, my bot can't fix templates on other wikis, and I am not aware of a bot that can, because each wiki requires applying for and getting bot permissions. I mean, maybe it could, if I masqueraded as IABot, one of a handful of bots with pre-set global bot perms. It would require bespoke code though, a time consuming project. I'll think about it. -- GreenC 00:10, 13 November 2023 (UTC)[reply]

biblioteca.sernageomin.cl

This domain, linked multiple times mainly in citations, seems to frequently break. Is it possible to do a mass archive addition to its uses, 'specially in citations? Jo-Jo Eumerus (talk) 15:48, 11 November 2023 (UTC)[reply]

OK. Will do. -- GreenC 16:13, 11 November 2023 (UTC)[reply]

Jo-Jo Eumerus, the bot ran on 161 pages containing *.sernageomin.cl with the following results:

180 links were found to be dead and an archive URL was added
17 links were dead but no archive URLs are available and they are now marked {{dead link}}
9 citations changed |url-status=live to dead
10 links still work
IABot database updated for 300+ wikis where the links might exist

-- GreenC 23:58, 12 November 2023 (UTC)[reply]

Thanks. I do suspect that some of these "dead links" can be replaced by other links, though - is there a list somewhere? Jo-Jo Eumerus (talk) 09:20, 13 November 2023 (UTC)[reply]

17 dead links

Jo-Jo Eumerus: 4 of these links like in Jorquera (caldera) might be false positives ie. the link doesn't exist. -- GreenC 15:19, 13 November 2023 (UTC)[reply]

Yeah, seems like for some of them a replacement with https://catalogobiblioteca.sernageomin.cl/Archivos/ would work. I'll work on it. Jo-Jo Eumerus (talk) 17:23, 13 November 2023 (UTC)[reply]

webrecorder.io

Convert webrecorder.io archive URLs, like this example Special:Diff/1184954276/1184954464. -- GreenC 17:25, 13 November 2023 (UTC)[reply]

Done. -- GreenC 18:27, 13 November 2023 (UTC)[reply]

bookcritics.org

Domain has many soft-404. -- GreenC 00:59, 14 November 2023 (UTC)[reply]

Done. Edited 190 pages and fixed around 220 citations, most soft404. Sample -- GreenC 05:16, 15 November 2023 (UTC)[reply]

top10cinema.com

Although this says it is reachable, this link redirects elsewhere. Looks like a case of usurpation. Wonder how many such links there are. Kailash29792 (talk) 04:39, 15 November 2023 (UTC)[reply]

137. I will usurp them, but it might take a while because it will be part of the next WP:JUDI batch, which will take a while to find 30 or 40 domains to fill the next batch. Unless there is an urgent request. Added. -- GreenC 05:22, 15 November 2023 (UTC)[reply]

washingtonindependent.com

See Wikipedia:Reliable sources/Noticeboard#Washington Independent.
We should probably get rid of any link to the live domain (which is garbage and may be considered for blacklisting) and only archive.org snapshots from before 2015 should be used. When there is no old snapshot, the link/reference should be removed entirely. Is this possible?
Snapshot from September 2014 where the latest headlines are from January 2012. Snapshots of the homepage strongly suggest the site was down from 2015 to 2019. In 2020 the domain expired, notice from godaddy saying "This domain name expired on 6/22/2020". — Alexis Jazz (talk or ping me) 14:27, 15 November 2023 (UTC)[reply]

User:Alexis Jazz: I have a setup for this kind of thing done it before including deleting cites without an archive URL. Some of the articles look pretty legit like here. This article was published in 2009 but the live link says 2020. I think your right with this solution. Change the status to usurped, since the domain was taken over by an unknown party who made incorrect modifications. The only thing is if you blacklist I would not be able to help because the blacklist would block my bot from making changes. 122 pages. -- GreenC 15:45, 15 November 2023 (UTC)[reply]

GreenC, yes, they republished some of the original articles which makes it confusing. If you go over the archives you'll notice that while the original was written by Spencer Ackerman the author in 2021 was Ceri Sinclair. In the current version no author is named at all. These republished versions, even if the text is identical, shouldn't be trusted either as the author and date are unreliable and it's unlikely they have a license to publish those articles. So there should be an archived link and it should be old. Reliable newspapers don't write articles titled "The 5 best online casinos in the USA right now".
I'd only request blacklisting after all existing links have been usurped. Btw, bots have the sboverride right so even blacklisting shouldn't be an issue for that? (besides, the bot would be removing the URL so the edit should never be blocked?) — Alexis Jazz (talk or ping me) 16:48, 15 November 2023 (UTC)[reply]

User:Alexis Jazz, Tracked at WP:JUDI where it is now in queue. It's not a JUDI case, but in effect the same thing (usurped domain) from the bot's perspective. How soon do you want this done? I normally run them in batches of 30 or 40 domains it's easier, but if you want to get it blacklisted I could push it through sooner, currently only 3 domains in the queue. I hope sboverride is working now. -- GreenC 04:52, 16 November 2023 (UTC)[reply]

GreenC, thanks! There's no hurry, it's not high volume. — Alexis Jazz (talk or ping me) 05:26, 16 November 2023 (UTC)[reply]

sboverride should work in theory. I see it on the official list at Special:ListGroupRights. –Novem Linguae (talk) 05:28, 16 November 2023 (UTC)[reply]

nationalgeographic.com

Many soft-404s. -- GreenC 01:41, 18 November 2023 (UTC)[reply]

..also *.natgeotv.(com|org).* eg. www.natgeotv.com.au GreenC 16:54, 20 November 2023 (UTC)[reply]

Results for nationalgeographic.(com|org):

Articles checked: 9,495
Articles edited: 7,732
Add new archive URL: 8,213
Switch |url-status=live to dead: 1,165
IABot database updated for 300+ wikis

-- GreenC 05:12, 24 November 2023 (UTC)[reply]

For natgeotv: checked 266 articles, edited 190 articles, added 153 archive URLs, change 19 url-status -- GreenC 19:50, 25 November 2023 (UTC)[reply]

chartattack.com

@GreenC, perhaps this one also fits JUDI? Chart Attack used to be a paper magazine which was ~~probably reliable at least for simple statements.~~ Sorry, wrong link, it was (until May 2018) generally reliable per Wikipedia:WikiProject Albums/Sources / WP:RSN. Now it's just garbage. 2023: https://www.chartattack.com/best-crypto-investment/.
Was it garbage in 2020? [2]: "not many people know how to play online pokies and earn money."
Was it garbage in 2019? Probably, but not quite as obvious.
In 2018 the site looked rather different and actually had a focus on music. ~~I don't think a cutoff date is needed for this one.~~ Better safe than sorry: cutoff date May 24, 2018. Possibly bad links seem limited in numbers, they can probably all be found within these 38 results. (which aren't even all bad and few enough to comb through by hand) Should be usurped though. — Alexis Jazz (talk or ping me) 10:28, 18 November 2023 (UTC)[reply]

There are 1,300 pages. Yeah there was a major change in 2018 or 2019 to the site's focus. Not sure what to call this kind of content, Google search traps? Like they check for popular Google search's, then write a think piece about that topic to capture the search traffic, and monetize it with adds. The content could be generated semi-automated via AI so it's low cost.

The old site "About" page says (April 2018): "Chart Attack is a guide to indie and alternative music, based out of Toronto, Canada, online since 1996. We're dedicated to showcasing great music that pushes expectations of genre". The name fits. They have an editor in chief and freelance authors. No problem. The new site has nothing to do with this. It is a usurpation. It's a lot of pages to usurp but I think you are right. -- GreenC 23:20, 18 November 2023 (UTC)[reply]

Waybck shows the domain was abandoned around May 24, 2018. Another site andpop.com had it within the next month into 2019. Then the reseller worldclassnames got it, and sold it to the current owners in April 2019. -- GreenC 23:36, 18 November 2023 (UTC)[reply]

Oh Chart Attack has more info. -- GreenC 23:40, 18 November 2023 (UTC)[reply]

GreenC, ah, okay there's your cutoff date then, May 24 2018.
In some cases articles from the original site are reproduced here as well, e.g. [3] vs. [4]. Note the change in author name, just like Washington Independent. Reproducing the original content is probably just SEO. — Alexis Jazz (talk or ping me) 00:00, 19 November 2023 (UTC)[reply]

Added to JUDI's queue. I might get to it sooner than later due to the number of pages. -- GreenC 00:47, 19 November 2023 (UTC)[reply]

vh1.com

I noticed http://www.vh1.com/news/articles/1497672/03022005/mudvayne.jhtml just redirects to https://www.facebook.com/VH1/ which is less than helpful. Then I noticed even http://www.vh1.com/ redirects to Facebook.
The article in question was archived and is actually still live at https://www.mtv.com/news/xu79dk/mudvayne-lose-the-makeup-find-inspiration-in-isolation so if any article wasn't archived it could be worth having a log of what failed so someone could search mtv.com for it. — Alexis Jazz (talk or ping me) 10:35, 18 November 2023 (UTC)[reply]

For me VH1 doesn't redirect to Facebook. The mudvayne.jhtml link is a 404, but a vh1.com 404 landing page. https://www.vh1.com goes to the site's home page which looks like this (archive snapshot from today). Maybe the Facebook redir was temporary? -- GreenC 22:36, 18 November 2023 (UTC)[reply]

GreenC, huh? No, it redirects to Facebook, really.
I need to WP:EVADEGDPR I guess. If half(?) of our readers can't access it we might as well treat it as being dead. (and either way, if the article link is 404 for you we have link rot at any rate) — Alexis Jazz (talk or ping me) 00:02, 19 November 2023 (UTC)[reply]

That's unfortunate. I'm not sure what community consensus is. The EVADEGPR page says "Don't use the Wayback Machine as a free proxy". I can/will certainly process the domain for 404s and soft-404s. -- GreenC 00:51, 19 November 2023 (UTC)[reply]

GreenC, I wrote the EVADEGDPR page, what that line means (and I'll go clarify that now..) is that you shouldn't systematically save pages just so you can personally view them once, but if you suspect it'll be a useful reference it's fine to save them. The thought behind it is that wasting archive.org's storage to look at garbage or random links is a bad idea, you should use a VPN or proxy for that. But if it's actually valuable, no problem, save it.
I'm unsure consensus exists for how to handle links that are live but geographically restricted. — Alexis Jazz (talk or ping me) 01:26, 19 November 2023 (UTC)[reply]

This has come up before over the years (can't say where now), and there has been dispute about archiving sites for the purpose of bypassing policy blocks. It's impossible to keep up with, policies change, and particularly for regional blocks it forces everyone else to default to the archive instead of live page. Sometimes I'll do it for a limited set of pages within a domain that are paywalled, but for an entire domain, that would be hard without consensus (nearly 3,000 pages). Maybe could see adding archive URLs keeping the status live, but my bot is not setup for it never done it before. Trying to keep up with and work around policy changes like this is a nightmare. -- GreenC 02:27, 19 November 2023 (UTC)[reply]

Just adding archive URLs but keeping the status live (where the link is actually live and not 404 like mudvayne.jhtml) would help a lot.
If your bot isn't set up for that, perhaps another bot could handle the currently live but geo-restricted links? — Alexis Jazz (talk or ping me) 04:59, 19 November 2023 (UTC)[reply]

OK first pass will fix the dead links. Then I'll try another pass adding archives to live links in CS1|2 templates with url-status live. Not sure yet about square and bare links. This will take some time. I'm currently in the jungle with nationalgeographic which has over 8,000 pages and many edge cases to discover. -- GreenC 16:04, 19 November 2023 (UTC)[reply]

Step 1: fix dead links

check 2,887 articles containing vh1.com
edit 2,030 articles
add 1,905 archive URLs (404s and soft-404s mostly the later)
modify 202 |url-status=live -> dead
add 203 {{dead link}} tags ie. no archives available

Step 2: add archive URLs to CS1|2 that have no archive URL, and set |url-status=live

check 2,887 articles
edit 470 articles
add 987 archive URLs

User:Alexis Jazz: this is done. -- GreenC 22:56, 1 December 2023 (UTC)[reply]

GreenC, thanks! Is there a way to find those 203 articles that were tagged with {{dead link}} so I can search mtv.com for those articles? — Alexis Jazz (talk or ping me) 16:23, 2 December 2023 (UTC)[reply]

User:Alexis Jazz: here are 154 pages with 203 URLs my bot marked with {{dead link}} (there might be others preexisting). BTW I noticed many of the archive URLs are poor quality, due to music videos in the source links, the archive providers often have trouble with video. -- GreenC 16:37, 2 December 2023 (UTC)[reply]

Extended content

usemod.com

Moved from Wikipedia talk:Link rot/cases/Judi

Can usemod.com go on WP:JUDI? See https://en.wikipedia.org/wiki/Special:LinkSearch?target=*.usemod.com

Is the bot able to update links? http://www.usemod.com/cgi-bin/mb.pl?GoodBye should be http://meatballwiki.org/wiki/GoodBye WhatamIdoing (talk) 06:02, 28 November 2023 (UTC)[reply]

WhatamIdoing, yes I can move some, and those that can't can be usurpified ala JUDI. -- GreenC 15:22, 28 November 2023 (UTC)[reply]

Thanks! WhatamIdoing (talk) 15:48, 28 November 2023 (UTC)[reply]

Also, could you add a sentence to Wikipedia:External links#Hijacked and re-registered sites with a link to this page? Editors might be more likely to report the domains if they knew that a bot would clean them up. WhatamIdoing (talk) 15:50, 28 November 2023 (UTC)[reply]

Done. Special:Diff/1184135778/1187332424 -- GreenC 16:15, 28 November 2023 (UTC)[reply]

WhatamIdoing, it appears someone else, I don't know who or when, already converted them. There are only 4 mainspace pages with usemod.com -- GreenC 00:38, 2 December 2023 (UTC)[reply]

Thanks. Special:LinkSearch says that it's on Wikipedia:WikiProject Organized Labour (and ~450 other pages), but I can't find the link in that page. In a transclusion, maybe? But at least the mainspace is relatively free of this error. WhatamIdoing (talk) 01:16, 2 December 2023 (UTC)[reply]

In Wikipedia:WikiProject Organized Labour, it is transcluded from Wikipedia:WikiProject Organized Labour/Participants where it's embedded in someone's signed comment. I deleted it. As spam links they probably should be deleted? I don't normally do this as these pages can be unpredictable. Like do I want to add an archive URL and {{usurped}} to User:Sj/Presentation? It's a lot of personal space and talk page comments to be modified without permissions. -- GreenC 06:50, 2 December 2023 (UTC)[reply]

I'm sure @Sj would be happy to have a working link, but I agree that it could be tricky elsewhere. WhatamIdoing (talk) 17:39, 3 December 2023 (UTC)[reply]

Thanks for the notice. usemod.com had a few different perl scripts; you want to distinguish things under mb.pl (moved to meatballwiki) from the rest (which can be pointed to a wayback machine archive). – SJ + – SJ + 12:57, 4 December 2023 (UTC)[reply]

Trojan/Malware warning on Pelenop.fr

Was editing a broken reference here, & upon trying the original site, it was immediately blocked by my Anti-virus. Apparently it's now been usurped into a site injecting Malware (or perhaps just that link, I'm not really keen on dealing with a citation giving me Malware again). I've corrected the archive to what appears to be a working & safe version of the reference & set the link to usurped. Thought it'd be prudent to mention it here, in case there's other links to the site lurking on the Wiki.

Here's the links to my 2 edits for quick reference. Again, the archive appears to be safe, but I wouldn't recommend going to the original site without active Anti-virus protection. 1. 2.

(Side note: Unfortunately I can't remember or find the previous citation/site that gave me Malware, but it should be in the list of my deleted edits if someone has access to that, with a very obvious "TROJAN WARNING" quote) Silverleaf81 (talk) 05:53, 2 December 2023 (UTC)[reply]

Thank you for converting it to usurped, and notifying this page, this is the correct place. It appears the domain only exists in that article: [5] -- GreenC 06:57, 2 December 2023 (UTC)[reply]

flare.com

The domain name flare.com is for sale! The magazine has moved to https://fashionmagazine.com/flare/ but the old content no longer seems to be online. Much of it has been archived in the usual places. Certes (talk) 17:32, 6 December 2023 (UTC)[reply]

User:Certes, I set the domain to dead in iabot.org and started a job to process it. -- GreenC 20:04, 6 December 2023 (UTC)[reply]

Old nextbestpicture.com links

Hello, please change all links (in the main namespace) of the form http://www.nextbestpicture.com/2/post/2020/12/the-2020-indiana-film-journalists-association-ifja-winners.html to https://nextbestpicture.com/the-2020-indiana-film-journalists-association-ifja-winners/ (i.e. everything between the first slash after the domain name and the last one in the link should be removed, the ".html" should be replaced with a slash, and HTTP should be changed to HTTPS). Lots of these links seem to be marked as dead by InternetArchiveBot, including at Clarke Peters (where I noticed this and fixed it manually) and On the Rocks (film). Thanks! Graham87 (talk) 07:06, 12 December 2023 (UTC)[reply]

No problem I'll get to it, thanks. Anything marked dead will be restored to live, if it tests live. I'll keep the old archive URL in place, unless you want to delete it, or, replace it with an archive to the new URL. -- GreenC 04:19, 13 December 2023 (UTC)[reply]

Graham87: here you go Special:Diff/1186100009/1190645424. Good find. It edited over 500 pages, fixed many cites. It was difficult they use a bot blocker that's why Wayback Machine and IABot had trouble. I had a solution for it and was able to verify the new links work, in a few cases it required an archive URL. -- GreenC 02:43, 19 December 2023 (UTC)[reply]

Extrasolar Planets Encyclopaedia

The formatting of exoplanet.eu catalog entries has changed recently, so that all entries now have a numeric ID (e.g. 1261 for Kepler-62f). The previous format (which had the planet name alone) still soft-redirects to the correct target, but older links using a previous format need to be corrected by hand. –LaundryPizza03 (d c̄) 01:29, 15 December 2023 (UTC)[reply]

User:LaundryPizza03: Is there an example of an old link, and its corresponding new link? -- GreenC 04:08, 15 December 2023 (UTC)[reply]

@GreenC: In this example, the former URL was https://exoplanet.eu/catalog/kepler-62_f/, and is now https://exoplanet.eu/catalog/kepler_62_f--1261/. –LaundryPizza03 (d c̄) 04:10, 15 December 2023 (UTC)[reply]

I'd suggest consulting Linksearch for example pages, and examples of the older format that is now a hard 404. 55 Cancri b is an example; the URL http://exoplanet.eu/planet.php?p1=55+Cnc&p2=b is linked; the old URL format had https://exoplanet.eu/catalog/55_cnc_b/, and the current DB page for this planet is at https://exoplanet.eu/catalog/55_cnc_b--25/. Note that host stars are no longer directly accessible in the database; information about them can be accessed through the entries about their planets.

exoplanet.eu: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:fr • Spamcheck • MER-C X-wiki • gs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: search • meta • Domain: domaintools • AboutUs.com –LaundryPizza03 (d c̄) 04:18, 15 December 2023 (UTC)[reply]

I see "kepler-62" (dash) is now "kepler_62" (underscore). It might be be possible to convert ?p1=55+Cnc&p2=b to 55_cnc_b and then loading that page https://exoplanet.eu/catalog/55_cnc_b/ and extracting the new URL from the HTML. As you suggest, I'll take a look at the linksearch and see how homogeneous. I'll get to this not immediately. -- GreenC 04:35, 15 December 2023 (UTC)[reply]

User:LaundryPizza03: Seeing a lot of links like this. I added an archive URL because the source link is dead. I'd prefer to convert them to the new /catalog url scheme, but there is no way to link to a star, only planets, like this. Am I missing something? What do you recommend for URLs with star.php?st= -- GreenC 19:02, 21 December 2023 (UTC)[reply]

The only thing I can figure out, on the Catalog page https://exoplanet.eu/catalog enter star_name="HD 5319" then click "Apply filter" it brings up a list of planets. However, there is no way to link to this search result. Only a person manually entering the star name can find it, there is no API or mechanism for automated use. -- GreenC 19:21, 21 December 2023 (UTC)[reply]

@GreenC: I'd suggest deleting all of those links. You can still convert the older-format planet links as you described. –LaundryPizza03 (d c̄) 05:25, 22 December 2023 (UTC)[reply]

For example the many exoplanet.eu star links in List of exoplanets discovered by the Kepler space telescope: 1–500 which look useful to verify data. Someone might object, why the cites are being deleted, since the archive URLs work and verify. -- GreenC 06:46, 22 December 2023 (UTC)[reply]

Try obtaining archives for the links that aren't already archived. –LaundryPizza03 (d c̄) 07:21, 22 December 2023 (UTC)[reply]

Yes the bot will add archives for dead links: Special:Diff/1143718768/1191219614. I am going slow because there are errors in the data showing up in the logs that require manual fixes. For example this planet Special:Diff/1168566545/1191217938 has been renamed, but the article name still had the old name. Similar example Special:Diff/1188022306/1191211568. Or syntax errors, Special:Diff/1188040396/1191199379. -- GreenC 17:07, 22 December 2023 (UTC)[reply]

User:LaundryPizza03 - this iteration is done. It edited 694 pages, out of 705 checked. It converted the star system links to archive URLs. The planet links are mostly converted. I noticed late in the process it wasn't converting planet links that already had an archive URL and were otherwise dead links.. they need manual checking. Probably something changed with the planet, like it's name or existence. It should be possible to find most of them in the catalog with some time and searching.

Also, I was unaware of {{Cite EPE}}. Over time, individual pages at the site will stop working, and the standard link rot tools won't detect or fix them, when the links are abstracted behind a custom external link template. I suppose it's possible the template could be useful if the entire site changes structure, but most likely the data in the template won't be sufficient to accommodate the new URL scheme. Thus at best the template makes adding a link a little quicker, and more uniform looking, but at the cost of increased link rot and challenges down the road when the URL scheme changes. I've always thought standard cite templates are the best way to go because there are so many tools that support them. -- GreenC 02:51, 23 December 2023 (UTC)[reply]

International Meteorological Organization

Hello. I notice that after clicking on this IMO link, it says the website moved to a new url and the old one will be available until this month. Looking through the IMO links on Wikipedia, some formats can be swapped over already:

Press release: old and new

News: old and new

Award: old and new

There are other ones that aren't in these three categories and that I don't see in the new website. Here are some examples. I was wondering if the old public.wmo.int links could be changed to the new wmo.int links where possible, and the broken public.wmo.int with no new URL could be archived. There's 436 links to go through. Thanks! MrLinkinPark333 (talk) 00:29, 17 December 2023 (UTC)[reply]

Fortunately you found this in time. I'll prioritize it. If the public-old site goes offline it will be a lot harder to migrate. -- GreenC 01:34, 17 December 2023 (UTC)[reply]

MrLinkinPark333: Here is what I did: migrate links where possible, as you discovered above like with Press releases, simply by changing the URL. This method only worked for some, the new site doesn't have all the pages from the old site. Thus, anything it couldn't find at the new site, it converted to public-old.wmo.int to bypass the information page that says the link is doomed. Then it saved a copy of the public-old.wmo.int link to the Wayback Machine. Then it added those Wayback links into the citation as archive URLs with url-status of dead (soon dead). I think this method saved the most content from imminent destruction. At some point later, once the new site is working, I can make more changes if you see ways to convert the public-old.wmo.int links to the new site at wmo.int. There are 195 public-old links in 160 articles. -- GreenC 19:14, 18 December 2023 (UTC)[reply]

That works. I can always revisit the links later to see if any can be swapped over. Thanks! MrLinkinPark333 (talk) 19:19, 18 December 2023 (UTC)[reply]

Phineas F. Bresee

Further reading Corbett, C.T. (1958) Our Pioneer Nazarenes. Kansas City, MO.: Nazarene Publishing House. [2][permanent dead link]

This can be corrected by linking to one of the following: https://whdl.org/en/browse/resources/6629 https://nmi.whdl.org/en/browse/resources/6629 https://apnts.whdl.org/en/browse/resources/6629

Thanks! 174.127.124.132 (talk) 07:22, 17 December 2023 (UTC)[reply]

Done! In the future, the best place to suggest an improvement for a single article (e.g. Phineas F. Bresee) is the article's talk page (e.g. Talk:Phineas F. Bresee). This page is to request an improvement for hundreds or thousands of articles with the same issue. Thanks! GoingBatty (talk) 01:27, 18 December 2023 (UTC)[reply]

Sub-site of a blacklisted website has changed URL

The sub-site "inventors.█████.com" ("about" censored because of wiki filter) now appears to be "thoughtco.com", with references/external links either linking to the same article on the new site, or simply don't work. Apparently there are 150+ articles using the inventors URL (1), & what looks like 500+ external link search results (2), although a significant portion are on talk pages. Silverleaf81 (talk) 09:28, 17 December 2023 (UTC)[reply]

User:Silverleaf81, the site is tricky. They've been excluded from the Wayback Machine link. There are some at Archive.today. However compare that link with the new one at thoughtco, notice the content drift, they've made changes to the content at Thoughtco. So the conservative course is convert them to archive URLs so the original citation verifies. The problem is there may not be complete coverage at archive.today, and the replacement link at thoughtco may not verify the cited fact.

What I can try, convert to archive.today, where possible. When not, leave it alone. Wherever it redirects, that is where it goes, and it will be up to someone to manually figure out if the new page verifies or not. Possibly some year in the future, the Wayback exclusion will be lifted, and those archives become available again. -- GreenC 04:09, 23 December 2023 (UTC)[reply]

User:Silverleaf81: This is done. It got most of them. It added 341 archive.today URLs. A list of about 50 questionables is at Wikipedia:Link_rot/cases/inventors.about.com but not all of them are legitimately a problem. -- GreenC 02:24, 26 December 2023 (UTC)[reply]

runeberg.org finally on https

My website runeberg.org just recently moved from http: to https: so it would be nice if someone could update the remaining 11,000 links accordingly. This is not urgent, as everything works fine with automatic redirects, but it would be nice. Thank you. -- LA2 (talk) 22:57, 17 December 2023 (UTC)[reply]

User:LA2: OK no problem. I got a lot of requests here at the same time other things came up elsewhere. I will get to this with some time, it is the right place/tool to request for this kind of work. I'll ping you when completed. -- GreenC 17:53, 18 December 2023 (UTC)[reply]

User:LA2: runeberg.org (http or https) existed in 6,769 articles. It checked each link has a status 200, after converting to https. Any that didn't it added a {{dead link}} tag. The rest are converted to https. There was some typos and non-working links to Google Translate I manually fixed. List of http runeberg.org links -- GreenC 20:31, 26 December 2023 (UTC)[reply]

Great! Thanks! --LA2 (talk) 22:20, 27 December 2023 (UTC)[reply]

www.nwt.org is for sale and references to it need attention

It seems that the Episcopal Diocese of North West Texas used the URL www.nwt.org for information about the candidates. That site is now for sale. References to that site, such as at https://en.wikipedia.org/wiki/Scott_Mayer_(bishop) should be corrected/removed. Fr Kevin PJ Coffey, SCP (talk) 16:45, 18 December 2023 (UTC)[reply]

As a three-letter domain, it will probably sell. I added it to the list of domain to be usurped. Special:Diff/1186090244/1190575904 -- GreenC 17:49, 18 December 2023 (UTC)[reply]

Yahoo! Groups

I found many broken links to Yahoo! Groups. Can we find archived copies of these pages? Jarble (talk) 18:19, 18 December 2023 (UTC)[reply]

Looked at a small number through archive.org and seem to have login requirements so may suck a lot of time for little gain. Neils51 (talk) 08:36, 22 December 2023 (UTC)[reply]

Yes some of the hardest objects: soft-404 within soft-404. Like a URL that redirects to a home page (www.yahoo.com) is soft-404 #1. This forces retrieving an archive URL but this also is a soft-404, because it contains a login screen. The solution is to find a different archive provider that has/had the ability to login when making the capture (archive.today) and to build extra soft-404 detection at the second layer specific to the site. This is what I am doing now with good success, but it's taking a while to do discovery what a soft-404 looks like since Yahoo has varieties. -- GreenC 06:19, 27 December 2023 (UTC)[reply]

Jarble: The bot added 1,474 new archive URLs. I limited it to only adding archive.today because it has the best coverage for this site, Wayback had trouble making good saves due to logins and cookies. There were 115 it couldn't find and added a {{dead link}}. Also added the archives to IABot's database so these updates will propagate to over 300 other wikis. -- GreenC 04:48, 28 December 2023 (UTC)[reply]

ATSDR migrations

Many links from http://www.atsdr.cdc.gov have been migrated to https://atsdr.cdc.gov or https://wwwn.cdc.gov, which has broken a lot of links. Some automated attempts to archive the pages have resulted in archives of 404 errors at this page. I noticed this on Health effects of radon, and unfortunately the IDs on a lot of these pages ("ToxFAQs") have no relation to the new, identical pages on the HTTPS websites. Additionally, some articles like Peninsula Extension refer to Public Health Assessments, which need to be found in an archived page since the files have been deleted and are only available by email request. Reconrabbit (talk|edits) 18:38, 19 December 2023 (UTC)[reply]

Quick note: it looks like many of the .pdf links are still intact, but .htm / .html links need to be archived. Not priority since this has been the case for at least 5 years Reconrabbit (talk|edits) 22:15, 20 December 2023 (UTC)[reply]

User:Reconrabbit: I can see why this has gone unaddressed for so long it's complicated. I can't promise everything is perfect but most everything that is dead now has an archive URL. They use JavaScript redirects which gave bots trouble, thus the bad archive URLs. I checked the existing archive URLs for soft-404s, this is imperfect, but it did find and replace a few: Special:Diff/1190591816/1192546009 I fixed a few of the ToxFAQ links by manually looking them up: Special:Diff/1189670705/1192547048 But most were simply archived: Special:Diff/1121144402/1192546200 If you want to create a map of old -> new the bot can use that to make changes on-wiki.

The http links existed in about 350 articles. The bot edited 211 pages. I think the difference is the links were already archived, or working such as the PDFs. It added 141 new archive URLs. And it made 127 redirect moves: Special:Diff/1154065478/1192545155 Hope that helps. -- GreenC 00:05, 30 December 2023 (UTC)[reply]

Thank you. It looks like a good number of the redirect moves don't go directly to the toxin in question, but that's fine, since it directs someone right to the ToxFAQs homepage with an alphabetical directory; shouldn't be too much of an ask for a reader to find the appropriate page from there. Recon rabbit 01:17, 30 December 2023 (UTC)[reply]

Yes those cases are not so bad. It's the ones that have tfacts##.html that would benefit from a mapping of old to new like Special:Diff/1121144402/1192546200 is not so great, but this is good Special:Diff/1189670705/1192547048 where I manually found the new link and programmed it into the bot. It was just too time consuming. If you want to map the tfact's I'll add them to the bot. A list of 31 old URLs, the index page for the new URLs. Can look it up based on the context of the cite eg. the first one in the article Benzene would look up "Benzene" at the index page and that is the new URL. -- GreenC 02:01, 30 December 2023 (UTC)[reply]

I tried out a method on a couple of the links, and found that it seems to work for pretty much every one: Replacing /tfactsXX.html with /toxfaqs/tfactsXX.pdf provides a contemporary PDF file for the item in question in every instance I tried. Ex: the archived link for Benzene, the live PDF Benzene ToxFAQ. Recon rabbit 02:22, 30 December 2023 (UTC)[reply]

Excellent discovery. Converting: Special:Diff/1192545947/1192569315 -- GreenC 02:43, 30 December 2023 (UTC)[reply]

Gospel Music Hall of Fame

Hello. The old url for the Gospel Music Hall of Fame looks to be usurped. The new URL was working at least until September 2023. Not sure which solution is better: 1) convert the old link to new links and use archive URLs 2) use archived URLs for both old and new links. Luckily, with the two URLS there's less than 100 links to work through. Thanks! MrLinkinPark333 (talk) 19:57, 19 December 2023 (UTC)[reply]

Update: the new url is working today. Taking a look at the URLs, some of them are easier to change over than others:

/site/ with name: this to that
/speaker-lineup/ with name can be converted like /site/:
Any with ID numbers would need manual converting. I.e. this to that
Any with years could be manually converted to individual bios. E.g. this to that. However, 2000 is used in 2 articles.
There's other exceptions where either the new URL is blank or the URL is in a slightly different order. I think an archived copy of Bartlett's old URL would be more useful as his article at Eugene Monroe Bartlett is referencing more than his year of induction.

Would this work or is there a more simpler solution? Thanks! --MrLinkinPark333 (talk) 02:50, 29 December 2023 (UTC)[reply]

User:MrLinkinPark333 - gmahalloffame.org is in 30 mainspace articles. I can convert where possible using the two rules you found, and for the manual ones, I'll change to archive URLs. If you want to manually repair them, I'll provide the list of the articles/URLs which were converted to archive URLs. It will also check for the string "Biography coming soon" and treat those pages as dead. And I'll check what else might come up in the logs like soft404 redirects to the home page. -- GreenC 17:12, 31 December 2023 (UTC)[reply]

Since it's a small list, I could fix whatever didn't get converted over. Thanks! MrLinkinPark333 (talk) 18:38, 31 December 2023 (UTC)[reply]

The bot only edited 15 pages..You can check two places: Special:Contributions/GreenC_bot (ending at Dolly Parton). And Search of gmahalloffame.org. Of the edits most were adding archive URLs. The pages it didn't edit, most already had archive URLs, and with no available replacement page there was nothing it could do. -- GreenC 19:58, 31 December 2023 (UTC)[reply]

Thank you for the quick reply! MrLinkinPark333 (talk) 21:11, 31 December 2023 (UTC)[reply]

Ilta-Sanomat

Around 346 articles (full list including the ones that already use archived URLs) have URLs to the Finnish newspaper Ilta-Sanomat's website http://www.iltasanomat.fi/ that now redirects to the main page https://www.is.fi/

It seems that URLs that have an ID starting with the numbers 200000 can be fixed by simply changing "iltasanomat" to "is", e.g.:

(I also changed to HTTPS in those examples)

But the URLs with IDs starting with the number 1 or URLs with completely different patterns can't be fixed by changing "iltasanomat" to "is", e.g.: ("Sivua ei löydy" is Finnish for "Page not found")

http://www.iltasanomat.fi/musiikki/art-1288486980607.html -> https://www.is.fi/musiikki/art-1288486980607.html
but here's a working URL starting with "200000": https://www.is.fi/musiikki/art-2000000525543.html

-

So, what would be the optimal way of fixing these?

A) Setting an archive link to all of them?
B) Changing the ones starting with "200000" from "iltasanomat" to "is" and setting an archive link to others?
C) ...?

Also, there are thousands of articles with the same issue on fi.wikipedia, so helping that project too would be much appreciated. 85.76.13.79 (talk) 15:12, 20 December 2023 (UTC)[reply]

I checked around for redirect information such as in the Wayback Machine or in headers and can't find anything, so there is no map how to move the non-20000 links. The 20000 links can be moved. Thus, solution "B" for enwiki. For fiwiki, unfortunately my bot is not configured to work with Finnish citation templates. However I can change the entire domain to "permadead" in the IABot settings, this will inform IABot to convert every iltasanomat.fi link on 300+ wikis to an archive URL. -- GreenC 20:54, 1 January 2024 (UTC)[reply]

Ok, plan B for en-wiki and changing iltasanomat.fi to permadead for other wikis sounds good. Thank you in advance. (Original poster). 2001:14BA:9C98:7100:C993:D281:D619:D802 (talk) 15:48, 2 January 2024 (UTC)[reply]

Results: 487 pages contain the domain. Checked each and made changes in 378 pages (some already had archive URLs). Converted 163 URLs of the -20000 type, added 320 new archive URLs, added 12 {{dead link}}, changed 12 |url-status=live to dead. Uploaded results (archive URLs) to IABot, and changed the domain to "permadead" so it will propagate on other wikis. IABot has recorded over 6,000 unique URLs. -- GreenC 20:20, 2 January 2024 (UTC)[reply]

A Note About this Forum

This forum is getting a lot of requests recently. The requests can take a lot of work, 1-7 days each depending on the complexity: custom programming, data discovery, running tests cases, qualifying results, designing algorithms, waiting for the bot to run (slow due to networking), etc... Furthermore, my time to do this work is limited! If you make a request, and time goes by, that is why. I wish there was a way to boilerplate it, and I have generalized the code as much as possible, but ultimately this work is bespoke and artistic in nature due to the endless variety of conditions at remote sites. I try to respond to requests in chronological order, except when a site needs be triaged due to imminent outage, has an extremely large footprint, or can be addressed quickly, in those cases I might respond before some others. -- GreenC 20:10, 20 December 2023 (UTC)[reply]

No worries! Take all the time you need :) MrLinkinPark333 (talk) 00:54, 22 December 2023 (UTC)[reply]

I know that recruitment is a difficult task but I really wish areas of technical maintenance like this weren't so often left to 1-3 editors. Thank you for your work, and don't rush things too much. Mach61 (talk) 22:18, 23 December 2023 (UTC)[reply]

www.smallsrecords.com

~~The WP:JUDI folks have gotten to it. I'll add the archive URLs at Draft:Chris Byars once I get off my school laptop (which blocks IA). Cheers, Mach61 (talk) 22:14, 23 December 2023 (UTC)~~[reply]

NVM only a few pages link to it Mach61 (talk) 22:20, 23 December 2023 (UTC)[reply]

IPA Fonts

According to this archived link, the IPA fonts were transferred from IPA to the Character Information Technology Promotion Council, who now host the fonts on their website. Citation 14 should link to https://moji.or.jp/mojikiban/font/ and Citations 13 and 22 (which is a dead link) should be https://moji.or.jp/ipafont/.

(Apologies if this is the wrong place for this. I'm new to editing and I didn't want to mess up the citation.) Ichneumonidae (talk) 18:25, 26 December 2023 (UTC)[reply]

Sorry, I should have said this is about the article List of CJK fonts! Ichneumonidae (talk) 18:26, 26 December 2023 (UTC)[reply]

Done: This page is for requesting changes that might affect hundreds or thousands of pages - you can check if the URL that changed is on a lot of pages by using Special:LinkSearch. If it's only affecting one article (I just checked, and it looks like these specific dead links are only present on List of CJK fonts and Mona (font)), the best place to suggest improvements is on the Talk page for that article. Thanks. Reconrabbit (talk|edits) 19:16, 26 December 2023 (UTC)[reply]

Space Launch Report

The website www.spacelaunchreport.com was cited extensively in many spaceflight articles and now has been usurped by an adware site of some sort. Could all of these links please be archived? Example link http://www.spacelaunchreport.com/falcon9ft.html#f9stglog from List of Falcon 9 first-stage boosters. Ergzay (talk) 10:02, 27 December 2023 (UTC)[reply]

As a further note, to ensure this isn't a waste of anyone's time. When searching for how many pages use the link I hit the error "A warning has occurred while searching: The regex search timed out, so only partial results are available. Try simplifying your regular expression to get complete results." so this should be a very good candidate for mass replacement. Ergzay (talk) 01:57, 28 December 2023 (UTC)[reply]

User:Ergzay, this is a known gambling site problem described at WP:JUDI. I process the domains in batches. It is added to the queue: Special:Diff/1190914504/1192203117 .. when regex searching the recommend method: insource:spacelaunchreport insource:/spacelaunchreport.com/ .. the first insource does a broad non-regex search, the second insource does a regex within the results of the first search only. Since regex is so expensive it narrows the search before doing regex. -- GreenC 05:01, 28 December 2023 (UTC)[reply]

bird-stamps.org

Domain bird-stamps.org hsa been usurped and redirect to the home page. Link search shows about 275 articles with such links, a relative handful of these have been updated with archive links. Fabrickator (talk) 08:51, 31 December 2023 (UTC)[reply]

A WP:JUDI gambling site. Added to queue: Special:Diff/1193111754/1193243552 -- GreenC 20:26, 2 January 2024 (UTC)[reply]

Memória Globo

Most Memória Globo links are dead (like https://memoriaglobo.globo.com/programas/entretenimento/novelas/zaza.htm), there are more on Portuguese Wikipedia. Notrealname1234 (talk) 18:06, 31 December 2023 (UTC)[reply]

User:Notrealname1234: There are some working URLs, eg. [6]. I'll check each, can't set all dead. Portuguese Wikipedia has it's own archive bots and archive provider, it's one of a few sites where IABot is unable to run, and my bot can't run anywhere but Enwiki. -- GreenC 20:34, 2 January 2024 (UTC)[reply]

It's done. Edited 144 pages, added 243 archive URLs, 7 {{dead link}}, moved 114 URLs to a new URL (redirects), updated IABot. -- GreenC 01:33, 3 January 2024 (UTC)[reply]

www.amjbot.org

We have hundreds of links to URLs like http://www.amjbot.org/content/96/3/668.full, which just serve an HTTP 404 in response. They can simply be removed, if they're in the URL parameter of a citation template with a DOI (which leads to the real current location of the current publisher's version). Nemo 16:36, 1 January 2024 (UTC)[reply]

User:Nemo_bis:

In 497 pages, I removed for {{cite journal}} and {{citation}} where there is a |doi= eg. Special:Diff/1189605841/1193313192
In 175 pages (with other templates or no doi), I added 169 archive URLs, 22 {{dead link}} eg. Special:Diff/1100760787/1193405038

-- GreenC 17:48, 3 January 2024 (UTC)[reply]

Nice! Thanks, Nemo 16:27, 4 January 2024 (UTC)[reply]

ir.uiowa.edu

This repository was retired and its contents went in various directions, including pubs.lib.uiowa.edu and scholarworks.wmich.edu. The domain currently serves TLS errors, while at some point it seemed to redirect all requests to an unrelated frontpage. URLs can be replaced where an OA copy is available, but as a first step it's ok to just remove all links in cite journal templates where a DOI is present. Nemo 13:34, 6 January 2024 (UTC)[reply]

User:Nemo_bis is "OA" -> "IA"? Otherwise I don't know what OA means. If it is IA, the example diff [7] shows the migration of ir.uiowa.edu -> pubs.lib.uiowa.edu .. are you suggesting using IA snapshots to find the redirect? Unfortunately it doesn't look like IA saved the correct redirect information. [8] Is there some place else to obtain the new URL? -- GreenC 17:32, 6 January 2024 (UTC)[reply]

No, OA as in open access. Citation bot will add the OA links later if the broken links are removed. I was only asking about the removal, sorry. Nemo 15:51, 7 January 2024 (UTC)[reply]

User:Nemo_bis, there are 418 pages with the domain. For all cite journal with a doi: A) In 132 citations removed the URL Special:Diff/1137009702/1194866745. B) In another 84 there was a working redirect migrated Special:Diff/1184196199/1194866750. For everything else not a cite journal with a doi: C) Added 198 archive URLs Special:Diff/1186059609/1194876255. Migrated 54 redirects same as B). And D) added 8 {{dead link}} Special:Diff/1173723334/1194876457. -- GreenC 05:39, 11 January 2024 (UTC)[reply]

Outstanding! I thought figuring out the redirects would be too much work (some go to a Primo frontpage). Nemo 21:21, 12 January 2024 (UTC)[reply]

I can usually catch those that redirect to the same place, by the nature of the same destination URL showing up multiple times in the logs, during a trial-run. I add a trap for them in the code to treat those redirects as dead links, and rerun it again. Almost every domain has this problem, to some degree. It's hard to fully automate but I have as much as possible. -- GreenC 22:27, 12 January 2024 (UTC)[reply]

Cool! Makes sense. Nemo 07:25, 14 January 2024 (UTC)[reply]

Is there any way to get a list of where these changes were made? I have been correcting all the links as I have time. None of them should be dead and all have live content somewhere, most should be using a DOI (which I have been adding) 1920wr (talk) 16:38, 17 January 2024 (UTC)[reply]

1920wr, Yes. I could provide a list of the article names for set C), but it would miss pre-existing archive URLs. It's probably better to find them with this search: 196 articles. For set D) that's hard to search for, rather, here are the 8 the bot added a {{dead link}}: Victor L. Littig,Jonathan Blum (writer, born 1967),John Herriott,R. Douglas Hurt,Second plague pandemic,Mayors of Sioux City, Iowa,List of school districts in Iowa,Christopher B. Krebs .. good luck with this project it would be great to see them converted to cite journal with DOI, a major improvement for this domain. If you think there is something I can help with bot let me know. -- GreenC 21:15, 17 January 2024 (UTC)[reply]

ebooks.adelaide.edu.au (404)

460 pages. "eBooks@Adelaide has now officially closed", January 7, 2020. There is no copy or replacement site. Prior to 2014 it was http://etext.library.adelaide.edu.au (same paths).

If path contains ".html" then convert to an archive URL
If path contains 4 elements and ends in "/" eg. http://ebooks.adelaide.edu.au/k/kant/immanuel/k16p/ then add "complete.html" and convert to archive URL ie. http://ebooks.adelaide.edu.au/k/kant/immanuel/k16p/complete.html -> https://web.archive.org/web/20110309070433/http://ebooks.adelaide.edu.au/k/kant/immanuel/k16p/complete.html
If path contains 3 elements and ends in "/" eg. http://ebooks.adelaide.edu.au/m/mill/john_stuart/ convert to archive URL
Exceptions to rule 2 & 3 are Plutarch, Voltaire, etc.. eg. https://ebooks.adelaide.edu.au/p/plutarch/symposiacs/ .. check logs for other exceptions
Optionally where no archive exists, either remove URL from citation or nuke citation if an external link section.

-- GreenC 18:10, 6 January 2024 (UTC)[reply]

Done, saved all but a handful. The existing links were often not to the full text, the archive version didn't follow the chapter tree so the texts were incomplete. I moved many to the "complete.html" version, which is the entire text on a single page, then converted to the archive.org version of that page. Special:Diff/1061289409/1195386287 .. Also, most are 19th century texts, they could be replaced by Gutenberg etc -- GreenC 04:57, 14 January 2024 (UTC)[reply]

oxfordislamicstudies.com

The domain "oxfordislamicstudies.com", referenced in about 400 articles, is returning the "NET::ERR_CERT_COMMON_NAME_INVALID" error.

It seems that in at least some cases, the current content is available at oxfordreference.com. Other possible places to look would be oxcis.ac.eu or perhaps ox.ac.uk. I really have no idea to what extent archive copies of oxfordislamicstudies.com provide any useful content. Fabrickator (talk) 19:05, 7 January 2024 (UTC)[reply]

In the case of http://www.oxfordislamicstudies.com/article/opr/t125/e2280?_hi=2&_pos=2 (non-working link), the archive copy returns useful content, while the oxfordreference.com link provides too little content to likely be of any use. Fabrickator (talk) 19:27, 7 January 2024 (UTC)[reply]

No archived version available at https://fatcat.wiki/release/lookup?doi=10.1093/acref/9780195165203.001.0001 yet either. Were they all HTML pages only or was there a PDF somewhere? Nemo 20:42, 7 January 2024 (UTC)[reply]

The book itself is archived (example). Nemo 20:43, 7 January 2024 (UTC)[reply]

According to [9]: "Oxford Islamic Studies Online product site has been retired. Content you previously purchased on Oxford Islamic Studies Online has now moved to Oxford Reference, Oxford Handbooks Online, or What Everyone Needs to Know. They are paywall sites and no redirect map. The Wayback links will probably be better, worth a try. -- GreenC 05:23, 14 January 2024 (UTC)[reply]

Fabrickator: In 317 articles, I added 413 new archive URLs, 19 {{dead link}}, and changed 106 |url-status=live to dead. -- GreenC 22:30, 15 January 2024 (UTC)[reply]

now Malware: myetymology.com

There are at least fifty uses of "www.myetymology (dot) com" on en.wiki [10], both bare URLs and in Cite templates. This domain seems to have some tricky malware scheme on it: visited via a Chrome browser it shows a page with the Chrome logo and text with something about having to verify that you're human and you should click "Allow". Via a Firefox browser, it puts up a grayed-out dummy page with a white dialog-box-like splash area saying "Before you continue to myetymology.oom" and blather about security and download Firefox add-on", with a single button labeled "continue". It does tricky stuff too: when I switched away from the Chrome window to invoke the snip-it utility to capture it, it changed the display so that it showed a duckduckgo search for "!ducky" (a search engine I don't use). The domain has definitely been usurped, is very likely dangerous, and needs to be eradicated from wikipedia. -- R. S. Shaw (talk) 04:13, 10 January 2024 (UTC)[reply]

User:R._S._Shaw: Added to the WP:JUDI queue for usurpation Special:Diff/1193243552/1195955910 -- GreenC 22:35, 15 January 2024 (UTC)[reply]

Change of URL for Lawfare

The website has undergone a total revamp, including a change of URL from lawfareblog.com to lawfaremedia.org.

OLD: https://www.lawfareblog.com/

NEW: https://www.lawfaremedia.org/

Valjean (talk) (PING me) 16:31, 10 January 2024 (UTC)[reply]

Valjean: Done. I changed the domain, and also checked for redirects, and the live status of each URL. It was more difficult due to CloudFlare DDoS mitigation blocking the bot, but resolved. About 408 URLs changed Special:Diff/1187555382/1196040749, another 18 moved the archive URLs and modified |url-status= Special:Diff/1177313295/1196042453. regards -- GreenC 04:34, 16 January 2024 (UTC)[reply]

Thanks! -- Valjean (talk) (PING me) 05:23, 16 January 2024 (UTC)[reply]

2002 Winter Olympics torch relay broken archive links

Hello. Both 2002 Winter Olympics and 2002 Winter Olympics torch relay use this archive URL but it does not work. Instead it redirects to the Wayback Machine and has a question mark in the URL. Looking at old archived copies of this link, none of the 2001 and 2002 versions work despite being highlighted in blue. Some of the 2002 archived copies redirect to a blank page. I was wondering why this was the case. Thanks! MrLinkinPark333 (talk) 20:50, 16 January 2024 (UTC)[reply]

I reported it, but can not guarantee it will get resolved. I looked in various places and ways and can not find a working replacement for this archive. It's an old site (by Internet standards) and went dead with a few years of creation. Thanks for the report. -- GreenC 22:01, 16 January 2024 (UTC)[reply]

No worries! It does make me wonder if any other archived URLs used on Wikipedia instead redirects to the Wayback Machine and puts a question mark into the URL. This is the first time that has happened to me. MrLinkinPark333 (talk) 00:49, 17 January 2024 (UTC)[reply]

There is link rot within the Wayback Machine itself. My bot WaybackMedic was made (and named) for that purpose, but it takes so long now to check every archive URL, due to the volume, it's not feasible to run it that way anymore. When we started in 2015 there were around 600k archive URLs on enwiki, now there are nearly 12 million and adding about 200k a month. -- GreenC 01:35, 17 January 2024 (UTC)[reply]

Ah. I wasn't aware of the issues with the Wayback Machine. Hopefully this is a limited issue. MrLinkinPark333 (talk) 02:25, 17 January 2024 (UTC)[reply]

Yes I believe it's a very small fraction. Of course we don't know what we don't know, cases like this are only knowable by manual discovery. If it was a lot we'd be hearing more complaints. The cases I can detect, it's like 0.0005% error rate. -- GreenC 03:47, 17 January 2024 (UTC)[reply]

Big Cartoon DataBase

Per Wikipedia:Templates for discussion/Log/2024 January 16#Big Cartoon DataBase Template:Bcdb and Template:BCDB title are being deleted, however there are many other non-templated links to that website that aren't working (see for example the second reference at Tod Carter or the external link at Knight-mare Hare). Reporting here as I don't think anything is currently done with these (archived, marked as dead, or removed) Gonnym (talk) 14:04, 23 January 2024 (UTC)[reply]

Gonnym, I see about 1,000 instances of the templates, and another 1,400 links. The site has been "excluded from the Wayback Machine". But, the first one I checked is available at archive.today. There are a number of options:

Convert the 1,000 templates to normal square links, then convert those plus the 1,400 to archive.today, where available, or add a {{dead link}} if not. That way if the site is ever un-excluded from the Wayback in future those archives could get added.
Nuclear option: completely eliminate all citations and links to this site.
Some other combo, like nuking the 1,000 but trying to save the 1,400 and if any those don't archive then nuke those etc..

Both options are a bit of work, nuking is not clean it's semi-automated each one has to be visually verified it didn't mangle things, but I have done it before and the quantity isn't too high. The conversion and archiving is more automated. My suggestion, if you think the site is completely unreliable and should be eliminated even when it has archives, the nuclear option, otherwise the first option. -- GreenC 14:40, 23 January 2024 (UTC)[reply]

I have no real opinion here as I hadn't participated in that discussion but I'll ping here others that did. @Snowmanonahoe @TechnoSquirrel69 @WikiPediaAid. Gonnym (talk) 14:47, 23 January 2024 (UTC)[reply]

The site is a wiki... I'm impressed it managed to amass 1400 citations. I say nuke it, because again, it's a wiki. Snowmanonahoe (talk · contribs · typos) 15:57, 23 January 2024 (UTC)[reply]

Thanks for the ping, Gonnym! The links being generated by the template are already being removed by a bot since the TfD closed as delete, so we don't need to worry about those. I would rather not indiscriminately delete the other links in citations, just add the archive URL along with a |url-status=dead if applicable. —TechnoSquirrel69 (sigh) 15:00, 23 January 2024 (UTC)[reply]

Sounds like that bot is not only eliminating the template, but also the entire citation to BCD. Sounds like a limitation of the bot, it can only delete templates without the option to convert to square links. That's unfortunate because TfD should concern removing templates, not removing citations, which is more the domain of WP:RSN. This is a common scenario with a mix of templates and links and we end up with this inconsistency. Some cites are completely deleted because of the template, others are kept because they are square links, it's random. Anyway this is not directly related to BCD just observing. I can try to archive what is left no problem. -- GreenC 15:25, 23 January 2024 (UTC)[reply]

I don't think the bot is removing citations, just the links generated by the {{bcdb}} template. All of the cite links should still be around. —TechnoSquirrel69 (sigh) 15:45, 23 January 2024 (UTC)[reply]

For now, I'll retain the citations and treat the links as dead. There is no clear consensus to nuke cites entirely. -- GreenC 01:47, 24 January 2024 (UTC)[reply]

Thanks, GreenC! —TechnoSquirrel69 (sigh) 23:06, 24 January 2024 (UTC)[reply]

I made the following edits

Remove pre-existing Wayback links since they don't work
Add archive.today links when available (1,025)
Add {{dead link}} for the rest (697)
Update iabot.org so changes can propagate to 300+ other language wikis

If in the future the restriction on Wayback is lifted the bots should be able to convert the dead links. -- GreenC 02:59, 25 January 2024 (UTC)[reply]

Gemini, Apollo, Shuttle Mission "Chronology of Wake-up Calls"

This weblink PDF (https://history.nasa.gov/wakeup%20calls.pdf) is used as a secondary source across a large number of articles for the Gemini, Apollo, and especially Space Shuttle missions. It recently got 404'd, but a very recent archived link is available here (https://web.archive.org/web/20231220093919/https://history.nasa.gov/wakeup%20calls.pdf). It would be great if y'all can add this archive link to the queue. SpacePod9 (talk) 00:54, 24 January 2024 (UTC)[reply]

I submitted an IABot job to process the 56 pages where it's located. -- GreenC 01:51, 24 January 2024 (UTC)[reply]

Thanks for the help! SpacePod9 (talk) 03:43, 24 January 2024 (UTC)[reply]

Canoe.ca

It appears that canoe.ca was once a news website that is referenced in quite a few articles, but it has since been usurped by another gambling website. Unfortunately, the new owners have also blocked the Wayback Machine and only some of the pages I've seen are in archive.today. However, some of the links appear to be salvageable by changing "canoe.ca" to "canoe.com" and then going into the Wayback Machine. Is this something that the bots can help with? Thanks! :Jay8g [V•T•E] 23:32, 27 January 2024 (UTC)[reply]

That was probably a little confusing. There are basically three ways that existing canoe.ca links can be archived:

Archive.today might have a direct archive of the canoe.ca URL
The Wayback Machine might have an archive of the same page with "canoe.ca" replaced with "canoe.com"
Archive.today might have an archive of the same page with "canoe.ca" replaced with "canoe.com"

As far as I can tell, the canoe.ca and canoe.com pages were completely identical, but all of the links I've checked seem to be dead on both domains. Unfortunately, there are over 10,000 of these links according to Special:LinkSearch, which is too much for me to deal with manually. There are also quite a few dead links to canoe.com itself, but at least those aren't usurped and can be found in the Wayback Machine normally. :Jay8g [V•T•E] 23:45, 27 January 2024 (UTC)[reply]

Notes for canoe.ca ie. canoe.com:

6,148 pages
391 pages with archive.org links
339 pages with WebCite links
1,184 pages with archive.today links
Over 100 sub-domains (www, jam, weather, etc..) see IABot for full list
Nothing in template or module NS

Proposal for canoe.ca in five runs of WaybackMedic:

Pass 1a (canoe1): Remove all Wayback links Done - remove 391 archives
Pass 1b (canoe3 & canoe4): Remove all WebCite links (SSL errors and unstable) Done - remove 329 archives
Pass 2 (canoe2): Attempt conversion to archive.today. Else add {{dead link}} Done - add 8,353 archive.today, 633 {{dead link}} (total including existing), change 578 |url-status=live to dead
Pass 3a (canoe5): For canoe.ca with a {{dead link}}: check the API if a Wayback link exists if it were converted to canoe.com - if so, change source link to canoe.com and set to live status and remove {{dead link}} Done - 157 URLs converted to canoe.com
Pass 3b (canoe6): Check the canoe.com links from Pass 3a for link rot, if so, convert to Wayback or archive.today links Done - 294 Wayback URLs added to canoe.com URLs in the same set of articles processed during Pass 3a (excess due to pre-existing canoe.com links that were dead)
Pass 3c (canoe7): Make a list of citations with {{dead link}} Done 406 cites listed at Wikipedia:Link rot/cases/canoe.ca
Pass 4 (judi14a and judi14b): Convert canoe.ca to a usurped citation per steps at WP:USURPURL. This will include completely deleting citations that have no archive URL Done Edited approximately 6,000 pages.

Proposal for canoe.com

Pass 5 (canoecom): Check for dead links and soft-404s as normal Done Edited 1,132 articles out of 1,953 checked. Added 1,820 archive URLs. Change 371 |url-status=live to dead

----

User:Jay8g per above proposal. Each pass of the bot has different settings enabled. When done in this order, it should work. The "Pass 3" might result in a lot of deleted citations, I'll let you know before running that one. This will require at least 4 runs of the bot of 6k pages each, plus some manual steps it will take a while. -- GreenC 01:39, 28 January 2024 (UTC)[reply]

That all sounds good to me! Thanks! :Jay8g [V•T•E] 04:01, 28 January 2024 (UTC)[reply]

I just thought of one issue with pass 4: Because canoe.ca was a news aggregator, some of the citations that currently link to it can be found on other, unrelated websites. For example, the reference in Dwayne Johnson (the first link that comes up for me in the 6,148-page search) points to http://www.canoe.ca/SlamWrestlingArchive/feb24_rocky.html on canoe.ca, but the same article can be found at https://slamwrestling.net/index.php/1998/02/24/a-piece-of-the-rock/ on Slam Wrestling's own website. That exact article is also available using the Wayback Machine with canoe.com, but if it was not available there, replacing it with the slamwrestling.net URL would be better than deleting it. Of course, there's no way to do that without manual work, and anything that's just a bare URL is gone for good.

I will be interested to see how many canoe.ca links are left after steps 1-4, to see whether it makes sense to remove those links entirely or try to find the same articles posted elsewhere first. I'm not sure if this is a situation that has come up before with usurped URLs like this or what the standard practice is. :Jay8g [V•T•E] 04:18, 28 January 2024 (UTC)[reply]

For the rocky example, there is no map to know where the canoe.ca link should go. And since canoe.ca is now a usurped vice site we are supposed to hide it from view. And if no archive is available, delete it. Let's wait and see how many there are after Pass 3. One solution is rather than delete the entire cite, convert to {{citation}} which doesn't require a URL, convert the |work= to Slam Wrestling, and remove the canoe.ca URL. This kind of work is laborious because there are so many permutations of citation templates and argument combinations people use it's not consistent. Also the square and bare links that don't use templates. -- GreenC 16:54, 28 January 2024 (UTC)[reply]

Yes, there's no automatic way to fix that. I'm also not sure how many of the links would even be able to be manually fixed, since some might not be able to be easily found on other domains. I agree with waiting to see what is left after the bot tries to find archive links to see if it's worth me trying to fix the leftovers manually. :Jay8g [V•T•E] 22:05, 28 January 2024 (UTC)[reply]

User:Jay8g: Here are the remaining 406 citations with {{dead link}}: Wikipedia:Link rot/cases/canoe.ca .. there are over 11,000 in total on enwiki so the archival success rate was about 96% which is very good. Something still needs to be done with the 406. Options are nuke the citation, which is the only choice for square links. Convert to {{cite news}} and remove the |url= - this option is normally done when the cite can be found offline like microfiche of a newspaper. Of course, there is manual work, where anything is possible. In the mean time, I'll start processing the rest of the canoe.com links, many appear inoperable. -- GreenC 14:36, 30 January 2024 (UTC)[reply]

I spot-checked several of the remaining 406 dead links and was unable to find alternative links for any of them, so I think we should be good to remove the remaining links. Thanks for all your help on this -- I'm impressed by how many links were able to be fixed! :Jay8g [V•T•E] 21:50, 30 January 2024 (UTC)[reply]

User:Jay8g sounds good. I'll be working on this over the next few days and will post when done. Thanks for bringing this to attention. I've been aware of Canoe, but didn't know it was usurped and excluded from Wayback, that's a new scenario (plus the canoe.com twist). It basically required every feature my bot has and then some, never made so many passes. This was a good learning experience what the bot can do and how. -- GreenC 02:14, 31 January 2024 (UTC)[reply]

As noted above, this is all done finally. -- GreenC 02:34, 5 February 2024 (UTC)[reply]

Most of the content on canoe.ca was from the Sun Media newspapers, so many of these articles can probably be found in Canadian newspaper archives (Web archives like https://web.archive.org/web/*;type=text/torontosun.com/* or newspaper archives like NewspaperARCHIVE.com). It looks like the URL's with "-cp" were Canadian Press stories and a bunch of them list The Canadian Press as the author, publisher, agency, etc. and the URL's with "-ap" were Associated Press stories. Articles from those agencies should be available in a variety of places. Finding them is the challenge.

The wrestling articles could probably all be found on Slam Wrestling if someone is willing to do the work. I didn't see any equivalent partner sites for other sports or categories.--Jahalive (talk) 02:22, 2 February 2024 (UTC)[reply]

Warren Abstract Machine citations

Some citations at Warren Abstract Machine are broken, including this one: http://wambook.sourceforge.net/ 185.151.251.58 (talk) 08:54, 31 January 2024 (UTC)[reply]

I ran IABot on the page but it might take a few tries before the bot decides a link is dead. - GreenC 02:19, 2 February 2024 (UTC)[reply]

bibliotecadigital.ciren.cl

This Chilean digital library seems to have reformatted its URLs and is used in numerous articles as a source. Here's a list - it seems like they still host most if not all articles but under different URLs. Jo-Jo Eumerus (talk) 13:52, 31 January 2024 (UTC)[reply]

User:Jo-Jo_Eumerus is there an example of old to new? Most likely if it's not obvious how to change there is nothing we can do other than treat the old links as dead and add archives. -- GreenC 02:15, 2 February 2024 (UTC)[reply]

It seems like they still share the titles: https://bibliotecadigital.ciren.cl/server/api/core/bitstreams/72bd0a55-5f0d-4ea6-98c4-116797dce09e/content becomes https://bibliotecadigital.ciren.cl/items/96666f36-9fc4-4833-8a95-0e85c6fd98ce Jo-Jo Eumerus (talk) 11:13, 3 February 2024 (UTC)[reply]

cnnphilippines.com

CNN Philippines has ceased operations as of January 31, 2024. As of now, https://cnnphilippines.com feeds back a 503. We'll need IABot to comb through the roughly 2,200 pages (~3,000 links total) it's linked on and add archives to those citations. Relevant discussion at WT:TAMBAY#Archiving news articles of CNN Philippines. Chlod (say hi!) 17:17, 31 January 2024 (UTC)[reply]

Submitted to IABot. -- GreenC 02:12, 2 February 2024 (UTC)[reply]

I don't know why but IABot missed over 1,000 links so I reran it with WaybackMedic and got the rest. -- GreenC 02:36, 5 February 2024 (UTC)[reply]

Many thanks, @GreenC!

Chlod (say hi!) 12:48, 5 February 2024 (UTC)[reply]

themessenger.com

themessenger.com has shut down [11], we have around 186 uses per themessenger.com . All of the news articles are now linking to a blank page (e.g. [12]) Hemiauchenia (talk) 19:46, 1 February 2024 (UTC)[reply]

Submitted to IABot. -- GreenC 02:17, 2 February 2024 (UTC)[reply]

Wst.tv

Hi, with a heavy heart, the World Snooker Tour has changed its website and changed how all of their links work, and has no real naming convention for most links from wst.tv.

For instance: https://wst.tv/players/jimmy-white/ now is at https://www.wst.tv/players/6100064a-0ea4-4a0c-b8ee-0e2ddaa3def4

News articles and other items have also moved. If there is a smart way for this to be fixed, let me know, but I'm assuming we'd need to archive/mark as dead for the remainder. Lee Vilenski ^{(talk • contribs)} 19:39, 2 February 2024 (UTC)[reply]

User:Lee Vilenski I don't see a way to migrate the links, without redirect information. If some have links have a redirect the bot will pick it up automatically. Otherwise it will add an archive URL or {{dead link}}. Looks like 379 pages. -- GreenC 05:57, 3 February 2024 (UTC)[reply]

All of the news articles have moved from https://wst.tv/murphy-takes-season-opener/ to https://www.wst.tv/news/2023/july/21/murphy-takes-season-opener/

It's a mess, I certainly don't see a way to fix it. Lee Vilenski ^{(talk • contribs)} 09:04, 3 February 2024 (UTC)[reply]

It's surprisingly common how often websites migrate to a new platform, and don't leave redirects. If you want, contact them to ask if they plan to leave redirects and mention Wikipedia as an example. For now I can still add the archives, and if in the future they add redirects, the bot can undo the archives, make it live again and migrate to the new redirected URL. Either way it's basically flipping a switch in the bot. -- GreenC 14:12, 3 February 2024 (UTC)[reply]

Regarding contacting WST: My experience is that they do not respond. It might be better to try to convince their software suppliers to provide redirects. It would appear that there are two companies involved. One is https://urbanzoo.io/ and the other is https://www.imgarena.com/. Alan (talk) 12:42, 4 February 2024 (UTC)[reply]

It looks like content was not migrated. For example old site https://wst.tv/white-completes-epic-comeback/ search at the new site: "White Completes Epic Comeback" in the news tab Search with no result. Likewise Google: https://www.google.com/search?client=firefox-b-1-lm&q=%22White+Completes+Epic+Comeback%22+site%3Awst.tv .. looks like a complete resetting of the site and any matches found, like with the /players, could be happenstance. --- GreenC 17:39, 4 February 2024 (UTC)[reply]

I was able to build a preliminary map of the player pages, by headless browsering https://www.wst.tv/players/ and reformatting the HTML into this table, making a best guess on the left column. If the bot encounters a URL in the left column, it will replace with the right column. -- GreenC 17:14, 4 February 2024 (UTC)[reply]

I think it is much more complex than that. The old site had pages for many more players than are currently included in https://www.wst.tv/players which only has current players. Look at https://web.archive.org/web/20221126125804/https://wst.tv/player_category_taxonomy/other-players/. Most of these are gone completely, and many are referred to in our articles. Alan (talk) 10:12, 5 February 2024 (UTC)[reply]

...for instance: if you search in https://www.wst.tv/players for "Davis", you will only get Mark Davis. The old site included Steve Davis, Joe Davis and Fred Davis, who were significant players, apparently now forgotten by WST. Alan (talk) 10:27, 5 February 2024 (UTC)[reply]

OK I was afraid of that, it didn't seem like many players. It does appear the old site and content was completely abandoned, and the new site has some overlap but that is happenchance and can't be assumed to contain the same actual content on the page even if a match can be made. They didn't do a site migration. In this case for citation verification purposes the correct action is treat everything from the old site as a dead link and hope there are archive available. -- GreenC 14:40, 5 February 2024 (UTC)[reply]

That's pretty much what we've been doing. If you look at the List of snooker players you'll see that all the references have working archives. Alan (talk) 15:14, 5 February 2024 (UTC)[reply]

Extended content

awk -ilibrary 'BEGIN{f=readfile("snook1.html"); for(i=1;i<=splitn(f,a,i);i++) {j++; if(j == 5) {j = 1; print "https://wst.tv/players/" tolower(fname) "-" tolower(lname) " --  https://www.wst.tv/" subs("href=\"/","",id) }; if(j == 1) {match(a[i], /href=["]\/players\/[^"]+[^"]/, d); id=d[0]}; if(j == 2) {fname=strip(a[i])}; if(j==4){lname=strip(a[i])}  }  }'

https://wst.tv/players/mark-allen --  https://www.wst.tv/players/c37aba27-5b12-4fae-8a8b-9e749c7a25f3
https://wst.tv/players/zhang-anda --  https://www.wst.tv/players/0512f55a-faea-48df-a8fc-895fbcaef511
https://wst.tv/players/muhammad-asif --  https://www.wst.tv/players/3f7a3e33-3889-4c3f-91e3-a6d876c8b999
https://wst.tv/players/john-astley --  https://www.wst.tv/players/49e85842-53d7-4fdb-b69b-4a0db92ff06d
https://wst.tv/players/stuart-bingham --  https://www.wst.tv/players/ac932300-dacb-4e91-803b-99a03fa20853
https://wst.tv/players/luca-brecel --  https://www.wst.tv/players/cd124662-9d97-413c-9609-5051d002ab3b
https://wst.tv/players/jordan-brown --  https://www.wst.tv/players/c49e98bc-101d-419a-81aa-ff2caedb1734
https://wst.tv/players/oliver-brown --  https://www.wst.tv/players/fe7732cc-435e-4ba8-84bf-25f771f0f376
https://wst.tv/players/alfie-burden --  https://www.wst.tv/players/b6350368-74fc-4adf-92c8-ff9126e90541
https://wst.tv/players/ian-burns --  https://www.wst.tv/players/80c5ce19-2c01-48a4-85e4-c0304ac1ea4a
https://wst.tv/players/james-cahill --  https://www.wst.tv/players/4b7b307c-8ec8-4b53-b46e-6817081b95c4
https://wst.tv/players/stuart-carrington --  https://www.wst.tv/players/37a87bd0-792f-46ae-9377-56df3bef9034
https://wst.tv/players/ali-carter --  https://www.wst.tv/players/c796b82d-1040-422d-b27d-9249310b99a3
https://wst.tv/players/ashley-carty --  https://www.wst.tv/players/32dedd2f-0e09-4c03-bed3-679646da516b
https://wst.tv/players/jamie-clarke --  https://www.wst.tv/players/b29c7ae2-4f1c-413c-92bb-01ce78d99b08
https://wst.tv/players/sam-craigie --  https://www.wst.tv/players/edcdfdad-8c65-48fb-94f0-b9b3ac9ad04d
https://wst.tv/players/dominic-dale --  https://www.wst.tv/players/86fd8e51-3964-497c-97c3-729cef44b1f0
https://wst.tv/players/mark-davis --  https://www.wst.tv/players/0398e6dc-dcbf-4ff0-9ff2-7515212bc818
https://wst.tv/players/ryan-day --  https://www.wst.tv/players/5d419487-e341-4301-a4f5-e493a2a78754
https://wst.tv/players/ken-doherty --  https://www.wst.tv/players/e9c5eddd-e493-473e-b688-a3a2ea861800
https://wst.tv/players/scott-donaldson --  https://www.wst.tv/players/ff710b2f-cf05-45d6-840e-e10a7dc9f921
https://wst.tv/players/mostafa-dorgham --  https://www.wst.tv/players/14243478-1def-4ce2-a9a0-80a2858abe32
https://wst.tv/players/graeme-dott --  https://www.wst.tv/players/e0f5c435-470e-4ac3-8406-5ccd39fd475c
https://wst.tv/players/adam-duffy --  https://www.wst.tv/players/2fc33800-aaf8-4e7f-9af0-afc58df79ed2
https://wst.tv/players/ahmed aly-elsayed --  https://www.wst.tv/players/f65d2c9a-513a-458b-9c8b-edfc3aebbce6
https://wst.tv/players/dylan-emery --  https://www.wst.tv/players/0106063a-5a37-47c3-9cbf-67a891012a5e
https://wst.tv/players/reanne-evans --  https://www.wst.tv/players/bc4020ad-76c2-42a4-8994-dd0f756d0b6a
https://wst.tv/players/tom-ford --  https://www.wst.tv/players/69df4145-0b26-4a1e-9afb-c9ae74fa3fd1
https://wst.tv/players/marco-fu --  https://www.wst.tv/players/5012642c-60cc-4ab3-a41b-b152370562eb
https://wst.tv/players/david-gilbert --  https://www.wst.tv/players/9b2532c1-a189-4573-8320-f254d2f9bfde
https://wst.tv/players/martin-gould --  https://www.wst.tv/players/2a0e2004-856c-4f0b-ae3e-54dded6141f8
https://wst.tv/players/david-grace --  https://www.wst.tv/players/ad650d94-b08b-4dc5-9c5f-1653dc909127
https://wst.tv/players/liam-graham --  https://www.wst.tv/players/75baf94d-2c63-42dc-8acb-4e7a5a7bcb09
https://wst.tv/players/xiao-guodong --  https://www.wst.tv/players/c3d39c08-92fd-471b-8901-903a4bd22027
https://wst.tv/players/he-guoqiang --  https://www.wst.tv/players/5587fb4d-8517-4572-918e-65ff83b71d74
https://wst.tv/players/ma-hailong --  https://www.wst.tv/players/a2dbb55d-a612-4aef-9a1c-b9401232eac5
https://wst.tv/players/anthony-hamilton --  https://www.wst.tv/players/a3789843-3f0c-4161-b68a-b770fff83f96
https://wst.tv/players/lyu-haotian --  https://www.wst.tv/players/022c7a82-72c5-4fb5-a748-eb9b249d33fb
https://wst.tv/players/barry-hawkins --  https://www.wst.tv/players/ec561f17-e982-43b3-8807-82fc76adbe75
https://wst.tv/players/louis-heathcote --  https://www.wst.tv/players/e8d25a73-348b-40cd-b4e8-f757250d8900
https://wst.tv/players/stephen-hendry --  https://www.wst.tv/players/8ef2e9be-1769-40e9-8235-a143c9ed5951
https://wst.tv/players/andy-hicks --  https://www.wst.tv/players/66dd278a-0996-41ce-a3c4-3213fda0693c
https://wst.tv/players/john-higgins --  https://www.wst.tv/players/a5eecca1-8302-4739-84fc-6721627baa43
https://wst.tv/players/andrew-higginson --  https://www.wst.tv/players/83deba83-12f0-446d-ab47-e43f5b8ab09e
https://wst.tv/players/liam-highfield --  https://www.wst.tv/players/15860676-6802-4c5d-a06e-ce1356e8cdb7
https://wst.tv/players/aaron-hill --  https://www.wst.tv/players/be51ee14-4b28-4932-8d3d-af8011dc9201
https://wst.tv/players/liu-hongyu --  https://www.wst.tv/players/b614e094-3724-419a-a052-13261ace5b05
https://wst.tv/players/ashley-hugill --  https://www.wst.tv/players/6be559fd-aaac-45af-bd53-5eaa54b22553
https://wst.tv/players/mohamed-ibrahim --  https://www.wst.tv/players/1aa06013-1544-4fd7-b3e7-e8682676acd5
https://wst.tv/players/asjad-iqbal --  https://www.wst.tv/players/b765daf4-6bf6-41e5-b298-50769ed0d841
https://wst.tv/players/himanshu-jain --  https://www.wst.tv/players/218661d8-4ebe-4700-9907-0d0e2af0aeeb
https://wst.tv/players/si-jiahui --  https://www.wst.tv/players/f3c7e0cf-7cb6-405e-9ba1-4d02716a20c3
https://wst.tv/players/jak-jones --  https://www.wst.tv/players/036bc430-6c51-4d63-a366-a6ca218f7f39
https://wst.tv/players/jamie-jones --  https://www.wst.tv/players/a85bdd17-6038-43c8-9cec-d492e4a8a2df
https://wst.tv/players/mark-joyce --  https://www.wst.tv/players/710a2723-9694-4cca-8827-64ee50386179
https://wst.tv/players/jiang-jun --  https://www.wst.tv/players/cf6b1e24-e90e-4420-8290-1c1b0f9ea97e
https://wst.tv/players/ding-junhui --  https://www.wst.tv/players/3ff06750-8c3c-456c-8fac-58209b6f679e
https://wst.tv/players/pang-junxu --  https://www.wst.tv/players/9c842985-9f09-4bd0-aa6a-dafe523b40ee
https://wst.tv/players/anton-kazakov --  https://www.wst.tv/players/cbe2d832-5b47-4b91-bf4e-1e482c875825
https://wst.tv/players/jenson-kendrick --  https://www.wst.tv/players/17e59e8f-42b0-4332-bfaa-452366af8280
https://wst.tv/players/rebecca-kenna --  https://www.wst.tv/players/36672a61-a02f-428b-94a1-d42323bccbb3
https://wst.tv/players/lukas-kleckers --  https://www.wst.tv/players/ccd2b587-4c53-40a5-8b4a-e90b7663ce56
https://wst.tv/players/sanderson-lam --  https://www.wst.tv/players/52ba4e5c-fea6-426c-8ab0-7ca6828d13d5
https://wst.tv/players/rod-lawler --  https://www.wst.tv/players/c9a6633d-a5f9-4302-aacd-c2869fe9259b
https://wst.tv/players/julien-leclercq --  https://www.wst.tv/players/690dc31c-2392-4dd0-8dd9-52e5825cab46
https://wst.tv/players/andy-lee --  https://www.wst.tv/players/d758aa70-d8b1-446a-8284-b2a1ace120bb
https://wst.tv/players/david-lilley --  https://www.wst.tv/players/6757b432-8dc6-4c8d-a345-dac8eb58edf5
https://wst.tv/players/oliver-lines --  https://www.wst.tv/players/c7c75376-75ce-4e4b-ba26-d6c8a098ec9b
https://wst.tv/players/jack-lisowski --  https://www.wst.tv/players/d56f02ab-f2df-41ca-b9a4-24167aded141
https://wst.tv/players/stephen-maguire --  https://www.wst.tv/players/c07238de-bca9-4067-9749-00841bd06d28
https://wst.tv/players/anthony-mcgill --  https://www.wst.tv/players/ac8407bc-1cbf-4642-86a3-1e3cacbaeb62
https://wst.tv/players/ben-mertens --  https://www.wst.tv/players/e9a8f8aa-aa8c-4e64-baa4-3fcfd07ebb26
https://wst.tv/players/hammad-miah --  https://www.wst.tv/players/0ffdae01-5fad-40c8-8b9f-8eb3a942ecac
https://wst.tv/players/robert-milkins --  https://www.wst.tv/players/95eec847-2905-491f-abbe-92ff39038bda
https://wst.tv/players/stan-moody --  https://www.wst.tv/players/a65d6cc8-05fa-4827-8294-a1da17c975f6
https://wst.tv/players/ross-muir --  https://www.wst.tv/players/8051730e-7460-4773-b262-9188f2166f61
https://wst.tv/players/shaun-murphy --  https://www.wst.tv/players/03fe92d3-ad85-434c-bc17-5fe02a496187
https://wst.tv/players/mink-nutcharut --  https://www.wst.tv/players/ae9dffcf-4e09-472a-848e-21bf165f975e
https://wst.tv/players/fergal-o'brien --  https://www.wst.tv/players/cefe88f9-89da-4460-9ed6-6e04ec69cec3
https://wst.tv/players/joe-o'connor --  https://www.wst.tv/players/c2809815-3bd0-41fa-b727-458e22c98070
https://wst.tv/players/martin-o'donnell --  https://www.wst.tv/players/8195961a-a4b7-4ba7-960b-08ab4778dbd3
https://wst.tv/players/sean-o'sullivan --  https://www.wst.tv/players/50da4361-072d-418d-a2a0-721866983d02
https://wst.tv/players/ronnie-o'sullivan --  https://www.wst.tv/players/226c7294-655e-4925-bcde-17330ddfc438
https://wst.tv/players/jackson-page --  https://www.wst.tv/players/19ce247e-1824-4f94-8fe3-c94ce4056802
https://wst.tv/players/andrew-pagett --  https://www.wst.tv/players/d338eb63-5268-427e-a60c-52cb55a56625
https://wst.tv/players/tian-pengfei --  https://www.wst.tv/players/4b168b1a-298b-4c0a-adf6-e3190e36caff
https://wst.tv/players/joe-perry --  https://www.wst.tv/players/a33b80af-7f17-4bb1-8c5d-d36e45eb801c
https://wst.tv/players/andres-petrov --  https://www.wst.tv/players/fc2f8de1-4d6a-40a1-84d2-faea2c5fdb8d
https://wst.tv/players/manasawin-phetmalaikul --  https://www.wst.tv/players/b95907dd-e602-4448-9c78-00c865f4bcd5
https://wst.tv/players/liam-pullen --  https://www.wst.tv/players/44b09a9f-4ded-4b51-80f5-dbd28eb86274
https://wst.tv/players/jimmy-robertson --  https://www.wst.tv/players/4e7f33e8-925d-4442-b8f7-6023cd920d9e
https://wst.tv/players/neil-robertson --  https://www.wst.tv/players/8b83133a-4c15-4275-811e-bdf2cb02702f
https://wst.tv/players/noppon-saengkham --  https://www.wst.tv/players/aaf6c342-11f7-4d03-86b3-1144a4fd92f8
https://wst.tv/players/victor-sarkis --  https://www.wst.tv/players/a91dbb92-a44c-4076-8694-5c08cd40c534
https://wst.tv/players/mark-selby --  https://www.wst.tv/players/ba7831b4-ab75-4435-946a-c6f02e4e2d4b
https://wst.tv/players/matthew-selt --  https://www.wst.tv/players/c1ac359d-8359-405b-9879-74dd9b4a5b2c
https://wst.tv/players/xu-si --  https://www.wst.tv/players/f5586d0e-89f5-434e-8723-65046b1d6fe9
https://wst.tv/players/yuan-sijun --  https://www.wst.tv/players/734865fe-9ee2-4a3e-b4d1-035bf819aff2
https://wst.tv/players/ishpreet-singh chadha --  https://www.wst.tv/players/cc2c8bf7-0c67-4751-9e36-7b86718164b1
https://wst.tv/players/baipat-siripaporn --  https://www.wst.tv/players/53cd277e-28fe-48ed-a0ce-4d5d9745c85f
https://wst.tv/players/elliot-slessor --  https://www.wst.tv/players/b1239913-b987-4bae-a7f6-ff4eb481f503
https://wst.tv/players/matthew-stevens --  https://www.wst.tv/players/af1c65bd-d676-4bfc-8e93-65e34adf93c7
https://wst.tv/players/zak-surety --  https://www.wst.tv/players/24564b03-cfd6-474c-a653-0268241d632f
https://wst.tv/players/allan-taylor --  https://www.wst.tv/players/d1cf990f-e5b8-4584-acce-2bd9b534fcb5
https://wst.tv/players/ryan-thomerson --  https://www.wst.tv/players/1227cfd1-3132-405f-a672-4bdf64538df3
https://wst.tv/players/rory-thor --  https://www.wst.tv/players/9d43b39f-b17f-415f-b779-eebc550cd265
https://wst.tv/players/judd-trump --  https://www.wst.tv/players/e2f3cfe7-6138-4ce6-b1dc-77dcc1d0a65f
https://wst.tv/players/thepchaiya-un-nooh --  https://www.wst.tv/players/67203224-1d66-4c1e-b655-150f4f835aba
https://wst.tv/players/alexander-ursenbacher --  https://www.wst.tv/players/12be0769-d225-4c97-b687-4753e3c1bc26
https://wst.tv/players/hossein-vafaei --  https://www.wst.tv/players/99019ac8-ad6a-4927-9f93-1935ea43ca55
https://wst.tv/players/chris-wakelin --  https://www.wst.tv/players/a1beeb4b-2493-476c-9682-1900eb83c2d5
https://wst.tv/players/ricky-walden --  https://www.wst.tv/players/80b7e0a3-61eb-4a12-b4c4-9d6da83d5b24
https://wst.tv/players/daniel-wells --  https://www.wst.tv/players/a458950b-c644-4f16-b89a-543ccfccc61c
https://wst.tv/players/jimmy-white --  https://www.wst.tv/players/6100064a-0ea4-4a0c-b8ee-0e2ddaa3def4
https://wst.tv/players/michael-white --  https://www.wst.tv/players/9728dd54-b60e-4bf5-9149-cecb93b530ee
https://wst.tv/players/robbie-williams --  https://www.wst.tv/players/8954fbf2-3b42-4af9-981b-333ec1cd8b03
https://wst.tv/players/mark-williams --  https://www.wst.tv/players/6aaddcbb-345c-474a-9069-e7757e155729
https://wst.tv/players/gary-wilson --  https://www.wst.tv/players/e5f4377c-5119-4c0a-9a88-e42eb8e48677
https://wst.tv/players/kyren-wilson --  https://www.wst.tv/players/a8c0d3a6-706b-4bf0-8dce-9cde97fe88c4
https://wst.tv/players/ben-woollaston --  https://www.wst.tv/players/8ad4ff3f-9f92-44ba-a884-6c8a8e0dcf08
https://wst.tv/players/peng-yisong --  https://www.wst.tv/players/78c09fb8-3382-4cb0-a3e8-d0f041f23389
https://wst.tv/players/wu-yize --  https://www.wst.tv/players/d935d534-e696-4292-b773-e9b8efee1ea7
https://wst.tv/players/dean-young --  https://www.wst.tv/players/2354ac0b-0b04-4965-8ae3-1f135713005c
https://wst.tv/players/zhou-yuelong --  https://www.wst.tv/players/960cd1e6-2bb4-4229-aefe-447646412bf2
https://wst.tv/players/cao-yupeng --  https://www.wst.tv/players/3a9eca87-f640-4942-a9a7-74a47f40c562
https://wst.tv/players/long-zehuang --  https://www.wst.tv/players/40859ee8-e438-4062-aa9b-84e4e8e22bac
https://wst.tv/players/fan-zhengyi --  https://www.wst.tv/players/8cbf82f6-c417-421c-ae39-17c8103284cd

Done User:AlH42, the bot is done. It edited 371 articles. Added 1,267 archive URLs. Converted 1,248 cases of |url-status=live to dead. -- GreenC 03:20, 6 February 2024 (UTC)[reply]

Good work! My poor, poor watchlist. Just need to work out what we can do with the remainder. Lee Vilenski ^{(talk • contribs)} 08:07, 6 February 2024 (UTC)[reply]

User:AlH42: Not too bad, articles where the bot added a {{dead link}}

-- GreenC 14:48, 6 February 2024 (UTC)[reply]

Thank you. I think we still have a lot to do though. And the WST player template is a problem. Alan (talk) 15:10, 6 February 2024 (UTC)[reply]

The bot should have processed every link for the domain in mainspace. It might have missed some rare cases where it has trouble parsing the page. The template space I didn't do. There might be some in File space, I have not checked. Anyway if you think you need more bot help, let me know. -- GreenC 15:44, 6 February 2024 (UTC)[reply]

Google cache

Apparently, the Google cache (webcache.googleusercontent.com) is about to be shut down. There are over 5,000 pages with these links, and many of them appear to already be broken. These should probably be replaced with the original URL and/or proper archive links if available, depending on how they are currently being used. :Jay8g [V•T•E] 00:59, 5 February 2024 (UTC)[reply]

I'll work on this.

Doing... - if you see this request brought up elsewhere point them here. The links are messy and so are placements within templates it will need some care. -- GreenC 01:29, 5 February 2024 (UTC)[reply]

Would archive.org still have the info? If so we should try to get all of it so it is easily replaceable by regex. Geardona (talk to me?) 15:29, 5 February 2024 (UTC)[reply]

Not all the now-dead original urls have archive.org links, is it possible to put google cache archive links into archive.org to 'save' the pages? Kingsif (talk) 22:47, 8 February 2024 (UTC)[reply]

The bot is more sophisticated than blindly converting to archive.org links. It will take 4 different actions, depending on the status of the source URL (live or dead), and archive availability for 1) the source URL and 2) Google Cache URL (at archive.org). In terms of creating new archive.org pages from the GC page, that only would work if the GC is still working which in most cases it not true, and when it is true, the source URL is usually live anyway, so there is no reason for either GC or archive.org -- GreenC 17:25, 9 February 2024 (UTC)[reply]

linguistlist.org

This site is linked to by the linglist parameter in {{Infobox language}}. Snowmanonahoe (talk · contribs · typos) 23:19, 5 February 2024 (UTC)[reply]

User:Snowmanonahoe: I only see it on two pages: https://en.wikipedia.org/wiki/Special:LinkSearch?target=linguistlist.org%2Fmultitree --The site itself looks dead since 2008 or 2009. -- GreenC 00:49, 6 February 2024 (UTC)[reply]

GreenC: try Special:LinkSearch/multitree.org/codes/. Those urls all redirect to linguistlist.org/multitree now. Snowmanonahoe (talk · contribs · typos) 00:58, 6 February 2024 (UTC)[reply]

User:Snowmanonahoe: Ok. There are 75 pages. Compare results at Archive.today with WaybackMachine. I recommend a first pass using Archive.today, and any not available a second pass will use WaybackMachine. Sound alright? BTW the entire linguistlist.org site looks like it needs review 421 pages. They made a new website and the old inbound links are not working right. The new website links are working. -- GreenC 02:30, 6 February 2024 (UTC)[reply]

I think Kwamikagami should weigh in on this first. Snowmanonahoe (talk · contribs · typos) 03:08, 6 February 2024 (UTC)[reply]

I gave up on getting multitree links to work back when they were basically offline. I didn't know they were up again.

Multitree is generally not a RS. I would avoid using them except for extinct languages where Linglist maintains the description of the ISO code (like Ethnologue does for living languages); for classification trees of various authors (e.g. on our Austroasiatic article); and maybe a couple other things I'm not thinking of, but not as a general reference.

Is there something in particular you wanted me to weigh in on? I'd think we'd want to update the links when we use them, as I can't think of any reason we'd want to preserve or link to old versions of their pages. — kwami (talk) 03:30, 6 February 2024 (UTC)[reply]

I would avoid using them except [some] .. OK my job is to save the dead links by adding an archive URL. It's only about 75 links. You can remove some citations and keep others as you prefer, once the archives are added, so you will be able to see what the content of the page is. -- GreenC 14:54, 6 February 2024 (UTC)[reply]

That should work just fine. No need for you to evaluate the quality of the ref. — kwami (talk) 15:25, 6 February 2024 (UTC)[reply]

hobbes.nmsu.edu

OS/2 repository going offline in April. Only a few pages on enwiki. [13] -- GreenC 15:32, 6 February 2024 (UTC)[reply]

Iltalehti

I've noticed that some of the 1,222 Iltalehti URLs are dead but bots don't fix them:

All those pages give the Finnish-language text "Hakemaasi sivua ei valitettavasti löytynyt." (= "Unfortunately, the page you were looking for could not be found."). I tried to Google those URLs' headlines, but I couldn't find new URLs for them, so I think Iltalehti has removed those articles from their website completely. Could a bot go through Iltalehti URLs and set an archive link for the Iltalehti webpages that have that exact text on them? Also, if there's a way to fix these, can it be set that InternetArchiveBot fixes them eventually on other language wikis as well? Like GreenC did a month ago in the discussion above #Ilta-Sanomat to the Ilta-Sanomat URLs. For example, there are 10,070 Iltalehti URLs on fi.wikipedia. Thank you again. 85.76.13.79 (talk) 15:35, 11 February 2024 (UTC)[reply]