Wikipedia:Link rot/URL change requests/Archives/2024/February

From Wikipedia, the free encyclopedia

cnnphilippines.com

CNN Philippines has ceased operations as of January 31, 2024. As of now, https://cnnphilippines.com feeds back a 503. We'll need IABot to comb through the roughly 2,200 pages (~3,000 links total) it's linked on and add archives to those citations. Relevant discussion at WT:TAMBAY#Archiving news articles of CNN Philippines. Chlod (say hi!) 17:17, 31 January 2024 (UTC)

Submitted to IABot. -- GreenC 02:12, 2 February 2024 (UTC)
I don't know why but IABot missed over 1,000 links so I reran it with WaybackMedic and got the rest. -- GreenC 02:36, 5 February 2024 (UTC)
Many thanks, @GreenC! Chlod (say hi!) 12:48, 5 February 2024 (UTC)

Wst.tv

Hi, with a heavy heart, the World Snooker Tour has changed its website and changed how all of their links work, and has no real naming convention for most links from wst.tv.

For instance: https://wst.tv/players/jimmy-white/ now is at https://www.wst.tv/players/6100064a-0ea4-4a0c-b8ee-0e2ddaa3def4

News articles and other items have also moved. If there is a smart way for this to be fixed, let me know, but I'm assuming we'd need to archive/mark as dead for the remainder. Lee Vilenski (talkcontribs) 19:39, 2 February 2024 (UTC)

User:Lee Vilenski I don't see a way to migrate the links, without redirect information. If some have links have a redirect the bot will pick it up automatically. Otherwise it will add an archive URL or {{dead link}}. Looks like 379 pages. -- GreenC 05:57, 3 February 2024 (UTC)
All of the news articles have moved from https://wst.tv/murphy-takes-season-opener/ to https://www.wst.tv/news/2023/july/21/murphy-takes-season-opener/
It's a mess, I certainly don't see a way to fix it. Lee Vilenski (talkcontribs) 09:04, 3 February 2024 (UTC)
It's surprisingly common how often websites migrate to a new platform, and don't leave redirects. If you want, contact them to ask if they plan to leave redirects and mention Wikipedia as an example. For now I can still add the archives, and if in the future they add redirects, the bot can undo the archives, make it live again and migrate to the new redirected URL. Either way it's basically flipping a switch in the bot. -- GreenC 14:12, 3 February 2024 (UTC)
Regarding contacting WST: My experience is that they do not respond. It might be better to try to convince their software suppliers to provide redirects. It would appear that there are two companies involved. One is https://urbanzoo.io/ and the other is https://www.imgarena.com/.  Alan  (talk) 12:42, 4 February 2024 (UTC)
It looks like content was not migrated. For example old site https://wst.tv/white-completes-epic-comeback/ search at the new site: "White Completes Epic Comeback" in the news tab Search with no result. Likewise Google: https://www.google.com/search?client=firefox-b-1-lm&q=%22White+Completes+Epic+Comeback%22+site%3Awst.tv .. looks like a complete resetting of the site and any matches found, like with the /players, could be happenstance. --- GreenC 17:39, 4 February 2024 (UTC)

I was able to build a preliminary map of the player pages, by headless browsering https://www.wst.tv/players/ and reformatting the HTML into this table, making a best guess on the left column. If the bot encounters a URL in the left column, it will replace with the right column. -- GreenC 17:14, 4 February 2024 (UTC)

I think it is much more complex than that. The old site had pages for many more players than are currently included in https://www.wst.tv/players which only has current players. Look at https://web.archive.org/web/20221126125804/https://wst.tv/player_category_taxonomy/other-players/. Most of these are gone completely, and many are referred to in our articles.  Alan  (talk) 10:12, 5 February 2024 (UTC)
...for instance: if you search in https://www.wst.tv/players for "Davis", you will only get Mark Davis. The old site included Steve Davis, Joe Davis and Fred Davis, who were significant players, apparently now forgotten by WST.  Alan  (talk) 10:27, 5 February 2024 (UTC)
OK I was afraid of that, it didn't seem like many players. It does appear the old site and content was completely abandoned, and the new site has some overlap but that is happenchance and can't be assumed to contain the same actual content on the page even if a match can be made. They didn't do a site migration. In this case for citation verification purposes the correct action is treat everything from the old site as a dead link and hope there are archive available. -- GreenC 14:40, 5 February 2024 (UTC)
That's pretty much what we've been doing. If you look at the List of snooker players you'll see that all the references have working archives.  Alan  (talk) 15:14, 5 February 2024 (UTC)
Extended content
awk -ilibrary 'BEGIN{f=readfile("snook1.html"); for(i=1;i<=splitn(f,a,i);i++) {j++; if(j == 5) {j = 1; print "https://wst.tv/players/" tolower(fname) "-" tolower(lname) " --  https://www.wst.tv/" subs("href=\"/","",id) }; if(j == 1) {match(a[i], /href=["]\/players\/[^"]+[^"]/, d); id=d[0]}; if(j == 2) {fname=strip(a[i])}; if(j==4){lname=strip(a[i])}  }  }'

https://wst.tv/players/mark-allen --  https://www.wst.tv/players/c37aba27-5b12-4fae-8a8b-9e749c7a25f3
https://wst.tv/players/zhang-anda --  https://www.wst.tv/players/0512f55a-faea-48df-a8fc-895fbcaef511
https://wst.tv/players/muhammad-asif --  https://www.wst.tv/players/3f7a3e33-3889-4c3f-91e3-a6d876c8b999
https://wst.tv/players/john-astley --  https://www.wst.tv/players/49e85842-53d7-4fdb-b69b-4a0db92ff06d
https://wst.tv/players/stuart-bingham --  https://www.wst.tv/players/ac932300-dacb-4e91-803b-99a03fa20853
https://wst.tv/players/luca-brecel --  https://www.wst.tv/players/cd124662-9d97-413c-9609-5051d002ab3b
https://wst.tv/players/jordan-brown --  https://www.wst.tv/players/c49e98bc-101d-419a-81aa-ff2caedb1734
https://wst.tv/players/oliver-brown --  https://www.wst.tv/players/fe7732cc-435e-4ba8-84bf-25f771f0f376
https://wst.tv/players/alfie-burden --  https://www.wst.tv/players/b6350368-74fc-4adf-92c8-ff9126e90541
https://wst.tv/players/ian-burns --  https://www.wst.tv/players/80c5ce19-2c01-48a4-85e4-c0304ac1ea4a
https://wst.tv/players/james-cahill --  https://www.wst.tv/players/4b7b307c-8ec8-4b53-b46e-6817081b95c4
https://wst.tv/players/stuart-carrington --  https://www.wst.tv/players/37a87bd0-792f-46ae-9377-56df3bef9034
https://wst.tv/players/ali-carter --  https://www.wst.tv/players/c796b82d-1040-422d-b27d-9249310b99a3
https://wst.tv/players/ashley-carty --  https://www.wst.tv/players/32dedd2f-0e09-4c03-bed3-679646da516b
https://wst.tv/players/jamie-clarke --  https://www.wst.tv/players/b29c7ae2-4f1c-413c-92bb-01ce78d99b08
https://wst.tv/players/sam-craigie --  https://www.wst.tv/players/edcdfdad-8c65-48fb-94f0-b9b3ac9ad04d
https://wst.tv/players/dominic-dale --  https://www.wst.tv/players/86fd8e51-3964-497c-97c3-729cef44b1f0
https://wst.tv/players/mark-davis --  https://www.wst.tv/players/0398e6dc-dcbf-4ff0-9ff2-7515212bc818
https://wst.tv/players/ryan-day --  https://www.wst.tv/players/5d419487-e341-4301-a4f5-e493a2a78754
https://wst.tv/players/ken-doherty --  https://www.wst.tv/players/e9c5eddd-e493-473e-b688-a3a2ea861800
https://wst.tv/players/scott-donaldson --  https://www.wst.tv/players/ff710b2f-cf05-45d6-840e-e10a7dc9f921
https://wst.tv/players/mostafa-dorgham --  https://www.wst.tv/players/14243478-1def-4ce2-a9a0-80a2858abe32
https://wst.tv/players/graeme-dott --  https://www.wst.tv/players/e0f5c435-470e-4ac3-8406-5ccd39fd475c
https://wst.tv/players/adam-duffy --  https://www.wst.tv/players/2fc33800-aaf8-4e7f-9af0-afc58df79ed2
https://wst.tv/players/ahmed aly-elsayed --  https://www.wst.tv/players/f65d2c9a-513a-458b-9c8b-edfc3aebbce6
https://wst.tv/players/dylan-emery --  https://www.wst.tv/players/0106063a-5a37-47c3-9cbf-67a891012a5e
https://wst.tv/players/reanne-evans --  https://www.wst.tv/players/bc4020ad-76c2-42a4-8994-dd0f756d0b6a
https://wst.tv/players/tom-ford --  https://www.wst.tv/players/69df4145-0b26-4a1e-9afb-c9ae74fa3fd1
https://wst.tv/players/marco-fu --  https://www.wst.tv/players/5012642c-60cc-4ab3-a41b-b152370562eb
https://wst.tv/players/david-gilbert --  https://www.wst.tv/players/9b2532c1-a189-4573-8320-f254d2f9bfde
https://wst.tv/players/martin-gould --  https://www.wst.tv/players/2a0e2004-856c-4f0b-ae3e-54dded6141f8
https://wst.tv/players/david-grace --  https://www.wst.tv/players/ad650d94-b08b-4dc5-9c5f-1653dc909127
https://wst.tv/players/liam-graham --  https://www.wst.tv/players/75baf94d-2c63-42dc-8acb-4e7a5a7bcb09
https://wst.tv/players/xiao-guodong --  https://www.wst.tv/players/c3d39c08-92fd-471b-8901-903a4bd22027
https://wst.tv/players/he-guoqiang --  https://www.wst.tv/players/5587fb4d-8517-4572-918e-65ff83b71d74
https://wst.tv/players/ma-hailong --  https://www.wst.tv/players/a2dbb55d-a612-4aef-9a1c-b9401232eac5
https://wst.tv/players/anthony-hamilton --  https://www.wst.tv/players/a3789843-3f0c-4161-b68a-b770fff83f96
https://wst.tv/players/lyu-haotian --  https://www.wst.tv/players/022c7a82-72c5-4fb5-a748-eb9b249d33fb
https://wst.tv/players/barry-hawkins --  https://www.wst.tv/players/ec561f17-e982-43b3-8807-82fc76adbe75
https://wst.tv/players/louis-heathcote --  https://www.wst.tv/players/e8d25a73-348b-40cd-b4e8-f757250d8900
https://wst.tv/players/stephen-hendry --  https://www.wst.tv/players/8ef2e9be-1769-40e9-8235-a143c9ed5951
https://wst.tv/players/andy-hicks --  https://www.wst.tv/players/66dd278a-0996-41ce-a3c4-3213fda0693c
https://wst.tv/players/john-higgins --  https://www.wst.tv/players/a5eecca1-8302-4739-84fc-6721627baa43
https://wst.tv/players/andrew-higginson --  https://www.wst.tv/players/83deba83-12f0-446d-ab47-e43f5b8ab09e
https://wst.tv/players/liam-highfield --  https://www.wst.tv/players/15860676-6802-4c5d-a06e-ce1356e8cdb7
https://wst.tv/players/aaron-hill --  https://www.wst.tv/players/be51ee14-4b28-4932-8d3d-af8011dc9201
https://wst.tv/players/liu-hongyu --  https://www.wst.tv/players/b614e094-3724-419a-a052-13261ace5b05
https://wst.tv/players/ashley-hugill --  https://www.wst.tv/players/6be559fd-aaac-45af-bd53-5eaa54b22553
https://wst.tv/players/mohamed-ibrahim --  https://www.wst.tv/players/1aa06013-1544-4fd7-b3e7-e8682676acd5
https://wst.tv/players/asjad-iqbal --  https://www.wst.tv/players/b765daf4-6bf6-41e5-b298-50769ed0d841
https://wst.tv/players/himanshu-jain --  https://www.wst.tv/players/218661d8-4ebe-4700-9907-0d0e2af0aeeb
https://wst.tv/players/si-jiahui --  https://www.wst.tv/players/f3c7e0cf-7cb6-405e-9ba1-4d02716a20c3
https://wst.tv/players/jak-jones --  https://www.wst.tv/players/036bc430-6c51-4d63-a366-a6ca218f7f39
https://wst.tv/players/jamie-jones --  https://www.wst.tv/players/a85bdd17-6038-43c8-9cec-d492e4a8a2df
https://wst.tv/players/mark-joyce --  https://www.wst.tv/players/710a2723-9694-4cca-8827-64ee50386179
https://wst.tv/players/jiang-jun --  https://www.wst.tv/players/cf6b1e24-e90e-4420-8290-1c1b0f9ea97e
https://wst.tv/players/ding-junhui --  https://www.wst.tv/players/3ff06750-8c3c-456c-8fac-58209b6f679e
https://wst.tv/players/pang-junxu --  https://www.wst.tv/players/9c842985-9f09-4bd0-aa6a-dafe523b40ee
https://wst.tv/players/anton-kazakov --  https://www.wst.tv/players/cbe2d832-5b47-4b91-bf4e-1e482c875825
https://wst.tv/players/jenson-kendrick --  https://www.wst.tv/players/17e59e8f-42b0-4332-bfaa-452366af8280
https://wst.tv/players/rebecca-kenna --  https://www.wst.tv/players/36672a61-a02f-428b-94a1-d42323bccbb3
https://wst.tv/players/lukas-kleckers --  https://www.wst.tv/players/ccd2b587-4c53-40a5-8b4a-e90b7663ce56
https://wst.tv/players/sanderson-lam --  https://www.wst.tv/players/52ba4e5c-fea6-426c-8ab0-7ca6828d13d5
https://wst.tv/players/rod-lawler --  https://www.wst.tv/players/c9a6633d-a5f9-4302-aacd-c2869fe9259b
https://wst.tv/players/julien-leclercq --  https://www.wst.tv/players/690dc31c-2392-4dd0-8dd9-52e5825cab46
https://wst.tv/players/andy-lee --  https://www.wst.tv/players/d758aa70-d8b1-446a-8284-b2a1ace120bb
https://wst.tv/players/david-lilley --  https://www.wst.tv/players/6757b432-8dc6-4c8d-a345-dac8eb58edf5
https://wst.tv/players/oliver-lines --  https://www.wst.tv/players/c7c75376-75ce-4e4b-ba26-d6c8a098ec9b
https://wst.tv/players/jack-lisowski --  https://www.wst.tv/players/d56f02ab-f2df-41ca-b9a4-24167aded141
https://wst.tv/players/stephen-maguire --  https://www.wst.tv/players/c07238de-bca9-4067-9749-00841bd06d28
https://wst.tv/players/anthony-mcgill --  https://www.wst.tv/players/ac8407bc-1cbf-4642-86a3-1e3cacbaeb62
https://wst.tv/players/ben-mertens --  https://www.wst.tv/players/e9a8f8aa-aa8c-4e64-baa4-3fcfd07ebb26
https://wst.tv/players/hammad-miah --  https://www.wst.tv/players/0ffdae01-5fad-40c8-8b9f-8eb3a942ecac
https://wst.tv/players/robert-milkins --  https://www.wst.tv/players/95eec847-2905-491f-abbe-92ff39038bda
https://wst.tv/players/stan-moody --  https://www.wst.tv/players/a65d6cc8-05fa-4827-8294-a1da17c975f6
https://wst.tv/players/ross-muir --  https://www.wst.tv/players/8051730e-7460-4773-b262-9188f2166f61
https://wst.tv/players/shaun-murphy --  https://www.wst.tv/players/03fe92d3-ad85-434c-bc17-5fe02a496187
https://wst.tv/players/mink-nutcharut --  https://www.wst.tv/players/ae9dffcf-4e09-472a-848e-21bf165f975e
https://wst.tv/players/fergal-o'brien --  https://www.wst.tv/players/cefe88f9-89da-4460-9ed6-6e04ec69cec3
https://wst.tv/players/joe-o'connor --  https://www.wst.tv/players/c2809815-3bd0-41fa-b727-458e22c98070
https://wst.tv/players/martin-o'donnell --  https://www.wst.tv/players/8195961a-a4b7-4ba7-960b-08ab4778dbd3
https://wst.tv/players/sean-o'sullivan --  https://www.wst.tv/players/50da4361-072d-418d-a2a0-721866983d02
https://wst.tv/players/ronnie-o'sullivan --  https://www.wst.tv/players/226c7294-655e-4925-bcde-17330ddfc438
https://wst.tv/players/jackson-page --  https://www.wst.tv/players/19ce247e-1824-4f94-8fe3-c94ce4056802
https://wst.tv/players/andrew-pagett --  https://www.wst.tv/players/d338eb63-5268-427e-a60c-52cb55a56625
https://wst.tv/players/tian-pengfei --  https://www.wst.tv/players/4b168b1a-298b-4c0a-adf6-e3190e36caff
https://wst.tv/players/joe-perry --  https://www.wst.tv/players/a33b80af-7f17-4bb1-8c5d-d36e45eb801c
https://wst.tv/players/andres-petrov --  https://www.wst.tv/players/fc2f8de1-4d6a-40a1-84d2-faea2c5fdb8d
https://wst.tv/players/manasawin-phetmalaikul --  https://www.wst.tv/players/b95907dd-e602-4448-9c78-00c865f4bcd5
https://wst.tv/players/liam-pullen --  https://www.wst.tv/players/44b09a9f-4ded-4b51-80f5-dbd28eb86274
https://wst.tv/players/jimmy-robertson --  https://www.wst.tv/players/4e7f33e8-925d-4442-b8f7-6023cd920d9e
https://wst.tv/players/neil-robertson --  https://www.wst.tv/players/8b83133a-4c15-4275-811e-bdf2cb02702f
https://wst.tv/players/noppon-saengkham --  https://www.wst.tv/players/aaf6c342-11f7-4d03-86b3-1144a4fd92f8
https://wst.tv/players/victor-sarkis --  https://www.wst.tv/players/a91dbb92-a44c-4076-8694-5c08cd40c534
https://wst.tv/players/mark-selby --  https://www.wst.tv/players/ba7831b4-ab75-4435-946a-c6f02e4e2d4b
https://wst.tv/players/matthew-selt --  https://www.wst.tv/players/c1ac359d-8359-405b-9879-74dd9b4a5b2c
https://wst.tv/players/xu-si --  https://www.wst.tv/players/f5586d0e-89f5-434e-8723-65046b1d6fe9
https://wst.tv/players/yuan-sijun --  https://www.wst.tv/players/734865fe-9ee2-4a3e-b4d1-035bf819aff2
https://wst.tv/players/ishpreet-singh chadha --  https://www.wst.tv/players/cc2c8bf7-0c67-4751-9e36-7b86718164b1
https://wst.tv/players/baipat-siripaporn --  https://www.wst.tv/players/53cd277e-28fe-48ed-a0ce-4d5d9745c85f
https://wst.tv/players/elliot-slessor --  https://www.wst.tv/players/b1239913-b987-4bae-a7f6-ff4eb481f503
https://wst.tv/players/matthew-stevens --  https://www.wst.tv/players/af1c65bd-d676-4bfc-8e93-65e34adf93c7
https://wst.tv/players/zak-surety --  https://www.wst.tv/players/24564b03-cfd6-474c-a653-0268241d632f
https://wst.tv/players/allan-taylor --  https://www.wst.tv/players/d1cf990f-e5b8-4584-acce-2bd9b534fcb5
https://wst.tv/players/ryan-thomerson --  https://www.wst.tv/players/1227cfd1-3132-405f-a672-4bdf64538df3
https://wst.tv/players/rory-thor --  https://www.wst.tv/players/9d43b39f-b17f-415f-b779-eebc550cd265
https://wst.tv/players/judd-trump --  https://www.wst.tv/players/e2f3cfe7-6138-4ce6-b1dc-77dcc1d0a65f
https://wst.tv/players/thepchaiya-un-nooh --  https://www.wst.tv/players/67203224-1d66-4c1e-b655-150f4f835aba
https://wst.tv/players/alexander-ursenbacher --  https://www.wst.tv/players/12be0769-d225-4c97-b687-4753e3c1bc26
https://wst.tv/players/hossein-vafaei --  https://www.wst.tv/players/99019ac8-ad6a-4927-9f93-1935ea43ca55
https://wst.tv/players/chris-wakelin --  https://www.wst.tv/players/a1beeb4b-2493-476c-9682-1900eb83c2d5
https://wst.tv/players/ricky-walden --  https://www.wst.tv/players/80b7e0a3-61eb-4a12-b4c4-9d6da83d5b24
https://wst.tv/players/daniel-wells --  https://www.wst.tv/players/a458950b-c644-4f16-b89a-543ccfccc61c
https://wst.tv/players/jimmy-white --  https://www.wst.tv/players/6100064a-0ea4-4a0c-b8ee-0e2ddaa3def4
https://wst.tv/players/michael-white --  https://www.wst.tv/players/9728dd54-b60e-4bf5-9149-cecb93b530ee
https://wst.tv/players/robbie-williams --  https://www.wst.tv/players/8954fbf2-3b42-4af9-981b-333ec1cd8b03
https://wst.tv/players/mark-williams --  https://www.wst.tv/players/6aaddcbb-345c-474a-9069-e7757e155729
https://wst.tv/players/gary-wilson --  https://www.wst.tv/players/e5f4377c-5119-4c0a-9a88-e42eb8e48677
https://wst.tv/players/kyren-wilson --  https://www.wst.tv/players/a8c0d3a6-706b-4bf0-8dce-9cde97fe88c4
https://wst.tv/players/ben-woollaston --  https://www.wst.tv/players/8ad4ff3f-9f92-44ba-a884-6c8a8e0dcf08
https://wst.tv/players/peng-yisong --  https://www.wst.tv/players/78c09fb8-3382-4cb0-a3e8-d0f041f23389
https://wst.tv/players/wu-yize --  https://www.wst.tv/players/d935d534-e696-4292-b773-e9b8efee1ea7
https://wst.tv/players/dean-young --  https://www.wst.tv/players/2354ac0b-0b04-4965-8ae3-1f135713005c
https://wst.tv/players/zhou-yuelong --  https://www.wst.tv/players/960cd1e6-2bb4-4229-aefe-447646412bf2
https://wst.tv/players/cao-yupeng --  https://www.wst.tv/players/3a9eca87-f640-4942-a9a7-74a47f40c562
https://wst.tv/players/long-zehuang --  https://www.wst.tv/players/40859ee8-e438-4062-aa9b-84e4e8e22bac
https://wst.tv/players/fan-zhengyi --  https://www.wst.tv/players/8cbf82f6-c417-421c-ae39-17c8103284cd
  •  Done User:AlH42, the bot is done. It edited 371 articles. Added 1,267 archive URLs. Converted 1,248 cases of |url-status=live to dead. -- GreenC 03:20, 6 February 2024 (UTC)
Good work! My poor, poor watchlist. Just need to work out what we can do with the remainder. Lee Vilenski (talkcontribs) 08:07, 6 February 2024 (UTC)
User:AlH42: Not too bad, articles where the bot added a {{dead link}}
-- GreenC 14:48, 6 February 2024 (UTC)
Thank you. I think we still have a lot to do though. And the WST player template is a problem.  Alan  (talk) 15:10, 6 February 2024 (UTC)
The bot should have processed every link for the domain in mainspace. It might have missed some rare cases where it has trouble parsing the page. The template space I didn't do. There might be some in File space, I have not checked. Anyway if you think you need more bot help, let me know. -- GreenC 15:44, 6 February 2024 (UTC)

Google cache

Apparently, the Google cache (webcache.googleusercontent.com) is about to be shut down. There are over 5,000 pages with these links, and many of them appear to already be broken. These should probably be replaced with the original URL and/or proper archive links if available, depending on how they are currently being used. :Jay8g [VTE] 00:59, 5 February 2024 (UTC)

I'll work on this.  Doing... - if you see this request brought up elsewhere point them here. The links are messy and so are placements within templates it will need some care. -- GreenC 01:29, 5 February 2024 (UTC)
Would archive.org still have the info? If so we should try to get all of it so it is easily replaceable by regex. Geardona (talk to me?) 15:29, 5 February 2024 (UTC)
Not all the now-dead original urls have archive.org links, is it possible to put google cache archive links into archive.org to 'save' the pages? Kingsif (talk) 22:47, 8 February 2024 (UTC)
The bot is more sophisticated than blindly converting to archive.org links. It will take 4 different actions, depending on the status of the source URL (live or dead), and archive availability for 1) the source URL and 2) Google Cache URL (at archive.org). In terms of creating new archive.org pages from the GC page, that only would work if the GC is still working which in most cases it not true, and when it is true, the source URL is usually live anyway, so there is no reason for either GC or archive.org -- GreenC 17:25, 9 February 2024 (UTC)
  •  Done - Google Cache is eliminated from Enwiki. It was in about 5,000 pages. It was a significant undertaking for multiple reasons. There are still 834 inside archive.org pages. One of four actions were taken: 1) original URL is live simply remove the Google Cache and replace with the original URL 2) Original URL is dead and no archives available, remove the Google Cache replace with the original URL and add a {{dead link}} 3) Original URL is dead but has an archive at another provider available 4) Original URL is dead and the Google Cache URL has an archive at another archive provider (the 834 linked above). Option #1 was most common surprisingly. For anyone wanting to do this elsewhere, I made a tool to convert Google Cache URLs to the original source URL: https://github.com/greencardamom/Googcacheparse -- GreenC 16:19, 11 February 2024 (UTC)
    Thanks again for your work on this! :Jay8g [VTE] 22:32, 11 February 2024 (UTC)

Canoe.ca

It appears that canoe.ca was once a news website that is referenced in quite a few articles, but it has since been usurped by another gambling website. Unfortunately, the new owners have also blocked the Wayback Machine and only some of the pages I've seen are in archive.today. However, some of the links appear to be salvageable by changing "canoe.ca" to "canoe.com" and then going into the Wayback Machine. Is this something that the bots can help with? Thanks! :Jay8g [VTE] 23:32, 27 January 2024 (UTC)

That was probably a little confusing. There are basically three ways that existing canoe.ca links can be archived:
  • Archive.today might have a direct archive of the canoe.ca URL
  • The Wayback Machine might have an archive of the same page with "canoe.ca" replaced with "canoe.com"
  • Archive.today might have an archive of the same page with "canoe.ca" replaced with "canoe.com"
As far as I can tell, the canoe.ca and canoe.com pages were completely identical, but all of the links I've checked seem to be dead on both domains. Unfortunately, there are over 10,000 of these links according to Special:LinkSearch, which is too much for me to deal with manually. There are also quite a few dead links to canoe.com itself, but at least those aren't usurped and can be found in the Wayback Machine normally. :Jay8g [VTE] 23:45, 27 January 2024 (UTC)
Notes for canoe.ca ie. canoe.com:
Proposal for canoe.ca in five runs of WaybackMedic:
  1. Pass 1a (canoe1): Remove all Wayback links  Done - remove 391 archives
  2. Pass 1b (canoe3 & canoe4): Remove all WebCite links (SSL errors and unstable)  Done - remove 329 archives
  3. Pass 2 (canoe2): Attempt conversion to archive.today. Else add {{dead link}}  Done - add 8,353 archive.today, 633 {{dead link}} (total including existing), change 578 |url-status=live to dead
  4. Pass 3a (canoe5): For canoe.ca with a {{dead link}}: check the API if a Wayback link exists if it were converted to canoe.com - if so, change source link to canoe.com and set to live status and remove {{dead link}}  Done - 157 URLs converted to canoe.com
  5. Pass 3b (canoe6): Check the canoe.com links from Pass 3a for link rot, if so, convert to Wayback or archive.today links  Done - 294 Wayback URLs added to canoe.com URLs in the same set of articles processed during Pass 3a (excess due to pre-existing canoe.com links that were dead)
  6. Pass 3c (canoe7): Make a list of citations with {{dead link}}  Done 406 cites listed at Wikipedia:Link rot/cases/canoe.ca
  7. Pass 4 (judi14a and judi14b): Convert canoe.ca to a usurped citation per steps at WP:USURPURL. This will include completely deleting citations that have no archive URL  Done Edited approximately 6,000 pages.
Proposal for canoe.com
  1. Pass 5 (canoecom): Check for dead links and soft-404s as normal  Done Edited 1,132 articles out of 1,953 checked. Added 1,820 archive URLs. Change 371 |url-status=live to dead
----
User:Jay8g per above proposal. Each pass of the bot has different settings enabled. When done in this order, it should work. The "Pass 3" might result in a lot of deleted citations, I'll let you know before running that one. This will require at least 4 runs of the bot of 6k pages each, plus some manual steps it will take a while. -- GreenC 01:39, 28 January 2024 (UTC)
That all sounds good to me! Thanks! :Jay8g [VTE] 04:01, 28 January 2024 (UTC)
I just thought of one issue with pass 4: Because canoe.ca was a news aggregator, some of the citations that currently link to it can be found on other, unrelated websites. For example, the reference in Dwayne Johnson (the first link that comes up for me in the 6,148-page search) points to http://www.canoe.ca/SlamWrestlingArchive/feb24_rocky.html on canoe.ca, but the same article can be found at https://slamwrestling.net/index.php/1998/02/24/a-piece-of-the-rock/ on Slam Wrestling's own website. That exact article is also available using the Wayback Machine with canoe.com, but if it was not available there, replacing it with the slamwrestling.net URL would be better than deleting it. Of course, there's no way to do that without manual work, and anything that's just a bare URL is gone for good.
I will be interested to see how many canoe.ca links are left after steps 1-4, to see whether it makes sense to remove those links entirely or try to find the same articles posted elsewhere first. I'm not sure if this is a situation that has come up before with usurped URLs like this or what the standard practice is. :Jay8g [VTE] 04:18, 28 January 2024 (UTC)
For the rocky example, there is no map to know where the canoe.ca link should go. And since canoe.ca is now a usurped vice site we are supposed to hide it from view. And if no archive is available, delete it. Let's wait and see how many there are after Pass 3. One solution is rather than delete the entire cite, convert to {{citation}} which doesn't require a URL, convert the |work= to Slam Wrestling, and remove the canoe.ca URL. This kind of work is laborious because there are so many permutations of citation templates and argument combinations people use it's not consistent. Also the square and bare links that don't use templates. -- GreenC 16:54, 28 January 2024 (UTC)
Yes, there's no automatic way to fix that. I'm also not sure how many of the links would even be able to be manually fixed, since some might not be able to be easily found on other domains. I agree with waiting to see what is left after the bot tries to find archive links to see if it's worth me trying to fix the leftovers manually. :Jay8g [VTE] 22:05, 28 January 2024 (UTC)
User:Jay8g: Here are the remaining 406 citations with {{dead link}}: Wikipedia:Link rot/cases/canoe.ca .. there are over 11,000 in total on enwiki so the archival success rate was about 96% which is very good. Something still needs to be done with the 406. Options are nuke the citation, which is the only choice for square links. Convert to {{cite news}} and remove the |url= - this option is normally done when the cite can be found offline like microfiche of a newspaper. Of course, there is manual work, where anything is possible. In the mean time, I'll start processing the rest of the canoe.com links, many appear inoperable. -- GreenC 14:36, 30 January 2024 (UTC)
I spot-checked several of the remaining 406 dead links and was unable to find alternative links for any of them, so I think we should be good to remove the remaining links. Thanks for all your help on this -- I'm impressed by how many links were able to be fixed! :Jay8g [VTE] 21:50, 30 January 2024 (UTC)
User:Jay8g sounds good. I'll be working on this over the next few days and will post when done. Thanks for bringing this to attention. I've been aware of Canoe, but didn't know it was usurped and excluded from Wayback, that's a new scenario (plus the canoe.com twist). It basically required every feature my bot has and then some, never made so many passes. This was a good learning experience what the bot can do and how. -- GreenC 02:14, 31 January 2024 (UTC)
As noted above, this is all done finally. -- GreenC 02:34, 5 February 2024 (UTC)
Most of the content on canoe.ca was from the Sun Media newspapers, so many of these articles can probably be found in Canadian newspaper archives (Web archives like https://web.archive.org/web/*;type=text/torontosun.com/* or newspaper archives like NewspaperARCHIVE.com). It looks like the URL's with "-cp" were Canadian Press stories and a bunch of them list The Canadian Press as the author, publisher, agency, etc. and the URL's with "-ap" were Associated Press stories. Articles from those agencies should be available in a variety of places. Finding them is the challenge.
The wrestling articles could probably all be found on Slam Wrestling if someone is willing to do the work. I didn't see any equivalent partner sites for other sports or categories.--Jahalive (talk) 02:22, 2 February 2024 (UTC)
I'm guessing you're not interested in customizing a bot to pull the news agency and date from the URLs of those CP and AP stories.--Jahalive (talk) 00:38, 13 February 2024 (UTC)
User:Jahalive, your idea is a good one. I'm going to pass because there is more work than I have time for. I want to use the bot and my time where it has the most impact, fixing link rot, that's really the bots specialty. Your idea could probably be done by other bot writers. Could try BOTREQ or AWBREQ -- GreenC 01:19, 13 February 2024 (UTC)

Warren Abstract Machine citations

Some citations at Warren Abstract Machine are broken, including this one: http://wambook.sourceforge.net/ 185.151.251.58 (talk) 08:54, 31 January 2024 (UTC)

I ran IABot on the page but it might take a few tries before the bot decides a link is dead. - GreenC 02:19, 2 February 2024 (UTC)
It was a soft-404 - I set it dead at iabot.org and reran the bot. -- GreenC 03:50, 13 February 2024 (UTC)

bibliotecadigital.ciren.cl

This Chilean digital library seems to have reformatted its URLs and is used in numerous articles as a source. Here's a list - it seems like they still host most if not all articles but under different URLs. Jo-Jo Eumerus (talk) 13:52, 31 January 2024 (UTC)

User:Jo-Jo_Eumerus is there an example of old to new? Most likely if it's not obvious how to change there is nothing we can do other than treat the old links as dead and add archives. -- GreenC 02:15, 2 February 2024 (UTC)
It seems like they still share the titles: https://bibliotecadigital.ciren.cl/server/api/core/bitstreams/72bd0a55-5f0d-4ea6-98c4-116797dce09e/content becomes https://bibliotecadigital.ciren.cl/items/96666f36-9fc4-4833-8a95-0e85c6fd98ce Jo-Jo Eumerus (talk) 11:13, 3 February 2024 (UTC)
Jo-Jo Eumerus It looks like https://bibliotecadigital.ciren.cl/server/api/core/bitstreams/72bd0a55-5f0d-4ea6-98c4-116797dce09e/content is working. Maybe they had time to repair it. But most of them are still not working. Without a map of old to new, I suggest only check if they are dead and if so add an archive URL. For example https://bibliotecadigital.ciren.cl/handle/123456789/7049 becomes https://web.archive.org/web/20160629061606/https://bibliotecadigital.ciren.cl/handle/123456789/7049 .. I think the new page would be https://bibliotecadigital.ciren.cl/items/96666f36-9fc4-4833-8a95-0e85c6fd98ce but it looks different.-- GreenC 00:27, 12 February 2024 (UTC)
Aye, same content but a slightly different looking platform. Jo-Jo Eumerus (talk) 12:22, 12 February 2024 (UTC)

Jo-Jo Eumerus: The bot ran on 25 pages. It added 10 archive URLs, and 9 {{dead link}}. The pages with {{dead link}}. -- GreenC 04:01, 13 February 2024 (UTC)

themessenger.com

themessenger.com has shut down [1], we have around 186 uses per themessenger.com HTTPS links HTTP links. All of the news articles are now linking to a blank page (e.g. [2]) Hemiauchenia (talk) 19:46, 1 February 2024 (UTC)

Submitted to IABot. -- GreenC 02:17, 2 February 2024 (UTC)
User:Hemiauchenia IABot processed this domain, but I had to run it a second time through WaybackMedic. The problem is IABot is missing a lot for reasons I don't understand. Of the 184 articles that contain this domain, after IABot processed it, Medic edited an additional 101 pages adding archive URLs, and converted 43 instances of |url-status=live to dead. -- GreenC 15:29, 13 February 2024 (UTC)

linguistlist.org

This site is linked to by the linglist parameter in {{Infobox language}}. Snowmanonahoe (talk · contribs · typos) 23:19, 5 February 2024 (UTC)

User:Snowmanonahoe: I only see it on two pages: https://en.wikipedia.org/wiki/Special:LinkSearch?target=linguistlist.org%2Fmultitree --The site itself looks dead since 2008 or 2009. -- GreenC 00:49, 6 February 2024 (UTC)
GreenC: try Special:LinkSearch/multitree.org/codes/. Those urls all redirect to linguistlist.org/multitree now. Snowmanonahoe (talk · contribs · typos) 00:58, 6 February 2024 (UTC)
User:Snowmanonahoe: Ok. There are 75 pages. Compare results at Archive.today with WaybackMachine. I recommend a first pass using Archive.today, and any not available a second pass will use WaybackMachine. Sound alright? BTW the entire linguistlist.org site looks like it needs review 421 pages. They made a new website and the old inbound links are not working right. The new website links are working. -- GreenC 02:30, 6 February 2024 (UTC)
I think Kwamikagami should weigh in on this first. Snowmanonahoe (talk · contribs · typos) 03:08, 6 February 2024 (UTC)
I gave up on getting multitree links to work back when they were basically offline. I didn't know they were up again.
Multitree is generally not a RS. I would avoid using them except for extinct languages where Linglist maintains the description of the ISO code (like Ethnologue does for living languages); for classification trees of various authors (e.g. on our Austroasiatic article); and maybe a couple other things I'm not thinking of, but not as a general reference.
Is there something in particular you wanted me to weigh in on? I'd think we'd want to update the links when we use them, as I can't think of any reason we'd want to preserve or link to old versions of their pages. — kwami (talk) 03:30, 6 February 2024 (UTC)
I would avoid using them except [some] .. OK my job is to save the dead links by adding an archive URL. It's only about 75 links. You can remove some citations and keep others as you prefer, once the archives are added, so you will be able to see what the content of the page is. -- GreenC 14:54, 6 February 2024 (UTC)
That should work just fine. No need for you to evaluate the quality of the ref. — kwami (talk) 15:25, 6 February 2024 (UTC)
For the 75 pages with multilist.org/codes URLs it is a multi-pass run:
  1. Pass 1 (multitree1): Remove existing archive.org links
  2. Pass 2 (multitree2): Add archive.today where available
  3. Pass 3 (multitree3): Add archive.org where available
User:Kwamikagami: 75 pages with multilist.org/codes - they should have either an archive URL or a {{dead link}} otherwise the bot had trouble parsing the citation. -- GreenC 04:00, 12 February 2024 (UTC)
Thanks. I'd only reviewed instances called from the info box. Will go thru them over the next few days. Looks like about half should be removed, as they're things that can be cited to RS's. — kwami (talk) 08:11, 12 February 2024 (UTC)
  • linguistlist.org was also processed (about 450 pages) and many problems were found and repaired: Dead links, soft-404s, migrated links, Cloud Flare blocks. -- GreenC 20:32, 12 February 2024 (UTC)
    Thanks for all the work with that. — kwami (talk) 23:26, 12 February 2024 (UTC)
    Is there a better way to handle the 512 auto-generated refs at Category:Languages with Linglist code? Or would they all have to be done by hand? — kwami (talk) 23:50, 12 February 2024 (UTC)
It is being generated by Template:Infobox_language/linguistlist. Are most multitree.org/codes URLs dead, or only some? Or not sure? -- GreenC 00:05, 13 February 2024 (UTC)
It's also in Template:Infobox language/ref and Module:Infobox language. It looks like all of multitree.org is retired. What if change the template to use a generic archive URL, and hope for the best: Special:Diff/1140877092/1206755696, Special:Diff/996938315/1206753611 and Special:Diff/1114901671/1206760361 - this is a stop-gap solution because archive.org won't have archives for all of the URLs. Ideally multitree.org would be removed from Template:Infobox_language and sub-templates and individually archive URLs added to replace the ones auto-generated, at the same location where it was auto-generated. Somewhat difficult. -- GreenC 01:45, 13 February 2024 (UTC)
Yeah, they appear to be defunct. But they are the official ISO repository for descriptions of languages extinct before ca. 1950, equivalent to Ethnologue for recent languages. We really should have a link to the official site. — kwami (talk) 02:40, 13 February 2024 (UTC)
Maybe it's OK with generic archive URLs at the Infobox layer. If not enough, will need to remove the Infobox support, add the citations individually to each article, and run archive bots to add archive URLs. -- GreenC 03:49, 13 February 2024 (UTC)

hobbes.nmsu.edu

OS/2 repository going offline in April. Only a few pages on enwiki. [3] -- GreenC 15:32, 6 February 2024 (UTC)

 Done -- GreenC 16:34, 13 February 2024 (UTC)

iltalehti.fi

I've noticed that some of the 1,222 Iltalehti URLs are dead but bots don't fix them:

All those pages give the Finnish-language text "Hakemaasi sivua ei valitettavasti löytynyt." (= "Unfortunately, the page you were looking for could not be found."). I tried to Google those URLs' headlines, but I couldn't find new URLs for them, so I think Iltalehti has removed those articles from their website completely. Could a bot go through Iltalehti URLs and set an archive link for the Iltalehti webpages that have that exact text on them? Also, if there's a way to fix these, can it be set that InternetArchiveBot fixes them eventually on other language wikis as well? Like GreenC did a month ago in the discussion above #Ilta-Sanomat to the Ilta-Sanomat URLs. For example, there are 10,070 Iltalehti URLs on fi.wikipedia. Thank you again. 85.76.13.79 (talk) 15:35, 11 February 2024 (UTC)

I requested IABot to run on Maj-Len Grönholm and it fixed it Special:Diff/1194582882/1206244327. Probably IABot hasn't automatically processed the pages yet. I'll take a look at it though, because I know IABot has gaps in coverage what it processes. I'll run it through WaybackMedic which will get them all, plus look for soft-404s like that "Unfortanately" string when the pages otherwise return status 200. Whatever it finds it will update the IABot database, and that should eventually propagate to the rest of the wikis. -- GreenC 16:35, 11 February 2024 (UTC)
Thanks again. One thing I noticed though: If either blogit., m. or plus. preceded iltalehti.fi in the URL, the bot changed the URL to the main page URL https://www.iltalehti.fi. I found 12 edits in question with this search: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Can the bot fix these or do we have to fix these by hand? 85.76.13.79 (talk) 13:00, 14 February 2024 (UTC)
Oh sorry looks like I missed those, they are soft-404s. If you will manually restore them to the original URL, I can rerun the bot on those pages. It will add an archive URL instead of following the redirect to the homepage. -- GreenC 14:21, 14 February 2024 (UTC)
Alternatively you can just revert the entire edit by the bot if there is no intervening edit, and the bot will redo the entire page, if that's easier. -- GreenC 14:23, 14 February 2024 (UTC)
Done. 85.76.13.79 (talk) 20:26, 15 February 2024 (UTC)
Also done. Special:Diff/1207818208/1207836829 -- GreenC 21:20, 15 February 2024 (UTC)

Normally I catch these. Output of the "l4s4" script (ie. show redirects with 4 or more cases):

mintbox:[] ./l4s4 
7 -  https://www.iltalehti.fi/politiikka/a/201712072200588364 
4 -  https://www.iltalehti.fi/perhe/a/200612185426589 
4 -  https://www.iltalehti.fi/popstars/a/200701145593138 
4 -  https://www.iltalehti.fi/uutiset/a/2016061121711142 
4 -  https://www.iltalehti.fi/viihdeuutiset/a/201801072200651274 
12 -  https://www.iltalehti.fi 
4 -  https://www.iltalehti.fi/viihde/a/2009073010005660 
8 -  https://www.iltalehti.fi/politiikka/a/201801182200679167 

ie. there were 12 pages with redirects to https://www.iltalehti.fi .. But I forgot to run the script before committing changes to wiki. -- GreenC 21:20, 15 February 2024 (UTC)

pomus.net

Sometimes it redirects to a pornsite and sometimes to different fake "I am not a bot" websites. There are many links to it, all of which require url-status=usurped - Altenmann >talk 07:11, 15 February 2024 (UTC)

Altenmann: Added to the WP:JUDI (usurpation) queue Special:Diff/1202023308/1207703597, thank you. -- GreenC 13:46, 15 February 2024 (UTC)

newindianexpress.com

Many old links don't redirect to their new ones, like this doesn't take us here. Better to tag the old ones as dead. Kailash29792 (talk) 13:09, 12 February 2024 (UTC)

 Doing... -- GreenC 17:55, 16 February 2024 (UTC)
 Done - The domain exists in 15,261 pages. The bot made changes in 8,467 pages. The changes were adding new archive URLs 5,240. Added 238 {{dead link}} where no archive URL existed. Changing 1,220 |url-status=live to dead. And a bunch of other misc cleanup work. Changes are also uploaded in IABot so it will propagate to 300+ other wikis. User:Kailash29792 this was a much needed cleanup thank you for bringing to attention. -- GreenC 18:13, 17 February 2024 (UTC)

crossrail.co.uk

All URLs under the crossrail.co.uk domain are now redirecting to https://web.archive.org/web/20221229005042/https://www.crossrail.co.uk/# with subpages just going to the same archive of the main page breaking links. All links therefore need to be marked as dead and pointed to an archive earlier than 29 December 2022. Thryduulf (talk) 12:51, 14 February 2024 (UTC)

Interesting. Never seen that before (HTML redirect to archive.org for the entire site). I like it. The site appears to be mostly usable via the archive version. Simple solution for general purposes. Well, like you say, we can do better at enwiki. I'll add more specific archive URLs for each page. -- GreenC 18:05, 16 February 2024 (UTC)

 Done - The bot checked 124 pages that have the domain. It edited 101 pages. Added 161 archive URLs. Converted 51 |url-status=live to dead. Added two {{dead link}}. Updated IABot with information so it propagates to 300+ other wikis. Thryduulf thank you for the notification. -- GreenC 21:05, 18 February 2024 (UTC)

royin.go.th

Several years ago, the Royal Institute of Thailand changed its name to the Royal Society of Thailand and most (but not all) of the content from its old website, under the domain www.royin.go.th, is now preserved under the subdomain legacy.orst.go.th . Can this be handled by a bot? --Paul_012 (talk) 10:12, 22 February 2024 (UTC)

58 pages. When I try the first one http://www.royin.go.th/th/knowledge/detail.php?ID=639 it doesn't work at http://legacy.orst.go.th/th/knowledge/detail.php?ID=639 rather wants to redirect to https://www.orst.go.th/?ID=639 however I can't read Thai and don't know if that is a soft-404 or legitimate page. -- GreenC 15:17, 22 February 2024 (UTC)
It seems links like that one are too old and weren't preserved, and that they constitute more than a small minority. 58 isn't a lot; maybe I can check them manually and replace them with AWB. --Paul_012 (talk) 15:45, 22 February 2024 (UTC)
Thanks. It would be better if you can. -- GreenC 16:17, 22 February 2024 (UTC)

Vice Media

Just wanted to flag that per Vice reporters on social media, there are concerns that the Vice Media website is about to be shutdown (a la The Messenger (website)). Is there a way to make sure that all articles using it as a source have archive links? Thanks! Sariel Xilo (talk) 21:16, 22 February 2024 (UTC)

Concerns about the total shuttering have just been picked up by Hollywood Reporter with the top editor unable to confirm if the website will be pulled down. Sariel Xilo (talk) 21:23, 22 February 2024 (UTC)
Wow that's over 17,000 pages. No worries if they shut it down we'll add archives. Any link added to Wikipedia should be archived at Wayback automatically, and big sites like this are typically crawled entirely. Too bad if true they had a lot of good content. -- GreenC 22:01, 22 February 2024 (UTC)
Confirmed that they'll stop publishing on Vice.com [4], but as to whether they'll leave the website up as a historic archive like Gawker was historiclally left or it will be taken down is anyone's guess. Hemiauchenia (talk) 22:19, 22 February 2024 (UTC)