Jump to content

Wikipedia:Link rot/URL change requests

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jindam vani (talk | contribs) at 16:29, 20 January 2023 (→‎Lakshman Sruthi: 10.20012023). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.

Dinamalar Nellai

While the main site still works, the same cannot be said for its subsidiary which now seems to house a different website. All dinamalarnellai refs tagged as live should be changed to dead. Kailash29792 (talk) 06:28, 22 October 2022 (UTC)[reply]

This is one of the WP:JUDI sites. Needs to be usurped. Working on it now plus 26 other judi-usurped domains. -- GreenC 15:34, 22 October 2022 (UTC)[reply]
Done. -- GreenC 23:35, 22 October 2022 (UTC)[reply]

Warshipsww2.eu

This domain has been usurped or rendered unfit now giving out German gambling spam. It's not related to WP:JUDI

Lyndaship (talk) 11:45, 29 October 2022 (UTC)[reply]

Added to JUDI, it's easier to process as a batch, the process is functionally the same. Also added the title string "Roulette Blog" to check for. Thanks! -- GreenC 14:41, 29 October 2022 (UTC)[reply]

Unearthed Arcana article series on dnd.wizards.com

Looks like Wizards of the Coast pulled all Unearthed Arcana articles (https://dnd.wizards.com/articles/unearthed-arcana) articles from before 2020 (ex: Waterborne Adventures (2015), Psionics and the Mystic – Take Two (2016), Dragonmarks (2018), etc). I started to manually update this at List of Dungeons & Dragons rulebooks#Unearthed Arcana (since it seems all of them have been archived) but it's a lot of articles and I'm not sure what other Wikipedia articles use the UA articles as sources. Thanks! Sariel Xilo (talk) 20:29, 1 November 2022 (UTC)[reply]

How would you know if the article is pre-2020 vs post-2019? I guess pre-2020 if it redirects to https://dnd.wizards.com/news/archive?category=unearthed-arcana -- GreenC 02:24, 3 November 2022 (UTC)[reply]

It's only in 9 articles:

Good to know it's not as widespread as I feared (List of Dungeons & Dragons rulebooks#Unearthed Arcana has ~85 UA articles listed)! What I didn't realize is that Wizards didn't do redirects for all the post-2019 articles (ex: the Jan 2020 article original link goes to the UA archive redirect instead of to https://dnd.wizards.com/unearthed-arcana/subclasses-part-1). So I think everything with https://dnd.wizards.com/articles/unearthed-arcana is dead. Sariel Xilo (talk) 02:58, 3 November 2022 (UTC)[reply]
Alright it's done. -- GreenC 03:28, 3 November 2022 (UTC)[reply]

Twitter

Can/should archive links be added to sources from Twitter which don't already have them via a bot? Media coverage on the company is showing it is increasingly unstable and I'm a bit concerned about the potential future link rot. Thanks! Sariel Xilo (talk) 02:25, 11 November 2022 (UTC)[reply]

That would be a really big project I don't want to get into unless there is evidence of a big link rot problem. Proactively, Internet Archive should already be archiving Tweets into the Wayback Machine, they'll be available if/when the links dies. -- GreenC 02:59, 11 November 2022 (UTC)[reply]
Sounds good! Sariel Xilo (talk) 03:06, 11 November 2022 (UTC)[reply]

At some point the website revamped itself, but the old links don't redirect to the new ones. This is an old link, and this is the new one. If the bot can't replace the links (I don't expect it to), can it add archives to the dead ones or at least tag the existing links as dead? Kailash29792 (talk) 02:15, 20 November 2022 (UTC)[reply]

User:Kailash29792, it tested all URLs for 404, and added about 350 new archive URLs and 50 {{dead link}}. -- GreenC 21:29, 23 November 2022 (UTC)[reply]

Four BBC hosts will shortly be removed

Details in this Mastodon post. The four deprecated hosts are:

  • m.bbc.co.uk
  • ssl.bbc.co.uk
  • m.bbc.com
  • ssl.bbc.com

In all cases, the hosts should be changed to www.bbc.co.uk or www.bbc.com (see post). Is it possible to easily get the scale of the problem? I don't know the Wikipedia linkrot tools well enough to know if it's possible to search comprehensively for URLs matching a pattern. — ʞɔıu 14:59, 23 November 2022 (UTC)[reply]

@Nkocharh, [1], [2], [3], [4] shows the usage of the links on English wikipedia. – robertsky (talk) 15:05, 23 November 2022 (UTC)[reply]
Ah, brilliant, thanks! So here's what we have:
For a total of 542 links. What's the best way to bulk migrate these? — ʞɔıu 15:16, 23 November 2022 (UTC)[reply]
User:Nkocharh: they are done. -- GreenC 17:13, 23 November 2022 (UTC)[reply]
That's great, thanks! — ʞɔıu 17:25, 23 November 2022 (UTC)[reply]

lirrhistory.com

The domain lirrhistory.com has been usurped; a cursory whois query suggests that this happened on 2022-09-21, but I'm not completely sure.

lirrhistory.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

larryv (talk) 00:45, 12 December 2022 (UTC)[reply]

Loaded into WP:JUDI for a future batch run to be usurped: Special:Diff/1121495151/1126945378 -- GreenC 02:09, 12 December 2022 (UTC)[reply]

@Jindam vani: Domain reported dead. Links archived (in 58 pages), and IABot database updated. -- GreenC 14:35, 27 December 2022 (UTC)[reply]

@GreenC: thank you <_> jindam, vani (talk) 14:55, 27 December 2022 (UTC)[reply]

The website Catholic News Service is dead since 30 December 2022, due do a decision of the USCCB taken months priors. See this statement. A new website, OSV News, currently has Catholic News Service's former URLs, but it all the previous articles do not seem to be present on OSV News (example). All the links to previously published CNS articles currently link to 404 errorson the OSV News website.

Therefore, my bot request is as follow: all Catholic News Service links added prior to 30 December 2022 should be marked as dead and archive URLs be added.

A discussion on the WProject Christianity seems to support my request. Veverve (talk) 00:18, 3 January 2023 (UTC)[reply]

@Veverve: ok, it is done. - GreenC 15:47, 3 January 2023 (UTC)[reply]
@GreenC: Sometimes, the URL was left as "live" (e.g. here). All those URLs should be put to "dead".
Also, why did you choose Archive.today and not Archive.org? Or did you choose both with one being preferred? Veverve (talk) 18:50, 3 January 2023 (UTC)[reply]
The "left live" appears to be a bug related to the combo of empty archive-* fields and filled url-status field, thanks for identifying. It's been fixed and re-ran on pages eg. [5]. It uses archive.today when it can't find a Wayback URL. -- GreenC 01:34, 4 January 2023 (UTC)[reply]
@GreenC: for example, the very first link of Dicastery for Evangelization is archived on Archive.org, but the archive URL which was added was that of Archive.today. Veverve (talk) 03:26, 4 January 2023 (UTC)[reply]
This happened because the API result from Wayback is what I call "bogus" meaning it's unreliable. This is because the Wayback API timed out and reported 0 results (it reports 0 not "times out" so you don't know which is what) so it used other techniques and it found the archive but it wasn't sure how reliable it was so it defaulted back to an alternative provider where one existed. -- GreenC 03:42, 4 January 2023 (UTC)[reply]

Is it possible to make a bot that replaces old links with new links that the old links redirect to?

The old BYU Library findingaids (pages that explain the contents of a collection within a special collections) are through the URL https://findingaid.lib.byu.edu. Right now, they redirect to their parallel URL on our new archivesspace pages in most cases. I would like to change the URLs, because one of my colleagues informed me that the old URLs may not redirect indefinitely. I have done a few hundred manually but there are still some 600 left. Occasionally, there is an error in the redirect (for example, with item-level things from the folklore collection) and human intervention is useful. If you can limit the bot to change links in the external links section, I can manually change links that are used a references (to ensure that the same information is present). Rachel Helps (BYU) (talk) 20:46, 3 January 2023 (UTC)[reply]

User:Rachel Helps (BYU), hi I can do this but can't limit based on where the URL is on the page. Do you have an example of a redirect error I can learn from? -- GreenC 18:21, 11 January 2023 (UTC)[reply]
That would be amazing--I can cleanup links in refs. Here is an example that works: on Mary_Elizabeth_Rollins_Lightner#External_links, https://findingaid.lib.byu.edu/viewItem/Vault%20MSS%20363 redirects to http://archives.lib.byu.edu/repositories/14/resources/7316. Most of the items in the folklore collection will have errors though, because item-level cataloging for this collection was removed (they have FA in the URL). One example I've preserved in the reference authored by Kristi Young on Alice Louise Reynolds. https://findingaid.lib.byu.edu/viewItem/FA%205/4.18.9.1.1/ redirects to the error page http://archives.lib.byu.edu/repositories/14/archival_objects/76703 (for this particular source, the original was preserved on archive.org https://web.archive.org/web/20150107211032/https://findingaid.lib.byu.edu/viewItem/FA%205/4.18.9.1.1/). I've already fixed most of the FAs; the remaining ones, #s 4-10 on the special link search, are still there because they're sources, not just external links. Another error example I found was https://findingaid.lib.byu.edu/viewItem/UA%201020/Series%204/box%202/folder%204/ on LDS Hospital (probably another item-level item that was eliminated with the transition to archivesspace). Looking for more error examples on the last 500 items in the special link search, I think that most "Series"-level links will have errors. Rachel Helps (BYU) (talk) 18:47, 11 January 2023 (UTC)[reply]
User:Rachel Helps (BYU) this should be no problem, the error page returns a 404 code which the bot detects and will attempt to find an archive URL to replace with. I'll run 50 articles or so and you can check to make sure it's on the right track. I need to process about.com (below) first, this is an urgent job involving a usurped domain, then will return here in a few days, thanks for your patience. -- GreenC 03:22, 12 January 2023 (UTC)[reply]

User:Rachel Helps (BYU), ok got it faster than expected. The bot made 50 edits can you look and report any problems? It starts at Tracy Hall at the top through to Charles L. Walker. If it looks OK I'll do the rest.-- GreenC 19:09, 12 January 2023 (UTC)[reply]

I went ahead and finished it. If you see any problems, let me know. -- GreenC 15:20, 15 January 2023 (UTC)[reply]
Thank you so much!! You have lifted a weight from my worklist in a very efficient way. Rachel Helps (BYU) (talk) 19:33, 16 January 2023 (UTC)[reply]
So glad to hear that, anytime you need help I am here. -- GreenC 22:37, 16 January 2023 (UTC)[reply]

Greenhivesaudio

It seems the site is dead, but it says 403 instead of 404. Like this. So I guess they are better replaced or tagged as dead links. Kailash29792 (talk) 11:35, 11 January 2023 (UTC)[reply]

User:Kailash29792, from what I can tell the domain only exists on 7 pages, can you do it manually would be easier and probably more accurate.
GreenC 18:25, 11 January 2023 (UTC)[reply]
All done. This could be archived. Kailash29792 (talk) 12:52, 13 January 2023 (UTC)[reply]

http 301 domains

  • faunaeur.org -> fauna-eu.org
  • dfw.cbslocal.com -> cbsnews.com/dfw/
  • house.state.tx.us -> house.texas.gov

thank u. <_> jindam, vani (talk) 17:07, 11 January 2023 (UTC)[reply]

jindam, vani: they are done. -- GreenC 15:47, 14 January 2023 (UTC)[reply]

about.com usurped and wiki blacklisted

Reported. User:Billinghurst, I can usurped domains with WaybackMedic according to process at WP:USURPURL but only on Enwiki. If the domain is blacklisted at the wiki level I probably can't because the blacklist will prevent the bot from editing the page. In which case the blacklist will need to be lifted for the bot to run. -- GreenC 22:07, 11 January 2023 (UTC)[reply]

@GreenC: suspended the global blacklist to allow for cleanup m:special:permalink/24351246, it will take ~15+ minutes to flow through. Please ping me to let me know when to reimpose the blacklist on this domain. Thanks for the work in this area. — billinghurst sDrewth 22:16, 11 January 2023 (UTC)[reply]
User:Billinghurst, thanks. It will probably take a couple days to work through. -- GreenC 02:37, 12 January 2023 (UTC)[reply]
User:Billinghurst, about.com is composed of many sub-domains. User:Harej reports there are about 1,000 subdomains. There are only 22 pages on Enwiki that are pure www.about.com (or about.com) URLs, which are the one's hijacked. The remaining 10k pages or so have sub-domains, and they are not hijacked. Examples: [6],[7], [8],[9],[10],[11],[12],[13] - they redirect to new domains. One is a 404, one or two are soft-404s, the rest are good content. Any thoughts what to do? -- GreenC 17:06, 12 January 2023 (UTC)[reply]
We just marked the entire domain as permadead. It's odd that the base domain is hijacked, while the subdomains are not. Something funny is happening here. I'm starting to believe that about.com is actually compromised, not usurped. This requires more investigation. —CYBERPOWER (Around) 17:17, 12 January 2023 (UTC)[reply]
Dotdash Meredith indicates that 'about.com' has been dead-site-walking since 2017, instead should be dotdashmeredith.com. But I have no idea what content moved over or about sub-properties. DMacks (talk) 17:24, 12 January 2023 (UTC)[reply]
Actually I don't think there is any problem.. it appears about.com was purchased by Dotdash Meredith. See the bottom of [14] which redirects to thoughco and it says "ThoughtCo is part of the Dotdash Meredith publishing family." It all looks legit. -- GreenC 17:32, 12 January 2023 (UTC)[reply]
@GreenC:We don't do redirect domains, and all these urls have been usurped. The content at the old urls is not the content that it was originally, and we don't know what will ever be at the seat of these redirects, so our replacing them back to the archived is appropriate. People can link directly to the (new) domain(s) of interest if they consider the target relevant, those base domains are unimpacted by this blacklisting. We are just removing the capacity of redirects, and firming up the urls originally utilised (as shoddy or as good as they were at the time). The recent examples of additions are showing clear sign of spammy additions and needing firmer control rather than hiding under a base redirecting url. — billinghurst sDrewth 21:50, 12 January 2023 (UTC)[reply]
OK I understand what your saying about using the original URL (ie. archive of it) for verification purposes. i don't see malicious behavior by Dotdash Meredith though? Such as spam. If I had some examples of that, it will help to document why this is being done. It will touch 10k articles and in most cases the content looks legit so I think people might say something about archiving and usurping a legit-looking URL. In some cases it will even mark the URL dead, when no archive is available, even though a working redirted URL is there. -- GreenC 22:53, 12 January 2023 (UTC)[reply]
All good if you don't want to change the links to archival links. There is zero need to add for new additions with an about.com redirecting url, our real users can add the actual url. So it will just be the spambots that we are seeing. I will reactivate this on the m:spamblacklist. — billinghurst sDrewth 10:27, 13 January 2023 (UTC)[reply]
Interesting test case, can the whitelist override the blacklist? (Yes, I know it can.) What if we whitelisted archive.org. Then adding archive URLs shouldn't be in issue in any case. We should try that. —CYBERPOWER (Chat) 20:09, 13 January 2023 (UTC)[reply]
the problem is that then archive.org could be used by anybody to circumvent the whole blacklist. -- seth (talk) 23:03, 16 January 2023 (UTC)[reply]
but: partial whitelisting of archive.org works, e.g. via
web\.archive\.org/web/[0-9]+/https?://(?:[a-z0-9]+\.|)about\.com
-- seth (talk) 23:26, 16 January 2023 (UTC)[reply]

dspace.usc.es -> minerva.usc.es, hankwilliamsdiscography.com -> jazzdiscography.com, asia.eurosport.com -> eurosport.com, cstv.com -> cbssports.com

(1) dspace.usc.es -> minerva.usc.es, (insource) 16 links (2) hankwilliamsdiscography.com -> jazzdiscography.com (insource) 67 links, (3) asia.eurosport.com -> eurosport.com, Changes sub-domain and redirect to / for links. (insource) 231 links (4) cstv.com -> cbssports.com Changes sub-domain and redirect to / for links. (insource) 618 links <_>jindam, vani (talk) 16:26, 12 January 2023 (UTC)[reply]

These are done. -- GreenC 15:22, 15 January 2023 (UTC)[reply]

industry.bnet.com -> cbsnews.com/moneywatch/, people.africadatabase.org -> hotels-dubai.org/africadatabaseorg/, examiner.ie -> irishexaminer.com, apnewsarchive.com -> apnews.com, xwebapp.ustrotting.com -> ustrottingnews.com, reader.digitale-sammlungen.de -> digitale-sammlungen.de, somali.asso.fr -> biotaxis.fr

(1) industry.bnet.com -> cbsnews.com/moneywatch/; (insource) 25 links, (2) people.africadatabase.org -> hotels-dubai.org/africadatabaseorg/ (insource) 57 links, (3) examiner.ie -> irishexaminer.com; (insource) 314 links, (4) Changes scheme from http to https and changes domain and changes path: apnewsarchive.com -> apnews.com; (insource) 1477 links, (5) Changes domain and redirect to /: xwebapp.ustrotting.com -> ustrottingnews.com; (insource) 52 links, (6) Changes sub-domain and redirects to crufty url: reader.digitale-sammlungen.de -> digitale-sammlungen.de; (insource) 670 links, (7) Changes domain and redirect to /: somali.asso.fr -> biotaxis.fr; (insource) 236 links. thank u. <_>jindam, vani (talk) 08:00, 13 January 2023 (UTC)[reply]

All done. -- GreenC 01:56, 18 January 2023 (UTC)[reply]

talk.ictvonline.org -> ictv.global, culturalheritage.gov.mt -> culture.gov.mt, legisweb.state.wy.us -> wyoleg.gov

(1) Changes domain: talk.ictvonline.org -> ictv.global; (insource) 986 links, (4) changes domain and changes path: culturalheritage.gov.mt -> culture.gov.mt; (insource) 347 links, (3) Changes scheme from http to https and changes domain: legisweb.state.wy.us -> wyoleg.gov; (insource) 214 links. thank u. <_>jindam, vani (talk) 18:16, 14 January 2023 (UTC)[reply]

rsssf.com -> rsssf.org, suederbrarup.de -> amt-suederbrarup.de, straininfo.net -> straininfobreak.ugent.be, scoop.diamondgalleries.com -> scoop.previewsworld.com, encyklopedia-solidarnosci.pl -> encysol.pl, eurocupbasketball.com -> euroleaguebasketball.net, businessweek.com -> bloomberg.com

(1) Changes scheme from http to https and changes tld: rsssf.com -> rsssf.org; (insource) 37355 links, (2) Changes scheme from http to https and changes domain and removes trailing /: suederbrarup.de -> amt-suederbrarup.de; (insource) 16 links, (3) Changes domain and redirect to / & it shows "net::ERR_NAME_NOT_RESOLVED" on straininfobreak.ugent.be: straininfo.net -> straininfobreak.ugent.be; (insource) 2138 links, (4) Changes scheme from http to https and changes domain and redirect does not contain ".,?&": scoop.diamondgalleries.com -> scoop.previewsworld.com, (insource) 82 links, (5) Changes domain and redirect to /: encyklopedia-solidarnosci.pl -> encysol.pl, (insource) 51 links, (6) Changes domain and redirect preserves id number: eurocupbasketball.com -> euroleaguebasketball.net, (insource) 1088 links, (7) Changes scheme from http to https and changes domain and redirects to crufty url: businessweek.com -> bloomberg.com, (insource) 10682 links. thank u. <_>jindam, vani (talk) 07:05, 16 January 2023 (UTC)[reply]

english.cri.cn -> chinaplus.cri.cn, en.rian.ru -> sputniknews.com, dcnr.state.pa.us -> dcnr.pa.gov, telegraphindia.com -> telegraphindia.com, sscnet.ucla.edu -> ioa.ucla.edu

(1) Changes scheme from http to https and changes domain and redirect to /: english.cri.cn -> chinaplus.cri.cn; (insource) 925 links,(2) Changes scheme from http to https and changes domain and redirect preserves id number: en.rian.ru -> sputniknews.com; (insource) 1302 links, (3) Changes domain and redirect preserves id number, it seems redirection is stopped for domain: dcnr.state.pa.us -> dcnr.pa.gov; (insource) 758 links, (4) Redirect to /, as of now urls ending with .jsp, .jsp#sometext&number, .asp: telegraphindia.com -> telegraphindia.com, i will look into after sometime, (insource) 8510 links, (5) Changes sub-domain and redirect to /: sscnet.ucla.edu -> ioa.ucla.edu, (insource) 337 links, thank u. <_>jindam, vani (talk) 05:37, 17 January 2023 (UTC)[reply]

It seems the site is down since they failed to renew their license. All links must be tagged as url-status=dead and others have archives added. Kailash29792 (talk) 08:36, 20 January 2023 (UTC)[reply]

http 301 or 302s

(1) Changes scheme from http to https and changes domain: plantsoftheworldonline.org -> powo.science.kew.org; (insource) 8515 links, (2) Changes domain: zonu.com -> gifex.com; (insource) 1727 links, (3) Changes domain: tnp.sg -> tnp.straitstimes.com; (insource) 787 links, (4) Changes scheme from http to https and changes domain and redirect to /: twistmagazine.com -> j-14.com; (insource) 82 links, (5) Changes scheme from http to https and changes domain and changes port and append string: sahistoryhub.com.au -> sahistoryhub.history.sa.gov.au; (insource) 66 links, (6) Changes domain and truncates url: kitchenertoday.com -> kitchener.citynews.ca; (insource) 55 links, (7) Changes scheme from http to https and changes domain and changes path: deutschesheer.de -> bundeswehr.de; (insource) 82 links, (8) Changes scheme from http to https and changes domain and append string: agnescarlsson.se -> facebook.com/AgnesOfficial; (insource) 16 links, (9) Changes domain and redirect to /: telegraphindia.com -> telegraphindia.com, i will look into after sometime, (insource) 87 links, (10) Redirect to /: sovsport.ru/gazeta -> sovsport.ru, (insource) 172 links, thank u. <_>jindam, vani (talk) 16:29, 20 January 2023 (UTC)[reply]