Wikipedia talk:Link rot

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Essays
WikiProject iconThis page is within the scope of WikiProject Essays, a collaborative effort to organise and monitor the impact of Wikipedia essays. If you would like to participate, please visit the project page, where you can join the discussion. For a listing of essays see the essay directory.
 Mid  This page has been rated as Mid-impact on the project's impact scale.
 
Wikipedia Help Project (Rated B-class, Mid-importance)
WikiProject iconThis page is within the scope of the Wikipedia Help Project, a collaborative effort to improve Wikipedia's help documentation for readers and contributors. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks. To browse help related resources see the Help Menu or Help Directory. Or ask for help on your talk page and a volunteer will visit you there.
B-Class article B  This page does not require a rating on the project's quality scale.
 Mid  This page has been rated as Mid-importance on the project's importance scale.
 

Shouldn't there be a section on "Signal a dead / rotten link" and put the code in the Advanced Edit Menu?[edit]

Imagine a user finds one. Isn't it already great if the user signals it, like leaves a code next to the dead link? I saw some code left by someone, but now I can't find the code anymore. Maybe [dead link]? And wouldn't it make sense to put the code in the Advanced Edit Menu? Thy --SvenAERTS (talk) 13:21, 31 January 2016 (UTC)

Archiving youtube videos[edit]

How exactly can we go about archiving youtube videos to prevent link rot? I tried putting some urls to youtube videos into the Internet Archive (which I saw being done elsewhere on Wikipedia), but it seems that when you access the archive link, the video refuses to play.

Is there any other way to archive youtube videos? 8bitW (talk) 23:47, 8 February 2016 (UTC)

Have you tried archive.is? They tend to be able to archive difficult pages that Wayback has trouble with. For others to try in this list Wikipedia:List of archives on Wikipedia. -- GreenC 17:01, 16 March 2017 (UTC)

Edit request[edit]

Near the bottom of the "Web archive services" section, please change the word "javascript" to "JavaScript" (and "flash" to "Flash"). Thanks!211.100.57.47 (talk) 14:12, 19 March 2016 (UTC)

 Done - Arjayay (talk) 15:31, 19 March 2016 (UTC)

Should archiving sources be mandatory?[edit]

After all, it is sometimes impossible to undo link rot and it is much easier to just archive the damn source to begin with. If a link rots then that nullifies adding it in the first place. Rovingrobert (talk) 07:47, 5 May 2016 (UTC)

Wayback automatically archives every external link added to Wikipedia. Adding the archive link to the page is done by IaBot as of 2016. -- GreenC 16:36, 13 December 2016 (UTC)

archive.is?[edit]

It seems ye old Archive.is has been added to the link blacklist. Why? User:jjdavis699 16:40, 18 May 2016 (UTC) — Preceding unsigned comment added by Jjdavis699 (talkcontribs)

No longer blacklisted. WP:Using archive.is -- GreenC 16:38, 13 December 2016 (UTC)

I have thus updated info here. Only today I wasted time using Webcite, thinking that archive.is is still banned here. Zezen (talk) 11:21, 24 August 2017 (UTC)

Semi-protected edit request on 12 March 2017[edit]

Superiorglasssolution (talk) 14:22, 12 March 2017 (UTC)

Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format. —  HELLKNOWZ  ▎TALK 14:24, 12 March 2017 (UTC)

New problem with Archive.org[edit]

It appears that archive.org is now implementing a beta version that will be very significant for Wikipedia. I'm not sure where to discuss this, so please let me know if I should bring it up somewhere besides this talk page. Or maybe someone has already brought it up.

Apparently, we can only search now at archive.org for main pages, but not sub-pages. At the same time, archive.org seems to be offering permanent links for any sub-page that we want.

It therefore might be wise for a bot to replace every external link at Wikipedia with an archive.org link, BEFORE the linked website goes dead. After it goes dead, there seems no way to cure the link rot.Anythingyouwant (talk) 21:12, 9 April 2017 (UTC)

we can only search now at archive.org for main pages .. do you mean on this page https://web-beta.archive.org/ the "Search" box in the upper right corner? That is a new feature. They are only doing base URLs for now. Full URLs can still be found a number of ways such as the API or a URL-based API eg. https://web.archive.org/web/*/http://www.yahoo.com/news or https://web.archive.org/http://www.yahoo.com/news -- GreenC 22:08, 9 April 2017 (UTC)
Here's the dead link I was trying to replace with an archive.org link: http://www.rhapsody.com/#artist/elvis-costello/album/elvis-costello-the-rhapsody-interview/track/on-linda-ronstadts-rendition-of-alison It's now impossible to do that, right?Anythingyouwant (talk) 00:09, 10 April 2017 (UTC)
In this case Wayback removes anything beyond the # because the # is just a page section. It's the same content with or without the # portion. -- GreenC 01:17, 10 April 2017 (UTC)
But the content they give me doesn't mention Elvis Costello or Linda Ronstadt. I finally found the content here: http://us.napster.com/artist/elvis-costello/album/elvis-costello-the-rhapsody-interview/track/on-linda-ronstadts-rendition-of-alison However, I think you're correct that the "#" caused the problem, thanks. Anythingyouwant (talk) 01:26, 10 April 2017 (UTC)

Link Rot[edit]

I am affiliated with Symantec an IT security company. I was hoping to address the "broken link" tag on the page: List of mergers and acquisitions by Symantec. The once FA article has about 70+ broken links to Thomson Reuters reports on alacrastore.com. I have searched the website and the internet and found no other way to access those reports. Additionally, the citation templates are already marked with "dead-url=yes," making the primary link in the citation the working archived version. Based on the instructions on this page, does this mean everything is in order and there is nothing additional to do to address the tag? I already corrected all of the other broken links on the page. CorporateM (Talk) 14:08, 14 April 2017 (UTC)

The bot WaybackMedic added 44 archive links on 23 March and that cleared up the problem mostly, don't see why the broken link tag is needed now. -- GreenC 15:11, 14 April 2017 (UTC)

What is the right thing to do regarding 'broken links'?[edit]

I read wikipedia often, and occasionally come across broken or 'dead' links But what is the correct thing to do? Ignore it? Mention it on the talk page? "Add" broken link into the article itself? (I am not a proper wikipedia editor) I read through the FAQ, and searched for 'broken links' - no results matching the query. Thanks — Preceding unsigned comment added by 79.76.99.144 (talkcontribs)

  • Generally: repair, tag, remove -- in that order. Repair if you can -- may be site syntax changed or there is an archived copy somewhere, like Internet Archive. If you cannot repair, tag it with {{dead link}} and someone else might eventually get to it; we have a few bots too. If you've looked everywhere and it cannot ever be restored -- then remove it (and hopefully replace it, and many editors would not remove even a dead link without providing alternative sources). Mostly, you don't really have to do anything -- we have millions of dead links and, unless you actually fix them, individually they are not worth the effort reporting beyond placing a dead link tag. The bot will also get to it eventually and repair or tag it. —  HELLKNOWZ  ▎TALK 12:00, 3 September 2017 (UTC)

Using a tool to archive live links[edit]

When archiving references in an article, should ALL the references (live and dead) be archived, or only the dead ones? I raised this question at Wikipedia:Bots/Noticeboard#Archiving live links - Redux, and referenced an earlier discussion at Wikipedia:Bots/Noticeboard/Archive 11#Archiving links not dead - good idea?. I was advised that this Linkrot talk page might be an appropriate place to discuss it. Apparently the default setting of a tool like IABot v1.5.2 is to archive only the dead links, but some people are choosing the option to archive everything. This practice came to my attention with this edit to the article Barack Obama: someone using the IABot v1.5.2 archived 392 references, adding 74,894 bytes to the article, and increasing its already huge size by 22.6%, from 330,241 to 405,135 bytes. (The user reverted at my request.) Do people think this kind of outcome is a good thing? Should some kind of consensus be developed, as to when and whether to use the "rescue all" option? --MelanieN (talk) 15:07, 4 October 2017 (UTC) On second thought I am going to post this question at Village Pump so as to get a wider readership and more input. --MelanieN (talk) 15:28, 4 October 2017 (UTC)

The discussion is at Wikipedia:Village pump (miscellaneous)/Archive 56#Using a tool to archive live links. – Uanfala (talk) 13:39, 15 May 2018 (UTC)

Dead links taken over by domain grabbers & Co.[edit]

To whom it may concern: Most of the automatic dead link detection helpers recognize links such as http://www.bigshoegames.com/about-us.html as working properly, even though the original target page has been replaced with advertising by a domain grabber. Therefore, it would be really nice if those tools could detect such dead links not only by their HTTP status codes, but also by looking at the page content. A list of match patterns indicative of domain grabbers could be compiled and maintained for example on-wiki and, after manual review, synchronized to the various tools. It would probably be difficult to reliably automatically determine the last "good" snapshot in the Wayback Machine, but marking up this kind of links as needing maintenance would be a huge step forward. --Tim Landscheidt (talk) 08:35, 5 November 2017 (UTC)

It is a problem. In my experience writing a filter for domain squatters (I've tried it) they are endless in variety and name. I've discovered affected domains and could forward a list to user:cyberpower678 for IABot to mark dead, though I need to write some code first to pull the domains from the logs. This is part of the bigger issue of soft 404 links which is quite challenging. -- GreenC 15:12, 5 November 2017 (UTC)
In the end, all attempts are futile :-). I have seen several companies and organizations who redirect all dead links to their homepage, probably because someone told them that a 404 might upset the reader, or they just do not have the technical skills to set up proper error pages; that's when I turn to "Cool URIs don't change" for some voice of reason. But at Wikipedia scale, one pattern can match a lot of pages, so this might be worthwhile. --Tim Landscheidt (talk) 19:50, 5 November 2017 (UTC)

───────────────────────── I ran a program against a large dataset and it found about 76 domains that are web squatters (or former squatters now completely dead), and checking the IABot database most of them are already marked dead. I'm fixing them via the IABot interface, but the queues are backed up at the moment (only 5 at once per user).[1] Here's the list:

Extended content
  • ahuero.com
  • anotherchance.es
  • bigdekalb.com
  • cooke.ws
  • curnonska.com
  • gabonnationalparks.org
  • kids.activedmonton.ca
  • www.activedmonton.ca
  • losespectaculos.tv
  • newsfix.ca
  • newwritinginternational.com
  • newyorknewstoday.com
  • oldwebsite.paralympic.org
  • paralympic.netempire.de
  • payrent.co.uk
  • pfcberkut.ru
  • usautotrails.com
  • verusx.net
  • www.airlineupdate.com
  • www.animacor.com
  • www.apacheness.com
  • www.artsandantique.net
  • www.basketpedya.com
  • www.bndr-mali.org
  • www.buddyhollyonline.com
  • www.chriswhitleydiscography.com
  • www.clannad.org.uk
  • www.claytoday.biz
  • www.comeonboro.com
  • www.detlefmauss.de
  • www.encuentroartesescenicas.com
  • www.fil-amboxers.com
  • www.flfa2010.com
  • www.floridaparks.com
  • www.foundryclimbing.com
  • www.giulianacesariniproart.com
  • www.health7800.com
  • www.hot-iron.co.uk
  • www.hwy56.com
  • www.indyinsiders.com
  • www.iraklis-fc.gr
  • www.kinema2cinema.com
  • www.lbpapalvisit.org
  • www.libertinesecurity.com
  • www.lincolnunitedfc.co.uk
  • www.luckyshow.org
  • www.marioyepes.co
  • www.mercatorgold.com
  • www.mojvikend.info
  • www.multiracialheritageweek.com
  • www.neftchifc.com
  • www.netspinners.co.uk
  • www.nfsbih.net
  • www.nick-kelly.com
  • www.oktoberfest.ca
  • www.olivercromwell.org
  • www.pgxnews.org
  • www.philadelphiabrassdrumcorps.org
  • www.pinrepair.com
  • www.rachelbillington.com
  • www.rlwc08.com
  • www.saintpatrickskilsyth.org.uk
  • www.saints-alive.co.uk
  • www.shipstontennis.org.uk
  • www.simpsonwatch.com
  • www.stroudsfitness.net
  • www.theforgottenimp.co.uk
  • www.thirdfridaywine.com
  • www.timesnews.co.ke
  • www.tonicbooks.com
  • www.usautotrails.com
  • www.versussleep.net
  • www.walbergwatch.com
  • www.webhosting.info
  • www.welsh-canoeing.org.uk
  • www.xblb.com

It's not a complete list but probably represents a fair portion of the total. -- GreenC 20:30, 5 November 2017 (UTC)

Announce: RfC: Nonbinding advisory RfC concerning financial support for The Internet Archive[edit]

Wikipedia:Village pump (miscellaneous)/Archive 57#RfC: Nonbinding advisory RfC concerning financial support for The Internet Archive --Guy Macon (talk) 12:13, 22 December 2017 (UTC)

Overhaul[edit]

This page has gotten somewhat long and verbose and I'm afraid most people don't read it. This has happened over the years due to changing conditions and the nature of Wikipedia where everyone edits. I'd like to overhaul the page and trim it down so the important stuff is clearly presented. Right now it's a mix of information points and a tutorial for newbies. It does neither very well. A tutorial can be made in a separate document while keeping this one a source of important information for editors about the various ways archiving is currently being done by automated systems and manually. -- GreenC 16:55, 12 January 2018 (UTC)

Sounds reasonable, but the devil is in the details. Might I suggest writing up a draft of the overhaul on a subpage of your user page and asking for comments/corrections before going live with it? --Guy Macon (talk) 19:11, 12 January 2018 (UTC)

Listing of site exclusions from archive.org[edit]

It would be useful, but likely unmaintainable, to have some listing of sites which are not possible to archive via archive.org due to, for instance, robots.txt or generic exclusion. For instance, it appears that eWeek is 'excluded' from the Wayback Machine, but can be archived in archive.is. Thoughts? --User:Ceyockey (talk to me) 12:50, 14 April 2018 (UTC)

It is constantly changing on a per website basis. My bot WP:WAYBACKMEDIC is able to auto detect when a link is excluded and look for alternatives like archive.is but it doesn't operate on a per-domain basis. I could run the bot on all pages that contain an eWeek URL.. -- GreenC 14:28, 14 April 2018 (UTC)