Talk:Web archiving

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Libraries (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Libraries, a collaborative effort to improve the coverage of Libraries on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Internet (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject United States (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject United States, a collaborative effort to improve the coverage of topics relating to the United States of America on Wikipedia. If you would like to participate, please visit the project page, where you can join the ongoing discussions.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Library of Congress (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Library of Congress, a collaborative effort to improve the coverage of the Library of Congress on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 

Query: Archiving webpages produced by database queries.[edit]

It seems to be difficult presently to arrange for an archival copy of a webpage that is produced as a result of a database query. The issue comes up, for example, at the Internet Movie Database, when one makes a query for all films involving 2 collaborators; the page that is produced is not readily archived by on-demand services such as WebCite. This issue leads to 2 questions: does anyone know if there's a solution to the problem, or has anyone written about the problem so it can be noted in the present article? Easchiff (talk) 20:46, 18 January 2009 (UTC)

A page has to have a URL link of its own in order to be archived. If it doesn't, you obviously can't post or cite the link per se, much less archive it. If it's a short page with not too much information, sometimes a solution is to copy and paste the information somewhere, perhaps in a subpage on an article's or user's Talk page, if you just want to preserve the information for somewhat temporary future reference. But the thing is, if a webpage doesn't have its own URL, then it likely isn't anything that would be used on Wikipedia as a citiation or External Link anyway. Softlavender (talk) 09:48, 18 July 2009 (UTC)

Archive blocking[edit]

Blocking the archival of TOS and privacy policies seems notable to me. Any thoughts on whether the reasons in the edit summary of this edit make it meritorious? It's mine and was just undone. --Elvey (talk) 21:24, 28 June 2010 (UTC)


No answer; will attempt a compromise edit. (Is this another (less interesting) example of archive blocking: http://forums.wireless.att.com/user/viewprofilepage/user-id/2343207 vs http://www.webcitation.org/query?url=http%3A%2F%2Fforums.wireless.att.com%2Fuser%2Fviewprofilepage%2Fuser-id%2F2343207&date=2010-07-03 ? Forbidden(403) is not the same as Page Not Found(404) but I suppose this could be a WebCite bug. http://forums.wireless.att.com/t5/user/viewprofilepage/user-id/2343207 works; I suspect this is irrelevant, but only WebCite staff has the access necessary to really answer this one.) --Elvey (talk) 18:20, 3 July 2010 (UTC)


It seems more folks are preventing/blocking the archival of TOS and privacy policies. I just tried to archive the Merrill Lynch Brokerage Website Terms and Conditions as of June 18, 2010 (the date they were last changed), and not only was I unsuccessful, it triggered the locking of my account! Spent over 40 minutes on the phone getting the account unlocked, and they also helped me navigate to a PDF of the terms and conditions, but they block webcite.org from archiving it; here's the archive attempt, which also shows the full URL: http://www.webcitation.org/5stWIeKa4 . They'll look into it and get back to me; it'll be interesting to see if anything changes. It's not archived by google. (http://www.google.com/search?q=site:ml.com+%22brokerage+website+terms+and+conditions%22) I'm trying to add the URL to google's index. I just successfully added it to the list of URLs google intends to crawl...someday. I will be very surprised if google archives it, as that requires ML to treat Google differently from WebCite, and for google to get around to doing the crawling, and to choose to index and archive the PDF. (http://www.google.com/addurl/) Merrill is happy to serve, and WebCite is happy to archive, other PDFs, e.g.: http://www.webcitation.org/www.ml.com/media/86941.pdf The website's search feature only finds the T&C if the search is done by a logged in user; the result is hidden when the search is done otherwise. --Elvey (talk) 22:48, 20 September 2010 (UTC)

Web archiving#On Demand[edit]

Aside from marketing jargon, the commercial services are functionally identical. Some are on-demand, some offer scheduled backup services. IMHO they should all be listed with identical common terminology. --Lexein (talk) 19:33, 7 July 2010 (UTC)

BackupURL.com Blacklisted?[edit]

Why is BackupURL.com blacklisted? Is it merely because it gets cited often in references?

I looked at the blacklist page, but it's pretty confusing:

http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spam/LinkReports/backupurl.com

As I understand it, all backupurl.com is, is a web archive. Sempi (talk) 04:58, 1 November 2011 (UTC)

WARC Tools[edit]

Following the link in the article, it appears that WARC Tools has been bought out by Symantec? I don't see any source code or downloads listed anymore, except for .pdf Sempi (talk) 05:45, 1 November 2011 (UTC)

Searh Tool Forbidden Site[edit]

In this section, Search Tool of Google Code is listed, but, it can not be accessed. --Tito Dutta (Send me a message) 22:57, 29 January 2012 (UTC)

Big list of enterprise and subscription services[edit]

Do we need Web_archiving#Enterprise_and_subscription_services section ?

It is big list of the enterpise services which are very expensive (for example PageFreezer subscription costs $50.000/year) and have no version open for public use.

Perma.CC and Wikipedia[edit]

One question - perma.cc requires that a link be used in a published journal and verified before being stored permanently. Will it "verify" links being cited in Wikipedia articles if submitted? We don't want to lose the content of one of the best open source "journals" in the world.Mdawn (talk) 16:55, 29 September 2013 (UTC)

Transactional Archiving of Remote Sites[edit]

Why is this stated to be impossible? All you need is an remote controlled browser that cannot bypass the intercepting proxy. See www.icanprove.com.

91.12.26.8 (talk) 11:18, 26 January 2014 (UTC)