Wikipedia:Requests for comment/Archive.is RFC

Please consider joining the feedback request service.

An editor has requested comments from other editors for this discussion. This page has been added to the following lists:

When discussion has ended, remove this tag and it will be removed from the lists. If this page is on additional lists, they will be noted below.

Recent events related to archive.is have left Wikipedia's links to that service in a state that requires a community decision.

Background

"Archive.is" is a website that functions similarly to the more established Wayback Machine: Both provide an archiving service whereby snapshots of web pages across the internet are saved in a vast repository. In case archived pages become unavailable at their original locations, or their content is removed or changed, these archive services provide a static backup of each page, each of which can be linked to with presumably more assurance that their content will remain online and intact. Compared to Wayback Machine, which is much older and established, Archive.is is a newer competing service. Wikipedia articles have commonly used links to Wayback Machine's version of web pages for use in their references in order to combat link rot.

A bot called RotlinkBot, created by User:Rotlink, has recently begun linking Wikipedia articles to the new Archive.is service. This bot was not approved, and was therefore subsequently blocked.

Following this block, the bot was used in an anonymous operation using IPs from three different Indian states, Italy, Hong Kong, Vietnam, Bulgaria, Qatar, Latvia, Hungary, Slovakia, Romania, Brazil, Argentina, Portugal, Spain, France, Mexico, Austria, and South Africa, raising strong suspicions that the IPs were not being used legally. These IPs, and User:Rotlink, self-identified as the owner of archive.is, were subsequently blocked. Rotlink has not commented on any of the blocks.

Over 10,000 links to archive.is remain on Wikipedia.

Points to consider

Archive.is is a relatively young archiving service.
No one has found any problems with the quality of archived links. So far as anyone can determine, archive.is is presenting an accurate record of all material it claims to archive.
In this discussion, User:Rotlink identifies himself as the owner of archive.is.
Rotlink wrote User:RotlinkBot, a bot which created links. It was unapproved, and blocked because of unapproved operation. Again, the bot seemed to operate reasonably well: minor defects were noted, but nothing serious. The motivation for the block was the unapproved operation.
RotlinkBot did not exclusively add links to archive.is: it added links to other archiving sites as well, and apparently in preference to archive.is in some cases.
On September 3, 2013, 94.155.181.118 (talk · contribs · deleted contribs · logs · filter log · block user · block log) began inserting links to archive.is, as well as links to other archive sites. This appears to be RotlinkBot running anonymously.
By September 17, 2013, the list of IPs that were inserting had grown. It included at least the following:
- 188.217.203.245 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 61.15.46.216 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 27.3.85.26 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 84.43.147.53 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 176.202.105.40 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 117.239.64.166 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 117.223.161.182 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 87.110.16.100 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.132.64.81 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 78.98.25.91 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.34.75.123 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 189.34.9.60 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 186.19.57.19 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 188.251.236.114 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 87.223.115.147 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 83.157.124.218 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 90.163.51.63 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 85.66.241.59 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.36.214.186 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 95.168.56.11 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 117.215.1.168 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 187.208.150.144 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 78.142.126.177 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 105.236.16.88 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 41.228.51.25 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.228.46.37 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 60.50.51.210 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 190.57.181.70 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 62.63.132.36 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 122.178.159.163 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 178.79.34.86 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 109.175.88.133 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 190.244.69.154 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
This list of IPs included three different Indian states, Italy, Hong Kong, Vietnam, Bulgaria, Qatar, Latvia, Hungary, Slovakia, Romania, Brazil, Argentina, Portugal, Spain, France, Mexico, Austria, and South Africa.
Based on that pattern of IPs, User:Kww concluded that not only was RotlinkBot being used anonymously in violation of its block, but that the IPs being used were likely to be anonymous proxies or a similar form of botnet. He blocked Rotlink, all of the IPs, and a few more IPs that were discovered later.
He called for edits by the IPs to be rolled back at WP:ANI: https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard/Incidents&oldid=573791554#Mass_rollbacks_required
Many editors and admins reverted.
At this point, over 10000 links to archive.is remain in Wikipedia.
At this point, User:Kww has no firm proof of illegal activity, although he remains of the opinion that this is likely.
User:Rotlink has made no comment in respect to his block.

The current situation is awkward. It's impractical to place the link on the spam blacklist, because the spam blacklist will interfere with editing any of the articles that contain a link to archive.is. It seems strange to have so many links, but to claim that no more links can be added. Several editors view the rollbacks themselves as destructive. We need to figure out how to go forward.

There would appear to be several options.

Options

1. Remain where we are

Wikipedia is notoriously inconsistent, and this is just one more case. There's no need for blacklisting, no need to remove the existing links, and no need to restore links that were removed due to the improper bot use.

Support Per my comment in the discussion section below. I put a lot of work into changing over broken links to archive.is links when pinkpaper.com went offline. I don't see why my hard work should be undone because of someone else's misbehaviour. If someone spammed links to BBC News all over Wikipedia, and we found it was a BBC News employee, that wouldn't change the fact that BBC News is a useful and reliable source. Behaviour issues on the part of this unapproved bot operator doesn't change the fact that archive.is remains a useful service that ensures a fair number of references are actually verifiable. —Tom Morris (talk) 12:52, 21 September 2013 (UTC)[reply]
Support I added many archive.is links to snooker-related articles, because they can't be found on any other archiving service due to nobot restrictions. Armbrust ^{The Homunculus} 16:45, 21 September 2013 (UTC)[reply]

2. Revert the reversions

Since no one has found a problem with the existing links that were reverted, the reverted links should be restored.

I'm going to park myself here, although I do think those added after Rotlink's indef by IPs shouldn't be reinstated. Kww seems to be going around in circles with whether they want the site blacklisted or not, and are jumping to conclusions that simply have no foundation whatsoever. Lukeno94 (tell Luke off here) 12:51, 21 September 2013 (UTC)[reply]

3. Complete removal of archive.is

We should write a bot which searches for all links to archive.is, replacing them when possible, and removing them when not. When this bot is complete, archive.is should be placed on the blacklist.

I prefer this option. It is based primarily on my belief that the IPs were not being used legally. This makes me distrust the motives of archive.is, and suspicious that we are being set up as the victim of a Trojan Horse: once the links to archive.is are established, those links can be rerouted to anywhere. If illegal means were used to create the links, why should we trust the links to remain safe?—Kww(talk) 15:57, 20 September 2013 (UTC)[reply]
Support this option as second choice.--v/r - T P 21:01, 20 September 2013 (UTC)[reply]
Support with no prejudice to human readdition - do not blacklist the link. A bot cannot determine whether the link is appropriate, but if a human editor does, he should be free to add it. ~Charmlet ^-talk- 21:45, 20 September 2013 (UTC)[reply]
Support. This is a new and uncertain operation, and there are serious questions about its ethics and stability given what has happened. When the operation has been around long enough to show it is trustworthy, we can reconsider using it then. But as it stands, we should remove it completely and flag it as questionable so as to save editors from working on creating links to it, which later either break down if the operation closes, or lead to adverts as the owner indicates might happen. The Wikipedia article on the operation itself is currently at AfD, with six delete comments and one keep: Wikipedia:Articles for deletion/Archive.is. The company may have been using Wikipedia to make themselves known, and to pave the way for the site owner to make a profit. It is not our purpose to promote or advertise any company. Alexa shows that Wikipedia is the website's fourth largest direct supplier, and an indirect supplier via mirrors and Google searches. The website is a start up that is relying on Wikipedia to build traffic. The owner has indicated that ads may appear after 2014. We should wait until the operation has proved itself before setting up thousands of links to what may become an advert site. SilkTork ^{✔Tea time} 09:30, 22 September 2013 (UTC)[reply]
Support per SilkTork (moved from option 4). And after reading the FAQ, it seems apparent that this is a one-man operation. Combined with the possibility of ads in the future, and the evidence we have on his ethics (which make me doubt this will even be a viable ad-free service for as long as promised), we should clean this up while it's still somewhat manageable and wait to see what archive.is becomes before allowing our articles to become reliant on it (as otherwise we could end up with an even tougher problem to deal with). equazcion ^(talk) 09:48, 22 Sep 2013 (UTC)
According to a response on the website there are two people running the operation. So it was either the owner who has been inappropriately using Wikipedia or the owner's partner. Either way, not a good show. SilkTork ^{✔Tea time} 09:58, 22 September 2013 (UTC)[reply]
Support for the time being. Although I've encouraged Rotlink to follow procedure at every step and promptly addressed his BRFA, it is clear that the likelihood of an ulterior motive is very high, as he has circumvented our processes at every step once he realized they will take time. And there is absolutely no reason to be in such a hurry to add massive numbers of links to one's website unless one really wants to drive traffic to their website. I know this is speculation, and I'd like to be proven otherwise. But until this is a 1-man operation, has no financial safety proof, doesn't follow robots.txt, the owner is this impatient to add links and doesn't respond, uses anonymous proxies to further add links, and there are no guarantees that the website doesn't suddenly start serving ads, I cannot endorse this archival service. Per SilkTork, the ethics and stability are too uncertain. This service first has to prove it is well-meant, reliable, and open -- two of which are already under significant doubt. For example, we have Webcite as a perfect alternative and every link rightfully archived at archive.is could have been archived at Webcite. — HELLKNOWZ ▎TALK 13:58, 22 September 2013 (UTC)[reply]
If you went to the comparision with WebCite, there are more points.
- Supporting robots.txt which was designed for crawlers is not relevant to on-demand archives. It prevents them to archive pages from many sites.
- It is WebCite that is 1-man enterprise experiencing financial problems which could only escalate after moving to expensive Amazon EC2 cloud hosting^[1]. 77.110.134.11 (talk) 14:53, 22 September 2013 (UTC)[reply]
Oppose. I think, the argument of supporters are very emotional. They appeal to ethics and try to predict the future. My vision of the future is:
- The revertion will be mass scale vandalism.
- The editors will scream like User:Lexein. Most of them do not read ANI and RFC and do not take part in this discussion. But they get notified about the changes in their articles.
- Many sources available only on archive.is (see ANI discussion for examples). The editors will have to circumvent the ban of the domain. Do you know how do they do it for currently banned domains?
  - By using Google Cache. This will result in "nice" URLs http://webcache.google.com/___&q=http://archive.is/http://webcache.google.com/___. This URL is correct against the new rules but it is fragile.
  - By using WebCite. This will result in pages with HTML hosted on WebCite and images hosted still on Archive.is. Fragile to downtime of any one of the services.
- Assuming that the bot was seeking for traffic, it can also circumvent the domain ban using Google Cache or WebCite. Both keep JavaScript on archived pages and the script can redirect trafic anywhere. I would say, it is even easier to steal traffic this way. 193.86.243.17 (talk) 07:49, 23 September 2013 (UTC)[reply]
I'm confused: are you saying that users would be right (or wrong) to oppose (like me) the punitive mass reversion of archive.is links? And how is anything I wrote "screaming"? Uncool. And I do read both ANI and RFC, so, wth? --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
It is not right nor wrong. I think a lot of editors who think like you are not interested in reading ANI and RFC. A lot of editors (actually, admins) who think like Kww do. If #3 would win the consensus, it would be only as the result of the bias. The editors who think like you will get know about the desicion when their arcticles will get touched by the reversion edits. They will argue and ask wtf. The admins will answer "There was RFC and if you did not read it, it is your problem. Now we have a solution by consensus and only have to reify it. It is too late to discuss". 88.15.83.61 (talk) 16:02, 24 September 2013 (UTC)[reply]

Sorry, if word "screaming" offended you. I peeked it from ANI topic with no thinking how rude it sounds. Sorry again. 88.15.83.61 (talk) 16:02, 24 September 2013 (UTC)[reply]

"Vandalism" is a deliberate attempt to compromise Wikipedia. A mass revert in good faith is not vandalism, since the aim is to improve Wikipedia, regardless of whether the result does so.

Circumventing Wikipedia policy is pointy as well as being against policy. If some editors do that, we can deal with it as it becomes a problem, but I don't think we should make up hypothetical problems to stop us doing things that are a good idea.

—me_and 16:51, 23 September 2013 (UTC)[reply]
I meant the following case (193.86.243.17 and me is the same person, first IP is airport wifi): I would like to edit an article, to fix typo or something minor. But when I try to save the page it is not possible, becase the page has a link to a banned domain (my edit has nothing to do with the link). Maybe, you have admin rights and never hit this case, but it is very common. The editor has the choice: not to save his edit, to remove the link, to link it via bit.ly... oops, it is banned as well... then to link it via WebCite. A lot of links to WebCite, archive.is and Google Cache are there not because the original links are dead, but because they are banned. 88.15.83.61 (talk) 20:05, 23 September 2013 (UTC)[reply]
Oppose I support verifiability and protection against link rot. I oppose the assumption of bad faith against the operator(s) of archive.is by a large number of editors and administrators here. The operator(s) of archive.is have stated that advertising is very unlikely, because its operation is cheap, and funded by income from other projects. The quality of archive.is content is high, in general better than both archive.org and webcitation.org. Up until the alleged bot operations, archive.is was only an asset to Wikipedia. Its archive is still an asset. Wikipedia's Five Pillars call for building an encyclopedia with verifiable content, based on reliable sources. IMHO archive.is contributes to that, and every link to archive.is should be maintained, as long as the archive link doesn't remove the original broken link. --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
Support Rotlink (and subsequently Archive.is and RotlinkBot) have burned a great amount of good faith from within the community. Rotlink was caught running an unauthorized bot and did some effort in trying to get it approved. Rotlink withdrew the request for approval on the bot task. When it was discovered that a great many IP addresses were adding archive.is links in the same way that RotlinkBot was, there was cause for blocking on the grounds for suspicion that the bot had been distributed to a wide collection of sites (possibly mirrors for Archive.is?) and started up. That no explanation has been forthcoming is indicative (in my mind) that Rotlink knows they were caught in the cookie jar, and are trying to weasel their way out of accepting responsibility. Rotlink has shown a interest in furthering their nascent archiving service over the expressed viewpoint of wikipedia. Therefor it is incumbent on wikipedia to divest itself of this Archiving service untill it becomes a standard accepted elsewhere and we recieve an accounting of Rotlink's actions and how they will resolve disputes such as this in the future. Hasteur (talk) 13:29, 24 September 2013 (UTC)[reply]
Strong Oppose as a stupid overreaction that damages the hard work numerous editors, including me, will have done. Anyone who has ever bothered to look for archives should know just how hard it can be to find archives of some links. And the questions over the future of archive.is are irrelevant; we're not about to advocate the removal of WebCite links, yet that has a MUCH less clear future. Lukeno94 (tell Luke off here) 15:34, 24 September 2013 (UTC)[reply]
Support Remove all archive.is additions and links, irrespective of who or what added them. Remove all references to archive.is. Apply a scorched earth policy to make it absolutely clear to everyone, that setting up an archiving service, archiving hundreds of thousands of URLs mentioned in Wikipedia references and then adding those archive details to the references will not be tolerated. Tens of millions of Wikipedia references do not link to an archive copy. Increasing that figure by a few hundred thousand makes no material difference to the overall number. Additionally, WebCite seems to be in financial trouble. They could easily add adverts to fund their site at any time. Wikipedia should remove all WebCite archive links long before this happens. Apply the same scorched earth policy and teach these people a lesson. Make it clear that setting up an archive service, archiving hundreds of thousands of pages and then expecting hundreds of thousands of free links from Wikipedia is always going to be doomed to failure becaue the project rejects all such offers of 'help'. Fundamentally, there's more to this. There's no need to ever link to an archive copy of anything. Most material currently being archived will be of no interest to anyone in a hundred years time. Truly interesting things stick around. By archiving hundreds of thousands of Wikipedia references, the "natural selection process" is being usurped with huge quantities of trivia being preserved that should not be. - 91.84.105.112 (talk) 16:08, 24 September 2013 (UTC)[reply]
91IP Please assume good faith on the actions of others (as I assume you'd want good faith assumed on yours). We are not supposed to enable/reward blocked editors ever. Hasteur (talk) 16:43, 24 September 2013 (UTC)[reply]
(NB:91IP is not me, my vote is above). It is questionable which one of the solutions can be called "reward blocked editor". I would say it is #3, as its consequences will draw big attention to the archiving problem in general and to archive.is in particular. Ill fame is also promoutional. 88.15.83.61 (talk) 17:40, 24 September 2013 (UTC)[reply]
I'm strongly concerned that the 91IP is simply here just to make a point. Lukeno94 (tell Luke off here) 18:19, 24 September 2013 (UTC)[reply]

3a. Allow dead links to remain permanently in Wikipedia. Change archive.is links back to dead links to the original content. Let users find archived copies by themselves if they can.

It is not traditionally the business of an encyclopaedia to help readers to obtain out-of-print references, or their modern equivalents, as far as I know. Hypothetically, various third-party apps and third-party websites could choose to shoulder the legal risks, if any, of presenting modified versions of Wikipedia articles with archive links added. Wikipedia's content licensing allows this. Editors would be free from legal risks and would have more free time to add actual content to the encyclopaedia.--greenrd (talk) 19:39, 23 September 2013 (UTC)[reply]

Support as proposer primarily on the grounds of freeing up editor time. Automation is good - even if someone else is doing it.--greenrd (talk) 19:39, 23 September 2013 (UTC)[reply]
Comment. A lot of links to archive.is, archive.org, WebCite and Google Cache are not dead. They are from domains banned in Wikipedia. That's why the editors had to use archiving service. Reverting such links means using banned domains.This will prevent the articles from further edits. 88.15.83.61 (talk) 19:48, 23 September 2013 (UTC)[reply]
When hosts which formerly hosted RS content die, frequently their domain-name-squatted replacements host malware; this is a good reason for them to become blacklisted. Also correctly blacklisted are notoriously unreliable sources which host only user-generated or copyright-violation content. But for the first, I will link to an archive of a URL, from a time when the content was valid. I'm stating this only to point out that archives are not typically used to maliciously bypass the blacklist, it's to link to an archive of an actual reliable source. When I find spot archive links to bad sources (blacklisted), I mark them as {{dubious}} or remove them entirely and tag the claim {{citation needed}}. --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
Oppose. Dead links are anathema. Verifiability is important. Wikipedia exists in an internet/web world: citing a source which is reachable by a URL, then letting that URL go dead with no archive of it, is wrong at several levels. Reading WP:LINKROT will help here. --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
So is circumventing the community consensus. Want to guess which one is more of an anathema? Hasteur (talk) 13:36, 24 September 2013 (UTC)[reply]
Strong Oppose - What on earth are you on about Greenrd? This doesn't free up editor's time at all; in fact, it wastes it by ruining hard work, and will cause multiple GAs and FAs to fail various bits of their criteria. Utterly stupid idea. Lukeno94 (tell Luke off here) 15:32, 24 September 2013 (UTC)[reply]

4. Replace bot-added archive.is links where possible, leave human-added links intact

We should replace links to archive.is that were added by the bot, where possible. Where no replacement is available, the links should be left in place. Links added by human editors should be left in place as well.

The circumstances surrounding these links leave me uneasy about leaving them alone (a startup trying to establish itself by automatically spreading its links across Wikipedia, use of proxies, unapproved bot by unresponsive entrepreneur). However I'm wary of cutting off our nose to spite our face -- if they have the only viable links to the content we need for a substantial number of references, leave the links alone in those cases. But in situations where there is a replacement available at a different, reputable service, those links should be switched over. Links added by people should also be left alone, however, as editors should be allowed to link to whichever service they want. The pervasiveness of the bot-added links establish a possible artificial trust among editors who see them that I think warrants undoing. equazcion ^(talk) 16:03, 21 Sep 2013 (UTC) (moved to complete removal)

This is a sensible option; the easiest way (albeit not a foolproof way) is to simply nuke those added by IPs. Lukeno94 (tell Luke off here) 18:47, 21 September 2013 (UTC)[reply]
Support, as long as archive.is remains ad-free and there remains no evidence the archive are not faithful renditions of the original sites. NE Ent
Support: I don't see enough evidence to support reverting legitimate editors' work, but I also believe that we should stop unauthorized bots from being able to edit Wikipedia even when their edits are ostensibly positive. —me_and 09:03, 23 September 2013 (UTC)[reply]

5. Contact Rotlink off-wiki and get them to seek approval of their bot

Contact Rotlink off-wiki (using email perhaps) and encourage them to follow the community's process for bot approval so the bot can operate within policy.

Support as proposer and first choice.--v/r - T P 21:03, 20 September 2013 (UTC)[reply]
Support in addition to number 3. I don't support a bot for what I believe should be human judgement, but regardless, I think he should be allowed to use the community processes for approval if the community so wishes. ~Charmlet ^-talk- 21:45, 20 September 2013 (UTC)[reply]
Support a bot which would focus on, within Wikipedia citations, archiving and/or finding alternate URLs for notoriously ephemeral, doomed sources such as Google cache, AP (or better, news sites hosting AP articles), and publications which regularly purge old articles for no obvious reason, such as the various Murdoch media outlets. Such a primary focus would fill a gap not served by any existing bot that I know of. --Lexein (talk) 13:19, 24 September 2013 (UTC)[reply]

6. Copy Archive.is and WebCite content to Wikimedia-controlled server until it is too late

It is only 10Tb (Archive.is) and 2Tb (WebCite). $500 question (3 * $165 (4 Tb HDD)). 193.86.243.17 (talk) 07:49, 23 September 2013 (UTC)[reply]

x3 for redundancy. Not that that makes it exorbitant or anything.--v/r - T P 13:29, 23 September 2013 (UTC)[reply]

It's never that cheap or that easy. Setting up and maintaining such a service requires more than simply the disk space. In any case, there discussions about doing this for WebCite at meta:WebCite. —me_and 16:59, 23 September 2013 (UTC)[reply]

Support. Do not host the archive files. Only copy and keep. If WebCite would go down, give the files to archive.is or archive.org and ask them to host them. If archive.is would go down give the files to WebCite or archive.org. 88.15.83.61 (talk) 19:42, 23 September 2013 (UTC)[reply]
Oppose Wikimedia is not in the business of providing hosting for "worthy" projects. Let Meta handle the discussion. If Wikimedia picks up the option we'll get a great fanfare of trumpets to announce this new option for all wikis. Hasteur (talk) 13:39, 24 September 2013 (UTC)[reply]

Discussion

I'm very concerned about the idea of completely removing all archive.is links, even those added by actual editors. I know of several editors who switched from WebCite to Archive.is when WebCite's future ability to archive came into question, myself being one of them. WebCite does say existing archives won't go away, but I find it hard to trust this in the long term. I'm unaware of any other on-demand services other than WebCite and Archive.is at this point, so if in a year WebCite goes away and Archive.is no longer trusted, where does this leave us? To be blunt, we're probably back to the idea of either trying to take over WebCite ourselves, or providing some funding, or...something along those lines. I know there's the issue of copyright/non-free being an issue for the Foundation, but a solution needs to be found for the long run, not simply what's convenient right now. (Sorry for rambling...) — Huntster (t @ c) 07:58, 21 September 2013 (UTC)[reply]

In case Kww's proposal does win out, there is also web.archive.org - a seperate website. It's my preferred archiving site :) Lukeno94 (tell Luke off here) 20:05, 21 September 2013 (UTC)[reply]

I know about Archive.org, but it is not "on demand". My whole point was that without the two above-mentioned sites, we won't have access to on demand services, which are needed for archiving a specific instance of a site. — Huntster (t @ c) 20:54, 21 September 2013 (UTC)[reply]

Web.archive.org has an "archive now" function, but the archived page is only made available much much later: officially 3-6 months, anecdotally 2 weeks to a year. --Lexein (talk) 19:11, 24 September 2013 (UTC)[reply]

I think the "remove all archive.is links" option is incredibly stupid. By all means, remove all links added by IPs if you really have to do this, but really... Fuck knows why you proposed this; you may as well blacklist it if you want this solution! Also, the option I want isn't there: where we reinstate all of the archive.is links added before Rotlink's indefinite block (or those inserted before their bot was indeffed), and only remove those added by IPs after their indef. Lukeno94 (tell Luke off here) 10:29, 21 September 2013 (UTC)[reply]

Please reread my comment: I explained myself. I believe the owner of the site to have engaged in illegal activity, and therefore do not trust him, his site, or his future intentions. The bot was indefed on August 18, so all links created by the IPs above were placed in defiance of a block.—Kww(talk) 16:47, 21 September 2013 (UTC)[reply]

Again, you have made that claim, but there's no real evidence to prove it; whatever happened to "innocent until proven guilty" anyway? You may not trust the site, but it is clear that several longstanding editors - including myself - do, and still trust it. Your proposal to nuke everything flies in the face of a LOT of work by legitimate editors, particularly as archive.is has often been the only accessible archive for a given page. Lukeno94 (tell Luke off here) 18:45, 21 September 2013 (UTC)[reply]

How do you think he had legal access to IPs in such a wide range of countries, Lukeno94?—Kww(talk) 19:09, 21 September 2013 (UTC)[reply]

Considering I know absolutely nothing about how VPNs and proxies work, I don't know what is legitimate and what isn't. But the actions of one person, regardless of who they are, shouldn't result in lots of other people having their hard work undone (as it can be VERY hard to find a working archive for a link sometimes...) Lukeno94 (tell Luke off here) 20:04, 21 September 2013 (UTC)[reply]
Kww, I see you presume it should be obvious to anyone that it was done illegally. I'm reasonably familiar with how proxies work and I'm not sure I understand your reasoning. If I wanted to set up proxies in several different countries I'm fairly certain I could do it legally. Could you explain what you believe to have occurred here that was illegal, and what leads you to think that? I'm asking honestly, not necessarily out of doubt. You may be more knowledgeable in these things than I am. equazcion ^(talk) 20:17, 21 Sep 2013 (UTC)

First is Occam's razor: what would prompt anyone to actually go to the expense of negotiating individual proxy hosts in places ranging from Qatar to Brazil to Vietnam? Second is the nature of the IPs: they aren't webhosts and servers. Instead, they are individual IPs on adsl networks, FTTH networks, cable modems, etc. Everything about the setup screams "botnet". If it was a legitimate proxy arrangement, I would expect to see webhosts and servers hosted in a small number of countries with good internet access.—Kww(talk) 00:27, 22 September 2013 (UTC)[reply]

So you think this is a network of compromised computers (just for those who might not know what botnet refers to). That does make sense, and thanks for explaining. equazcion ^(talk) 00:41, 22 Sep 2013 (UTC)
What would prompt anyone to actually go to the expense to do a legal or illegal setup instead of simple googling proxylist ? 77.111.172.172 (talk) 09:10, 22 September 2013 (UTC)[reply]

I think known open and advertised proxies tend to be preemptively blocked from editing. equazcion ^(talk) 09:18, 22 Sep 2013 (UTC)

This can explain why there are so few webhosts and contiguous blocks of IP. They were already blocked from editing. I can imagine another simple way to get proxies. We are talked about a site owner, right? Then he/she can see access logs of the site. There are usually a lot of hits from malicious security scanners (looking for SQL-injections, etc). Those IPs are proxies and can be connected back and reused. Setting up own proxy infrastructure looks too expensive. 77.111.172.172 (talk) 09:53, 22 September 2013 (UTC)[reply]

Those proxylists frequently (not always, but frequently) contain compromised computers as well. It's a common vector for virus and malware distribution.—Kww(talk) 14:42, 22 September 2013 (UTC)[reply]

Citation needed. Although I see you took my suggestion to pose an RFC, I'm seeing a raft of pointy supposition, gossip, handwaving, and assumption of bad faith, exaggerated with purple prose like "spambot" and "botnet", deliberately spreading fear, uncertainty and doubt about a resource you've personally decided to dislike and campaign against, despite showing little knowledge of the suspect service. You still don't know if it was Rotlink who actually did any archive.is additions after the block, or if it was anyone at archive.is at all, or someone else who helped out. Same IP? Ask a Checkuser. Otherwise, this is all rather weak koolaid, which I won't be drinking. --Lexein (talk) 19:11, 24 September 2013 (UTC)[reply]

I'm very concerned about the removal of archive.is links too. A while back, I tried to fix all the references that use the now offline site pinkpaper.com, the website of the Pink Paper, one of the UK's main LGBT news sources. Between archive.org and archive.is, I managed to find replacements for some but not all of the references used. The LGBT topic area tends to be filled with a lot of poorly sourced material especially around BLP subjects. Removing archive.is links is likely to leave a lot of those links broken. I don't really know what's going on with the IP and the non-approved bot account, but I'd rather if all the hard work I put into fixing PinkPaper links were removed just because of somebody else's behaviour. And I'm not keen on having BLP articles on sexuality-related topics potentially left without sources. This seems self-defeating. Whatever the problem is, please can you seek more of a calmer, less dramatic solution than removing all the links to a useful archival service. —Tom Morris (talk) 12:48, 21 September 2013 (UTC)[reply]

Thought/idea: would it be possible to wrap the archive.is links up in an external links template? (similar to {{IMDb title}} etc) That way, if the site goes hinky in the future, all the links could quickly be disabled, minimizing negative fallout. Siawase (talk) 12:03, 22 September 2013 (UTC)[reply]
- This is a good idea. Wouldn't it be better to backup its content to a Wikipedia server? If the site goes hinky in the future, all the links could be changed to something like archiveis.wikimedia.org instead of disabling the links and hitting the verificability issue. 77.110.134.11 (talk) 12:15, 22 September 2013 (UTC)[reply]
- My concern with this is that a template would be seen as tacit approval of these links, which I think we're a long way from having. I know I would see the use of such a template as implying the community considers these links to be A Good Thing, particularly if Archive.is had such a template while other archiving services didn't. —me_and 09:06, 23 September 2013 (UTC)[reply]

The introduction of this RfC misses the fact that User:Rotlink (user, not bot) himself added a lot of links [1] between having his bot blocked, withdrawing his BRFA and until this was pointed out to him [2]. — HELLKNOWZ ▎TALK 13:35, 22 September 2013 (UTC)[reply]
Do we even have any proof that Rotlink is the owner of the website, and isn't just claiming to be? I'm still disgusted that the actions of one person could lead to the reversion of a shedload of good edits by legitimate editors; regardless of whatever position they hold. Frankly, the age of an archiving site is utterly irrelevant; if it does go under, or if it does end up with adverts, then THAT is the time to propose its removal. Seems like several people have forgotten about WP:CRYSTAL... Lukeno94 (tell Luke off here) 14:37, 22 September 2013 (UTC)[reply]
- Lexein says "I have communicated with the owner"[3] and that Rotlink is the owner[4]. NebY (talk) 18:21, 22 September 2013 (UTC)[reply]

At the risk of looking like a total pratt, that isn't convincing. Lexein has communicated with Rotlink, who claims that they are the owner. Lexein's usage of words doesn't confirm or disprove the claim. I'd like to see something rather more solid before we jump to conclusions about whether to include archive.is links or not. Also, the presence of a Wikipedia article, and the reliability and/or notability of it, has precisely nothing to do with whether we use an archiving site or not; bringing that up is unnecessary and deliberately inflammatory. Lukeno94 (tell Luke off here) 19:27, 22 September 2013 (UTC)[reply]

better diff. — HELLKNOWZ ▎TALK 19:37, 22 September 2013 (UTC)[reply]
CRYSTAL applies to determining article existence and content. A modicum of speculation isn't unreasonable when it comes to technical concerns, which this can easily become if an archive site becomes widely relied upon by articles and then becomes unviable. equazcion ^(talk) 19:43, 22 Sep 2013 (UTC)

The possibility of WebCite going under was, and perhaps still is, very real. Did that mean we blanket removed every single link? No, it didn't. The diff Hellknowz shows a lot of technical knowledge; but it's fairly generic stuff that anyone who goes and looks things up for could come out with. Given that it is near a year old though, it appears probable, if not certain in my mind, that Rotlink is an owner, or employee. It does not, however, confirm he is the only owner; nor should it matter one iota if a website is owned by one guy, two guys, or a consortium. Lukeno94 (tell Luke off here) 19:55, 22 September 2013 (UTC)[reply]

Luke, Lexein has emailed the owner and has later confirmed that the owner is Rotlink. I was not trying to "bring up" that article (you surely know about it already) and am disturbed that you think it "deliberately inflammatory" to try to help answer the question you asked, "Do we have any proof...", by referencing another editor's research. NebY (talk) 22:20, 22 September 2013 (UTC)[reply]

You've misread my comment - I was referring to the earlier mention of the AfD on an article about archive.is; that was irrelevant and deliberately inflammatory. Not any response to my comment. Lukeno94 (tell Luke off here) 22:28, 22 September 2013 (UTC)[reply]

WebCite is no longer accepting submissions like they used to. I tried it today. They rejected my target page with the false summary claiming that my email address was incorect. Archive.is accepted my submission no problem. Poeticbent talk 21:01, 22 September 2013 (UTC)[reply]

I just archived a page fine, perhaps your e-mail address was invalid, like a stray character. — HELLKNOWZ ▎TALK 23:02, 22 September 2013 (UTC)[reply]

Must've been a temporary outage. I tried it yesterday and got the same error message. Good to know though that it's back up again for the time being. De728631 (talk) 15:23, 23 September 2013 (UTC)[reply]

Other Archive.is users. I was 2 minutes too late to add this info to Wikipedia:Articles_for_deletion/Archive.is into the list of notable sources which consisted of a single item. I just did "archive.is" search on majesticseo.com (pro account required) and find out that besides Wikipedia archive.is is used by WikiLeaks^[2], Verso Books^[3], Lenta.ru^[4], The Guardian^[5], Channel Register^[6], Reuters^[7], RTVE^[8], The Atlantic Wire^[9], Time (magazine)^[10], Badische Zeitung^[11], Blueseed^[12], MTV Sweden^[13], Chicago Reader^[14], The Huffington Post^[15], Público (Portugal)^[16] and many others less notable web sites and blogs. 88.15.83.61 (talk) 19:19, 24 September 2013 (UTC)[reply]

References

[1] WebCite#Fundraising

[2] ttps://twitter.com/wikileaks/status/368886340089171968

[3] ttps://twitter.com/VersoBooks/status/382175504607887360

[4] ttp://lenta.ru/articles/2013/02/22/fromharvardwithlove/

[5] ttp://www.theguardian.com/environment/climate-consensus-97-per-cent/2013/sep/16/climate-change-contrarians-5-stages-denial

[6] ttp://www.channelregister.co.uk/2013/09/16/bill_gate_again_world_richest_and_richest_in_us_for_the_20th_straight_year/

[7] ttp://blogs.reuters.com/felix-salmon/2013/01/

[8] ttp://blog.rtve.es/distritolatino/2013/01/index.html

[9] ttp://www.theatlanticwire.com/entertainment/2013/09/utero-20-what-were-saying-now-and-what-we-said-then/69397/

[10] ttp://business.time.com/2012/10/01/how-the-maker-movement-plans-to-transform-the-u-s-economy/

[11] ttp://www.badische-zeitung.de/gemuese-kooperative-wirft-afd-kandidatin-fein-raus

[12] ttp://blueseed.co/faq/

[13] ttp://www.mtv.se/musikvideo_artist/1519-p-nk

[14] ttp://www.chicagoreader.com/Bleader/archives/whoa

[15] ttp://www.huffingtonpost.com/2013/07/02/andrew-mason-album_n_3534367.html

[16] ttp://www.publico.pt/mundo/noticia/wikileaks-exige-demissao-de-jornalista-da-time-que-sugeriu-assassinar-assange-1603394

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]