Jump to content

Wikipedia:Requests for comment/Archive.is RFC

From Wikipedia, the free encyclopedia
See also Wikipedia:Archive.is RFC 2 (closed), Wikipedia:Archive.is RFC 3 (closed) and Wikipedia:Archive.is RFC 4 (closed)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Recent events related to archive.is have left Wikipedia's links to that service in a state that requires a community decision.

Background

[edit]

"Archive.is" is a website that functions similarly to the more established Wayback Machine: Both provide an archiving service whereby snapshots of web pages across the internet are saved in a vast repository. In case archived pages become unavailable at their original locations, or their content is removed or changed, these archive services provide a static backup of each page, each of which can be linked to with presumably more assurance that their content will remain online and intact. Compared to Wayback Machine, which is much older and established, Archive.is is a newer competing service. Wikipedia articles have commonly used links to Wayback Machine's version of web pages for use in their references in order to combat link rot.

A bot called RotlinkBot, created by User:Rotlink, has recently begun linking Wikipedia articles to the new Archive.is service. This bot was not approved, and was therefore subsequently blocked.

Following this block, the bot was used in an anonymous operation using IPs from three different Indian states, Italy, Hong Kong, Vietnam, Bulgaria, Qatar, Latvia, Hungary, Slovakia, Romania, Brazil, Argentina, Portugal, Spain, France, Mexico, Austria, and South Africa, raising strong suspicions that the IPs were not being used legally. These IPs, and User:Rotlink, self-identified as the owner of archive.is, were subsequently blocked. Rotlink has not commented on any of the blocks.

Over 10,000 links to archive.is remain on Wikipedia.

Points to consider

[edit]
  1. Archive.is is a relatively young archiving service.
  2. No one has found any problems with the quality of archived links. So far as anyone can determine, archive.is is presenting an accurate record of all material it claims to archive.
  3. In this discussion, User:Rotlink identifies himself as the owner of archive.is.
  4. Rotlink wrote User:RotlinkBot, a bot which created links. It was unapproved, and blocked because of unapproved operation. Again, the bot seemed to operate reasonably well: minor defects were noted, but nothing serious. The motivation for the block was the unapproved operation.
  5. RotlinkBot did not exclusively add links to archive.is: it added links to other archiving sites as well, and apparently in preference to archive.is in some cases.
  6. On September 3, 2013, 94.155.181.118 (talk · contribs · deleted contribs · logs · filter log · block user · block log) began inserting links to archive.is, as well as links to other archive sites. This appears to be RotlinkBot running anonymously.
  7. By September 17, 2013, the list of IPs that were inserting had grown. It included at least the following:
  8. This list of IPs included three different Indian states, Italy, Hong Kong, Vietnam, Bulgaria, Qatar, Latvia, Hungary, Slovakia, Romania, Brazil, Argentina, Portugal, Spain, France, Mexico, Austria, and South Africa.
  9. Based on that pattern of IPs, User:Kww concluded that not only was RotlinkBot being used anonymously in violation of its block, but that the IPs being used were likely to be anonymous proxies or a similar form of botnet. He blocked Rotlink, all of the IPs, and a few more IPs that were discovered later.
  10. He called for edits by the IPs to be rolled back at WP:ANI: https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard/Incidents&oldid=573791554#Mass_rollbacks_required
  11. Many editors and admins reverted.
  12. At this point, over 10000 links to archive.is remain in Wikipedia.
  13. At this point, User:Kww has no firm proof of illegal activity, although he remains of the opinion that this is likely.
  14. User:Rotlink has made no comment in respect to his block.
  15. A second attack occurred on Oct 2, using the following proxy list:

The current situation is awkward. It's impractical to place the link on the spam blacklist, because the spam blacklist will interfere with editing any of the articles that contain a link to archive.is. It seems strange to have so many links, but to claim that no more links can be added. Several editors view the rollbacks themselves as destructive. We need to figure out how to go forward.

There would appear to be several options.

Options

[edit]

1. Remain where we are

[edit]

Wikipedia is notoriously inconsistent, and this is just one more case. There's no need for blacklisting, no need to remove the existing links, and no need to restore links that were removed due to the improper bot use.

  1. Support Per my comment in the discussion section below. I put a lot of work into changing over broken links to archive.is links when pinkpaper.com went offline. I don't see why my hard work should be undone because of someone else's misbehaviour. If someone spammed links to BBC News all over Wikipedia, and we found it was a BBC News employee, that wouldn't change the fact that BBC News is a useful and reliable source. Behaviour issues on the part of this unapproved bot operator doesn't change the fact that archive.is remains a useful service that ensures a fair number of references are actually verifiable. —Tom Morris (talk) 12:52, 21 September 2013 (UTC)[reply]
  2. Support I added many archive.is links to snooker-related articles, because they can't be found on any other archiving service due to nobot restrictions. Armbrust The Homunculus 16:45, 21 September 2013 (UTC)[reply]
  3. Support. It's a sticky situation, but I see no need for knee-jerk reactions such as using a bot to remove all links. Don't fight fire with fire. (sorry for the cliché...) — This, that and the other (talk) 01:55, 27 September 2013 (UTC)[reply]
  4. Support I take 'remain where we are' as not excluding things like options 4 and 5, both of which I think are potentially good ideas. Alternatively, 2 and then 4, but I'm hesitant about voting for someone else to do a bunch of extra work. If this bot can be made to be compliant with good practices, then great. If people want to make sure more established archival sites are used in preference to archive.is until it becomes sufficiently well established, well, I think that goes without saying.Qalnor (talk) 19:37, 29 September 2013 (UTC)[reply]

2. Revert the reversions

[edit]

Since no one has found a problem with the existing links that were reverted, the reverted links should be restored.

  1. I'm going to park myself here, although I do think those added after Rotlink's indef by IPs shouldn't be reinstated. Kww seems to be going around in circles with whether they want the site blacklisted or not, and are jumping to conclusions that simply have no foundation whatsoever. Lukeno94 (tell Luke off here) 12:51, 21 September 2013 (UTC)[reply]
    I've taken a look at some of the bot's edits, and except for the 'illegal placing', I see nothing technically wrong with the edits. Wholescale removal of all the archive links inserted/updated based on some speculation or expectation of future (unsubstantiated) ethical violation seems to be a rather pointy thing to do. We've blocked the bot, and disabled some of the proxies, and that should be enough action for now. -- Ohc ¡digame!¿que pasa? 04:30, 30 September 2013 (UTC)[reply]

3. Complete removal of archive.is

[edit]

Section to edit for "Complete removal" comments

[edit]

We should write a bot which searches for all links to archive.is, replacing them when possible, and removing them when not. When this bot is complete, archive.is should be placed on the blacklist.

  1. I prefer this option. It is based primarily on my belief that the IPs were not being used legally. This makes me distrust the motives of archive.is, and suspicious that we are being set up as the victim of a Trojan Horse: once the links to archive.is are established, those links can be rerouted to anywhere. If illegal means were used to create the links, why should we trust the links to remain safe?—Kww(talk) 15:57, 20 September 2013 (UTC)[reply]
  2. Support this option as second choice.--v/r - TP 21:01, 20 September 2013 (UTC)[reply]
  3. Support with no prejudice to human readdition - do not blacklist the link. A bot cannot determine whether the link is appropriate, but if a human editor does, he should be free to add it. ~Charmlet -talk- 21:45, 20 September 2013 (UTC)[reply]
  4. Support. This is a new and uncertain operation, and there are serious questions about its ethics and stability given what has happened. When the operation has been around long enough to show it is trustworthy, we can reconsider using it then. But as it stands, we should remove it completely and flag it as questionable so as to save editors from working on creating links to it, which later either break down if the operation closes, or lead to adverts as the owner indicates might happen. The Wikipedia article on the operation itself is currently at AfD, with six delete comments and one keep: Wikipedia:Articles for deletion/Archive.is. The company may have been using Wikipedia to make themselves known, and to pave the way for the site owner to make a profit. It is not our purpose to promote or advertise any company. Alexa shows that Wikipedia is the website's fourth largest direct supplier, and an indirect supplier via mirrors and Google searches. The website is a start up that is relying on Wikipedia to build traffic. The owner has indicated that ads may appear after 2014. We should wait until the operation has proved itself before setting up thousands of links to what may become an advert site. SilkTork ✔Tea time 09:30, 22 September 2013 (UTC)[reply]
    Let me dispute your assertions point by point. You used the words "uncertain operation", when in fact it has high quality archives, in HTML/CSS (mal-Javascript free) and image form for each page archived. Its acquisition of a page is quite certain and reliable because it is not a crawler (it only archives a single page for any given citation), and it, like WebCite, is not subject to the vagaries of robots.txt (web.archive.org's required Achilles' heel). It offers DMCA-based content removal for those website owners who do not wish their content archived, and since it only archives one page of a site upon request, site owners do not have an onerous task requesting thousands of removals; just (likely) one. Its uptime has been 100% as far as I have been able to determine. No, archive.is has been quite certain. You seem to have forgotten the long archive.org outage of a couple of years ago, when it apparently stopped archiving pages for no publicly disclosed reason. So archive.is is more "certain" than web.archive.org, so far, certainly in terms of memorable outages. And there have been Webcitation.org outages, and a threatened cessation of new archiving this year due to funding problems. So archive.is is more "certain" than that. 2. Linkrot stops for no one. At that time of the Wayback Machine's long dark "pause", let's call it, as I saw links in citations of RS, including web-only RS disappear, I was alarmed. There were a few alternate archiving services, several have been listed at WP:LINKROT at various times; I tried to use all of them, but six months, or a year or two later they were gone, or went to a subscription model. Archive.is has now lasted IMHO long enough (9 full months) to merit a measure of respect and forebearance of minor transgressions (which only occurred, I think, this month). And yes, the recent events have been minor, and, I contend, in service of the Five Pillars, with no obvious commercial intent. 3. Commercial intent including advertising: Recall how the Wayback Machine works. Alexa Internet archives interesting pages, as measured by its browser plugin and other means, as well as crawling the web, for its commercial, paying clients. Alexa, several months later, releases that archived content to web.archive.org. In other words, the archive is the handmaiden to the commercial service. If you object strongly to any hint of an organization being funded, then you would logically stop supporting the use of web.archive.org links. 4. Notability: You assert that Archive.is went to AfD. Ok, so it was nominated without notifying any other interested editors (like recent editors of the article) per WP:AFD#, and was deleted due to alleged non-notability. That's not a reason not to use the service, and so is moot. WebCite itself was non-notable, and we used it. Now it's notable. So what? Observe the article WebCite - is that really notable enough for an article? Observe that aside from one NYT mention, the cited sources are all primary, or cowritten by its founder, Eysenbach. I'm not advocating AfD, but hey, it's vulnerable. So are you then going to campaign for webcitation.org link removals? I wouldn't think so. 5. "The company may have been using Wikipedia to make themselves known" - this is bald conjecture. "May" cannot be a valid reason for removal of all links to it, because the benefit to Wikipedia completely outweighs the possible benefit to archive.is. We also have no evidence that archive.is is a "company" in the sense of a commercial venture at this time. Is it a company? We do not know that. Also, only deadlinks would result in an interested reader clicking on the blue "archive" link in the citation and seeing the archive.is page. Archive.is links in {{cite web}} appropriately only assert "deadurl=yes" for dead links. If it was a pure linkspam play as you fear and infer, the bot would have asserted |deadurl=yes for all filled-in cite web templates. Your argument to delete all archive.is links does not logically follow from the occasional link traffic entrained from readers clicking on citation links to verify them. Further, we have no evidence of the size of traffic outbound to archive.is. Basing such a scorched-earth action on so little evidence is not appropriate. 6. You write "It is not our purpose to promote or advertise any company. Alexa shows that Wikipedia is the website's fourth largest direct supplier, and an indirect supplier via mirrors and Google searches. " I suppose it's just ironic, but Hey, Look, Wikipedia directly advertises Alexa statistics on every website article we host. In fact Alexa shows that Wikipedia is the 7th largest source of referer traffic to Alexa.com. Alexa is a much larger commercial organization than archive.is. We don't have actual traffic numbers (thanks, Alexa!) so direct comparisons are floppy for now, but please don't play the holy card about directly linking to commercial sites. As for advertiser-supported sites, don't forget the genuflecting at film articles, in nearly every single review section, first citing and linking to Rottentomatoes.com and Metacritic.com, those oh-so-reliable sources (Fox News quotes them, so they must be reliable!). My point? 'Commercial' and 'ads' aren't problems; it's the value provided to Wikipedia that matters. The community thinks that those commercial sites add enough value to overlook the ridiculous blatant conflict-of-interest advertising on them. But there aren't ads at archive.is now , so ads, and the fear of ads, just can't be considered relevant now. 7. It's easier to ask forgiveness than permission. So, we should, lacking any proof of illegality, in fact, just go ahead and forgive the recent alleged bot transgressions without even being asked, because on balance, Wikipedia wins with archive.is, and loses without it as can be seen in dozens (hundreds) of dead links not held at archive.org or webcitation.org. My point? We should simply welcome any service which can stay reliable and freely accessible (preferably ad-free), and do everything possible to help that service comply with content and behavioral community standards, and keep trying indefinitely, because Wikipedia needs verifiability of web content, just as much as print, TV, and radio content. Publishing is publishing, and archiving of ephemeral, but reliable sources, is important. More important to me, than to you, apparently, but I do hope to convince you. --Lexein (talk) 11:20, 25 September 2013 (UTC)[reply]
    Followup: See note below about ads. --Lexein (talk) 08:18, 4 October 2013 (UTC)[reply]
    The problem is the people cannot fit the behaviour into any customary category. It does not look like usual editor's behaviour, does not look as usual spammer's behaviour, etc. They do not know what expect from it. They still cannot stop the bot fixing the dead links. They have angst.
    And as it is something unusual, the usual verbs cannot be applied to it. Can the ants from the outer space be "forgiven"? No. They can be only wiped out.
    It is an existential issue, not a technical one.
    That's why your brilliant arguments won't work. 95.225.130.13 (talk) 18:36, 25 September 2013 (UTC)[reply]
  5. Support per SilkTork (moved from option 4). And after reading the FAQ, it seems apparent that this is a one-man operation. Combined with the possibility of ads in the future, and the evidence we have on his ethics (which make me doubt this will even be a viable ad-free service for as long as promised), we should clean this up while it's still somewhat manageable and wait to see what archive.is becomes before allowing our articles to become reliant on it (as otherwise we could end up with an even tougher problem to deal with). equazcion (talk) 09:48, 22 Sep 2013 (UTC)
    According to a response on the website there are two people running the operation. So it was either the owner who has been inappropriately using Wikipedia or the owner's partner. Either way, not a good show. SilkTork ✔Tea time 09:58, 22 September 2013 (UTC)[reply]
  6. Support for the time being. Although I've encouraged Rotlink to follow procedure at every step and promptly addressed his BRFA, it is clear that the likelihood of an ulterior motive is very high, as he has circumvented our processes at every step once he realized they will take time. And there is absolutely no reason to be in such a hurry to add massive numbers of links to one's website unless one really wants to drive traffic to their website. I know this is speculation, and I'd like to be proven otherwise. But until this is a 1-man operation, has no financial safety proof, doesn't follow robots.txt, the owner is this impatient to add links and doesn't respond, uses anonymous proxies to further add links, and there are no guarantees that the website doesn't suddenly start serving ads, I cannot endorse this archival service. Per SilkTork, the ethics and stability are too uncertain. This service first has to prove it is well-meant, reliable, and open -- two of which are already under significant doubt. For example, we have Webcite as a perfect alternative and every link rightfully archived at archive.is could have been archived at Webcite. —  HELLKNOWZ  ▎TALK 13:58, 22 September 2013 (UTC)[reply]
    If you went to the comparision with WebCite, there are more points.
    • Sorry Hellknowz, it's blatantly unfair and false to imply that archive.is two of unreliable, not well-meant, or not open, without being specific. The facts are that:
    1) archive.is is quite reliable,
    2) there's no requirement for our archive sites to be "open", whatever you mean by that (neither archive.org nor webcitation is "open" by any definition), and
    3) archive.is itself is quite obviously well-meant, in spite of your dislike for the recent actions of some person who may be Rotlink.
    You've also fallen into the misunderstanding that an archive service has to honor robots.txt for single-page archiving. Archive.org chooses to honor it because it's a web crawler; this is fine per se, but the toxic side effect of this is that failed web sites can be hidden forever by new, unrelated owners of the old domain. This causes a total loss of archived sites, and is totally undesirable. We actually don't want to rely solely on robots.txt-honoring archives, because they pose a definite unstoppable risk to verifiable content we rely on. Archive.is is decidedly not a webcrawler, it's a Wikipedia crawler which then archives a single page from the web. Remember, robots.txt was all about crawlers, not archivers in general. Lastly, Webcitation is nowhere near a "perfect alternative" - it archives pages poorly, has had multi-day outages, doesn't archive whole-page images, does archive javascript which may be mal-javascript, and is threatening to stop archiving new content at any moment now; that's hardly "perfect" by anyone's definition. --Lexein (talk) 05:43, 29 September 2013 (UTC)[reply]
    To be fair, you have as much right saying the service is reliable as me saying it isn't, there is no reason to bring in accusations. I could as well say, "facts" are, 1) archive.is is quite unreliable 2) there is no support for archive sites to be "closed" and 3) archive.is intentions are in doubt despite my huge initial good faith towards the owner. Obviously, we disagree and it's our opinion. I've read all your comments here and elsewhere and if I change my opinion, I will amend my statement. I did very intentionally say "Support for the time being". —  HELLKNOWZ  ▎TALK 09:16, 29 September 2013 (UTC)[reply]
    You mean "Support for the time being", no? -- Ohc ¡digame!¿que pasa? 03:23, 30 September 2013 (UTC)[reply]
    Whoops, yes I did. —  HELLKNOWZ  ▎TALK 08:22, 1 October 2013 (UTC)[reply]
    • Hellknowz, you're quite wrong about opinion: you'll need to support your claim of unreliability with evidence. 1. When has archive.is ever been down, or wrong about an archived page? Its uptime percentage so far is better than both alternatives. 2. Without indicating what you mean about "open", you're spouting verbiage which is not supportable in fact. I reiterate, there's literally nothing in essay, guideline, or policy, or Talk pages that our archives must be "open", and to that point neither of the alternatives is "open" about finances, hardware, software, or personnel. In fact, archive.is is more "open" about its technology than the others. Here at WP, that which is not forbidden is permitted, because Wikipedia runs that way, not the other. So what "open", exactly, are you talking about? 3. Archive.is intentions re ads are irrelevant: we already deliberately link to ad-supported resources like many RS such as Huffington Post, newspapers, and others including Rotten Tomatoes and Metacritic. Be fair when slinging around words like "intentions": archive.is is not serving ads, and there's no certainty that it will. Seriously. --Lexein (talk) 19:25, 30 September 2013 (UTC)[reply]
  7. Oppose. I think, the argument of supporters are very emotional. They appeal to ethics and try to predict the future. My vision of the future is:
    • The revertion will be mass scale vandalism.
    • The editors will scream like User:Lexein. Most of them do not read ANI and RFC and do not take part in this discussion. But they get notified about the changes in their articles.
    • Many sources available only on archive.is (see ANI discussion for examples). The editors will have to circumvent the ban of the domain. Do you know how do they do it for currently banned domains?
    • Assuming that the bot was seeking for traffic, it can also circumvent the domain ban using Google Cache or WebCite. Both keep JavaScript on archived pages and the script can redirect trafic anywhere. I would say, it is even easier to steal traffic this way. 193.86.243.17 (talk) 07:49, 23 September 2013 (UTC)[reply]
    I'm confused: are you saying that users would be right (or wrong) to oppose (like me) the punitive mass reversion of archive.is links? And how is anything I wrote "screaming"? Uncool. And I do read both ANI and RFC, so, wth? --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
    It is not right nor wrong. I think a lot of editors who think like you are not interested in reading ANI and RFC. A lot of editors (actually, admins) who think like Kww do. If #3 would win the consensus, it would be only as the result of the bias. The editors who think like you will get know about the desicion when their arcticles will get touched by the reversion edits. They will argue and ask wtf. The admins will answer "There was RFC and if you did not read it, it is your problem. Now we have a solution by consensus and only have to reify it. It is too late to discuss". 88.15.83.61 (talk) 16:02, 24 September 2013 (UTC)[reply]
    Sorry, if word "screaming" offended you. I peeked it from ANI topic with no thinking how rude it sounds. Sorry again. 88.15.83.61 (talk) 16:02, 24 September 2013 (UTC)[reply]
    "Vandalism" is a deliberate attempt to compromise Wikipedia. A mass revert in good faith is not vandalism, since the aim is to improve Wikipedia, regardless of whether the result does so.
    Circumventing Wikipedia policy is pointy as well as being against policy. If some editors do that, we can deal with it as it becomes a problem, but I don't think we should make up hypothetical problems to stop us doing things that are a good idea.
    me_and 16:51, 23 September 2013 (UTC)[reply]
    I meant the following case (193.86.243.17 and me is the same person, first IP is airport wifi): I would like to edit an article, to fix typo or something minor. But when I try to save the page it is not possible, becase the page has a link to a banned domain (my edit has nothing to do with the link). Maybe, you have admin rights and never hit this case, but it is very common. The editor has the choice: not to save his edit, to remove the link, to link it via bit.ly... oops, it is banned as well... then to link it via WebCite. A lot of links to WebCite, archive.is and Google Cache are there not because the original links are dead, but because they are banned. 88.15.83.61 (talk) 20:05, 23 September 2013 (UTC)[reply]
  8. Oppose I support verifiability and protection against link rot. I oppose the assumption of bad faith against the operator(s) of archive.is by a large number of editors and administrators here. The operator(s) of archive.is have stated that advertising is very unlikely, because its operation is cheap, and funded by income from other projects. The quality of archive.is content is high, in general better than both archive.org and webcitation.org. Up until the alleged bot operations, archive.is was only an asset to Wikipedia. Its archive is still an asset. Wikipedia's Five Pillars call for building an encyclopedia with verifiable content, based on reliable sources. IMHO archive.is contributes to that, and every link to archive.is should be maintained, as long as the archive link doesn't remove the original broken link. --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
    Followup: this Archive.is blog entry shows no enthusiasm for ads: "I do not think it is correct (or even legal) to put ads on the pages created by others." and mentions other funding options, such as subscriptions for extra features. --Lexein (talk) 10:57, 3 October 2013 (UTC)[reply]
  9. Support Rotlink (and subsequently Archive.is and RotlinkBot) have burned a great amount of good faith from within the community. Rotlink was caught running an unauthorized bot and did some effort in trying to get it approved. Rotlink withdrew the request for approval on the bot task. When it was discovered that a great many IP addresses were adding archive.is links in the same way that RotlinkBot was, there was cause for blocking on the grounds for suspicion that the bot had been distributed to a wide collection of sites (possibly mirrors for Archive.is?) and started up. That no explanation has been forthcoming is indicative (in my mind) that Rotlink knows they were caught in the cookie jar, and are trying to weasel their way out of accepting responsibility. Rotlink has shown a interest in furthering their nascent archiving service over the expressed viewpoint of wikipedia. Therefor it is incumbent on wikipedia to divest itself of this Archiving service untill it becomes a standard accepted elsewhere and we recieve an accounting of Rotlink's actions and how they will resolve disputes such as this in the future. Hasteur (talk) 13:29, 24 September 2013 (UTC)[reply]
  10. Strong Oppose as a stupid overreaction that damages the hard work numerous editors, including me, will have done. Anyone who has ever bothered to look for archives should know just how hard it can be to find archives of some links. And the questions over the future of archive.is are irrelevant; we're not about to advocate the removal of WebCite links, yet that has a MUCH less clear future. Lukeno94 (tell Luke off here) 15:34, 24 September 2013 (UTC)[reply]
  11. Support Remove all archive.is additions and links, irrespective of who or what added them. Remove all references to archive.is. Apply a scorched earth policy to make it absolutely clear to everyone, that setting up an archiving service, archiving hundreds of thousands of URLs mentioned in Wikipedia references and then adding those archive details to the references will not be tolerated. Tens of millions of Wikipedia references do not link to an archive copy. Increasing that figure by a few hundred thousand makes no material difference to the overall number. Additionally, WebCite seems to be in financial trouble. They could easily add adverts to fund their site at any time. Wikipedia should remove all WebCite archive links long before this happens. Apply the same scorched earth policy and teach these people a lesson. Make it clear that setting up an archive service, archiving hundreds of thousands of pages and then expecting hundreds of thousands of free links from Wikipedia is always going to be doomed to failure becaue the project rejects all such offers of 'help'. Fundamentally, there's more to this. There's no need to ever link to an archive copy of anything. Most material currently being archived will be of no interest to anyone in a hundred years time. Truly interesting things stick around. By archiving hundreds of thousands of Wikipedia references, the "natural selection process" is being usurped with huge quantities of trivia being preserved that should not be. - 91.84.105.112 (talk) 16:08, 24 September 2013 (UTC)[reply]
    91IP Please assume good faith on the actions of others (as I assume you'd want good faith assumed on yours). We are not supposed to enable/reward blocked editors ever. Hasteur (talk) 16:43, 24 September 2013 (UTC)[reply]
    (NB:91IP is not me, my vote is above). It is questionable which one of the solutions can be called "reward blocked editor". I would say it is #3, as its consequences will draw big attention to the archiving problem in general and to archive.is in particular. Ill fame is also promoutional. 88.15.83.61 (talk) 17:40, 24 September 2013 (UTC)[reply]
    I'm strongly concerned that the 91IP is simply here just to make a point. Lukeno94 (tell Luke off here) 18:19, 24 September 2013 (UTC)[reply]
    Oppose. Waste of time and even double waste of time considering that editors in good standing added some of those links. Someone not using his real name (talk) 05:21, 28 September 2013 (UTC)[reply]
  12. Support removal of all archive.is links now. After the repeated insertions of links by botnet(s), I have little faith in the ethics of the site owners. They could well turn their site into a malware dissemination tool. There is other corroborating evidence for low ethics like their choice of data storage ISPs and lack of respect for robots.txt (and thus the content/copyright owners' desires). Someone not using his real name (talk) 02:55, 5 October 2013 (UTC)[reply]
  13. Strong support but no opinion yet on blacklisting Archive.is – I can confirm that I have issued a DMCA against Archive.is for inappropriately copying a page off my roleplay website (source) and disrespecting the robots.txt protocol, in which all robot visitors need to respect. I think we are dealing with automated plagiarism from top to bottom, and not those who merely use it do something creative with it. --Marianian(talk) 02:24, 29 September 2013 (UTC)[reply]
    You are incorrect about robots.txt being required for all robot visitors. It is primarily intended for crawlers, which archive.is is most definitely not. It is quite incorrect to call it plagiarism, because archive.is is not using your creation for any purpose other than you intended, and it provides full attribution and credit to you and your website; plagiarists never do. That said, you are very welcome to issue DMCA takedowns for any of your pages you wish, but you should target the correct site. Sadly, you asked Google to do something about Archive.is; you need to a) click "report abuse" on the archive.is page, and b) file a DMCA notice to Archive.is itself for the 3 pages archived by it. --Lexein (talk) 05:43, 29 September 2013 (UTC)[reply]
    By the way, this doesn't look like a copyright violation on archive.is. The page is available under CC-BY-NC. The website archive.is is non-commercial, links to the text of CC-BY-NC and links to the list of contributors. This seems to be licence-compliant use. --Stefan2 (talk) 20:24, 3 October 2013 (UTC)[reply]
  14. Strong Oppose - This is a completely ridiculous notion. If archive.is is violating the law then the site will either be forced to comply with the law or be shut down and previously dead links will unfortunately be dead again. To say that we should prefer a more established archival site (archive.org) is perfectly reasonable. I see no problem with doing a bot-check to compare links and replacing archive.is links with archive.org links where possible, but I can't begin to understand why this suggestion is even on the table. If the bot is actually harmful, continue to disallow the use of the bot -- I would have to do more research on that subject to give an opinion -- but until someone has actual evidence that the pages are being modified after archival then the choice between good links and bad links is a trivial one. Qalnor (talk) 19:14, 29 September 2013 (UTC)[reply]
    Let me see if I understand your logic, Qalnor : if we have good reason to believe that the site owner used illegal techniques to insert links, we should allow those links to be spread throughout Wikipedia until he's convicted? Defending ourselves in advance is "completely ridiculous"?—Kww(talk) 20:27, 29 September 2013 (UTC)[reply]
    Point 13: "At this point, User:Kww has no firm proof of illegal activity, although he remains of the opinion that this is likely." How in the world this point even merits inclusion is a mystery to me since 'he used proxies in Albania therefore he is a terrorist' is logic fit only for the highest levels of government but the fact of the matter is that this RFC does not posit illegality. When and if it does I will feel the need to give the matter further thought and either say more than I have or change my vote; until that point, however, if your only refutation of my statement is 'b-b-b-but the law' then you will have to forgive me if I do neither. Qalnor (talk) 23:18, 29 September 2013 (UTC)[reply]
    You should take note that all people that are technically competent to evaluate the pattern of IPs that made these edits have come to the conclusion that illegal proxy use is a likely explanation. It's not "b-b-b-but the law" as you so dismissively put it, it's more like "why should we take the risk of having tens of thousands of links to a site run by someone that hacks into people's computers for his own ends?" Seems to me like there's a strong risk that he would feel free to hack into the computer of anyone that goes to his site if it suited his purposes.—Kww(talk) 23:27, 29 September 2013 (UTC)[reply]
    "You should take note that all people that are technically competent to evaluate the pattern of IPs that made these edits have come to the conclusion that illegal proxy use is a likely explanation." This phrase is incredibly slippery; it's high on innuendo and low on content. If you didn't intend for it to be so please provide whatever data you have if you think it's influential. Qalnor (talk) 23:49, 29 September 2013 (UTC)[reply]
    Not slippery at all. I've only seen two kinds of analysis: those by people that say they aren't competent to evaluate a pattern of IP addresses, and those that say the pattern of IP addresses looks suspicious. Can you show me a third? When an editor makes a comment like "Considering I know absolutely nothing about how VPNs and proxies work, I don't know what is legitimate and what isn't", I think it's pretty safe to not pay much attention to how he judges the level of risk. I invite you to examine the large list of IP addresses above and come up with a legal explanation that doesn't strain credibility.—Kww(talk) 00:13, 30 September 2013 (UTC)[reply]
    Not slippery at all eh? I'll break it down for you: First, in reference to: 'all people that are technically competent to evaluate the pattern of IPs' you haven't defined who these people are, what their credentials of technical competence are or cited the alleged analysis. Second, in reference to: 'have come to the conclusion that illegal proxy use is a likely explanation' is precisely equivalent to saying 'experts agree that a likely explanation of crop circles are ships flown by little green men'; even once we get past the question of the credentials of their expertise, claiming that something is a 'likely explanation' is completely empty of meaning because it isn't the same as saying 'most likely explanation'.
    So take those two sentences and combine them one possible meaning is: 'I, Kww, analyzed the data and I know what proxies are so I know what I'm talking about, and my conclusion is this: I think he done it.' Now, I didn't originally phrase it that way because I'm trying to be polite here; instead I asked you to simply clarify what I identified as a potentially slippery statement. If there is some analysis here that I'm missing by someone with network security credentials then I will either refute that evidence or accept that things have moved beyond the level of you insisting illegality without evidence. But until that happens I have no interest in engaging you in a debate on an assumption which I don't personally consider to be highly likely (and while I don't claim to be a network security expert I am familiar with the relevant issues including how proxies work). Qalnor (talk) 00:42, 30 September 2013 (UTC)[reply]
    And I'll be polite in return: if you don't care to analyze the data, your opinion on the data has no value. My statement may be "slippery", but it's at least well founded. Yours is the equivalent of "I'm going to say this idea is stupid without looking at any of the data or providing any analysis." There's no reason to pay any attention to statements like that.—Kww(talk) 00:54, 30 September 2013 (UTC)[reply]
    I am not the one making positive claims, you are; if you have any evidence backing your claim that the actions were actually illegal then please present it. I don't have anything else to say to you until you either: 1. Provide actual evidence which brings the question of legality beyond mere speculation as characterized in the RFC in Point of Consideration #13: "At this point, User:Kww has no firm proof of illegal activity, although he remains of the opinion that this is likely." or 2. You make an argument against my comments not contingent upon the assumption of illegality. Qalnor (talk) 02:00, 30 September 2013 (UTC)[reply]
    The list of IP addresses is sufficient to anyone that cares to analyze them before issuing an opinion.—Kww(talk) 02:19, 30 September 2013 (UTC)[reply]
    Yes, I've looked over the IP list. It could be that he's acquired access to those IPs illegally, that is one possibility. Other possibilities include things like: 1. TOR, 2. The proxies may have been sniffed by a third party and placed on a list of proxies onto the internet which he then used or 3. He may have legitimate access to accounts on various servers throughout the world due to business contacts. Each of which is more likely than your assumption. So like I said, actual evidence please. Qalnor (talk) 02:32, 30 September 2013 (UTC)[reply]
    None of the IPs fail TOR check, so that's out. Many of the IPs are residential DSL modems, so "legitimate business contacts" as out. Using a third-party proxy list that contains compromised machines is still illegal use of a compromised machine. Got any other more alternatives besides the illegal ones?—Kww(talk) 04:14, 30 September 2013 (UTC)[reply]
    I'm willing to eliminate TOR on the basis that you've given because I'm not, offhand, aware of the turnaround time on IPs and how likely it is that someone would that long of a list of nodes that was only temporary. If you actually manage to eliminate all other objections I may have to go back and look into how likely that is, but for the moment my gut reaction to what you say is that TOR is unlikely. That having been said, your claim that residential DSL modems = 'not legitimate business contacts' is ridiculous. I use a residential service for my home internet service, but I also rent a box. Now, I happen to rent a box from a rather large business, so I would be reluctant to do what I'm about to say, but if I rented a box from a smaller business and the owner of that business asked me if he could set up a secured proxy on my local linux box as a personal favor, I would probably do it, and if he did so in exchange for knocking off a couple bucks I owe him I'd do it in a twinkling. Concisely put, my point here is that there is absolutely unlikely about him having a business contact in a residential area that would provide him with a shell login or just set up a proxy server for him. Next, sniffing is not an illegal compromise, and I've never seen a proxylist site which had login information for the proxies so we're not mixing cracking in. I'm just talking about him finding a list of unsecured proxies online, and those I do know go down pretty quickly so even if they aren't currently configured this way I still think it's more likely than your botnet theory.
    Finally, you asked if I had any other alternatives other than illegal ones. Well, you haven't actually refuted the small list I've given, but I've thought of a fun one I'm sure you'll appreciate. Let's imagine there's a person named Kww, and let's imagine that that person works for archive.org in their IT department and the proprietors at that establishment are aware that he has an admin account on Wikipedia. They ask Kww, for the good of archive.org to do his best to use everything at his disposal to destroy archive.is. In this hypothetical scenario Kww then acquires a bunch of proxies (legally or illegally, IDK, but you seem pretty sure they're illegal so I guess they're illegal) and makes it look like the bot was trying to continue it's evil work of improving Wikipedia after it was stopped from doing so semi-legitimately (probably also by Kww but I haven't followed up that lead yet). Now, on the face of it this scenario is pretty ridiculous and unlikely. It's vaguely plausible, but doesn't constitute evidence by any measure except your own. So probably in addition to getting rid of archive.is we should get rid of Kww and archive.org, just to be safe? (Disclaimer: everything I just said was not an actual accusation or suggestion, just an analogy. I do not believe it, only find it to be as likely as the scenario Kww has constructed) Qalnor (talk) 10:28, 30 September 2013 (UTC)[reply]
  15. Oppose I share and can understand the concern about ethics, but I don't much like this knee-jerk reaction. We should avoid throwing out 'good' changes with the 'bad'. While the bot is unapproved, not all of its work is bad, as it seems to be replacing dead google cached pages with pages from the WayBack Machine, as well as with links to archive.is although I have not verified the appropriateness of the archived contents. I actually think it would be useful to have a bot that archives in this fashion, subject to adherence to procedures and safeguards. Another issue is the possible disappearance of webcitation.org at the end of the year. Concerned about this, and bearing in mind there are not many sites that allow the public to pre-emptively archive web pages for possible citation, I had already started using archive.is, and had hoped to be able to continue using it. Having said that, seeing them deliberately circumventing our bot safeguards gives me cold comfort as to how they will operate in the future. However, none of the edits are vandalism that call for being reversed. Wholescale removal of all the archive links inserted/updated based on some speculation or expectation of future (unsubstantiated) ethical violation seems to be a rather pointy thing to do. -- Ohc ¡digame!¿que pasa? 02:45, 30 September 2013 (UTC)[reply]
  16. Support. This is disgusting self-promotion, which needs to be discouraged completely, even the the website appears to be fine at the moment. I was actually teetering on the edge of voting to keep the links until I realized how many of the IPs on this page are Rotlink himself evading this block. Someguy1221 (talk) 07:13, 2 October 2013 (UTC)[reply]
  17. Support - and indeed I have been reverting and blocking wherever I have come across the abuse in this way. GiantSnowman 15:37, 2 October 2013 (UTC)[reply]
  18. Support after reading User:Kww's rationale. We don't know what the purpose of the links is, and so we can't be sure that they aren't being used as a Trojan horse. If the nature and details of the archive are better known, I may change this to option 1. Robert McClenon (talk) 16:01, 2 October 2013 (UTC)[reply]
  19. Comment - We don't really even know where the archive is. The domain is registered in Iceland, but its registered address is in Prague, Czech Republic, and one of its nameservers is in Lichtenstein. Robert McClenon (talk) 16:17, 2 October 2013 (UTC)[reply]
  20. Support with blacklist: The operator of archive.is appears to be acting in extreme bad faith in how he is inserting these links; consequently, I don't feel we can trust the site's contents in the future. --Carnildo (talk) 23:21, 3 October 2013 (UTC)[reply]
  21. Support - Using zombies to link to this site makes me leery of what could be added to the site in the future. If he's willing to do this, he might add exploits to the site in the future. With this many links incoming from Wikipedia, a fair number of people could have their machines compromised with zero-day exploits. @Kww, I'd notify the WMF and the stewards of this, since it'll probably need a global blacklisting. Reaper Eternal (talk) 00:21, 4 October 2013 (UTC)[reply]
    Reaper Eternal, please actually read the proposal before blindly commenting. This particular subsection is generic; not just Archive.is, so you've just supported the removal of all archiving sites by accident. Lukeno94 (tell Luke off here) 16:43, 4 October 2013 (UTC)[reply]
    If you'd bothered to ask me, rather than immediately assuming I don't read what I support, you would have found that I read section #3, clicked edit, scrolled down to the end of the voting section, and added my support. As it turns out, the proposal below it is actually a subsection of the proposal I meant to support, meaning that my support was accidentally added to the wrong section. Reaper Eternal (talk) 00:58, 5 October 2013 (UTC)[reply]
  22. Support The security issues involved are far too plausible to leave these links in Wikipedia without strong evidence that the archive is run by an organization capable of maintaining the archive, and with accountability for any malware or problematic content that may appear later. Big money can be made from infecting computers used to browse the Internet, and someone running a bot operating from multiple IPs demonstrates high motivation and a low regard for ethics. Johnuniq (talk) 00:37, 5 October 2013 (UTC)[reply]
  23. Support - The intransigence & evasiveness of the site operator and his willingness to use illegal (probably) botnets to propagate links to this new archiving service do not bode well for the future. 78.105.23.161 (talk) 03:05, 6 October 2013 (UTC)[reply]
  24. Overwhelmingly Support Removal I was asked by 'bot to comment on this, and a review of the application of this Archive.is tends to show that the "alternative" to The Wayback Machine should not be trusted and as such should not be linked to. The Wayback Machine currently solves all such link rot issues that we encounter, there is no need for any "backup" to The Wayback Machine. But even if there were such a need, a suitable University-backed alternative archival array of servers would be developed. As it is, Archive.is was implemented without any observable community input, it's "just somebody's computer" in a very real sense, and it should not be used. Damotclese (talk) 17:54, 7 October 2013 (UTC)[reply]
  25. Strong Oppose - Rot link is a real problem. I think there is a lot of bad faith assumptions about archive.is, in part based on the idea that is 'new' to the scene. archive.is has been a prominent option for some time in the debates about link rot, which have been re-occurring for years with the general consensus being yes we need to do something - but no one ever agrees on what exactly to do, with the discussions being peppered with that most Wikipedian of comment - well if you have this complaint why don't you do something about it. It sounds like that is what RotlinkBot did. I also am opposed to the idea that using bots is an essentially bad idea that must be regulated pessimistically. This is just people using the skills and resources at their disposal, some of us have access to research databases, some of us have access to technical skills to make bots, and some of us have persistence and will to stay long in debates and through that shape Wikipedia policy - that I do not have, but these are my two cents, take it as you will Jztinfinity (talk) 16:56, 12 October 2013 (UTC)[reply]
  26. Support - The army of proxies or botnet IPs to evade the block suggests that there may be something more nefarious at work here. That is not something that the operator of a legitimate website does, at all. Mr.Z-man 18:23, 12 October 2013 (UTC)[reply]
[edit]

It is not traditionally the business of an encyclopaedia to help readers to obtain out-of-print references, or their modern equivalents, as far as I know. Hypothetically, various third-party apps and third-party websites could choose to shoulder the legal risks, if any, of presenting modified versions of Wikipedia articles with archive links added. Wikipedia's content licensing allows this. Editors would be free from legal risks and would have more free time to add actual content to the encyclopaedia.--greenrd (talk) 19:39, 23 September 2013 (UTC)[reply]

  1. Support as proposer primarily on the grounds of freeing up editor time. Automation is good - even if someone else is doing it.--greenrd (talk) 19:39, 23 September 2013 (UTC)[reply]
  2. Comment. A lot of links to archive.is, archive.org, WebCite and Google Cache are not dead. They are from domains banned in Wikipedia. That's why the editors had to use archiving service. Reverting such links means using banned domains.This will prevent the articles from further edits. 88.15.83.61 (talk) 19:48, 23 September 2013 (UTC)[reply]
    When hosts which formerly hosted RS content die, frequently their domain-name-squatted replacements host malware; this is a good reason for them to become blacklisted. Also correctly blacklisted are notoriously unreliable sources which host only user-generated or copyright-violation content. But for the first, I will link to an archive of a URL, from a time when the content was valid. I'm stating this only to point out that archives are not typically used to maliciously bypass the blacklist, it's to link to an archive of an actual reliable source. When I find spot archive links to bad sources (blacklisted), I mark them as {{dubious}} or remove them entirely and tag the claim {{citation needed}}. --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
  3. Oppose. Dead links are anathema. Verifiability is important. Wikipedia exists in an internet/web world: citing a source which is reachable by a URL, then letting that URL go dead with no archive of it, is wrong at several levels. Reading WP:LINKROT will help here. --Lexein (talk) 13:07, 24 September 2013 (UTC)[reply]
    So is circumventing the community consensus. Want to guess which one is more of an anathema? Hasteur (talk) 13:36, 24 September 2013 (UTC)[reply]
    In case anybody missed it, that's highly toxic torch-mob groupthink, coupled with missing the point by a wide margin, and capping it all off by misusing a word in two different ways simultaneously. --Lexein (talk) 09:44, 19 October 2013 (UTC)[reply]
  4. Strong Oppose - What on earth are you on about Greenrd? This doesn't free up editor's time at all; in fact, it wastes it by ruining hard work, and will cause multiple GAs and FAs to fail various bits of their criteria. Utterly stupid idea. Lukeno94 (tell Luke off here) 15:32, 24 September 2013 (UTC)[reply]
    UK law provides for the unauthorised copying of orphaned works, but (a) an onerous search for the rightsholder is (rightly in my view) required, (b) by definition web pages are not orphaned works when they are archived by archive.is because they still exist and (c) in any case this does not help English Wikipedia, which must follow the laws of the United States. Therefore I believe to do it (which is what archive.is lets people do, unlike archive.org which just does it itself) is onerous one way or another, and it should be left to someone else. At least archive.org makes its own decisions about what to copy, so an editor cannot be accused of initiating the unauthorised reproduction of a work when linking to it.--greenrd (talk) 18:51, 27 September 2013 (UTC)[reply]
    I think, you can compare it with any photo hosting or code hosting like BitBucket. The user can enter URL of repository anywhere in the Internet and Bitbucket will download and publish it without guessing how the content in the repo is licensed. Until the copyright holder claims his rights.
    Or should we also ban bitbucket.org, github.com, sourceforge.net, flickr.com, facebook.com, ... because it is possible that some content hosted there is unauthorisely reproducted from another website ? 79.47.98.149 (talk) 19:20, 27 September 2013 (UTC)[reply]
    The typical page on any of those websites is not a blatant copyright violation, unlike the situation with these bot-added archive.is links (or are we seriously expected to believe that all the copyright holders of those web pages just happened to archive them themselves?!)--greenrd (talk) 15:45, 28 September 2013 (UTC)[reply]
  5. Oppose - aside from everything else the phrasing of this proposal is atrocious and seems to imply that dead links remaining dead vs. using established and proven archiving sources like archive.org is preferred, which is quite silly. - The Bushranger One ping only 23:59, 27 September 2013 (UTC)[reply]
    The phrasing of the RFC itself was unnecessarily obfuscatory (probably in order to cover up the fact of illegal activities going on) and I could not even determine whether these were only dead links or not. So this entire thing got off to a bad start. We need a more formal RFC process for things like this so a badly-worded RFC cannot be allowed.--greenrd (talk) 15:45, 28 September 2013 (UTC)[reply]
  6. Oppose. Waste of time and even double waste of time considering that editors in good standing added some of those links. See comments by User:Tom Morris below for example. Someone not using his real name (talk) 05:18, 28 September 2013 (UTC)[reply]
    The fact that editors otherwise in good standing have added those links does not mean that copying someone else's work and then adding a link to your copy is not morally and legally problematic. (As I have said before, I don't have a problem with linking to archive.org for dead links, because then the editor is not initiating the copy.) We have not yet seen ads on this copied content, but that is surely only a matter of time.--greenrd (talk) 15:55, 28 September 2013 (UTC)[reply]
    But neither is the content author initiating the copy for archive.org. It's not less legally problematic to link to WP:LINKVIOs regardless who made the copy unless it's an authorized representative of the copyright owner. All internet archives are operating on rather shaky legal grounds. At least archive.org honors requests for removals. Someone not using his real name (talk) 03:01, 5 October 2013 (UTC)[reply]
  7. Strong Oppose No comment is really even needed for this one, this is exactly the same as 3 except it is basically making the suggestion that we just give up on sources entirely. Qalnor (talk) 19:49, 29 September 2013 (UTC)[reply]
  8. Support I agree that having dead links in the articles are Ok. For verifiability it is enough to have the links alive at the time of citation. As the time passes, no one need these links. Wikipedia has 30,000+ links to http://findarticles.com/. They are dead since Summer 2012. Is there any campaign to remove the links? Or to replace them with links to archive.org? No. No one cares. No one ever noticed they are dead. So, the dead link are Ok. 89.110.200.127 (talk) 17:13, 18 October 2013 (UTC)[reply]
    • This is factually incorrect: plenty of people care. There's Category:Articles needing link rot cleanup which many of us visit to try to repair on a regular basis. Also, lots of those findarticles.com links were already captured by the Wayback Machine, or by Webcite by editors like me. Wherever the titles of the articles were correctly added, many times original sources have been found to supplant the Findarticles links. --Lexein (talk) 09:44, 19 October 2013 (UTC)[reply]
  9. Support I can't see why people think that archives should be exempt from WP:LINKVIO. --Stefan2 (talk) 17:44, 18 October 2013 (UTC)[reply]
    • Who "thinks" such a thing? Nobody explicitly here, so far. Worse, you seem to think, erroneously, that links to archives are links to copyright violations. Who told you to think that? They aren't, obviously, so say the courts. Those who disagree have very easy and legal recourse, via the oh-so-easy-to-abuse DMCA. Yay! --Lexein (talk) 09:44, 19 October 2013 (UTC)[reply]
[edit]

We should replace links to archive.is that were added by the bot, where possible. Where no replacement is available, the links should be left in place. Links added by human editors should be left in place as well.

The circumstances surrounding these links leave me uneasy about leaving them alone (a startup trying to establish itself by automatically spreading its links across Wikipedia, use of proxies, unapproved bot by unresponsive entrepreneur). However I'm wary of cutting off our nose to spite our face -- if they have the only viable links to the content we need for a substantial number of references, leave the links alone in those cases. But in situations where there is a replacement available at a different, reputable service, those links should be switched over. Links added by people should also be left alone, however, as editors should be allowed to link to whichever service they want. The pervasiveness of the bot-added links establish a possible artificial trust among editors who see them that I think warrants undoing. equazcion (talk) 16:03, 21 Sep 2013 (UTC) (moved to complete removal)
  1. This is a sensible option; the easiest way (albeit not a foolproof way) is to simply nuke those added by IPs. Lukeno94 (tell Luke off here) 18:47, 21 September 2013 (UTC)[reply]
  2. Support, as long as archive.is remains ad-free and there remains no evidence the archive are not faithful renditions of the original sites. NE Ent
  3. Support: I don't see enough evidence to support reverting legitimate editors' work, but I also believe that we should stop unauthorized bots from being able to edit Wikipedia even when their edits are ostensibly positive. —me_and 09:03, 23 September 2013 (UTC)[reply]
  4. Support, with the caveat that it should be limited to articles that have been edited by the unapproved bot and its IPs, and that it not be blacklisted, due to it being reasonable for editors to add archive links, including this one. The only problem is the unapproved bot and its sockpuppets. VanIsaacWS Vexcontribs 00:19, 27 September 2013 (UTC)[reply]
  5. Comment. I thought the (sneaky) "bot" edits were already rolled back in mass? (Is Wikipedia's captcha that easy to defeat by program, by the way? Adding external links as an IP triggers it.) Someone not using his real name (talk) 05:42, 28 September 2013 (UTC)[reply]
    The very existence of the captcha tells me this wasn't in any way a pure botnet - it was at worst a human-assisted "joe job". --Lexein (talk) 01:46, 30 October 2013 (UTC)[reply]
    Sadly enough, the captcha isn't applied to bots using the API: it only serves to defeat spammers that are sufficiently stupid that they spam by hand or with browser simulators.—Kww(talk) 02:44, 30 October 2013 (UTC)[reply]
    So we're all here screaming over deleting tens of thousands of links to an active archive resource, and choosing to cause those refs to go to dead links, and blacklisting without definitive hard evidence an active archive resource only because at WP (or WMF), nobody thought to (or worse, elected not to) institute API auth keys, or IP-locked API auth keys, or IP-range-locked API auth keys. Talk about burying the lead, Kww. Did you really just not make clear for months the fact that these were API edits, not web edits? All this drama, for nothing. All this damage, because somebody couldn't be bothered to implement proper API access authentication. I do not believe that "the encyclopedia that anyone can edit" should mean "the encyclopedia that any random anonymous script can edit through API access". But I guess I'm the only one. --Lexein (talk) 06:04, 30 October 2013 (UTC) (I've struck above, after reading all the responses. See my longer reply below. --Lexein (talk) 21:26, 30 October 2013 (UTC))[reply]
    I'm a little confused about why that's relevant to the discussion here. Whether or not the policy was breached seems unrelated to how we police against such breaches. Someone committing robbery is still committing robbery even if I forgot to lock my front door that morning. (Not to say, of course, that WP/WMF shouldn't take this as the kick up the backside to implement API authorization.) —me_and 09:57, 30 October 2013 (UTC)[reply]
    Bot edits are done through the API, and RotlinkBot uses the API. No information was hidden or buried.Anonymous access to the API is, unfortunately, required to make things like Visual Editor work.—Kww(talk) 12:48, 30 October 2013 (UTC)[reply]
    Can you provide your source of information? It is not clear if the captcha is applied to bots. I found only this and it looks like the users of MediaWiki API are not exempted from captcha, though WikiPedia may have different setup, intentionally less secure. It would be interesting to read why it is so. 188.162.65.32 (talk) 12:17, 30 October 2013 (UTC)[reply]
    I'll dig around and find a source. My understanding is that the catpcha for anonymous API edits is not enabled on most Wikimedia sites.—Kww(talk) 12:48, 30 October 2013 (UTC)[reply]
    AFAIK, the captcha is applied to the API. However as I noted elsewhere on this page, it is well known that MediaWiki's standard captcha (which Wikimedia still uses) is very weak and is easily beaten by some black hat spam/SEO tools like XRumer. And there are ways of having humans do it while still using an automated program (3rd world captcha farms). The fact that they would have needed to enter a captcha is not evidence that it wasn't done automatically. Nor would that really be relevant. Doing it semi-automatically would not preclude the use of a botnet. Just because it has "bot" in the name doesn't mean everything has to be done automatically. Mr.Z-man 13:33, 30 October 2013 (UTC)[reply]
    The captcha *is* applied to the API -- I tested it this morning. Editing anonymously is possible, but adding a link triggers the same captcha you get with the normal editor. In any case, the whole API/non-API discussion is irrelevant, as you could also edit (semi-)automatically via screen scraping. Valhallasw (talk) 14:33, 30 October 2013 (UTC)[reply]
    Thanks for discussing, everyone. Sorry about my extreme expression. I still think additional tightening of access methods should be done, but I'll list them in my userspace. Feel free to delete, move to Talk, or collapse this thread. (In case anyone ever doubted, I'm against unauthorized bot editing, and want better front-line prevention methods employed; if these were in place, the RFC against Archive.is might never have started.) --Lexein (talk) 21:26, 30 October 2013 (UTC)[reply]
  6. Support, really. Archive.is itself has so far been reliable and accurate. The mob mentality of I don't like archive.is because somebody aggressively bot-edited adding archive.org links, as well as archive.is links seems simply WP:IDONTLIKEIT. If the swarmed bot edits are already gone, I strongly agree with keeping all human added edits which link to Archive.is.
[edit]

Contact Rotlink off-wiki (using email perhaps) and encourage them to follow the community's process for bot approval so the bot can operate within policy.

  1. Support as proposer and first choice.--v/r - TP 21:03, 20 September 2013 (UTC)[reply]
  2. Support in addition to number 3. I don't support a bot for what I believe should be human judgement, but regardless, I think he should be allowed to use the community processes for approval if the community so wishes. ~Charmlet -talk- 21:45, 20 September 2013 (UTC)[reply]
  3. Support a bot which would focus on, within Wikipedia citations, archiving and/or finding alternate URLs for notoriously ephemeral, doomed sources such as Google cache, AP (or better, news sites hosting AP articles), and publications which regularly purge old articles for no obvious reason, such as the various Murdoch media outlets. Such a primary focus would fill a gap not served by any existing bot that I know of. --Lexein (talk) 13:19, 24 September 2013 (UTC)[reply]
    I have boldly contacted Rotlink via "email this user" and posted that message at Wikipedia:Archive.is_RFC/Rotlink_email_attempt. --Lexein (talk) 01:37, 3 October 2013 (UTC)[reply]
    I received a reply, and have replied, and am awaiting his permission to post. --Lexein (talk) 10:30, 4 October 2013 (UTC)[reply]
    I received permission, and the conversation continued, then real life intervened, and now the whole thing is posted, warts and all, refactored for time order. "Denis" is Denis Petrov, as seen in the StatRadar.com Whois summary. I failed to get him to seek bot approval. --Lexein (talk) 04:55, 5 October 2013 (UTC)[reply]
    I had a hard time making sense of that discussion. Did you see anything there which you interpreted as either an explicit confirmation or denial of being responsible for the bot edits?—Kww(talk) 06:05, 5 October 2013 (UTC)[reply]
    • Nope, I don't seem to have gotten him to answer the question, as far as I can tell. He inferred that some indirect SEO work by others may have been responsible for inflicting collateral damage on Wikipedia, but never confirmed or denied involvement, and didn't answer whether he could or would stop or help stop what I'm calling "swarm" edits. (I tried collapsing the following, but it broke !vote numbering.) Instead he said "Isn't already stopped ?" So the edits stopped because the filters worked, so why worry? Gah. Not my point.
    • (Later added this) The worst result (from my viewpoint) is that he seems to agree with #3. Complete removal of archive.is, and even #3a. Allow dead links to remain permanently in Wikipedia., leaving readers to fend for themselves. I hope that was sarcasm, but I fear it's not. This is a decidedly anti-Five Pillars, anti-readers position, IMHO. I'm convinced there is something missing from this story, which would help us all understand this utter change of motivation from build-for-use to provoke-to-destroy, but which would not further impugn Archive.is or Denis. I don't know, and it galls me.
    • IMHO the best results from the conversation were a) the Memento suggestion, b) the insider's view of what problems archivers (all of them) face, period, and c) that, in his mind, even if Archive.is fixed the "bot" issue, other nightmare problems with Wikipedia would likely follow. I can't guarantee to him that he's wrong. I got the sense that he has flatly given up on a relationship with Wikipedia. I might have gotten that wrong, though, by extrapolating on paltry information.
    • One sequence was troubling: "Advocating the persistence of the links really looks very suspicious. Actually, I do not understand why all the discussion is about the links. Talking about whether archive.is is trustable, no one advocates (or no one cares?) creating additional backup. Both party continue trusting archive.is as the only holder of the snapshots of the cited sources. " Well, I don't think insisting on keeping functional links to a functional archive of a now-defunct cited source is suspicious, but maybe that's not what he meant. IMHO he's saying no single vendor should be relied upon. Fair enough. But offline backups (download or zip files) aren't any sort of workaround for Wikipedia or our readers. IMHO we and readers need ongoing online availability of the vendor, and reachability through Wikipedia ref archive links without blacklisting, filtering, or mass reverts; that the issue to me. I clearly didn't get that across. Secondarily, only archive.is has some of the page snapshots I value because archive.org and WebCite literally shoot themselves in the foot by honoring misrepresentative robots.txt (new domain owner screwing over defunct domain owner's content). Vendor trustworthiness about content isn't on the table for me: based on my spot checks, I easily trust Archive.is's snapshots as much as I trust Web.archive.org's and WebCite's. The most urgent matter of the day, the fact that Archive.is's is under suspicion for doing or allowing automated WP edits to the best of our knowledge, failed to make any headway with him. You can't say I didn't try. --Lexein (talk) 08:34, 5 October 2013 (UTC)[reply]
  4. Support. As long as the bot, which seemed to be fine, can be approved to do the edits, does the work in an approved and within-policy fashion, there shouldn't be a problem. I would possibly add that specifying that it should add archive.org links when available in preference to archive.is (or any other archive) would be a good move, but not required. - The Bushranger One ping only 00:01, 28 September 2013 (UTC)[reply]
  5. Oppose. Don't see the point. They burned their bridges. See User talk:RotlinkBot. You wanna go hat in hand to them? By the way, 88.15.83.61, 193.86.243.17, 77.110.134.11, 95.225.130.13, and probably some other IPs commenting in this RfC are obviously the same person(s) operating the proxy farm and perhaps Rotlink... Someone not using his real name (talk) 05:10, 28 September 2013 (UTC)[reply]
    I guess we want them to explain their actions and assure that their service is legitimate and won't end up suddenly serving ads after an ad company makes a deal with them once the site gets linked to extensively by Wikipedia. —  HELLKNOWZ  ▎TALK 18:01, 28 September 2013 (UTC)[reply]
    I don't know what makes people so excited about ads. I'd have no problem linking to an ad-supported archive. What I want is an explanation of how he arranged the IPs he used. If there's no legal explanation, then I don't want to link to him no matter how much he promises to refrain from illegal activity in the future. If there is a legal explanation, I don't care about ads. I'm concerned about viruses and malware, not ads.—Kww(talk) 18:08, 28 September 2013 (UTC)[reply]
    I think a strictly legal explanation is unlikely. While some of those IPs look like hosting services, which might plausibly all be rented by the same company, the WHOIS info of others makes this unlikely because they belong to educational institutions [1] or are listed as home/cable/DSL lines in various countries [2] [3] [4] [5] [6]. Probably unwitting open proxies or some kind if not an overlay network ("botnet"). Someone not using his real name (talk) 21:15, 28 September 2013 (UTC)[reply]
    I agree that a legal explanation is extremely unlikely, and that's the basis for my support of removing the links. It isn't ads, it's risk.—Kww(talk) 21:27, 28 September 2013 (UTC)[reply]
    Also OVH, which was one of his two data stores he said it used at one point [7] is known for their "gullible" provision of services, including to warez topsites, seedboxes etc. Do a bit of googling if you don't want to take my word for it; these for instance [8] [9]. And Hetzner doesn't appear a complete stranger to that line of biz either [10]. Someone not using his real name (talk) 22:27, 28 September 2013 (UTC)[reply]
    You probably want to put some of this material under the proposal for removing all links, Someone.—Kww(talk) 22:52, 28 September 2013 (UTC)[reply]
    Are you suggesting that archive.is is using a file sharing network for storage? If so, would you associate that with the wide range of IP addresses used to modify WP links? NebY (talk) 16:07, 5 October 2013 (UTC)[reply]
    No, I just do not understand how the kat.ph neighbourship is related to malware. Neighbourship on a huge ISP is a very weak relation and kat.ph is not involved in malware distribution. The relation to WikiLeaks is stronger and it would be a better source to hazard conjectures about evil cyber-underground and zombie networks, wouldn't? 82.205.72.112 (talk) 16:54, 5 October 2013 (UTC)[reply]
  6. Oppose - Previous attempts to communicate with the editor have failed. Robert McClenon (talk) 16:02, 2 October 2013 (UTC)[reply]
    False. The last attempts to communicate with Rotlink were on 3 September and 12 September, and were successful. After that, there was the block on 19 September. That's not a good enough reason to stop trying to communicate. --Lexein (talk) 01:54, 3 October 2013 (UTC)[reply]
    Lexin it's possible that involved editors could have used the "Send User an email" function. Please assume good faith. Hasteur (talk) 02:33, 3 October 2013 (UTC)[reply]
    I don't know why you keep involving some editor who is not involved in this discussion. Seems rude to keep setting off his notification widget. --Lexein (talk) 05:10, 5 October 2013 (UTC)[reply]
    Comment @User:Lexein Since you appear to strongly support the use of this service that many other editors find troubling or questionable, have you tried to contact Rotlink off-wiki and ask them to request approval of the bot? Since you want that action taken, and many of us don't, why don't you go ahead and do it? You don't need the consensus to make that contact as one person. Robert McClenon (talk) 17:05, 5 October 2013 (UTC)[reply]
    Asked and answered above, dated 3 October 2013. In that case, can you please stop flooding this RFC page with your defense of the service and see whether you get a reply? Robert McClenon (talk) 17:06, 5 October 2013 (UTC)[reply]
    Okay, Lexein. You tried. Thank you. It appears that you received a less-than-satisfying reply, if I read your lengthy remarks correctly. Do you now agree that the activities associated with the service are questionable? Robert McClenon (talk) 17:11, 5 October 2013 (UTC)[reply]
    Not the way you say it, no. To be very specific (please read this very carefully): the contents of Archive.is and links to it related to (only) Wikipedia deadlink recovery, are above reproach, regardless of everything else that has happened. But the confluence of events dooming our use of it enrages and saddens me: I don't know who's running the swarm edits, I do not think it's Denis, I don't know why he can't or won't stop it. I maddeningly don't know why Denis seems to had this fatalistic change of heart. It doesn't make sense. I'm outraged that there's no goddamned sensible explanation for this mess. I feel like the last librarian, facing a squad of jackbooted thugs (the unholy alliance of the swarm edit committers and the coffied-up administrators here) whose sole joint aim is to destroy only the rarest, most fragile, out of print books, leaving the pulp, mass-market trash and tabloids unmolested, and laughing about it in my face. --Lexein (talk) 18:53, 5 October 2013 (UTC)[reply]
    Lexein, you need to calm down: I understand that you think that I value process too highly, but "jack-booted thug" and "coffied-up" are going beyond the pale.—Kww(talk) 19:07, 5 October 2013 (UTC)[reply]
    You still misunderstand. And don't purport to know what I need. Better to say what you want, rather than what I need. I feel beset on both sides now - that can't be helped. Apparently I'm the only one pissed off about losing work. Apparently I'm the only one pissed off about losing links to snapshots of deadlinked web-only RS. Not you, apparently. I am indeed wearing the blinders of a librarian: don't care about the spammers, really, don't like the spam, but also don't like the bureaucratic steamrollering of Wikipedia's links to archives of deadlinked RS. Just get away from my resources, and leave them alone. In that light alone, "process over resources" is sandpaper on eyeballs, and boots on face. Yeah, I'm taking it personally, because of the time and effort I put in to trying to keep our goddamn cited sources verifiable.
    And don't take "coffied up" as an insult. If I were an admin in context, not particularly involved in the history (failed archives, non-functional Wayback Machine for a year, nascent and (early on) unreliable WebCite, dealing with newcomer sans CV foreign-based archive.is), suddenly faced with an onslaught of swarmed identical IP edits from apparently proxied (and maybe even compromised machines), I'd be lacing on the jackboots, and be acting with my sternface on. I do not write without empathy for all our positions here. --Lexein (talk) 19:23, 5 October 2013 (UTC)[reply]
    First, the question to Lexein was whether he or she agreed that the activities associated with the service were questionable. I wasn't asking whether the service was questionable. I think that we agree that the archive itself is a good archive. I don't think that was in doubt. My question is whether Lexein is now in agreement that there is something strange and questionable about the swarmed edits through open proxies. If Lexein is now in agreement, and is in agreement that the activities associated with the service will not be set right, Lexein has a right to be annoyed, but I will ask Lexein to be a little less dramatic. Unless I misread Lexein (and that is possible, given the WP:TLDR quality of previous posts), then we no longer think that relying on the archive is a satisfactory option. Robert McClenon (talk) 22:45, 5 October 2013 (UTC)[reply]

6. Copy Archive.is and WebCite content to Wikimedia-controlled server until it is too late

[edit]

It is only 10Tb (Archive.is) and 2Tb (WebCite). $500 question (3 * $165 (4 Tb HDD)). 193.86.243.17 (talk) 07:49, 23 September 2013 (UTC)[reply]

x3 for redundancy. Not that that makes it exorbitant or anything.--v/r - TP 13:29, 23 September 2013 (UTC)[reply]
It's never that cheap or that easy. Setting up and maintaining such a service requires more than simply the disk space. In any case, there discussions about doing this for WebCite at meta:WebCite. —me_and 16:59, 23 September 2013 (UTC)[reply]
  1. Support. Do not host the archive files. Only copy and keep. If WebCite would go down, give the files to archive.is or archive.org and ask them to host them. If archive.is would go down give the files to WebCite or archive.org. 88.15.83.61 (talk) 19:42, 23 September 2013 (UTC)[reply]
  2. Oppose Wikimedia is not in the business of providing hosting for "worthy" projects. Let Meta handle the discussion. If Wikimedia picks up the option we'll get a great fanfare of trumpets to announce this new option for all wikis. Hasteur (talk) 13:39, 24 September 2013 (UTC)[reply]
  3. This is effectively equivalent to this proposal on meta. As intriguing an idea as it may be, it presents legal challenges and could have a substantial price tag. Even if the community was keen, this would be more in the domain of a WMF in-office decision. --LukeSurl t c 15:07, 26 September 2013 (UTC)[reply]
    No! In meta it was proposed to copy and publish under wiki{p|m}edia.org domain. Here proposed only to copy and keep, just in case if the pessimistic scenarios described here by a lot of speakers will come true. It is much cheaper, easier in support and cannot have legal consequences. 93.148.194.81 (talk) 17:17, 26 September 2013 (UTC)[reply]

Blacklisting

[edit]

I should note that two admins have blacklisted archive.is during the latest round of botnet insertions, regardless of the opinions in this RfC. That means no more links to it can be added, at least by non-autoconfirmed users. Someone not using his real name (talk) 03:38, 5 October 2013 (UTC)[reply]

It's not on the spam blacklist. Yes, to put a stop to the bots, there are filters in place that prevent additions of "archive.is" under some conditions.—Kww(talk) 03:58, 5 October 2013 (UTC)[reply]

Discussion

[edit]
  • I'm very concerned about the idea of completely removing all archive.is links, even those added by actual editors. I know of several editors who switched from WebCite to Archive.is when WebCite's future ability to archive came into question, myself being one of them. WebCite does say existing archives won't go away, but I find it hard to trust this in the long term. I'm unaware of any other on-demand services other than WebCite and Archive.is at this point, so if in a year WebCite goes away and Archive.is no longer trusted, where does this leave us? To be blunt, we're probably back to the idea of either trying to take over WebCite ourselves, or providing some funding, or...something along those lines. I know there's the issue of copyright/non-free being an issue for the Foundation, but a solution needs to be found for the long run, not simply what's convenient right now. (Sorry for rambling...) Huntster (t @ c) 07:58, 21 September 2013 (UTC)[reply]
  • I know about Archive.org, but it is not "on demand". My whole point was that without the two above-mentioned sites, we won't have access to on demand services, which are needed for archiving a specific instance of a site. Huntster (t @ c) 20:54, 21 September 2013 (UTC)[reply]
  • I think the "remove all archive.is links" option is incredibly stupid. By all means, remove all links added by IPs if you really have to do this, but really... Fuck knows why you proposed this; you may as well blacklist it if you want this solution! Also, the option I want isn't there: where we reinstate all of the archive.is links added before Rotlink's indefinite block (or those inserted before their bot was indeffed), and only remove those added by IPs after their indef. Lukeno94 (tell Luke off here) 10:29, 21 September 2013 (UTC)[reply]
  • Please reread my comment: I explained myself. I believe the owner of the site to have engaged in illegal activity, and therefore do not trust him, his site, or his future intentions. The bot was indefed on August 18, so all links created by the IPs above were placed in defiance of a block.—Kww(talk) 16:47, 21 September 2013 (UTC)[reply]
  • Again, you have made that claim, but there's no real evidence to prove it; whatever happened to "innocent until proven guilty" anyway? You may not trust the site, but it is clear that several longstanding editors - including myself - do, and still trust it. Your proposal to nuke everything flies in the face of a LOT of work by legitimate editors, particularly as archive.is has often been the only accessible archive for a given page. Lukeno94 (tell Luke off here) 18:45, 21 September 2013 (UTC)[reply]
  • Considering I know absolutely nothing about how VPNs and proxies work, I don't know what is legitimate and what isn't. But the actions of one person, regardless of who they are, shouldn't result in lots of other people having their hard work undone (as it can be VERY hard to find a working archive for a link sometimes...) Lukeno94 (tell Luke off here) 20:04, 21 September 2013 (UTC)[reply]
  • Kww, I see you presume it should be obvious to anyone that it was done illegally. I'm reasonably familiar with how proxies work and I'm not sure I understand your reasoning. If I wanted to set up proxies in several different countries I'm fairly certain I could do it legally. Could you explain what you believe to have occurred here that was illegal, and what leads you to think that? I'm asking honestly, not necessarily out of doubt. You may be more knowledgeable in these things than I am. equazcion (talk) 20:17, 21 Sep 2013 (UTC)
  • First is Occam's razor: what would prompt anyone to actually go to the expense of negotiating individual proxy hosts in places ranging from Qatar to Brazil to Vietnam? Second is the nature of the IPs: they aren't webhosts and servers. Instead, they are individual IPs on adsl networks, FTTH networks, cable modems, etc. Everything about the setup screams "botnet". If it was a legitimate proxy arrangement, I would expect to see webhosts and servers hosted in a small number of countries with good internet access.—Kww(talk) 00:27, 22 September 2013 (UTC)[reply]
  • I think known open and advertised proxies tend to be preemptively blocked from editing. equazcion (talk) 09:18, 22 Sep 2013 (UTC)
  • This can explain why there are so few webhosts and contiguous blocks of IP. They were already blocked from editing. I can imagine another simple way to get proxies. We are talked about a site owner, right? Then he/she can see access logs of the site. There are usually a lot of hits from malicious security scanners (looking for SQL-injections, etc). Those IPs are proxies and can be connected back and reused. Setting up own proxy infrastructure looks too expensive. 77.111.172.172 (talk) 09:53, 22 September 2013 (UTC)[reply]
  • Citation needed. Although I see you took my suggestion to pose an RFC, I'm seeing a raft of pointy supposition, gossip, handwaving, and assumption of bad faith, exaggerated with purple prose like "spambot" and "botnet", deliberately spreading fear, uncertainty and doubt about a resource you've personally decided to dislike and campaign against, despite showing little knowledge of the suspect service. You still don't know if it was Rotlink who actually did any archive.is additions after the block, or if it was anyone at archive.is at all, or someone else who helped out. Same IP? Ask a Checkuser. Otherwise, this is all rather weak koolaid, which I won't be drinking. --Lexein (talk) 19:11, 24 September 2013 (UTC)[reply]
  • I'm very concerned about the removal of archive.is links too. A while back, I tried to fix all the references that use the now offline site pinkpaper.com, the website of the Pink Paper, one of the UK's main LGBT news sources. Between archive.org and archive.is, I managed to find replacements for some but not all of the references used. The LGBT topic area tends to be filled with a lot of poorly sourced material especially around BLP subjects. Removing archive.is links is likely to leave a lot of those links broken. I don't really know what's going on with the IP and the non-approved bot account, but I'd rather if all the hard work I put into fixing PinkPaper links were removed just because of somebody else's behaviour. And I'm not keen on having BLP articles on sexuality-related topics potentially left without sources. This seems self-defeating. Whatever the problem is, please can you seek more of a calmer, less dramatic solution than removing all the links to a useful archival service. —Tom Morris (talk) 12:48, 21 September 2013 (UTC)[reply]
  • Thought/idea: would it be possible to wrap the archive.is links up in an external links template? (similar to {{IMDb title}} etc) That way, if the site goes hinky in the future, all the links could quickly be disabled, minimizing negative fallout. Siawase (talk) 12:03, 22 September 2013 (UTC)[reply]
    • This is a good idea. Wouldn't it be better to backup its content to a Wikipedia server? If the site goes hinky in the future, all the links could be changed to something like archiveis.wikimedia.org instead of disabling the links and hitting the verificability issue. 77.110.134.11 (talk) 12:15, 22 September 2013 (UTC)[reply]
    • My concern with this is that a template would be seen as tacit approval of these links, which I think we're a long way from having. I know I would see the use of such a template as implying the community considers these links to be A Good Thing, particularly if Archive.is had such a template while other archiving services didn't. —me_and 09:06, 23 September 2013 (UTC)[reply]
  • The introduction of this RfC misses the fact that User:Rotlink (user, not bot) himself added a lot of links [12] between having his bot blocked, withdrawing his BRFA and until this was pointed out to him [13]. —  HELLKNOWZ  ▎TALK 13:35, 22 September 2013 (UTC)[reply]
  • Do we even have any proof that Rotlink is the owner of the website, and isn't just claiming to be? I'm still disgusted that the actions of one person could lead to the reversion of a shedload of good edits by legitimate editors; regardless of whatever position they hold. Frankly, the age of an archiving site is utterly irrelevant; if it does go under, or if it does end up with adverts, then THAT is the time to propose its removal. Seems like several people have forgotten about WP:CRYSTAL... Lukeno94 (tell Luke off here) 14:37, 22 September 2013 (UTC)[reply]
  • At the risk of looking like a total pratt, that isn't convincing. Lexein has communicated with Rotlink, who claims that they are the owner. Lexein's usage of words doesn't confirm or disprove the claim. I'd like to see something rather more solid before we jump to conclusions about whether to include archive.is links or not. Also, the presence of a Wikipedia article, and the reliability and/or notability of it, has precisely nothing to do with whether we use an archiving site or not; bringing that up is unnecessary and deliberately inflammatory. Lukeno94 (tell Luke off here) 19:27, 22 September 2013 (UTC)[reply]
  • better diff. —  HELLKNOWZ  ▎TALK 19:37, 22 September 2013 (UTC)[reply]
  • CRYSTAL applies to determining article existence and content. A modicum of speculation isn't unreasonable when it comes to technical concerns, which this can easily become if an archive site becomes widely relied upon by articles and then becomes unviable. equazcion (talk) 19:43, 22 Sep 2013 (UTC)
  • The possibility of WebCite going under was, and perhaps still is, very real. Did that mean we blanket removed every single link? No, it didn't. The diff Hellknowz shows a lot of technical knowledge; but it's fairly generic stuff that anyone who goes and looks things up for could come out with. Given that it is near a year old though, it appears probable, if not certain in my mind, that Rotlink is an owner, or employee. It does not, however, confirm he is the only owner; nor should it matter one iota if a website is owned by one guy, two guys, or a consortium. Lukeno94 (tell Luke off here) 19:55, 22 September 2013 (UTC)[reply]
  • Luke, Lexein has emailed the owner and has later confirmed that the owner is Rotlink. I was not trying to "bring up" that article (you surely know about it already) and am disturbed that you think it "deliberately inflammatory" to try to help answer the question you asked, "Do we have any proof...", by referencing another editor's research. NebY (talk) 22:20, 22 September 2013 (UTC)[reply]
WebCite is no longer accepting submissions like they used to. I tried it today. They rejected my target page with the false summary claiming that my email address was incorect. Archive.is accepted my submission no problem. Poeticbent talk 21:01, 22 September 2013 (UTC)[reply]
I just archived a page fine, perhaps your e-mail address was invalid, like a stray character. —  HELLKNOWZ  ▎TALK 23:02, 22 September 2013 (UTC)[reply]
Must've been a temporary outage. I tried it yesterday and got the same error message. Good to know though that it's back up again for the time being. De728631 (talk) 15:23, 23 September 2013 (UTC)[reply]

  • Other Archive.is users. I was 2 minutes too late to add this info to Wikipedia:Articles_for_deletion/Archive.is into the list of notable sources which consisted of a single item. I just did "archive.is" search on majesticseo.com (pro account required) and find out that besides Wikipedia archive.is is used by
88IP making the case that if others use it, we should to. Editor is encouraged to make the case for why enWP should use/trust it instead of flooding us with lists of other sites that use it

International Tropical Timber Organization http://www.itto.int/direct/topics/topics_pdf_download/topics_id=3452&no=1, WikiLeaks https://twitter.com/wikileaks/status/368886340089171968, Verso Books https://twitter.com/VersoBooks/status/382175504607887360, Lenta.ru http://lenta.ru/articles/2013/02/22/fromharvardwithlove/, The Guardian http://www.theguardian.com/environment/climate-consensus-97-per-cent/2013/sep/16/climate-change-contrarians-5-stages-denial, Channel Register http://www.channelregister.co.uk/2013/09/16/bill_gate_again_world_richest_and_richest_in_us_for_the_20th_straight_year/, Reuters http://blogs.reuters.com/felix-salmon/2013/01/, RTVE http://blog.rtve.es/distritolatino/2013/01/index.html, The Atlantic Wire http://www.theatlanticwire.com/entertainment/2013/09/utero-20-what-were-saying-now-and-what-we-said-then/69397/, Time (magazine) http://business.time.com/2012/10/01/how-the-maker-movement-plans-to-transform-the-u-s-economy/, Badische Zeitung http://www.badische-zeitung.de/gemuese-kooperative-wirft-afd-kandidatin-fein-raus, Blueseed http://blueseed.co/faq/, MTV Sweden http://www.mtv.se/musikvideo_artist/1519-p-nk, Chicago Reader http://www.chicagoreader.com/Bleader/archives/whoa, The Huffington Post http://www.huffingtonpost.com/2013/07/02/andrew-mason-album_n_3534367.html, Público (Portugal) http://www.publico.pt/mundo/noticia/wikileaks-exige-demissao-de-jornalista-da-time-que-sugeriu-assassinar-assange-1603394 and many others less notable web sites and blogs.

88.15.83.61 (talk) 19:19, 24 September 2013 (UTC)[reply]
  • I did not say if others use it, we should to. I only said the site has many notable users (no less notable than WebCite's). I want to see the ideas of how it can be compatible with the claims above about "botnets", "seeking for traffic", "depending on Wikipedia" from the authors of those claims. 88.15.83.61 (talk) 20:34, 24 September 2013 (UTC)[reply]
  • And I see no point to add collapsing button except of lying about what I said. It does not make the text smaller. Your lie is larger than the list you have collapsed. 88.15.83.61 (talk) 20:38, 24 September 2013 (UTC)[reply]
  • 88IP. you keep using that word lie. Please back up the assertion with diffs, retract, or face summary striking/removal of your attempts to lobby for Archive.is. Per WP:WIAPA Accusations about personal behavior that lack evidence. Serious accusations require serious evidence (are a personal attack). Evidence often takes the form of diffs and links presented on wiki. You are now being directly challenged to back up your accusations of lies. Hasteur (talk) 12:29, 25 September 2013 (UTC)[reply]
So they are now using even less descriptive edit summary as not to be caught. Personally, this just further lowers my trust in the user and their true motives. There is no reason to be this persistent to circumvent our guidelines and policies. —  HELLKNOWZ  ▎TALK 10:20, 25 September 2013 (UTC)[reply]
  • By the way, why do you think that there was a bot?
  • It is not clear from the #Points to consider.
  • Yes, using interveaving IPs from different continents looks suspicious and displays proxy usage.
  • But, as IP users, they are:
    1. limited in numbers of edits per unit of time. I do not know the exact limits but they exist. Approximately 4th-5th save in a row will not success because of this limit. It is much simpler reason to use proxy rotation compared to the conspiracy theories.
    2. have to solve captcha; at least some part of the job (if not all) was done by a human, not by a bot. 95.225.130.13 (talk) 17:11, 25 September 2013 (UTC)[reply]
MediaWiki's normal captcha is known to be weak, so weak that the developers don't even recommend other wikis use it. And spammers can easily farm out captcha solving to 3rd world countries. The fact that they have access to so many proxies or a botnet means they could have other black hat tools and services. Mr.Z-man 18:59, 12 October 2013 (UTC)[reply]

Legality question

[edit]
  • Comment - The comment was made above that, if the operation is illegal, it will eventually be taken down. That raises the question of what law the operation may be violating. Icelandic law? Czech Republic law? (The domain is registered in Iceland based on the TLD, but its registered address is in Prague, and one of its nameservers is in Lichtenstein.) US federal law? California law? Florida law? The 2019 treaty on international control of open proxies? The amorphous nature of the archive gives me reason to wonder what is going on. Robert McClenon (talk) 16:14, 2 October 2013 (UTC)[reply]
  • First, are the IPs "residential" in the sense of being home computers? If so, are they static IP addresses? Can dynamic IP addresses be used as open proxies? Second, are the IPs compromised by malice, or by ignorant configuration? In the 1990s, when open proxies were a common means for the transmission of spam, they were usually the result of ignorant configuration, not of malice. (Hanlon's Razor applies. Never attribute to malice what can be explained by stupidity.) Third, is the argument that the archive itself is illegal, or that the use of the open proxies is illegal? I am suspicious. The principle of "innocent until proved guilty" does not apply, because this is not a criminal proceeding, and is a potential matter of safety and computer security. However, I am not persuaded that anything is illegal. We just don't know that it isn't illegal, and that is sufficient reason for caution. Robert McClenon (talk) 21:28, 2 October 2013 (UTC)[reply]
Many of these IPs appear to be standard home computers sitting in FTTH and DSL pools. Home computers are rarely simply "misconfigured" as open proxies with modern operating systems and firewalls. I haven't argued that archive.is itself is illegal, only that Rotlink appears to be using illegal means to link to it.—Kww(talk) 22:13, 2 October 2013 (UTC)[reply]
I agree with Kww that this looks suspiciously like a botnet. For me, we don't need to concern ourselves with the legality of his actions except for this: Why the heck would someone be using a botnet to spam their website all over Wikipedia? The extreme measures that Rotlink is taking to promote his project is the most worrisome aspect of this all. And the reason I think we should scrub his site off Wikipedia entirely, just to be safe, and to discourage this type of self-promotion. Someguy1221 (talk) 23:24, 2 October 2013 (UTC)[reply]
  • Yes but you're just saying that they're compromised machines. I agree that it's possible that some of them are compromised machines, but you've been making it sound like he's personally breaking the law. He isn't, if he's using what I just used then he's using a publicly available service which you can find on Google with obvious search terms. I don't presume that those proxies are compromised machines, even though I will admit that it's a possibility. That having been said, I think it's also possible that many of them are run for the purpose of evading censorship. Ethics is a slightly separate matter, because it is ethically questionable to go around access restrictions, but your concern has seemed to primarily be a question of legality and not ethics. I would tend to agree with your ethical concerns, I just don't think that the violation of ethics rises to the point where I think it's a good idea to throw the baby out with the bathwater. Qalnor (talk) 01:26, 3 October 2013 (UTC)[reply]
  • Comment It seems that we are headed for the complete removal of archive.is links from WP – a move I still contest based on a currently potential or imaginary threat, and the undoubted disruption that such removal would cause to all users. I'm a pragmatist and dislike sanctimoniousness or scaremongering. Our ever-vigilant sysops have apparently blocked the worst of the offences, so that ought to be an end to the matter. Having said that, the high level of activity of IPs in this discussion (suspected as being archive.is principals) isn't doing their cause any favours.

    The attempt to link to WP on a larger scale is very likely to be a commercial strategy to drive more traffic their way, opening its doors to future advertising revenues. Is that something we ought to seek to deny them? If so, we should similarly blacklist any source/citation that carries any advertising banners. Our assembled editors all seem to want an archiving site that fulfils our needs, capable of pre-emptive archiving, and which is financed according to the lofty ideals of WP/WMF. Fact is, linkrot isn't going away, and these things cost money. Not a lot in internet terms, but still needs to be paid for. Whilst the British Library have started a UK-based archive (still in its infancy), it's limited in geographical scope and doesn't allow for archiving on demand. We can't have it all our way unless the WMF agrees to set up and finance an archiving function of our own without being beholden to others. The derived problem will be the lack of independence from WP, spam archiving, and the inevitable submission to Wikipolitics. But until we have a coherent strategy on how to combat linkrot, and viable and stable external archive suppliers, any work people do to archive the internet for posterity (with respect to our usage) must be welcome if it complies with our policies and guidelines. We should draw a line under past offences and be done with it. -- Ohc ¡digame!¿que pasa? 03:11, 3 October 2013 (UTC)[reply]

Personally I would be reluctant to 'forgive' 'offences' of this nature anyway. And I used the quotations for a reason. It's not really a matter of forgiving. If someone were to run an unauthorised bot, I would find that dubious but not the sort of thing I would automatically consider harsh actions like a complete removal. Even some minor block evasions. But when someone uses such a large number open proxies (regardless of ther permission issue) or a bot net or whatever? That's the level where I'm thinking it's too dubious to trust. (In other words it's not offences I'm worrying about but untrustworthyness.)
Which ties in to the other problem. If we knew beyond a resonable doubt that the people behind this were definitely people involved in archive.is I would have supported a complete ban of the site long ago. It's not just a matter of adding ads, but as others have said, a matter of not being able to trust someone who would resort to such measures to provide what we need, a resonably safe and trustworthy archiving site. Who knows whether they will one day decide to modify content for their own purposes or for financial gain, or whether they may allow malware to spread from their site or whatever. The simple fact is when the people behind a site are willing to resort to such dubious and ethically questionable methods, it's entirely resonable to be suspicious and not trust them. Pre-emptive removal therefore seems resonable.
While it's true normal sites also have a risk of changing in that way, particularly with stuff like domains expiring, the difference is for most normal sites it's just an ordinary everyday risk rather than having a reason to actively distrust.
The problem as I said earlier, and one reason I have not commented before, is how we verify that it's really people involved in the site behind this. The fact it may seem likely because they will benefit most isn't great evidence. And so far all we have that I've seen is that someone involved in the attempts claims a connection. This also isn't great evidence since the stuff we should worry about like a false flag operation or joe job would clearly involve the same thing. (Not necessarily a rival, could simply be someone who personally dislikes the site or its owners.) Even for a fan or supporter being the perpetrator, it's easily possible a misguided person may claim some connection that doesn't exist perhaps to give weight to their efforts. The sort of evidence we need is some indication from people we definitely know to be involved in archive.is claiming involvement in these efforts.
Normally in cases of blacklists and similar on wikipedia, it's not a big deal. The purpose of the blacklist is to protect wikipedia (or whatever WMF projects), and they are generally only used when the site is rarely useful. So whether it's someone tied to the site, a fan, or even an opponent trying to defame or damage the site, it doesn't matter as the site suffers no major loss and we suffer no major loss since we shouldn't be linking to them much anyway. The problem here of course is archive.is could be useful and would be the sort of thing we may use, were it not for the people behind it possibly being involved in spamming us in highly questionable ways, but of course that would be particularly unfair if it's an enemy trying to damage them.
P.S. I looked more carefully and found [16]. I still don't find it that compelling. It's perhaps a lot less likely although still not impossible it's simply a fan or supporter claiming a connection that doesn't exist. But for an opponent, it doesn't really add anything since any details which we could verify an opponent could also find so it just proves the person behind it is fairly dedicated. Well unless these details only became available after the claims.
BTW, I didn't really mention before but even if we could verify the connection between Rotlink and the site, while this would be another nail, it wouldn't really be enough to confirm that the IPs are mostly coming from the same source, as opposed to someone else who is e.g. seeing a good target because of the history or even simply doing it because of the history. On the other hand, if we can confirm the connection, this would suggest the person is aware of the problems we had with Rotlink and unless they decided to simply leave, it would seem likely also aware of the problems we are having with the IPs. The fact there's been no apology for Rotlink nor a denial of usage of the IPs would seem quite suspicious and the sort of thing where my good faith is disappearing.
Also has anyone every approached someone at archive.is about the problems? I did see [17] [18]. My problem here is particularly with stuff like Wikipedia:Archive.is RFC/Rotlink email attempt, I'm a bit unclear if what happened is User:Lexein emailed someone clearly behind archive.is, e.g. using an email address or contact form from the archive.is site, who confirmed they are behind rotlink. Or whether they simply emailed rotlink who said they were the owner which Lexein accepted in good faith. These are obviously quite different scenarios and the above discussion isn't sufficiently clear to me. Edit: A third scenario would be they emailed Rotlink and after multiple replies it was confirmed that the person they were communicating with was using an archive.is domain. This would confirm some connection although unless the email is also listed on archive.is, it's still possible it's simply someone who knows the person or people behind the site who too readily allow people to use email addresses under their domain name. In that case it wouldn't be unresonable for us to treat the person as being who they say they are, although it would still have been better to verify with someone clearly listed as a contact point for the site.
Nil Einne (talk) 15:11, 4 October 2013 (UTC)[reply]
What I did was stated at the top of Wikipedia:Archive.is_RFC/Rotlink email attempt. I clicked the "Email this user" link on User:Rotlink's page. The first response I got back, and all following responses, were from "Denis", whose first name matches Archive.is's whois report. The responses did not come from an archive.is domain. You're welcome to do the same thing I did, and you'll likely get a response from the same domain I did. (I'm not totally up for revealing the email address here). But I'm willing to forward the whole email thread directly from my inbox to OTRS, to provide you with an avenue to verify the accuracy of the "email attempt" page. Beyond that, confirming identity, and getting straight answers, have both been frustrating, as I've stated elsewhere. --Lexein (talk) 22:25, 28 October 2013 (UTC)[reply]

Ad-supported archives

[edit]

Unless and until Wikipedia shows it is able to bail out or take over from WebCite, we're in a dicey position. An ad supported archive has every reason to be aggressive in trying to get Wikipedia to use it, and if there are multiple ad supported archives, they could end up having bot edit wars to try to take over each other's traffic. But if we are not going to actually support the non-ad archives and they end up shutting down, what other option do we have? Wnt (talk) 15:53, 3 October 2013 (UTC)[reply]

See note about ads above. --Lexein (talk) 08:30, 4 October 2013 (UTC)[reply]
I'm surprised that you're perplexed. The answer is obvious: because copyright infringement with ads is commercial copyright infringement, which is morally and sometimes legally worse.--greenrd (talk) 21:41, 12 October 2013 (UTC)[reply]

WMF Notified

[edit]

In the interests of transparency, I have contacted the Wikimedia Foundation regarding this. Reaper Eternal (talk) 02:31, 5 October 2013 (UTC)[reply]

WMF really should be tackling this problem head-on, especially because it periodically says it is working on such projects to tackle link-rot, which tend to disrupt any community lead approach to tackling link rot. Either WMF should take responsibility or tell the community to solve the problem itself. And if the situation is the latter, then the community should take responsibility for doing more than just saying yes, link rot is a problem, which in several RFCs it has done so, but take some next step. Otherwise, based on the central Wikipedian principle of people contributing their time and energy to fix articles, we get ad-hoc solutions like this which are then subjected to inconsistent debating independent of the greater link-rot debate, leading again to differences between community-approved sentiment (the link-rot debates often talked very favorably about a bot-led solution to the problem, at least as a short-term fix) and administrative decisions. When those two diverge, editors become less respectful and more resentful. Ultimately, the community and the WMF needs to understand that if you say you're going to address a problem, do it, do not just express empty sentiment with no way forward. Jztinfinity (talk) 17:06, 12 October 2013 (UTC)[reply]
Can you link to where the WMF says it is working on projects to tackle link rot?--greenrd (talk) 21:44, 12 October 2013 (UTC)[reply]
During the Google Summer of Code for 2011, the Wikimedia Foundation and its contractors worked for 3 months with the Internet Archive to have the MediaWiki software automatically archive all of our citations. mw:Extension:ArchiveLinks was created as a result, but it has not been enabled on any projects yet and the extension was archived a few days ago. 64.40.54.39 (talk) 23:01, 12 October 2013 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.