Jump to content

Wikipedia:Requests for comment/Archive.is RFC 4: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎NOINDEX: re (subst unsigned)
Line 296: Line 296:
:::'where all non-indexed RFCs are normally placed' .. there is however no real regulation for that.
:::'where all non-indexed RFCs are normally placed' .. there is however no real regulation for that.
:::Funny, a couple of minutes ago you were in Algeria, and yesterday in the Ukraine. And I already answered that question: there is no policy or regulation for that on Wikipedia. You have however neatly evaded the question. --[[User:Beetstra|Dirk Beetstra]] <sup>[[User_Talk:Beetstra|<span style="color:#0000FF;">T</span>]] [[Special:Contributions/Beetstra|<span style="color:#0000FF;">C</span>]]</sup> 13:32, 26 May 2016 (UTC)
:::Funny, a couple of minutes ago you were in Algeria, and yesterday in the Ukraine. And I already answered that question: there is no policy or regulation for that on Wikipedia. You have however neatly evaded the question. --[[User:Beetstra|Dirk Beetstra]] <sup>[[User_Talk:Beetstra|<span style="color:#0000FF;">T</span>]] [[Special:Contributions/Beetstra|<span style="color:#0000FF;">C</span>]]</sup> 13:32, 26 May 2016 (UTC)
:::: There is established practice coded in robots.txt. There is established practice to place new RFCs undex /wiki/Wikipedia:Requests_for_comment/. Well, no one has to follow it. But no one has to resist either. There must be the reason why an experienced editor do not follow it and create RFC with a SEO-optimized name avoiding robots.txt - it is not a random event already. And then fight again NOINDEX on the pages - it is not a random event definitively.


<!-- END OF DISCUSSION SECTION -->
<!-- END OF DISCUSSION SECTION -->

Revision as of 13:41, 26 May 2016

Background

Archive.is is an archiving service similar to sites like Webcite and the Wayback Machine. archive.is is on the en.wikipedia.org Wikipedia:Spam-blacklisting. This prevents it's use as a reference source URL.

Based on the questions of consensus raised within talk Spam-blacklist, the community should discuss and vote on whether the previous consensus, established in Wikipedia:Archive.is RFC 3, should remain in force.

Effect of Blacklisting ("Blocking")

Article source links that refer to archive.is are currently disallowed. Any edit incorporating a new URL that starts with https://archive.is/, will result in the editor being returned to their draft changes (Edit view) with a warning notice which reads (in part)

Your edit was not saved because it contains a new external link to a site registered on Wikipedia's blacklist.
To save your changes now, you must go back and remove the blocked link (shown below), and then save.
...

The following link has triggered a protection filter: archive.is

(Here is a picture of the entire warning notice). The warning notice prevents the change referencing the archive.is URL from being saved.

Previous RFCs

For review, there have been three previous RFC on this topic.

The User is invited to consider them before voting.

Instructions

To add your vote,

  1. add a numbered entry under appropriate section like # short comment. --~~~~, or simply # --~~~~.
  2. (optional) further discussion can be added under the Discussion section.

For example, within the Support Vote section, an entry might read like

  1. archive.is is the bee's knees. --User1 (talk) 00:00, 1 May 2016 (UTC)

or, within the Oppose Vote section

  1. archive.is is the first horse of the apocalypse. --User2 (talk) 00:00, 1 May 2016 (UTC)

Voting

Remove archive.is from the Spam blacklist and permit adding new links (Oppose/Support)

Support Vote

Enter a single-line, numbered, signed vote here if you support removing archive.is from the Spam blacklist.

  1. archive.is is desperately needed because sources are rotting, reasons for blocking are unfounded. --JamesThomasMoon1979 05:23, 18 May 2016 (UTC)[reply]
    Citation needed Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  2. Support Per Jtmoon. archive.is is the best archive we have. Archives pages better than any other site, and archives more pages. Was blocked because an unauthorised bot inserted archive URLs pointing to archive.is. Similar to what the authorised bot CyberBot II now does for archive.org. There was never any concrete evidence of involvement of archive.is with the unauthorised bot (or CyberBot II with archive.org). No reason to believe there would be any problem if the site was removed from the blacklist. Hawkeye7 (talk) 06:29, 18 May 2016 (UTC)[reply]
    Supporting the change in wording. The whole point is to enable the use of archive.is. Hawkeye7 (talk) 22:26, 20 May 2016 (UTC)[reply]
    WP:PERX Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  3. Indeed, other archiving services sometimes fail to archive parts of sites (usually "Web 2.0", but WebCite fails on even simple parts like the "PREMIERE" indicator in Zap2it's TV schedule (the episode guide is messed up for the series in question, so that's not an option)) when archive.is manages to do so. There is no evidence for a "botnet", or it being affiliated with the site. For all we know, maybe they got what they wanted by getting archive.is blacklisted. nyuszika7h (talk) 06:45, 18 May 2016 (UTC)[reply]
    So you are electing to willfully ignore the numerous IP editors making the same types of actions that Rotlink bot for 5 edits and then hopping to annother IP? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  4. Support - Per Jtmoon and Hawkeye7. It seems counter-productive blacklisting what is obviously a useful tool. Sometimes it is the ONLY archive of a source available. Nothing sinister about it that I can see, so surely its better than the alternative, which is sometimes nothing? Anotherclown (talk) 10:37, 18 May 2016 (UTC)[reply]
    So you're electing to ignore the part where Archive.is strips out the original advertising on the page and inserts in their own advertising links to montetize themselves, thereby making it not a true archive of the page? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  5. Support - No serious reason for continued blocking. Various fears about archive.is have not materialized over 2.5 year period, the site offers useful service, it is not banned on other language wikis, and existing older links to it have not caused any serious damage.--Staberinde (talk) 15:12, 18 May 2016 (UTC)[reply]
    You mean other than archive.is being caught circumventing our ban again in February of this year? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  6. Support — The blacklist was a very poor solution to a pretty minor problem. The site itself is not the issue, but rather how a few users chose to override bot-policy. It should not remain. Carl Fredik 💌 📧 19:35, 19 May 2016 (UTC)[reply]
    And then after the ban, the bot (or sockpuppet) actions still continuing in defiance of the ban? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  7. support Problems with archive.is have not materialized. That said, I think that it is naive to think that the problems people had with archive.is weren't actual problems caused by folks associated with archive.is. I was the closer of the first RfC. Hobit (talk) 16:00, 21 May 2016 (UTC)[reply]
    You mean other than they getting caught bulk inserting themselves in February of this year? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  8. Support. Archive.is is a legitimate archival service. The unauthorized bot incident was sad; the community here was unsettled. But the response to it was akin to terminating all relations with a country just because a tourist from that country broke a law. It is a good service. Let's put it to use. Best regards, Codename Lisa (talk) 10:50, 22 May 2016 (UTC)[reply]
  9. Support. BTW, I'm sick and tired of the carpet bombing by the malicious 'Blacklisted link' template flagging articles for no good reason. Poeticbent talk 06:43, 23 May 2016 (UTC)[reply]
  10. Support – Archive.is is a legitimate archiving service, unlike Internet Archive which retroactively deletes pages based on changes to robots.txt after archiving, and WebCite which sometimes does not archive pages correctly. Archive.is is also able to create archives from Google caches. I have run into many occasions when repairing dead links where Archive.is is the only archiving service available. The benefits of adding Archive.is back outweigh the minor disadvantages. SSTflyer 07:48, 23 May 2016 (UTC)[reply]
    Before anyone uses archive.is not respecting robots.txt as an argument to oppose, I'd like to note that there is a manual takedown process where you can report an archived page for a variety of reasons. Not sure how that works out in practice. nyuszika7h (talk) 11:59, 23 May 2016 (UTC)[reply]
    No, it's a strong argument for using it. Plastering robots.txt on your site should not make it immune to manual archiving. Carl Fredik 💌 📧 12:34, 23 May 2016 (UTC)[reply]
    @CFCF: Actually, it also scrapes Wikipedia's recent changes and archives links added to articles, but I agree with you, I don't like how archive.org also makes all previously archived copies unavailable if a site's robots.txt blocks it on the next crawling attempt. nyuszika7h (talk) 12:45, 23 May 2016 (UTC)[reply]
    @Nyuszika7H: "Actually, it also scrapes Wikipedia's recent changes ..." - that does mean that the people behind archive.is are actively trying to be a good archive for Wikipedia, and hence that they would have good reason to be linked from Wikipedia. That does strangely coincide with the MO of User:Rotlink and the IPs. --Dirk Beetstra T C 04:58, 25 May 2016 (UTC)[reply]
  11. Support, in large part because the whole thing was a big overreaction, as Codename Lisa notes. Plus, it's a benefit if they don't follow archive.org's policy of dumping pages based on later robots.txt changes. Nyttend (talk) 12:51, 23 May 2016 (UTC)[reply]
  12. Support - if it's being used for spammy links, those need to be dealt with individually. The service is way too useful to encyclopedic purposes - David Gerard (talk) 13:21, 23 May 2016 (UTC)[reply]
  13. Support—as noted above, the original decision was an overreaction, the site is useful, and we can deal with case-by-case situations of spamminess. Imzadi 1979  13:28, 23 May 2016 (UTC)[reply]
  14. Weak support for last resort There's no question that archive.is handles more scenarios as the other haven't adopted to newer web technologies. I am concerned about the lack of respect that archive.is shows for internet standards, particular ignoring robots.txt. Content producers have the right to control what they produce, and often use that file to control it. archive.is, in part, can archive more links because it ignores the expressed wishes of content producers that use that file to control access. And yes, I read their FAQ. My preference would be to encourage archive.is only as a last resort archival service.Ravensfire (talk) 14:17, 23 May 2016 (UTC)[reply]
  15. Support per all previous supports. (Note I am a long-time user who now prefers to edit as an IP. I don't see anything about IP's not being allowed to weigh in here.) — Preceding unsigned comment added by 173.17.170.8 (talk) 17:08, 23 May 2016
    IPs are not debarred from participating in RfCs; however, you are still expected to WP:SIGN your posts. --Redrose64 (talk) 19:49, 23 May 2016 (UTC)[reply]
  16. Support Any issues can be sorted locally. Only in death does duty end (talk) 19:51, 23 May 2016 (UTC)[reply]
  17. Support like any other site on the web, spam links can be removed case by case. The bot stuff is in the past now. Pinguinn 🐧 20:14, 23 May 2016 (UTC)[reply]
  18. Support per virtually everyone. Appears to be a useful way to prevent link rot. Regards, James (talk/contribs) 22:18, 23 May 2016 (UTC)[reply]
  19. Support— as stated above it's useful and links has to be dealth with individually. -- Hakan·IST 07:04, 24 May 2016 (UTC)[reply]
  20. Support per all above me.--The Traditionalist (talk) 16:14, 24 May 2016 (UTC)[reply]
  21. Support but only where other archival services are insufficient. If there's a worry about permanence, then create a bot that submits all archive.is archives used on wiki to the Internet Archive and replaces them. I don't buy the botnet argument. I think it's just run-of-the-mill spamming, either by a resourceful person or via social engineering. That said, I do agree that archive.is shouldn't be a preferred archival service, and in the vast majority of cases editors should be firmly encouraged to use the Internet Archive to take snapshots instead. —/Mendaliv//Δ's/ 21:47, 24 May 2016 (UTC)[reply]
  22. Support, but only as an option of last resort. Kaldari (talk) 22:47, 24 May 2016 (UTC)[reply]
  23. Support, Not only because of the above ( especially agree with Codename Lisa ) but for similar reasons that I updated my position in RFC.3. In regards to possible continuing abuse is it possible have an edit filter tag post that add the links? PaleAqua (talk) 05:25, 25 May 2016 (UTC)[reply]
  24. Support, it is a useful archive tool that preserves the formatting of pages more faithfully than Webcite and the Wayback machine. The blacklisting was an overreaction. -- Ham105 (talk) 13:46, 25 May 2016 (UTC)[reply]

Oppose Vote

Enter a single-line, numbered, signed vote here if you support keeping archive.is on the Spam blacklist.

  1. Close this RfC as the result is meaningless - this does not address the issues and consensuses of the previous RfCs and a supporting closure would not change anything. --Dirk Beetstra T C 05:16, 19 May 2016 (UTC)[reply]
    Consensus can change. This RfC will establish a new consensus that will overturn the original RfC. Hawkeye7 (talk) 08:12, 19 May 2016 (UTC)[reply]
    No, this will not overturn the consensus of the previous RfC, as it is not handling all aspects of the previous RfC. A consensus that supports not using the blacklist to prohibit additions does not automatically mean a consensus to stop removing the links, or even a consensus that no new links can be added. Consensus can change, if that is appropriately assessed. --Dirk Beetstra T C 08:35, 19 May 2016 (UTC)[reply]
    Most of the comments here suggest that the reason for the block was "unfounded" as such this consensus can certainly overturn the older one. Carl Fredik 💌 📧 22:36, 19 May 2016 (UTC)[reply]
    I would also point out this RFC merely removes it from the blacklist, it would not over-rule local consensus at an article to exclude an archive.is link. It merely permits the adding of them subject to normal editing consensus to do so. Only in death does duty end (talk) 19:49, 23 May 2016 (UTC)[reply]
    Most of the comments here did apparently not do their research, or are based on naive thinking: they think that the initial abuse through the additions of archive.is was through some interested independent party that was only here to help Wikipedia, and not the people behind archive.is. Oppose - get your facts straight. --Dirk Beetstra T C 03:34, 24 May 2016 (UTC)[reply]
  2. Oppose There are plenty of legitimate archiving services (archive.org is the main one) that take care of the link rot problem without introducing any of the many problems of archive.is. Jackmcbarn (talk) 00:32, 22 May 2016 (UTC)[reply]
    Archive.is is a legitimate archiving service. It invariably does a better job than archive.org or webcite. Regrettably, the number of archiving services remains pitifully few. Hawkeye7 (talk) 01:45, 22 May 2016 (UTC)[reply]
    As explained above, there are many cases where archive.org and WebCite are incapable of archiving certain parts (or even any meaningful content) of websites and without that the references are useless. I wouldn't mind preferring other archives when they are usable (though it's probably too late to add a section for that to this RfC now), but there is no reason to block archive.is altogether. And we shouldn't waste editors' time by making them request whitelisting whenever they can't get any other working archive. (Not sure if there have been any cases of these requests and whether they would be accepted or not based on the previous RfCs, as far as I know they did not address individual whitelisting.) nyuszika7h (talk) 20:30, 22 May 2016 (UTC)[reply]
  3. Leave on blacklist : No legitimate archiving service would use illegal botnets to add the links. It's interesting that this RFC doesn't mention that piece of archive.is's history, and I expect that the votes to remove it are based on ignorance of that fact.—Kww(talk) 22:48, 23 May 2016 (UTC)[reply]
    Archive.is is no more responsible for that than archive.org is for CyberBot II. Hawkeye7 (talk) 23:29, 23 May 2016 (UTC)[reply]
    We aren't talking Rotlinkbot and contravention of Wikipedia policies: we are talking actual criminal trespass, where hundreds of user computers were compromised. It was a criminal act which directly benefited archive.is and only benefited archive.is. It's pretty unlikely that the only beneficiary of a criminal act was somehow completely uninvolved.—Kww(talk) 23:35, 23 May 2016 (UTC)[reply]
    I also find it pretty striking that very close to the moment that archive.today was deregistered User:Rotlink went around and changed all of those links to archive.is. No-one not involved with archive.is would do that service. Seeing that User:Rotlink and User:Rotlinkbot, and all the IPs show exactly the same behaviour does nothing but suggest that archive.is themselves were behind the actions. Keep sticking your heads in the sand, and pretend that you are not being led by companies so that they can make money through you. --Dirk Beetstra T C 03:34, 24 May 2016 (UTC)[reply]
    1. "No-one not involved with archive.is would do that service." By this logic, all Featured Articles of Wikipedia are written by people directly affiliated with their subjects! Have you ever heard of the word "fan"? Fans are more steadfast and even fierce in their support than people actually affiliated with the matter.
    2. Quality of the service is another matter. That bot affair was sad. But not using archive.is is cutting off the nose to spite the face!
    Codename Lisa (talk) 08:04, 24 May 2016 (UTC)[reply]
    Have you ever heard of spammers - spammers are even more fierce in their support as it is .. making them money. They are sometimes more resilient than fans (and some are around for years and still continue). In that case, ask the WMF to renew their Terms of Use, and open a Phabricator ticket to remove the spam-blacklist extension, as they are both meaningless. You act as if spam does not exist.
    I can agree with the problems, though not using archive.is would not be detrimental to Wikipedia (as many of their archived pages can be found elsewhere, and even if not does not mean that the information is not verifiable anyway), even if some people are acting as if it is the end of Wikipedia. --Dirk Beetstra T C 08:44, 24 May 2016 (UTC)[reply]
    Oh, by the way, fans are involved with the subject, and they do have a reason to advance a subject. It takes effort to adhere to our pillars, and if you are right, you just give the excuse to them not to adhere to our pillars. --Dirk Beetstra T C 08:46, 24 May 2016 (UTC)[reply]
    You are making assumptions here, you have no evidence. As I said earlier, it could have been someone who wanted exactly this, to get it blacklisted (I don't know why anyone would want to do that, but people can do surprising things). But regardless of who did it, that was years ago. The main purpose of the blacklisting was to stop the unauthorized mass addition of links because there was no other way to stop it. And Beetstra, "many of their archived pages can be found elsewhere" – the key word here is many, but not all. There's no reason why we shouldn't make it easier for people trying to verify the information and provide an archive link just because it happens to be only available on archive.is. Let me ask you something, would you also consider pointing an editor trying to verify the information in a dead link to archive.is on the talk page a "blockable offense" (while archive.is is on the blacklist)? nyuszika7h (talk) 09:13, 24 May 2016 (UTC)[reply]
    "I also find it pretty striking that very close to the moment that archive.today was deregistered User:Rotlink went around and changed all of those links to archive.is. No-one not involved with archive.is would do that service." – We know he's a friend of the owner (he said that), but what you say is not necessarily true. When someone's site got taken over and they migrated to a new domain, I replaced the existing links on huwiki with AWB. And I'm just a frequent visitor of the site. nyuszika7h (talk) 09:25, 24 May 2016 (UTC)[reply]
    I know he is a friend, not necessarily involved in a way that is in 'violation' with our current terms of use. I am also aware that his English is good enough to communicate and I would have preferred that some communication would have come forward regarding the situation (either from the friend or from the owner; and IIRC the owner was asked to interact with us). So we know that this is closer to a 'fan' (and one rather close to the owner) than to a Joe job (I have seen such cases, and it is how I ended up in the anti-spam business years ago). Also, this does not look like a Joe Job, or at best a very, very badly executed one. I must say, that for a friend the editor did show to be extremely persistent (seen the nature of the IPs being used) in making sure that archive.is got linked (if you are so persistent that you want to run the risk that you friends' site gets blacklisted .. so you may understand that I am somewhat reluctant to consider the friends' story to be the whole story).
    Regarding the 'many of their archived pages can be found elsewhere' - and I mean many, based on the whitelisting requests I still have to see a significant number for which there are no alternatives. And I hope you followed that discussion and related comments from me there - I am taking and advocating a stance there softer than the initial RfC (a stance reconfirmed in the second, though weak), it is what we have a whitelist for (though the community decision of the RfC was to remove the links also when there was no working alternative).
    Regarding the whether I find that a blockable offense - it depends on how it is done (I'll discuss this broader than your question). In a talkpage discussion, no (even if you would find a non-blocked url-shortening service and use that to link to the page - I would however de-link and meta-blacklist the url-shortening service on sight, but without 'warning' the original editor). In mainspace using wording ('find the archived version on archive.is') would be inappropriate prose for a mainspace reference, and I have removed such instances (one non-archive.is recently: diff). If that would be done repeatedly I would consider it disruptive in itself, not because of the 'evading' of the blacklist (I have used strong wording on someone who did that on another blacklisted link after being denied de-blacklisting, and I removed all the violating instances; did not block the editor). If someone would go into an intentional circumvention of the technical means that enforced the community decision made in the first RfC which was reaffirmed in the third RfC (and I have seen such cases for archive.is), then yes, I would consider that a blockable offense - that is intentional (and in the case where I am talking about: knowingly) circumventing a community decision with which you do not agree. --Dirk Beetstra T C 11:08, 24 May 2016 (UTC)[reply]
    Whether or not evidence is needed is another question - the task of the technical means at hand of users with advanced rights (blocking, spam-blacklist, page protection, edit filters) is to stop disruption to Wikipedia. I do think we agree that the way the links were added was disruptive and counter the practices described in our policies and guidelines. The original editors were running unapproved bots and were blocked. That did not stop the behaviour (so, blocking was not enough), and page protection is obviously not an option here (too many pages involved). So the technical means was to stop the MO of the edits: either through blacklisting or through an edit filter. Whether or not it is the owner, a fan, a friend, your neighbour, monkeys, or a competitors' Joe job is irrelevant, the target is to stop the disruption to Wikipedia, and that is what the community decided should be done through a blacklist (at first enforced through an edit filter which practically did the same for 1 1/2 - 2 years and no-one cared, only when it was blacklisted (User:Jtmoon: attempts; User:Hawkeye7: attempts, User:Nyuszika7H: attempts, to take a few of the supporters for removal here) people started to cry wolf .. funny). --Dirk Beetstra T C 11:44, 24 May 2016 (UTC)[reply]
    What does this mean? Hawkeye7 (talk) 12:23, 24 May 2016 (UTC)[reply]
    What does what mean? Editors are prohibited for years to add links, and the ones that complain hardly find any problems with that (you, Hawkeye7, needed it 5 times (12 edits) in just over 2 years, and I it seems you found alternatives as I did not see any requests (I may have missed those, hard to track) to get these links in), but as soon as blacklisting is performed it is suddenly not replaceable and a problem). --Dirk Beetstra T C 13:11, 24 May 2016 (UTC)[reply]
    Oh. I'm just a peon, so all it said for me was "One or more of the filter IDs you specified are private. Because you are not allowed to view details of private filters, these filters have not been searched for. No results." Hawkeye7 (talk) 13:18, 24 May 2016 (UTC)[reply]
    Sorry, I did not realise that you were not able to see the results of hidden filter hits either (strange). Anyway, to explain myself - when people started at the blacklist/whitelist to complain that they could not live without it, I started to look into how often they were hitting the blacklist/edit-filter due to this (they must have hammered it, right?). I saw for some several hits (though generally not more than about 12 (absolute numbers, you 5 times unique pages, s.o. else in that stat also 12 hits, 4 unique pages); in over 2 years) and checked what they did in response (often: use another archive, as I said, I can't see whether they complained or requested to override). Some of the people that complained that archive.is was sometimes the only archive available NEVER hit the filter nor the spam-blacklist at all (which, I think, is also showing how needed the site is). The discussion that precipitated this short survey of me (an archive.is link that was deemed needed) turned out to be a) available in its original form on archive.org, available under a new, alternative path on the original server, and available on archive.org on the new path - hence replaceable in itself. Although I believe that there are cases where archive.is is the only archive, I do not believe that the situation is as dire as many here do want us to believe (and I think that the onus is on the editors who want to revert the prohibition to show that the link is really needed so much that the community should override their earlier decision - what came out of the whitelisting requests and out of the filter/spam-blacklist does not show such an urgency). For fun, I looked down at the two edits you did not find disruptive (I agree, the individual edits are not disruptive) - both are replaceable). --Dirk Beetstra T C 13:40, 24 May 2016 (UTC)[reply]
    So basically, the arguments a) that the blacklisting is unfounded because it was not the owner, or is unfounded because the threat has stopped, are invalid as well: the blacklisting/edit-filter did what it needed to do; and the involved editor is still active ánd interested in linking archive.is on MediaWiki (as recent as February 14, 2016) - show me evidence that we do not need such means anymore, any guarantee that it will not restart (I have shown to be willing to experiment, and I know cases where such 'experiments' were going very wrong)?
    And if we want to use the argument of 'some are not replaceable' - did anyone do the statistics on a significant subset of currently linked archive.is archives and see how many are really not replaceable? --Dirk Beetstra T C 11:44, 24 May 2016 (UTC)[reply]
    FWIW, I do not agree that it was disruptive. I looked at a couple of the links inserted by the unauthorised bot at random.[1][2] Both are fine; correct archive links were properly added for sites that were no longer available. In these cases at least (the first two I looked at), there was no greater disruption than the authorised bot CyberBot II, which tries to do exactly the same thing with archive.org. So it comes down to an unauthorised bot running. I understand the reasons for prohibiting unauthorised bots, but I'm not seeing actual damage to the encyclopaedia. Hawkeye7 (talk) 12:23, 24 May 2016 (UTC)[reply]
    The disruption was indeed in the way they were added, and as I explain, the only way the community found to stop that was to stop the MO of the editor by blacklisting (blocking and page protection was not cutting the deal). If the editor would have stopped and worked with the community (which, until now have not done) this might have gone different (but outreach from our side did not get any response, nor did they do that effort - and I know what that tells me). --Dirk Beetstra T C 13:11, 24 May 2016 (UTC)[reply]
    Hawkeye7, you realize that we are not talking an "unauthorized" bot in terms of violating Wikipedia policy, right? We're talking about a botnet: computers that have been compromised by virii and malware, with access sold by criminals to people that then use them to commit computer crime. It doesn't come down to an "unauthorised bot" running at all.—Kww(talk) 13:53, 24 May 2016 (UTC)[reply]
    What evidence is there of a botnet? There are many ways to change/spoof IP-addresses entirely without using a bot-net.
    (P.S. virus isn't pluralized virii)
    Carl Fredik 💌 📧 15:34, 24 May 2016 (UTC)[reply]
    Looking deeper there seems to be no evidence — in the discussion leading up the black-listing: [3] — it was mentioned he had used open-proxies — which should have been barred from editing from the get-go. This all screams a total overreaction and poor judgement when instigating the block, and it should be opened up again for legitimate use. We may need better ways to target open-proxies. Carl Fredik 💌 📧 15:40, 24 May 2016 (UTC)[reply]
    You think that those proxies hosted on residential computers around the world weren't there as a result of malware and virus infections? The very report you are linking to is complaining about illegal proxy use in support of archive.is only 4 months ago, belying Codename Lisa's assertion that all misbehaviour is long in the past.—Kww(talk) 15:51, 24 May 2016 (UTC)[reply]
    Firstly — where are you getting the idea that they are on residential computers? And secondly, even if they were there is no indication whatsoever that a bot-net is involved. FoxyProxy is just one service that allows the average user to set up an open proxy, this can later simply be scanned for across the entire internet, examples [4] , [5]. Carl Fredik 💌 📧 16:26, 24 May 2016 (UTC)[reply]
    There are numerous DSL and cable modems in the mix presented at WP:Archive.is RFC, CFCF: did you ever bother to actually analyse the IP list presented in the data, or are you just screaming absolutely no evidence without having devoted any time to analysing the data presented?—Kww(talk) 16:37, 24 May 2016 (UTC)[reply]
    No, I didn't analyze them myself — what I saw was a lack of any analysis presented on that page. There wasn't so much as a comment, and even if they are private computers there are any number of private users who host open proxies — as I have already described. Carl Fredik 💌 📧 16:44, 24 May 2016 (UTC)[reply]
    Just for a fun exercise, check out https://www.socks-proxy.net/ which scans for open proxies — you'll find loads of private addresses. There is no need for a botnet, and no reason to believe there was any illegal activity. We need to better patrol open proxies, but that isn't going to be solved using the spam-list.Carl Fredik 💌 📧 16:50, 24 May 2016 (UTC)[reply]
    Thanks for admitting that all of your comments have been made without analysing the evidence presented: I hope whoever closes this mess takes that into account.—Kww(talk) 16:59, 24 May 2016 (UTC)[reply]
    Kww—You as well as I know that is not what I wrote. I did not perform an independent network-analysis of the used addresses, in part because after 4 years they are likely no longer operational, but also because that is required by you who make the allegations, not me! There is nothing in the previous RfCs or discussions that point towards any illegal behaviour here — and insinuating that my arguments hold no value because I have not performed a WHOIS/Ping/Port scan of each address is fraudulent! I can add that in many jurisdictions such analysis is in itself illegal! Carl Fredik 💌 📧 21:00, 24 May 2016 (UTC)[reply]
    User:Haskwye7, User:Kww - you might want to review this in that respect. --Dirk Beetstra T C 15:14, 24 May 2016 (UTC)[reply]
    Note that both are replaceable with alternative archives: the first one and the second one (is it me, or is the second reference conflicting with the data presented in the table on Wikipedia - the table reads 37, 25 and 5, the archive talks only about "Never Cry Another Tear - 24/10/2009 - 24/10/2009 - 70 - 1" (the item does not seem listed in the table on Wikipedia). --Dirk Beetstra T C 13:40, 24 May 2016 (UTC)[reply]
    Kww, please have a look at http://spys.ru/proxies/ you will find out that: 1) there are plenty of 'residential IP' among public proxies and 2) they are not botnet but misconfigured soho-routers.
    Also, Wikipedia cannot be edited from 'hosting IPs'. Even if the spammers took unsorted public proxylist the successful edits could have been made from 'residential IPs' only.— Preceding unsigned comment added by 203.170.75.14 (talkcontribs)
    News flash: Accessing a "misconfigured soho-router" without the owner's knowledge and consent is also a criminal act.—Kww(talk) 17:18, 25 May 2016 (UTC)[reply]
    1) Not at all. There are many research and commercial projects what access each and every address on the Internet making no discount on how well the endpoint is configured and on what premises it is located. Such as shodan.io, scans.io, ..., not to mention projects like hola.org which silently but nevertheless legally turns every computer where their browser extension is installed into a proxy-server.
    2) I believe that you use the word "crime" metonymically as "what I personally would consider unethical". Otherwise, the confidence about their crime implies you as a criminal for "forgetting" to notify the authorities. Regardless of were their action a crime or not, misprision is a crime.
    I will accept the technical correction that mere "access" is not a crime. Using it to transmit commands to other computers goes well beyond mere access.—Kww(talk) 20:03, 25 May 2016 (UTC)[reply]
    Any "access" implies transmitting a command to another computer, in that even a ping-request will require passing this back through a series of relays. Now if we want to use a more restrictive definition - these computers are not transmitting any commands, they are simply relaying it as proxies - where the request still originates from the original users computer. If configured voluntarily by the user, accessing such a proxy, if unprotected by passwords etc. would not constitute a crime in most locales. Carl Fredik 💌 📧 21:37, 25 May 2016 (UTC)[reply]
  4. Oppose There are 3 conditions I want to see prior to reconsidering if we want to allow Archive.is:
    1. an admission from Rotlink, and the management from Archive.Whatever that they acknowledge that
      • their initial attempts to run a bot to add Archive.is links and to overwrite other archiving services with Archive.is was out of order because of WP:BOTPOL
      • their initial attempts to "spam" their service's links into English wikipedia without gaining a consensus to do so is out of order
      • after having RotlinkBot blocked and the refusal to participate in WP:BRFA and consensus building, numerous IP addresses began making the same actions that RotlinkBot. This gave the impression that either they were botswarming on multiple different hosts or had deliberately planted software on residental computers in order to accomplish a fait accompli
      • they have not engaged collaboratively with the community to resolve the issues (namely ignoring robots.txt, striping out original ads, injecting ads to monetize Archive.is, etc) that caused their service to be not welcome in English Wikipedia
    2. That they engage with the community to resolve the issues that have previously blocked the usage of Archive.is
    3. That they work with the foundation to provide a better source of retaining reference data for pages.
    Until these are finished, Any relaxing of restrictions is premature and ill guided as the reasons for the restrictions have not been overcome. Hasteur (talk) 12:45, 25 May 2016 (UTC)[reply]
    @Hasteur: As what I did with others, I'd like to bring the discussion here: Wikipedia:Administrators'_noticeboard/Archive279#Sockmaster_flagging_archive.today_links_en_masse to the table, to show the total neglect that Rotlink (and sockpuppets/meatpuppets) have for the situation here (which, in my opinion, does not show much hope that the spamming will not continue ..). --Dirk Beetstra T C 13:32, 25 May 2016 (UTC)[reply]
    @Beetstra: I was not up to speed on the recent history, but I am completely unsuprised that the same actions are still being jammed into the encyclopedia. Hasteur (talk) 14:14, 25 May 2016 (UTC)[reply]
    We have tried to get the Internert Archive to do these things without success. Hawkeye7 (talk) 21:46, 25 May 2016 (UTC)[reply]
    You seem to be unable to tell the difference between users adding the links of their own choice and the service itself inserting the links. To use an analogy, you as a user of a computer are free to choose to use what web browser you want (in the generic case). Using Microsoft Office doesn't automatically oblige you to use Internet Explorer. Hasteur (talk) 23:40, 25 May 2016 (UTC)[reply]
    That is all I am asking for; the freedom to use the archiving service I want. Feel free to petition for CyberBot II to stop adding archive.org links until they comply with your preconditions. Hawkeye7 (talk) 02:22, 26 May 2016 (UTC)[reply]
    Has Internet Archive been caught red heanded trying to override community consensus? Has Internet Archive been caught using disruptive tactics after the community has rejected it? You still seem to think that Archive.is and Internet Archive are equivilant services. Let's try yet annother analogy since you're still missing the point. In your city, I assume Taxi service is available, and there are regulations around taxi service. Internet Archive is acting like a responsible citizen and following all the regulations. Archive.is is acting like a pirate taxi operator and picking up fares wherever they want and ignoring regulations, safety concerns, and trafic laws. Currently we have a plan in place (in our analogy universe) that the police (admins) are empowered to stop the Archive.is cars, arrest the drivers, and impound the car for willful disregard of the city (wikipeida) policy. Finally your repeated and willful ignorance is suggesting (at least to me) that you may have a PoV you should declare. Hasteur (talk) 12:13, 26 May 2016 (UTC)[reply]

Discussion (Request For Comment)

Arguments by User:Jtmoon,

Support Removal
  1. immune from LINKROT
  2. other archiving sources are suffering from linkrot as this is discussed
  3. proven itself reliable; operating same way since 2012
  4. fast, reliable, and easy to use; all Wikipedians grasp it, compare to archive.org which can be pretty slow, IMO.
Counter-Oppose
  1. most Oppose concerns are "Crystal Ball reasoning"; might get taken down, might host malware, might host spam
    1. most Oppose concerns apply to all other websites considered link worthy; news sites, blogs, corporate sites, etc.
  2. since the Oppose position is that of censoring, the burden of proof should be on that side to present decent evidence of their concerns. So far, it has only been speculation.
    1. no proof provided of supposed spam-bots, malware hosting
    2. no proof provided archive.is had employed (or had reason to employ) a botnet
    3. no proof provided of any botnet
  3. commercial concerns make no sense as nearly all primary source news websites are commercial websites
--JamesThomasMoon1979 04:17, 17 May 2016 (UTC)[reply]

Previous RfCs

The previous RfCs have multiple aspects, which are not all addressed in this RfC. The RfCs do include conclusions about removal of links (which would still have consensus), the RfCs talk about prohibiting additions of links (which would also not be addressed here). Can someone please appropriately address these issues, as the conflicting consensuses might now result in the blacklisting rules being removed, while we are still to prohibit additions (which would mean that an edit filter should be re-enabled?), and we should still remove the links while the links are not on the blacklist. As the situation is now, the outcome of this to-be-RfC is not bringing anyone anywhere. --Dirk Beetstra T C 05:14, 19 May 2016 (UTC)[reply]

The previous RfC ended with no consensus on the removal of links, or on allowing the addition of links. It is understood that removal from the blacklist means that editors are free to add links whereever they think appropriate. Hawkeye7 (talk) 08:26, 19 May 2016 (UTC)[reply]
I quote from Wikipedia:Archive.is_RFC_3: "Result: No consensus — Given the prior RFC, this presumably means the stare decisis would be to continue prohibiting additions and continue removing." - that consensus will not be overthrown with this the decision to de-blacklist, the consensus would still be to prohibit additions and continue removing, except that that would not be enforced with the blacklist but in other ways. And even if the consensus would be not to use technical means to prohibit additions (so no blacklist ánd no edit filter, e.g.), the continued addition would still be against consensus .. and anyone violating that would be editing against community consensus; "Combined with the lack of consensus affirming the inverse of this item (i.e., #1), again we have to look to the prior RFC, which presumably means the stare decisis would still be to continue removal, unless I'm missing something." - also not overthrown with de-blacklisting, we would still need to remove the links. Can we now please set this RfC up in a proper way, as any conclusion here is meaningless. --Dirk Beetstra T C 08:33, 19 May 2016 (UTC)[reply]
When there is no consensus, the status quo stands. Consensus on this RfC will establish a new consensus. It is understood that removal from the blacklist means that editors are free to add archive.is links whereever they think appropriate. Hawkeye7 (talk) 09:08, 19 May 2016 (UTC)[reply]
Don't get me wrong, I have nothing against overturning the previous consensuses (and I might even !vote in favour of doing that), but administratively the problem is differently. We currently have a community consensus to blacklist and remove - and that is what is being applied. I cannot make the single-admin decision to overturn that. The set-up of the current RfC does not overturn that either, it only means that the blacklisting is to be reverted, but there is still a standing consensus to prohibit. I would strongly suggest to phrase clear questions in the RfC that address the previous RfCs conclusions: 'should the additions of archive.is links still be prohibited (yes/no)?', 'should existing archive.is links still be removed (when the conclusion of the first question is 'yes') (yes/no)?'. Positive answers on that is what you (we) want. I don't understand now why the questions in this RfC cannot be addressed properly, get a proper, new, consensus, and execute that. Why do things half-baked? --Dirk Beetstra T C 09:14, 19 May 2016 (UTC)[reply]
The very same reason that you or me didn't create the RfC in the first place! Because it places can intolerable burden on an editor trying to create an RfC. Restricts access to the whole RfC process to wikilawyers. So adhering to WP:NOTBUREAUCRACY, which is a policy, a support vote should be considered a positive answer for all of the above. Remove from the blacklist (or whether the hell it is that is preventing addition of links). Do not remove existing links. Allow addition of new links. Do not collect $200. Hawkeye7 (talk) 09:33, 19 May 2016 (UTC)[reply]
I have not been the one arguing to get the previous consensus overturned, I have ruled that that was needed for me to take administrative action. 'A support vote should be considered a positive answer for all of the above.' - should by who? By the administrator that has to judge what the consensus that is achieved here is meaning for the previous consensus? There have now been two RfCs after the first one, one invalidated early, and a second with half-baked questions that did not properly get a consensus or clear answer (the previous RfC ended with two conclusions having '.. presumably means ..' in them). So I will forego the conclusion here that it will here become 'the consensus here shows that the blacklist should not be used to prohibit additions, an action which is endorsed in the previous RfCs, which presumably means that the link additions should still be prohibited but that that should not be done using the spam-blacklist', and 'the additions should not be prohibited with the spam-blacklist, but are still prohibited, which presumably means that the links should still be removed'. A bit of working together in creating this RfC and getting proper questions laid out (and I did edit this page early on, and commented in different places about it early on, all with similar suggestions) before starting may get us a long way, and at least a clear answer that can not be wikilawyered around or which can have multiple interpretations. The question laid out here does however have multiple interpretations possible. --Dirk Beetstra T C 10:56, 19 May 2016 (UTC)[reply]
I reject your argument that it is impossible to remove the site from the blacklist. This RfC is for exactly that. Hawkeye7 (talk) 12:51, 19 May 2016 (UTC)[reply]
I am not making that argument, on the contrary, I make the argument that the consensus here could be that the site gets removed from the blacklist. It will however not counter the consensus reached by the earlier RfCs - that additions should be prohibited and that the existing links should be removed. --Dirk Beetstra T C 13:14, 19 May 2016 (UTC)[reply]
This RfC implicitly removes all restrictions on archive.is. Hawkeye7 (talk) 13:35, 19 May 2016 (UTC)[reply]
"Remove archive.is from the Spam blacklist) (Oppose/Support)" - explicitly ALL restrictions? I only see a question for one restriction. --Dirk Beetstra T C 13:37, 19 May 2016 (UTC)[reply]
Well there is nothing else that we can do. If wikilawyers want to try and obstruct the use of archive.is in the face of consensus, then it becomes a matter for ArbCom. Hawkeye7 (talk) 13:49, 19 May 2016 (UTC)[reply]
@Hawkeye7, Nyuszika7H, CFCF, and Beetstra: I've been tied up with IRL work and will be for a bit longer. Perhaps you users or @Staberinde: or others could address User:Beetstra concerns. Thanks User:Beetstra for bringing this up.
@Beetstra:, I presumed the prior RFCs, seeing no activity for several years, and having not changed the Blacklisting, then they could be left as is. That is, this RFC 4 would not need to re-present the arguments in those prior RFCs since they seemed to be a stalemate. Secondly, I wanted to simplify the issue in this RFC 4 since re-presenting arguments in three prior RFC was too onerous for the editor (myself) and too onerous for other Users to understand that would only complicate this issue for little or no improvement to en.wikipedia.org.
--JamesThomasMoon1979 18:18, 20 May 2016 (UTC)[reply]


Considering that RfC just started we could simply add "Remove archive.is from the Spam blacklist and permit adding new links" to proposal to cover this, and then ping everyone who already voted in the section. There is really not much point in 3-4 different questions here, as it is meaningless to remove from blacklist if we don't also permit adding new links. Any objections @Beetstra:?--Staberinde (talk) 15:31, 19 May 2016 (UTC)[reply]

Also this RfC isn't properly listed yet, but I guess that should wait until formatting objections have been handled.--Staberinde (talk) 15:46, 19 May 2016 (UTC)[reply]

Or we just ignore this inane objection on the grounds that it has no merit. Wikipedia is not a bureaucracy, and we can assuredly interpret this question and the responses to be sufficient for overturning an old consensus. We should interpret the spirit of the question, not the exact words. Any more ink spilt on this nonsense damages the sanity of anyone who's actually here to help improve the encyclopaedia. Carl Fredik 💌 📧 22:46, 19 May 2016 (UTC)[reply]
@Jtmoon, Hawkeye7, Nyuszika7H, CFCF, and Beetstra: I went ahead and clarified proposal wording a bit adding "and permit adding new links"[6]. Pinging all who already voted to avoid any later complaints.--Staberinde (talk) 16:11, 20 May 2016 (UTC)[reply]

Comments by Beetstra

  1. @Jtmoon: there are many alternatives that can be used, and you seem to have gone quite well without (I presume you did find alternatives or did not find it excessively needed when you were blocked 3 times in additions of the site). The original spammers of 3 1/2 years ago (who were using accounts and mulitple IPs (botnets)) were active on-wiki as recent as February 2016. The use of such practices, whether by site owners, fans, friends, or a competitor, is a very well founded reason to use technical means (edit filter, blocking, page protection or the spam blacklist) to stop such edits. --Dirk Beetstra T C 05:35, 25 May 2016 (UTC) (reping: @Jtmoon: --Dirk Beetstra T C 05:35, 25 May 2016 (UTC))[reply]
    If it happens again it is easy to get User:Xlinkbot to scan for new IPs adding links and to ban them for being open proxies. Carl Fredik 💌 📧 09:39, 25 May 2016 (UTC)[reply]
    @Hawkeye7: - we've discussed this in the section below. I am still awaiting answer there. --Dirk Beetstra T C 05:35, 25 May 2016 (UTC)[reply]
  1. @Nyuszika7H: Can you explain to me why you think that other archiving services fail to archive parts of sites, when you appear to only had the need to use it once? Did you always use alternatives? Also, you are aware that there is evidence for a botnet (I hope that you properly analysed the provided evidence for that), and that the original spammers have used it as recent as February 2016. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Once is enough for a user to know and remember that the site can not be linked — and already we have a negative impact. This isn't about the amount of use, it's about a completely asinine and incorrect application of the spam-list. Carl Fredik 💌 📧 09:42, 25 May 2016 (UTC) [reply]
    "Once is enough for a user to know and remember that the site can not be linked" – exactly. And in the cases I needed it, I didn't feel like requesting whitelisting, because I didn't even know that was possible – back then it was using the edit filter, I don't know if it's possible to whitelist with that, but I didn't know about MediaWiki:Spam-whitelist until some point either. And I didn't feel encouraged to request whitelisting the few times I needed it, based on other users' comments.
    I was not aware of the recent activity, but as someone else pointed it out, there is no reason to believe machines were compromised, as there are a lot of users who run open proxies, there are thousands of open proxies and not all of them are (or can be) blocked. At the very least, we need to make sure that at least legitimate uses with no alternatives will be whitelisted. (Have any archive.is links ever been whitelisted? The previous RfCs didn't really address that, just one user mentioned whitelisting, unless I missed something.) But as evidenced by the circumvention, this is not the right approach in the long term. CFCF's suggestion of using XLinkBot sounds like a good idea to me. Although I guess sleeper socks could be used, that wouldn't be as easy to abuse. nyuszika7h (talk) 09:58, 25 May 2016 (UTC)[reply]
    {@CFCF: I took the point that 'once is enough to know' would be a point - still no-one ever ran into an sufficiently urgent need that they did ask for ways around the filter, that still suggests that it is not very needed. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    Yes, and this could be considered a chilling effect on the type of content that would otherwise have been authored. Numerous studies and literature show how adding even minescule hurdles to performing a task on the internet will decrease the amount it is performed by an order of magnitude. It causes users to think: "Oh, but there must be a really good reason why this is blocked" or "Oh, this isn't worth my time" or even "Oh, but that process is horrible, I don't want to force myself through it" — all of which disincentivise producing quality content. Carl Fredik 💌 📧 11:18, 25 May 2016 (UTC)[reply]
    Still it is telling, as is that now suddenly we need to remove it from the blacklist (a decision taken 3 1/2 years ago, with an inbetween re-discussion) .. because additions are not allowed and we need it .. that is a misinterpretation (and I have been commenting on how this RfC is presented earlier). --Dirk Beetstra T C 12:15, 25 May 2016 (UTC)[reply]
    @Nyuszika7H: I also considered XLinkBot earlier. Also that however has collateral damage, but to a lesser extend (similarly, we could use an edit-filter, though that is heavier on the server). In the time of the edit-filter there was no whitelisting possible indeed. As spamming is likely still a problem, I think that XLinkBot/EditFilter may be needed as an alternative, and I think that that should be strongly considered.
    There are currently no archive.is sites whitelisted. For the handful of requests that I handled alternatives existed (though I have indicated and suggested there that if there is no alternative whitelisting could be carried out). --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    @Anotherclown: Can you explain to me why you think it is a useful tool, when you appear to never had the need to use the site? Did you always use alternatives? --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
Gday. I've never added it to an article for the very reason that it was blacklisted (hence the nil results in your search). There have been a few occasions when I would have as it was the only tool that had an archived copy of a particular dead link. As such the result of it being blacklisted was that unreferenced information was removed when it could otherwise have been referenced and retained. Anotherclown (talk) 00:08, 26 May 2016 (UTC)[reply]
  1. As I mentioned previously the mere knowledge that the site is blocked is enough to have a chilling effect. Requiring users to jump through excessive hoops to whitelist links is ridiculous and leads to nothing beyond allowing you to display your authority in denying them access.. Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC) [reply]
    @CFCF: That is ad hominim and out of line with how I have acted there. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    I admit that may have been clumsily phrased — but the issue is that this centralizes the command to a single user or a small subset of users (you yourself have previously expressed that too few admins are willing to work with the black-/white-list). Wikipedia is supposed to refrain from this type of centralization of power — and a straw-poll of the different requests on the white-list page shows that most are being denied (many for pretty trivial reasons too). Either whitelisting individual URLS (not entire page-directories) should be automated so that any authorized/autopatrolled user can do it — or the spam-list should only be used for cases where there is little cause for legitimate use of a website.
    That other similar websites are available is also not an argument in this case at all — because it should come down to preference which archival service is to be used. archive.is allows for screen-shotting the page and archives the page as it loads for the user who made the request — for example bypassing regional variations in content. An example here is to go to www.nickelodeon.com, or www.disney.com — where there is no visible way to access other regional sites. It is possible that this type of censoring can spread to other sites, more akin to the type that we use for referencing on Wikipedia. In fact this is already occurring: http://speisa.com/modules/articles/index.php/item.2446/daily-mail-blocks-sweden-for-legal-reasons.html
    Traditional archival services such as archive.org have nothing to offer in this type of situation, while archive.is is able to accurately portray differences in the loaded sites. These are all rare case-scenarious, but adding the burden to request whitelisting (a very cumbersome process) is damaging to the encyclopedia. Carl Fredik 💌 📧 11:12, 25 May 2016 (UTC)[reply]
    Yes, I have been a strong advocate of getting more people involved there, as (and I know that), I am sometimes policeman, judge, executioner, higher court, again executioner ... (which for clear-cut cases is not necessarily a problem, and I do sometimes step away .. which then again sometimes results in requests getting ignored due to no alternative manpower). There would be improvements possible there (also in the long-requested overhaul of the whole system, it is inconvenient and difficult to operate - not the priority of WMF apparently). 'Automation' is difficult, as often one does need to dig into reasons and see where requests come from and how/if they are needed. This specific case is now based on RfC (i.e. community consensus). That community consensus states that links should be removed and additions prohibited (full stop). Based on that, any whitelisting could just be blanket denied (that is automated). I already WP:IAR on that in that I do say that if there is no replacement possible that I would whitelist. I have however not run into many of those cases. Note, regarding more manpower, non-admins could easily help there in evaluating situations and/or clarifying (and even denying the obvious 'bad' requests) - but that is not in the interest of people, apparently.
    Generally, the spam blacklist is only used for those cases where there is hardly to no legitimate use. For archive.is we have been through a meta-discussion where we discussed the cross-wiki aspect of this site, with early on already saying that this is probably not good to blacklist but that we did need a way to mitigate the cross-wiki situation (there was resistance from a significant number of wikis to the unsolicited additions). It again boils down to the crudeness of the spam-blacklist extension (WMF, where are you?), where alternatives are hard to implement (a global edit filter with a throttle would be an option, but that is taking too much system resources; I could write a throttling variety into XLinkBot .. but my time is also limited).
    As such, I am afraid that we end up in a loose-loose type of situation - or no legitimate use, or giving in to yet another spammer (I know, we have no evidence that the editor is connected to archive.is (he says he is a friend), but evidence to the contrary does not exist either). I do not expect that the community will step in sufficiently if the IPs start spamming archive.is again (maybe a bit of whack-a-mole and a bit of reverting), nor that now we finally see some significant pressure on the developers to actually come with proper solutions to this type of situations. --Dirk Beetstra T C 12:15, 25 May 2016 (UTC)[reply]
  1. @Staberinde: You are aware that the original editors that were using botnets to add these links to various Wikimedia sites have done so as recent as February 2016? I would consider that a serious reason to continue blocking the site. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    No evidence has been presented that the original editors used anything but legal tools — and while this is an issue that can be solved through other means we should not play mock investigators here — it undermines our position and makes us look like fools who do not understand basics of the internet. Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC) [reply]
    The use of multiple internationally based residential IPs does not look good. And whether the use of the IPs itself was legal or not - if someone is editing disruptively from multiple IPs people work hard to stop it, here someone is running unauthorised (in Wikipedia terms) automated accounts from multiple IPs .. --Dirk Beetstra T C 12:15, 25 May 2016 (UTC)[reply]
    @CFCF: A pretty minor problem - thousands of uncontrolled link additions by unauthorised bots and using botnets to circumvent accounts being blocked does not sound like a 'pretty minor problem' to me, but more like a massive understatement. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Yeah, pretty minor. Botnets (as has been explained many times before) — and all we are left with is an unauthorised bot and one stupid editor who isn't following the rules. Either we could invite him/her to apply for authorization or get some better way to block the open proxies that have been used. It should not be too hard to track where he has gotten his proxy-list from. Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC)[reply]
    @CFCF: Well, a bit of analysis shows what they abuse, and how difficult it is to block all those IPs. 3 1/2 years after the initial situation they were back at it with a whole new set. 'It should not be too hard to track where he has gotten his proxy-list from.' - if so, it is also not to hard for them to find another list. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    {rto|Hobit}} in the end, who added it is not of importance. Whether it is the site owner, a fan, a bunch of monkeys, a sweatshop, your neighbour, or a competitor, the result for Wikipedia is the same: continued and repeated disruption through the use of unauthorised automated accounts / IPs (where the IPs are part of botnets). --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    And in the end the spam black-list is a poor way to address the concerns because it has collateral damage and impacts legitimate uses as well. (Whitelisting is not a viable option.) Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC)[reply]
    @CFCF: So we just open the tap without considering viable alternatives (well, we two were the only two to at least hint at alternatives). --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    @Codename Lisa: it is indeed a sad incident, unfortunately, it did not stay just with that incident, the same situation repeated again just 3 months ago (about 3 1/2 years after the initial problem). --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
  1. @SSTflyer: you have "run into many occasions when repairing dead links where Archive.is is the only archiving service available" is in striking contrast with your two recorded attempts to use archive.is in the 2 1/2 years the filter blocked the additions. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Mere knowledge that Wikipedia does not allow Archive.is is enough to make editors avoid using it. That argument is absolutely insane — it builds on the idea that all we ever do on the internet is edit Wikipedia, and that we would never archive for other purposes — and it is just stupid in that it implies that editors do not remember or could not have seen others who have tried and failed to use archive.is . Carl Fredik 💌 📧 09:52, 25 May 2016 (UTC) [reply]
    @CFCF: as above, I considered the once-bitten-twice-shy option .. still no-one had a big enough problem to solve anyway, even if they knew that just trying would not be enough. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    @Nyttend: a big overreaction when the site gets spammed using multiple unauthorised bot accounts and botnet IPs, and where the original editors behind the additions feel to repeat their actions as recent as February 2016? --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Yes, a big overreaction — even in the way you talk about it. There is no evidence of a bot-net!' Carl Fredik 💌 📧 09:52, 25 May 2016 (UTC) [reply]
    Shout all you like, but the alternative is "used a list of potentially compromised computers and misconfigured routers in an effort to bypass IP level blocks", which is just as illegal. Anybody shouting that there's no evidence is simply ignoring the evidence presented.—Kww(talk) 02:53, 26 May 2016 (UTC)[reply]
    No, that is an alternative, not the alternative. I will give you that it is possible that this could have been done illegally — but that is not enough to even indict anyone in any court in the world. You must beyond reasonable doubt show that the person has performed a crime, not just mere allegations that it is possible that in performing this action it is also possible to perform a crime. So, as far as legality goes — unless you have better proof your argument is null.
    Using and hosting open proxies on personal computers is a practice that is legal in most of the world — and does not require you to use compromised or misconfigured hardware. All it requires is that you connect to a server hosting an open proxy.
    Your argument amounts to wafting away a gardener from his job while cutting the rose-bushes, because: "his garden-scissors could be stolen — and he might be an illegal immigrant too". Just because you don't happened to like what he did doesn't mean it was illegal! Carl Fredik 💌 📧 10:02, 26 May 2016 (UTC)[reply]
    @David Gerard: ".. way too useful to encyclopedic purposes", yet in the 2 1/2 years that this was monitored, you felt only once to use the site and apparently have found alternatives in all other cases where these encyclopedic purposes needed to be met. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Whatever you're measuring is incorrect - I have preferred archive.org links, but (e.g. when h-online took down its entire archive) had to resort to archive.is several times when there was no other copy I could find.
    Also, at this point you're just getting spammy yourself, badgering people, saying the same things over and over. This isn't convincing me, and I suspect others, that you have a substantive point - David Gerard (talk) 06:41, 25 May 2016 (UTC)[reply]
    I can agree that my measure is thwarted (I took that into account, though that in itself is telling as well), but I don't think that the use of archive.is is as big as is here suggested (and most people here did not do the effort to actually see whether it is so much needed over other services, and that is what you have said here as well ('I have preferred archive.org links' - so not much use for archive.is anyway, I wonder how others think about this).
    I think that if you use arguments in an RfC that you should be able to defend those, and I have been saying since the beginning that people were not properly informed with the introduction of this RfC and that it needed to be expanded (and the question that you posed below in the discussion is one of the examples why your !vote here is uninformed - what if we have no other methods of stopping the continued abuse), and I do think that most of the editors here do give (obviously) uninformed responses (the spamming is something of the past - did the !voter actually go through recent discussions regarding this subject and looked through the logs to see whether it is not current and ongoing? Well obviously not, since it was an issue a mere 3 months ago). --Dirk Beetstra T C 06:58, 25 May 2016 (UTC) (expanded --Dirk Beetstra T C 07:02, 25 May 2016 (UTC))[reply]
  1. @Ravensfire: You forgot about the case where a domain expires and/or is taken over. Parked domains usually have a robots.txt that prevents crawling, and if it's taken over, the attacker can do so manually. Archive.is has a manual report process if one would like an archived copy to be taken down. There is only limited automated archiving (from Wikipedia recent changes, I don't know of anything else), other than that it's manual so it doesn't really make sense to apply robots.txt. Anyway, I don't mind a preference for other archives when not inferior. nyuszika7h (talk) 14:30, 23 May 2016 (UTC)[reply]
  1. @Pinguinn: "The bot stuff is in the past now." - for a full 3 months. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
  1. @James Allison: Appears to be a useful way to prevent link rot .. yet you have not used this link in the 2 1/2 years that it's use was restricted. How can you judge that it is useful? --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
  1. Mendaliv - that is the current practice (a slight soft reading of the previous RfC): links can not be used (as they are blacklisted), except when it can be shown that there is no replacement, in which case specific links can be whitelisted. If that is what you mean, then the links could as well stay on the blacklist. --Dirk Beetstra T C 03:13, 25 May 2016 (UTC)[reply]
    I support flipping the burden, then. Demonstrate that the links are available elsewhere, are needless, or are otherwise forbidden by policy and they can be removed. The bottom line is that there's no justification for keeping it on the blacklist. If it starts getting uncontrollably spammed, then we can consider re-adding it. As I say above, and have said elsewhere, there's no clear evidence of use of a botnet, no evidence to connect the spamming to the site operators, and no evidence that disfavoring its use as an archival service merits this extraordinary and unusual manner of preventing its use. And let's just assume for a moment that there is a real risk of the site going rogue: Then use Internet Archive to snapshot the archive.is page and link to Internet Archive instead (presuming there's some reason why we aren't just using IA to archive the live page). No risk of going rogue anymore. —/Mendaliv//Δ's/ 03:47, 25 May 2016 (UTC)[reply]
    @Mendaliv: "there's no justification for keeping it on the blacklist. If it starts getting uncontrollably spammed, ...". You are aware that the original spammers were active throughout Wikimedia as recent as February 2016, using the same techniques of automated edits on both an account and on multiple IPs of the same character (proxies, botnets) as what precipitated the original discussions, RfCs and consensus to blacklist? --Dirk Beetstra T C 04:55, 25 May 2016 (UTC)[reply]
  1. What is this supposed to mean, Kaldari? If all else fails we remove it from the blacklist, or that the links should only be used if there is no alternative archive (the latter being the current practice)? --Dirk Beetstra T C 06:58, 25 May 2016 (UTC)[reply]

Spammy links?

Of course, the original problem remains. Can we deal effectively with the spammy links it's being abused for on an individual basis? Is this feasible? - David Gerard (talk) 19:22, 23 May 2016 (UTC)[reply]

@David Gerard: I'm not sure what you're talking about, I'm not aware of archive.is links still being "spammed", unless you are talking about bypassing the spam filter using archive.is, which can just as easily be done with WebCite. nyuszika7h (talk) 20:31, 23 May 2016 (UTC)[reply]
I'm assuming that was the original justification for adding it to the blacklist. If it wasn't, what was? - David Gerard (talk) 10:36, 24 May 2016 (UTC)[reply]
@David Gerard: Well indeed, what this case has shown was that someone was originally spamming it and that that resulted in blacklisting (first enforced through an edit filter). That spamming was first by named accounts, and when those were blocked multiple IPs took over - which showed that blocking the accounts did not solve the problem (and obviously page protection is not cutting the deal either). We also know that one of the original editors was active (on Wikimedia, not on en.wikipedia where they are blocked) as recent as February 2016, so I think that that is enough evidence to show that the editors until recent still had interest in linking this (though they have not been active in multiple additions before 2016 on a really significant scale). On the other hand, members of this community have not been able to add links for years now without too much problems (only within days of actual blacklisting they started to cry wolf because they did not comply with the second decision made in the original RfCs: all links should be removed). Those editors have not shown that there are indeed a significant number of links that can never be replaced by showing an analysis of a random, significant subset of currently linked archive.is archives (which corroborates with the few complaints that they could not add the links in the first place - apparently there alternatives existed or were not needed), nor have they shown whether not having an archive (for newly to-be-added links, or for those linked instances) is detrimental for the information on that page. User:XLinkBot might be an option to catch spammers early-on, but that will have some collateral damage similar, but on a smaller scale, to spam-blacklisting (reverting genuine edits by new editors/IPs). --Dirk Beetstra T C 12:04, 24 May 2016 (UTC)[reply]
You might want to read this discussion to make a decision on whether you think the threat is over, and whether we can handle the situation in other ways. --Dirk Beetstra T C 15:12, 24 May 2016 (UTC)[reply]

NOINDEX

Please stop removing NOINDEX. RFCs are not to be indexed, see http://en.wikipedia.org/robots.txt (also http://bugzilla.wikimedia.org/show_bug.cgi?id=11261). This Archive.is RFCs are malicioulsly placeed not under /wiki/Wikipedia_talk:Requests_for_comment/ to evade the indexing prohibition. That's especially weird when done by the editors arguing for respecting robots.txt— Preceding unsigned comment added by 105.107.123.132 (talkcontribs)

There are many RfCs which are not under that tree. There is no policy or guideline that prescribes that RfCs are not to be indexed, that pages under mentioned tree are noindexed may very well be for another reason. Do not insert that tag again without getting a proper consensus. --Dirk Beetstra T C 13:06, 26 May 2016 (UTC)[reply]
There is also no policy that archives should respect robots.txt. You put yourself to the opponent side. Also, before consensus is established the noindex must be present as the default option for all RFCs.
Or should I open ticket in Bugzilla to add this pages to robots.txt manually by the WMF admins?— Preceding unsigned comment added by 78.139.174.106 (talkcontribs)
An RfC .. not no-indexed for the reason of being an RfC. And there are many examples like that. --Dirk Beetstra T C 13:15, 26 May 2016 (UTC)[reply]
I do however have a question, randomly cross-country hopping IP: what is in these RfCs that it should not be found by e.g. Google? --Dirk Beetstra T C 13:18, 26 May 2016 (UTC)[reply]
There is a RFC-section on a talk page which cannot be individually included or excluded. Archive.is RFC are designated pages in the same Wikipedia: namespace where all non-indexed RFCs are normally placed.
There is no cross-country, I am in Georgia. And I have question - what was the reason to format the name of the RFCs in a way that it should not be found by e.g. Google? — Preceding unsigned comment added by 78.139.174.106 (talkcontribs)
'where all non-indexed RFCs are normally placed' .. there is however no real regulation for that.
Funny, a couple of minutes ago you were in Algeria, and yesterday in the Ukraine. And I already answered that question: there is no policy or regulation for that on Wikipedia. You have however neatly evaded the question. --Dirk Beetstra T C 13:32, 26 May 2016 (UTC)[reply]
There is established practice coded in robots.txt. There is established practice to place new RFCs undex /wiki/Wikipedia:Requests_for_comment/. Well, no one has to follow it. But no one has to resist either. There must be the reason why an experienced editor do not follow it and create RFC with a SEO-optimized name avoiding robots.txt - it is not a random event already. And then fight again NOINDEX on the pages - it is not a random event definitively.