Wikipedia:Requests for comment/Archive.is RFC/Rotlink email attempt

From Wikipedia, the free encyclopedia


Email from Lexein to Rotlink, October 3, 2013[edit]

(Formatted for presentation via "Email this user", and refactored for in-order indented responses by date. Times are PDT. This conversation began with me using 'Email this user' on User:Rotlink's page. The responses are from 'Denis', who appears to be the Denis appearing on the Archive.is Whois results.)

Subject: Bots? User scripts? IP edits? You? Stop and discuss, please.

Rotlink, I think you know I and several other editors have been glad to see the work you've done archiving deadlinks in Wikipedia citations on your own server(s), and improving the archive.is website & features. I've even offered some corrections for your FAQ page. But we have not been glad about the recent swarm of IP edits, and the tremendous fallout from that.

As you are no doubt aware, archive.is is right now under discussion for COMPLETE REMOVAL from Wikipedia at http://en.wikipedia.org/wiki/Wikipedia_talk:Archive.is_RFC . This RFC (public request for comment) followed discussion at the Administrators Noticeboard for Incidents here:
https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/IncidentArchive812#Mass_rollbacks_required which was a direct result of the recent flood of identical-looking edits from IP addresses which appear to be proxies or (allegedly) "bot" computers from a wide number of countries all over the world.

If you, or your business partner, or anyone you know or are doing business with, is doing these continuing automated edits, they have to all STOP. NOW. If you or your partner are responsible _in any way_ for them happening, you have to come forward now and explain yourself either on your own Talk page, or at the RFC above, or a subpage like http://en.wikipedia.org/wiki/Wikipedia_talk:Archive.is_RFC/Rotlink_email_attempt where I've posted this email (and put a link to this at the RFC itself).

If you don't stop it, _and_ don't explain, all archive.is links will continue to be aggressively removed, and may never be allowed back on Wikipedia. Archive.is is in danger of being blacklisted from Wikipedia.

I am currently against such a blacklisting, as long as archive.is links are accurate, reliable, and an asset, and are placed according to Wikipedia policy and guidelines. However, you know that blatant use of multiple IP addresses to evade blocks are *against policy*. Running an unapproved bot is *against policy*. Continuing to do so after having been asked to stop is *against policy*. https://en.wikipedia.org/wiki/Wikipedia:Blocking_policy

Personal note:

  • I don't understand why you withdrew your bot proposal, when it stood a chance of being approved. PLEASE EXPLAIN.
  • I don't understand why you stopped communicating.
  • These recent "bot"/script actions from so many different IP addresses have angered so many Wikipedians in good standing that I just don't know if you or archive.is can come back from this.
  • From my few and brief interactions with you, these "bot"-like actions don't seem like you, Rotlink. You had been making such progressive (not aggressive) and positive contributions up to that point. WHAT THE HELL HAPPENED? PLEASE EXPLAIN.

I have supported the idea of contacting you, and further clarified what I support publicly with text at the RFC.

Going forward:

  • Stop the bots or scripts.
  • Come forward with a bot proposal which can be accepted. This can happen if properly done.
  • Communicate with us. Answer allegations about spam, advertising, use of botnets, etc.

Lexein

http://en.wikipedia.org/wiki/User_talk:Lexein

On Thu, Oct 3, 2013, at 04:39 AM, Denis wrote:[edit]

Hi.

Do not panic! I do not plan to stop the service, to delete or alter the snapshots, to put ads or malware on it, etc. As for Archive.is_RFC, I think my opinion is somewhere between 3 and 3a.

For 3. As you may know, archive.is was started as a side project, because I have a cluster which disk space was not used. After 1.5 years of operation, 40Tb (out of 100Tb) is occupied with archive.is database. After 2 years more archive.is will be in the position similar to the current position of WebCite, taking decision either to close the public submit form or to seek for the money to support the growth. Wikipedia drop off (which implies stoping the pro-active archiving as useless) can move the issue from the 2-year horizon to the eternity. Paradoxically, the solution caused by the fear for the possible commercialization can help to sustain the service non-commercial.

For 3a. All the free archives have much more scary issues then it was mentioned in the Archive.is_RFC. There are not only copyright infringements among the snapshots (~10 emails/day about it), there are also a lot of snapshots of defaced sites, saved lists of user credentials, list of credit card numbers, child porn, ... (reported by the security companies; I will not post the links here, you can easily find parental control blacklist of domains and use it to search on any of the archives). Any issue can cause the criminal investigation and downtime.

I think, if the Wikipedia government are so concerned even about illegality of proxy lists, they should remove all the free archives, buy pagefreezer.com's subscription and use it instead. It will also put the relationships into the business framework, provide 100% uptime and good support.

---8<---

I see, my position is somehow closer to Kww party's than to your's and can cause even more anger. I do not mind if you will decide not to publish my answer.

--->8---

On Fri, Oct 4, 2013, at 3:28 AM, Lexein wrote:[edit]

Denis,

Thanks for replying. I have some questions (Q). Yeah, it's a long email.

1Q. Do I have your permission to publish your response at http://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC/Rotlink_email_attempt ?
Better yet, will you post your reply there, along with answers to the semi-interview questions below?
I personally do not think your replies will necessarily cause more anger, it will answer some questions, and will prompt other questions, such as the ones I ask below. Most of us at Wikipedia prefer honest, non-abusive communication. See the triangle on my talk page: http://enwp.org/User_talk:Lexein .

2. Our most pressing issue is the ongoing "swarm" (my term) of multiple-IP edits, evading blocking, to Wikipedia articles which add links to archive.is (and archive.org):

2Qa: If you control those "swarming" IP edits, will you now, please, stop them? If they stop, the situation at Wikipedia will improve. Automated edits might be authorised, a bit slower, with an (approved) RotlinkBot. IF THEY STOP, IT WILL HELP WIKIPEDIA, AND HELP ARCHIVE.IS .

2Qb: If you are not controlling those swarming IP edits, who is? How can they be motivated to stop? To answer your comments: There's technically no Wikipedia "government" - it's all unpaid volunteer Wikipedia editors like me who are not administrators, and some who are administrators, who are together *arguing* now over what to do about the massive influx of new archive.is links from IP addresses, evading blocks. This repeated swarming of edits requires a *lot* of administrator time to block and clean up, and a lot of drama at Wikipedia, because they're against a principle. The principle (derived from many policies and guidelines) is this: Wikipedia is for anyone to edit, as a human, at a human rate, from usually a single address or user name, except for *approved bots*, which when proven safe, are allowed to work faster (from a single bot username). The swarming and/or fast proxy editing and/or IP rotation violates that principle and results in http://enwp.org/WP:BLOCKING .

3. You bring up interesting points about the burdens all archivers have to deal with, aside from copyright: private information, porn, malware, takedown requests, etc.

3Q. Are these burdens reducing your interest in operating Archive.is?

4. Of course, I'm surprised that you agree with removing all archive.is links from Wikipedia (RFC option #3), and with (RFC option #3a) leaving dead links in place. I understand the logic of reduced burden on archive.is, and how you might prefer that.

4Q. If that is your preference, why these swarms of IP edits?

5Q. Given the good will Archive.is has accrued at Wikipedia up until August, why allow it to be ruined now with this controversy? It makes no sense whatsoever.

6. I read your January 2013 blog post about other monetization possibilities, rather than ads. That was interesting. http://blog.archive.is/post/40852742377/just-a-quick-question-if-you-already-have-more-than 6Q: What is your stance on ads/alternatives now, in October?

7. Forget asking Wikipedia to stop using all archivers, and to use a paid archival solution, just because of the proxy list question. The extreme majority of edits at Wikipedia are not done through proxies. We block proxies whenever they are detected in order to reduce people gaming the system. It is far more reasonable to ask that whoever is doing proxy editing, to stop it.

Personal note: I'm just shaking my head with disbelief at how wrong this whole situation has gone. I don't know if something somebody at Wikipedia said angered you, or somehow triggered this "swarm" editing, or if you're just being ironic or sarcastic. Nothing makes sense.

The preferred Wikipedia way is to by hook or by crook, talk it out, and ease the situation somehow. Some admins prefer the scorched earth approach. I do not.

I hope you can take the time to answer some of these questions, particularly 1Q and 2Q and hopefully the others, and either grant permission to post, or post the answers yourself. Thanks, Lex

On Fri, Oct 4, 2013, at 05:05 AM, Denis wrote:[edit]

Hi. Sure, you can publish.

I assumed it by default, given he permission not to publish if would not like :) I have some more information (perhaps more realistic) information, but I would prefer not to disclose it, because it can affect other people and because the RFC participants are too free in rearranging the words of others producing surprisingly new denotations.

Shortly, I believe the Wikipedia was not the target (I do not see high-traffic or trendy articles touched), it is somehow indirectly related to SEO link building.

"You bring up interesting points about the burdens all archivers have to deal with, aside from copyright: private information, porn, malware, takedown requests, etc. Are these burdens reducing your interest in operating Archive.is?"

It was clear from the beginning that archive.is will be in "Internet grey zone". Even .is domain has been chosen after ISNIC declaration that it sees no problem with thepiratebay.is and will not cancel its delegation. This also can be commented either as an additional level of durability (by your party) or as relation to cybercrime (by your opponents).

"What is your stance on ads/alternatives now, in October?"

A lot of people use the archive to store their favorite porn or hentai and it is the reason why there is no personal accounts (neither free nor paid ones).

(BTW, the Wikipedia's top referrer is from Celeste (pornographic actress); you were right that the people does not click on references. Unless they expect to see something not shown on WP). Absence of personal accounts makes the archive less convenient for porn lovers/collectors while it almost does not affect the users like you.

So, now I am a bit skeptical about introducing the personal accounts as it may place the site into a particular niche. On the other hand, porn and non-porn surfers can be easily separated (each to its own website) in order the thematic ads to be shown only to the former.

All these are just thoughts, not a real plan.

There is no real plan at all; so far the site can be free and the possible deadline is minimum 2 year far ahead (see my first message for details).

"Given the good will Archive.is has accrued at Wikipedia up until August, why allow it to be ruined now with this controversy? It makes no sense whatsoever. "

The question comes from the assumption that the archive links are valuable to someone except to the spamers.

Yes, they are valuable also for you, but you can use User:Ark25's snippet and see the links to all archives next to the existing dead links.

"Forget asking Wikipedia to stop using all archivers, and to use a paid archival solution, just because of the proxy list question."

After the proxy issue there can easily arise another one about the copyright or about a stored page with defamation of living person which has been removed from the original site by the court order somewhere in Africa or something else.

The threat with the Internet archiving is permanent and the WP editors expect that someone would take 24/7 care on it. Do you see any alternative to the paid archives? I do not. I cannot provide even 100% uptime. It is an amateur side project, not a corporation or a cybergang. Not even a startup.


On Fri, Oct 4, 2013, at 07:20 AM, Lexein wrote:[edit]

Denis -

Thanks again for the quick response. I didn't know about your porn users etc. I can certainly see how that would explode your storage needs. And I do see how the problems don't stop at proxies, but go on.

Ok, if you're saying that you are not in direct control of the swarming IP edits to Wikipedia articles, please get whoever is doing it to just STOP THE WIKIPEDIA EDITS.

Are you saying there's another machine not under your control which is running a copy of the archive.is WP article scanner/archiving/editing script? If so, that machine needs to be forcibly stopped. The only bot which would ever be allowed to edit Wikipedia would be your RotlinkBot if approved. No others.

It's very important that the swarm simply stop touching Wikipedia. Permanently.

Several very opinionated Wikipedia administrators are convinced that it's you doing it, and I have no way to change their minds. It just has to stop, and stay stopped.

If the swarming doesn't stop, archive.is will be blacklisted. This means no links to archive.is at all. They'll all be deleted. People who try to save articles with archive.is links in them will be blocked from saving. I'm not sure I've made this clear:gone. And all that stupid SEO work will be wasted, so it's not even worth trying. And that stupid "archive the archive.is link" trick will be discovered and deleted, too.

I consider the loss of *any* archived deadlinked references to be a serious loss to Wikipedia. It directly damages Wikipedia's reputation for claims being verifiable in independent reliable sources, in my opinion. I'm pretty passionate about it, because I've had to fight hard to save articles from deletion, just because the prior sources were no longer available on the web (I know, I know, get a life). I cannot apologize for my passion about this. I'm a long term editor, with significant effort involved in articles plagued by deadlinks. I value archiving tools highly, because the web is a fragile place, full of linkrot.

At this point, you have the right to say "fuck off" and I wouldn't blame you. The nerve, Wikipedia complaining about a free service. Well, we're not complaining about the service, we're complaining about the automated-seeming swarming edits which seem like spamming.

I've been accused by one editor of being Archive.is's #1 advocate at Wikipedia. The truth is I'm the #1 advocate of *all* archive services. But I'm losing faith. I don't know if you can help this situation, but it would be appreciated if you could.

To sum up, there's only one way I can think of that you can help: Stop the swarm, or get those responsible to stop the swarm. Permanently. (I had some lame ideas about you logging, and us comparing your used IP addresses to the swarm-edit IP addresses, but we'd have no true verification of that).

But if you can't help, and you don't mind all Wikipedia use of archive.is ending, then all I can say is, "Adios, it's been real."

Lexein


On Fri, Oct 4, 2013, at 08:30 AM, Denis wrote:[edit]

Hi Isn't already stopped ? No registered users nor IPs can link to archive.is: https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/False_positives/Reports#Riz8383 https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/False_positives/Reports#58.107.231.222 https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/False_positives/Reports#149.254.182.30

As for the removal, it is only removal of the links, not removal of the snapshots.The snapshots are safe and not going to be removed. Anyone can search archives or install Memento plugin for Firefox and Chrome which will retrieve the snapshots from all archives and show archived snapshots instead of dead pages.

Advocating the persistence of the links really looks very suspicious. Actually, I do not understand why all the discussion is about the links. Talking about whether archive.is is trustable, no one advocates (or no one cares?) creating additional backup. Both party continue trusting archive.is as the only holder of the snapshots of the cited sources. It is very easy to backup the snapshots from archive.is, there is "download .zip" button. How many sources have you cited? 100? 1000? 10000? anyway it is not about terabytes of data and buying new hardware.

I can explain it only by separating the emotional anti-archive.is attitude of the editors (which has not changed over the year since https://twitter.com/edsu/status/269485404057657344; and which is very different from, for example, the anti-archive.is attitude in wikiislam.net that time which was sober and constructive resulting in implementing new features) and the proxy/bot issue as separate problems with the latter is only a trigger for the new wave of escalation of the former.

On Fri, Oct 4, 2013, at 11:08 AM, Lexein wrote:[edit]

Ok, the point at Wikipedia is to have the link to the original (dead) url, *and* the link to the snapshot, right there in the citation, right there on the Wikipedia page, for easy verification of the source.

"As for the removal, it is only the removal of the links, not removal of the snapshots" - I know that. The links to the snapshots are what matter to me, the Wikipedia article editor. I want to keep the links to the snapshots in the archiveurl= parameters in the references in the articles, so the articles can be stable, and verifiable, with stable reference links.

The loss of archive.is links to snapshots in article references (due to blocking and blacklisting) is of great importance to me. That's what I'm complaining about.

My whole point, this whole time, is that the links make verification of the original dead source considerably easier for thousands of readers who may want to check, or read the cited articles for more information. Those links also make verification easier for dozens of deletionist editors who, if those links to snapshots aren't there, always try to delete articles when the original links go dead, rather than look for the link to the snapshot.

The swarm edits are triggering the removals, and the blacklist, so I can no longer have stable verification of source citations in articles, due to no more links to Archive.is in articles. That's what's bothering me.

Our average readers are simply not going to go on the hunt to see if Archive.is has the snapshot of a dead url. If the link is there, they'll look. If it's not, they won't. And our deletionist editors will certainly not bother, and so will just nominate articles for deletion with stale URLs. And they will now start to win deletion discussions, because we will no longer have the ability to verify, because we will no longer have the link to Archive.is snapshots *right there in the References section*. So we're stuck with Archive.org broken HTML snapshots, with whole lost websites locked behind punitive robots.txt, and the same story with WebCite.

All that had to happen, to stop this deletion and blocking of Archive.is links, is for you or whoever to stop the damned swarm edits. I guess that was just too much trouble. Now it's too late, so thanks for nothing.

I know you think this is just "no problem." Ok, so it's no problem for *you*. Whatever.

Lexein

On Fri, Oct 4, 2013, at 11:11 AM, Lexein wrote:[edit]

And I didn't know about Ed Summers' tweet about Archive.is, sorry. And I didn't know about Wikiislam.net, either. I guess you *could* have *talked* to someone at Wikipedia about it. I can't see or know everything, everywhere.

Don't blame me if your inability to communicate has contributed directly to the loss of the use of Archive.is at Wikipedia.

Yeah, I'm bitter. --Lexein

On Fri, Oct 4, 2013, at 02:25 PM, Denis wrote:[edit]

Hi.

I think, you may be interesting in direct integration of Memento into same template similar to {{Wayback}}. This will search on Archive.is as well as on the other archives. There is some activity started: https://en.wikipedia.org/w/index.php?title=Wikipedia_talk%3ALink_rot&diff=573704114&oldid=570355389

This may be the only solution to your problem without support from WMF . There is another problem with both Archive.is and WebCite (more realistic than imaginary threats mentioned in the RFC): they work only until operator's death or Alzheimer (not Gunther nor me are young). After this, someone will have to pay for the servers and domains, answer to emails, solve technical issues.

On Fri, Oct. 4, 2013, at 10:18 PM, Lexein wrote:[edit]

Denis -

Thanks for that information.

On a prior point: I don't think the burden should be on Wikipedia to have to keep creating filters to prevent addition of links, when the bot(or bots) could simply stop editing Wikipedia.

You and Gunther will live forever. You both have to live forever. No choice! You say you and Gunther are not young: this whole time I (and I think many WP editors) imagined you in your sassy teens or 20's, having a go at Wikipedia. That is, except for your strangely calm demeanor, compared to me.

I posted our conversation up to this point at http://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC/Rotlink_email_attempt I omitted our email addresses, but included your name. If you have second thoughts, it can be taken down and erased from Wikipedia servers quite permanently.

Lexein

On Sat, Oct 5, 2013 at 6:03 AM, Denis wrote:[edit]

Hi.

I think you can erase it after the RFC will be closed. Some sentences (for example, the suggestion to switch to enterprise archives) are intentionally targeted not you and the volunteer administrators but other people who are in position to perceive them.

BTW, I have found that blacklisting of webcitation.org and backupurl.com was also discussed and backupurl.com was blacklisted (long before its shutdown): http://meta.wikimedia.org/wiki/Talk:Spam_blacklist/Archives/2010-10#Exeption_for_backupurl http://meta.wikimedia.org/wiki/Talk:Spam_blacklist/Archives/2011-02#backupurl.com This may illustrate my argument for the persistent tension of Wikipedia and archiving services which has nothing to do with the swift-passing issues.