Wikipedia:Bots/Requests for approval/Archivedotisbot

Archivedotisbot

Operator: Kww (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:29, Saturday May 10, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP (based on Chartbot's existing framework)

Function overview: Removal all archival links to archive.is (and its alias, archive.today)~~, which was put in place to bypass the blacklist)~~

Links to relevant discussions (where appropriate): WP:Archive.is RFC, MediaWiki talk:Spam-blacklist/archives/December 2013#archive.is, Wikipedia:Administrators' noticeboard#Archive.is headache

Edit period(s): One time run, with cleanups for any entries that got missed.

Estimated number of pages affected:

Exclusion compliant (Yes/No):

Already has a bot flag (Yes/No):

Function details:

Remove "archiveurl=" and "archivedate=" parameters whenever the archiveurl points at archive.is or archive.today.

Discussion

Comment There is no direct connection between the existence of the links and the blacklisting of the archive.is site. Most of the archive links were put there in good faith. As archive.is performs a unique function, the proposer will need to demonstrate the links themselves are actually in violation of policy, and that any given archive is replaceable – meaning the bot ought to be capable of replacing the links with one on another archive site, particularly where the original referring url has gone dead. Non-replacement will lead to diminution of verifiability of citations used. -- Ohc ^¡digame! 01:20, 12 May 2014 (UTC)[reply]

Leaving the links in place wouldn't correspond to the RFC consensus, and having the links in place while the site is blacklisted makes for a painful editing experience.—Kww(talk) 01:31, 12 May 2014 (UTC)[reply]
Blacklisting does not distinguish good-faith edits. Welcome to the alternate universe of the MediaWiki talk:Spam-whitelist. Will the bot honor the whitelist? If so, we should get some links whitelisted before trial so that functionality may be tested. See MediaWiki talk:Spam-whitelist/Archives/2014/03#archive.is/T5OAy. This should be done before the bot runs, to avoid any discontinuity of referencing, as the whitelist approval process can take months to come to consensus. – Wbm1058 (talk) 01:58, 12 May 2014 (UTC)[reply]

Does anybody keep track of all the archive links they place? I can guess but I can never be sure. If a bot is approved, removals of potentially valid and irreplaceable (in some cases) links will be the default scenario unless all editors who consciously used the site come forward with their full list. I fear that even if I whitelisted all the articles I made substantial contributions to, that list would be incomplete. Then, some links I placed will inevitably get picked off by the bot. -- Ohc ^¡digame! 04:30, 12 May 2014 (UTC)[reply]

I have to reject the timing and implication of this request at this time on a couple of key grounds. Archive.today was not made to bypass the filter. There is no evidence that Archive.is operated the Wiki Archive script/bot. The actual situation was resolved by blocks, not the filter - the filter was by-passable for a long time. Kww made a non-neutral RFC that hinged on perceived use as ads, malware and other forms of attack - without any evidence nor any realization of any of these "bad things" would ever or be likely to occur. Frankly, the RFC was not even closed by an admin and it was that person, @Hobit:, that bought into the malware spiel and found Archive.is "guilty" without any evidence presented. Also, this is six months later, if that's not enough reason to give pause - I'll file for a community RFC or ArbCase on removing the Archive.is filter all the quicker. Back in October 2013, I'd have deferred to the opinion then, but not when thousands of Gamespot refs cannot be used because of Archive.org and Webcite's limitations and Kww seems deaf to the verifiability issues. Those who build content and maintain content pages need Archive.is to reduce linkrot from the most unstable resources like GameSpot. ChrisGualtieri (talk) 04:49, 12 May 2014 (UTC)[reply]

I will simply point out that your arguments were raised and rejected at a scrupulously neutral RFC that was widely advertised for months.—Kww(talk) 05:00, 12 May 2014 (UTC)[reply]

False, I wasn't even a part of the RFC. Also, the malware and illegal aspect were repeatedly pushed without evidence. ChrisGualtieri (talk) 16:29, 12 May 2014 (UTC)[reply]

I didn't say that you had participated: I said that your arguments had been presented. The framing of the RFC statement was scrupulously neutral. Arguments were not neutral, but such is the nature of arguments.—Kww(talk) 16:50, 12 May 2014 (UTC)[reply]

Can you prove, with firm evidence, that archive.today was created to "bypass the blacklist"? That domain has existed for months, and during this time, an attacker could have spilled a mess all over Wikipedia, but this has not occurred. Currently, archive.is does not exist (just try typing in the URL), it redirects to archive.today which is the current location of the site. A website may change domains due to any number of legitimate reasons, ranging from problems with the domain name provider, to breaking ccTLD rules. --benlisquare_T•C•E 06:04, 12 May 2014 (UTC)[reply]

Struck the language expressing cause and effect, and simply note that archive.is and archive.today are the same site.—Kww(talk) 06:13, 12 May 2014 (UTC)[reply]

I did close the RfC and am not an admin. I closed the discussion based upon the contributions to that RfC. There was no "Guilty" reading. Rather it was the sense of the participants that archive.is links should be removed because there was a concern that unethical means (unapproved bot, what looked like a bot network, etc.) were used to add those links. I think my close made it really really clear that I was hopeful we could find a way forward that let us use those links. If you (@ChrisGualtieri:) or anyone else would like to start a new RfC to see if consensus has changed, I'd certainly not object. But I do think I properly read the consensus of the RfC and that consensus wasn't irrational. On topic, I think the bot request should be approved--though if someone were to start a new RfC, I'd put that approval on hold until the RfC finished. Hobit (talk) 18:05, 12 May 2014 (UTC)[reply]
Comment. An unapproved(?) bot is already doing archive.is removal/replace: [1] 77.227.74.183 (talk) 06:18, 13 May 2014 (UTC)[reply]

Im not a bot, so that is completely uncalled for. Werieth (talk) 10:16, 13 May 2014 (UTC)[reply]

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.

I see you work 24/7 and insert high amount of unreviewed links like a bot ([2], [3] in Barcelona).

I call you a bot. 90.163.54.9 (talk) 13:03, 13 May 2014 (UTC)[reply]

I dont read Chinese, and it looks like a valid archive. Not sure what the issue is. comparing http://www.szfao.gov.cn/ygwl/yxyc/ycgy/201101/t20110120_1631663.htm and its archive version http://www.webcitation.org/684VviYTN the only differences Im seeing is its missing a few images, otherwise its the same article. Werieth (talk) 13:11, 13 May 2014 (UTC)[reply]

The first page has only frame and misses content, second has only a server error message. No human would insert such links. I also notices that you inserted many links to archived copies of youtube video pages, which is nonsense.

You should submit a bot approval request (like this one), and perform a test run before run your bot at mass scale.

Only the fact that in the same transaction you removing archive.is links prevents editors to undo your edits. Otherwise most of your edits would be reverted. 90.163.54.9 (talk) 13:14, 13 May 2014 (UTC)[reply]

Not sure what your looking at but http://www.webcitation.org/684VviYTN looks almost identical to http://www.szfao.gov.cn/ygwl/yxyc/ycgy/201101/t20110120_1631663.htm. The only two differences I see is that the archive is missing the top banner, and the QR code at the bottom. As I said Im not a bot and thus dont need to file for approval. Werieth (talk) 13:21, 13 May 2014 (UTC)[reply]

Forget 684VviYTN, it was my copy-paste error, which I promptly fixed. There are 2 other examples above. 90.163.54.9 (talk) 13:24, 13 May 2014 (UTC)[reply]

taking a look at http://www.apb.es/wps/portal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_hgz2DDIFNLYwMLfzcDAyNjQy9vLwNTV38LM_1wkA6zeH_nIEcnJ0NHAwNfUxegCh8XA2-nUCMDdzOIvAEO4Gig7-eRn5uqX5CdneboqKgIAAeNRE8!/dl2/d1/L2dJQSEvUUt3QS9ZQnB3LzZfU0lTMVI1OTMwOE9GMDAyMzFKS0owNUVPODY!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/ExtranetAnglesLib/El%20Port%20de%20Barcelona/el+port/historia+del+port/cami+cap+el+futur/ vs http://web.archive.org/web/20131113091734/http://www.apb.es/wps/portal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_hgz2DDIFNLYwMLfzcDAyNjQy9vLwNTV38LM_1wkA6zeH_nIEcnJ0NHAwNfUxegCh8XA2-nUCMDdzOIvAEO4Gig7-eRn5uqX5CdneboqKgIAAeNRE8!/dl2/d1/L2dJQSEvUUt3QS9ZQnB3LzZfU0lTMVI1OTMwOE9GMDAyMzFKS0owNUVPODY!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/ExtranetAnglesLib/El%20Port%20de%20Barcelona/el+port/historia+del+port/cami+cap+el+futur/ it looks like a snapshot of how the webpage looked when it was archived and the page is dynamic. There is one part of the page that appears to be dynamically generated via JavaScript that appears partially broken in the archive but most of the page content persists and is better than not having any of the content if the source goes dead. Instead of complaining about my link recovery work why dont you do something productive? Werieth (talk) 13:36, 13 May 2014 (UTC)[reply]

Productive would be to undo your changes and discuss in public the algorithms of your bot, but it is impossible because you intentionally choose pages with at least one archive.is link and thus you do abuse the archive.is filter making your unapproved bot changes irreversible. Also, you comment those changes as "replace/remove archive.is" albeit 90% of the changes you made are irrelevant to archive.is. 90.163.54.9 (talk) 15:11, 13 May 2014 (UTC)[reply]

Oppose, RFC was non-neutral, bias, and not widely advertised despite Kww's claims, this is obvious from the number of editors who say they had no knowledge of the discussion while clearly being opposed to its outcome. DWB / Are you a bad enough dude to GA Review The Joker? 08:02, 13 May 2014 (UTC)[reply]

DWB:It's hard to give much weight to an argument based on a falsehood. It was placed in the centralized discussion template on Sept 13 and not removed until Oct 31. That you personally missed a discussion doesn't invalidate a discussion. The framing of the question was scrupulously neutral.—Kww(talk) 14:44, 13 May 2014 (UTC)[reply]

Question I assume the bot, if approved, will replace, not remove the archives? Will it properly include an edit summary? Thank-you. Prhartcom (talk) 14:12, 13 May 2014 (UTC)[reply]
- Good point. I'm not sure there _are_ replacements, though following the AN thread, it looks like a new archive tool is potentially available. So it certainly should be replacing them where possible. In addition, an edit summary which explains what's going on (ideally with a link to a more detailed explaination) should be required (and trivial I'd think). Hobit (talk) 14:42, 13 May 2014 (UTC)[reply]
Oppose Per DWB. Duke Olav Otterson of Bornholm (talk) 15:13, 13 May 2014 (UTC)[reply]