Talk:Archive.today: Difference between revisions

Content deleted Content added

Inline

Revision as of 09:46, 22 June 2019

This page is not a forum for general discussion about Archive.is. Any such comments may be removed or refactored. Please limit discussion to improvement of this article. You may wish to ask factual questions about Archive.is at the Reference desk. For the use of web archiving services on Wikipedia, see Wikipedia:Link rot; to discuss the use of Archive.is on Wikipedia for this purpose, use Wikipedia talk:Link rot.

This article was nominated for deletion. Please review the prior discussions if you are considering re-nomination:

keep, 10 June 2015, see discussion.
delete, 17 September 2013, see discussion.

This article has not yet been rated on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Websites: Computing Stub‑class

	This article is part of WikiProject Websites, an attempt to create and link together articles about the major websites on the web. To participate, you can edit the article attached to this page, or visit the project page.WebsitesWikipedia:WikiProject WebsitesTemplate:WikiProject WebsitesWebsites articles
Stub	This article has been rated as Stub-class on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.
	This article is supported by WikiProject Computing.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Libraries Stub‑class Low‑importance

	This article is within the scope of WikiProject Libraries, a collaborative effort to improve the coverage of Libraries on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LibrariesWikipedia:WikiProject LibrariesTemplate:WikiProject LibrariesLibraries articles
Stub	This article has been rated as Stub-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the project's importance scale.

The following Wikipedia contributor may be personally or professionally connected to the subject of this article. Relevant policies and guidelines may include conflict of interest, autobiography, and neutral point of view.

Rotlink (talk · contribs)

This article was nominated for deletion review on 1 June 2015. The result of the discussion was recreation allowed.

Copyright

The article contains false information: "archive.today removes archived pages in response to DMCA takedown requests from copyright holders." As a webmaster, I've had my site scraped against my will and sent a properly formatted DMCA to both the site and its ISP. It is a scarping site, masking as an archiving service. 80.62.117.71 (talk) 13:36, 26 June 2014 (UTC)[reply]

Well it's hosted by a Russian company, Hostkey abuse@hostkey.nl and abuse@hostkey.ru -- after a year of complaining to Cloudflare they told me that. The owner of the site, Denis Petrov, many times does the scraping himself. He will scrape an entire website in a matter of hours. He runs a botnet with endless IPs constantly changing their headers and Cloudflare cannot block their scraper. I asked him extremely politely to remove my site a bit ago and he scraped the entire site in hours. He was doing 100 pages a second all from a botnet with many IPs. While I don't agree with every site on the spam blacklist of Wikipedia, I agree with this site on it, though he has mirror domains such as .fo and .ec that should be added. I don't edit here very much and mostly by IP so I'm not sure where to suggest those. Plimitarmed (talk) 05:28, 8 November 2016 (UTC)[reply]

I noticed in the history of the article, the IP address 2a01:4f8:b10:5003:a:8:1:2. When Archive.is scrapes my website I often find that IP address among them. It belongs to that website. When I complained to Hostkey, they say I need to ship documents to them via airmail. Plimitarmed (talk) 08:02, 8 November 2016 (UTC)[reply]

An IP came along, 113.180.75.25, and reduced my references to one. That IP is one I had seen from the archive.is scraper on my logs. Then a second IP came and removed the whole section. 80.221.159.67 had a problem with the article giving the current host of where Archive.is was hosted. Plimitarmed (talk) 16:20, 9 November 2016 (UTC)[reply]

How does automatic archiving work?

Discussion of features/bugs, not the article

If you look at ro:Biserica de lemn din Hilișeu-Crișan, there is a dead link:

Lumina de duminică, 30 octombrie 2007 „Schit de maici cu o biserică unicat”, de Nicoleta Olaru

Archive.is knows this link: http://archive.is/http://www.ziarullumina.ro/articole;1418;1;3759;0;Schit-de-maici-cu-o-biserica-unicat.html Strangely, it has only a "newest shot" (6 Jul 2013 03:25) which is an error page and that makes me wonder if it ever had older "shots" and then maybe it deletes the older shots? That would be bad..

The link is there since 29 november 2010, so I guess it was archived before 6 Jul 2013 on Archive.is (Almost all external links of Wikipedia (all Wikipedias, not only English) were archived in May 2012 says the Archive.is owner here: Wikipedia talk:Link rot#Archive.is)

It's a very very good idea to archive automatically all the external links of Wikipedia, but then it's very bad to delete them and to replace with newer shots, which will eventually end up in showing "dead link".

It very much looks like Archive.is keeps only the newest shots when it archives pages automatically. — Ark25 (talk) 23:48, 26 July 2013 (UTC)[reply]

Hi. The page was not archived before 6 Jul 2013. It has short url http://archive.is/HSzOE. This means it has the sequential ID of 47136582 and can also be accessed as http://archive.is/id/47136582 (it is not public url, something like debugging tool, do not use it for linking). If you have a look at the snapshots with the IDs around 47136582 (for example http://archive.is/id/47136581 or http://archive.is/id/47136583) you will see that all of them were made 6 Jul 2013.

Some snapshots are re-archived and overwritten. These are snapshots from urls like http://www.google.com/sorry/indexredirect?continue=http://another.url/. Re-archiving would help when the server responds 500 error or captcha.

Realtime tracking of the recent changes in all national Wikipedias is a relatively new feature (it is fully on duty from May-June 2013), so no wonder that some links which had been in Wikipedia for years have been archived for the first time only in 2013. Rotlink (talk) 03:16, 18 August 2013 (UTC)[reply]

Useful feature

Archive.is can archive pages in the Google search cache. Once the content is archived, archive.is attributes it to the original website URL and not to Google's cache URL. This feature is useful when a site goes offline, that fact is noticed within a few days, the page isn't already archived in the Internet Archive, WebCite or elsewhere, and the only remaining copy of the page appears to be in the Google search cache. - 81.157.199.46 (talk) 20:50, 29 July 2013 (UTC)[reply]

If this is a fact intended to go into the article, then an independent reliable source, or at the very least primary, published online documentation will be required to support it in an inline citation. This will necessitate more hard HTML documentation or help pages at archive.is. Statements by non-published (non-notable) or anonymous authors in blogs/wikis/forums cannot meet WP:RS requirements. --Lexein (talk) 07:47, 3 October 2013 (UTC)[reply]

wiki.dandascalescu.com

In the comments made in the AfD discussion I don't see a consensus for removing this citation. —rybec 14:51, 21 September 2013 (UTC)[reply]

It's a wiki; that's a gnarly WP:RS problem, even if Dan Dascalescu is an established or published expert in the field or academia, cited by others. The blog might be assessed as RS if we can establish Dan's bona fides.--Lexein (talk) 07:51, 3 October 2013 (UTC)[reply]

Robot Exclusion Standard

The article seems to be getting mixed up regarding the Robot Exclusion Standard, and the fact that Archive.is does not honor the standard, and what this means. The purpose of my recent edits was to clarify that this standard is used by the main archives (like WayBack and WebCite) to avoid infringing on copyrights, whereas Archive.is does not honor this standard, so there is a large amount of material re-hosted on Archive.is that is in violation of copyright law, specifically, the Digital Millennium Copyright Act (DMCA).

Some other editors deleted the link I provided to the Robot Exclusion Standard (saying it is a "dead link", although I have no trouble accessing it), and then inserted the statement: "... however, the protocol is used against malware robots in general, which routinely scan the web for security vulnerabilities and email-address harvesters used by spammers. Archive.is does not obey the robot exclusion standard designed against spammers." I frankly don't understand these words. The Robot Exclusion Standard doesn't provide any protection against malware robots, nor against spammers. It is a voluntary standard that is used by responsible organizations to work together to avoid unintended interactions, among which are copyright violations (which of course are NOT discretionary).

So, I propose to trim the words about malware and spam, and just go back to the relevant and well-sourced statements about how archives use robot exclusion to avoid copyright infringement, and the well-sourced and undisputed fact that Archive.is does not honor this standard. I'll also add the requested citation for the DMCA.Weakestletter (talk) 21:57, 22 September 2013 (UTC)[reply]

By the way, as I was editing the article, I noticed that the words about malware and spam actually make no sense at all, because they say "the protocol is used against malware", and yet the cited reference says just the opposite: "...malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. " These words signify that the protocol is NOT useful against malware or spammers. So those edits to the article are clearly not right. I've deleted them.Weakestletter (talk) 22:04, 22 September 2013 (UTC)[reply]

Yeah, that looks like a basic misunderstanding of what robots.txt actually does -- i.e. nothing by itself -- it is the robots that choose to act or not act on it. — HELLKNOWZ ▎TALK 22:58, 22 September 2013 (UTC)[reply]

The article seems to imply that pages are retrieved when someone requests that they be archived, and that archive.is doesn't retrieve pages en masse but singly when someone requests a page. If it's not crawling sites, it's debatable whether it's a bot.

If someone were to spider and republish the contents of a Web site, the mere absence of a robots.txt would make a poor defence against claims of copyright infringement. The Wikipedia article doesn't mention the word "copyright"--the protocol is not a way to give or withhold permission to republish.

The DMCA is a US law. The .is top-level domain suggests that archive.is may be in Iceland, which is not (yet) part of the United States. The DMCA makes exceptions for libraries and archives, so if archive.is does happen to be under US jurisdiction, it might be able to claim it is an archive, and republish without violating copyright. —rybec 01:15, 23 September 2013 (UTC)[reply]

It's certainly true that there's a difference between not honoring robot exclusion files and not honoring copyright. In theory, an archive service could try to contact each individual website owner and request permission to re-host their material. Strictly speaking, the copyright laws probably require this... but that would make web archiving almost impossible in practice. So, working in good faith, the responsible archivers (e.g., WayBack, WebCite) honor robot exclusion files. They advertise that if any copyright holder doesn't want their material archived, just place a robot exclusion file in the root directory, and they will respect your copyright and not re-host your material. If you don't, then the archives will take the absence of a robot file for tacit permission to archive your site. This isn't a perfect arrangement, and it's rather heavily biased in favor of the archives, but it's workable.

Now, an archiving service like Archive.is comes along, and says they do not honor robot exclusion files, and they will re-host any material they want to, even if there is a robot file saying the author does not give permission. That's a clear violation of copyright, and moreover, it is clear that Archive.is is not even attempting to honor copyright.

By the way, the DMCA does NOT allow archives to re-host copyrighted material. (Brick and mortar libraries obviously are allowed to loan out purchased paper copies of copyrighted material, but that is completely different, and even they are under strict prohibition against creating any new copies, beyond the physical copies they purchased.) In fact, the whole purpose of the DMCA take-down agreement is so that large sites can avoid prosecution for copyright violation IF they agree to promptly take down and de-link (and even remove from search results) any site that is re-hosting copyrighted material without permission.

One more comment - The fact that Archive.is is hosted outside the United States is not particularly relevant, because they are lobbying to be used as citations for Wikipedia articles (for example), and Wikipedia does business in the US, and strives to avoid violating US copyright laws. So the whole ostensible purpose of Archive.is is undermined by its failure (so far) to implement some effective means of honoring copyright law. Until it does so, I don't think it can be adopted by any reputable web site.Weakestletter (talk) 02:32, 24 September 2013 (UTC)[reply]

I see that the DMCA exemption for libraries and archives (look for section 404 at [1]) only allows the copied material to be used on the premises, not over the Internet:

any such copy or phonorecord that is reproduced in digital format is not otherwise distributed in that format and is not made available to the public in that format outside the premises of the library or archives.

Is there a reliable source that says archive.is is "lobbying to be used as citations for Wikipedia articles"? I took out the sentence about Wikipedia because it seemed to belong on a Wikipedia project page, not in a regular article. —rybec 03:44, 24 September 2013 (UTC)[reply]

That is it's entire function. See the wikipedia project page, where this is being promoted:

http://en.wikipedia.org/wiki/Wikipedia:Using_Archive.is

In particular, note that "Archive.is monitors RecentChanges of many wiki projects (including all national wikipedias) in order to authomaticaly archive new links as soon as possible after the editors added them to the articles." You see? Archive.is is designed specifically to re-host Wikipedia links, for the purpose of having stable references. It's whole reason for existence is to be used to link from Wikipedia articles, rather than linking to the original web sites which, of course, are under the control of those unreliable copyright holders. What the founders of Archive.is seem to have overlooked, is that Wikipedia needs to scrupulously adhere to the copyright laws of the countries where it operates, including the US, and this prohibits them from linking to unauthorized re-hosted copies of copyrighted material.Weakestletter (talk) 14:36, 24 September 2013 (UTC)[reply]

You are right, there must be "User:RotlinkBot monitors ..." not "Archive.is monitors ...". It is known from User:RotlinkBot comments on Wikipedia talk pages, not from Archive.is FAQ or Twitter. 88.15.83.61 (talk) 16:13, 24 September 2013 (UTC)[reply]

It is also not clear if User:RotlinkBot is still doing it after he/she/it was banned on Wikipedia. 88.15.83.61 (talk) 16:31, 24 September 2013 (UTC)[reply]

at stake (on April 17,2019) archive.is has only 2901 URLs belonging to en.wikipedia. It is self-explicative that the overwhelming majority ha sbeen moderated, censored and manually deleted accross the time. — Preceding unsigned comment added by 94.38.235.96 (talk) 16:28, 17 April 2019 (UTC)[reply]

Nah a whole lot more than 2k.[2] If anyone was doing wholesale deletions we'd know about it. This thread is very old many new development have occurred since 2013. -- GreenC 17:51, 17 April 2019 (UTC)[reply]

filter misbehaviour

the filter which prevents adding links to the pages on the archive.is website also prevens adding links to the Archive.is article. The links such as [[Archive.is]]

Country blocking

Information about the country blocking is self-evidently available on the web.

You can easily google for currently active proxies in the countries in question and then run something like "Chrome.exe --proxy-server=socks5://37.27.205.217:35101 http://archive.is"

Removal of archived content

The article needs to be updated, as there is a "report" button where it's possible to report archived content to be taken down for a wide variety of reasons. nyuszika7h (talk) 09:24, 16 April 2016 (UTC)[reply]

The person/s running archive.is don't necessary remove archived content none of my content has even been removed despite asking them to. — Preceding unsigned comment added by Nothappy21010 (talk • contribs) 17:06, 30 November 2016 (UTC)[reply]

Hatnote

What's better here - "Not to be confused with Internet Archive." or some variant of "For the San Francisco-based nonprofit website at archive.org, see Internet Archive."? User:94.230.146.228 is concerned that by being specific we're implying that the two websites are connected, but I think it's more misleading to say "not to be confused with Internet Archive" because that can be easily read as "not to be confused with archiving on the internet in general" - a reader actually looking for archive.org (without knowing its URL) might not think to click that and assume that they're already at the right article. --McGeddon (talk) 10:02, 11 May 2016 (UTC)[reply]

Perhaps link to Archive.org in the hatnote, which is a redirect to Internet Archive? nyuszika7h (talk) 20:18, 11 May 2016 (UTC)[reply]

Saying "For the San Francisco-based nonprofit website at archive.org, see Internet Archive." has a false connotation of "archive.is is sort of archive.org but for-profit" or even "there is a single company with non-profit and for-profit products; the former is archive.org and the latter is archive.is"; Anyway "...for non-profit see ..." implies that the following text is about something which is not non-profit albeit archive.is is non-profit as well 94.230.146.228 (talk) 16:37, 12 May 2016 (UTC)[reply]

"Not to be confused with archive.org." sounds good. Any objections? --McGeddon (talk) 18:06, 14 May 2016 (UTC)[reply]

Wikipedia:Archive.is RFC 4

There is an RfC at Wikipedia:Archive.is RFC 4 with the proposal "Remove archive.is from the Spam blacklist and permit adding new links (Oppose/Support)". Cunard (talk) 06:20, 23 May 2016 (UTC)[reply]

'Worldwide availability' section

Does the section contain valuable information? Virtually every website is geo-banned somewhere. JustPaste.it blocked in almost the same set of countries, Facebook is banned in China, etc.

In case if this information is valuable, I would suggest creation of a table or List of geo-blocked websites. 59.11.121.66 (talk) 03:40, 28 May 2016 (UTC)[reply]

I think it's encyclopedic information, notably because the host of the website has blocked connections from certain countries and given reasons for that. A comprehensive table is not needed, but blocks that have been covered by the media should be recounted here. – Finnusertop (talk ⋅ contribs) 14:05, 8 June 2016 (UTC)[reply]

Use cases

I agree with PeterTheFourth actually, I was going to bring that up. It makes archive.is sound like some service which is only used by "authors and hacktivists", when it can be used by anyone, really. nyuszika7h (talk) 11:35, 22 June 2016 (UTC)[reply]

RfC: long or short URL

RfC open if we should use long or short URLs when linking to archive.is -- GreenC 23:17, 5 July 2016 (UTC)[reply]

→

In hindsight

Given hindsight, blacklisting Archive.is and bot-spamming all Wikipedia articles that used to use it did a lot of damage to our portal that cannot be reversed easily. Here's just one of the countless examples of Wikipedia articles referenced to no longer active websites, which were archived by Archive.is and never archived by the Wayback Machine: en.wikipedia.org » DRB Class 52, General Government, Kriegslokomotive » snapshots from host old.pkp.pl including » http://archive.is/bWNJt → Please take a look at my attempts at trying to reverse the damage in just one Wikipedia article: the Holocaust train. Would be nice to see another bot designed specifically to undo the deletions prompted by the original bot. Poeticbent talk 16:19, 2 August 2016 (UTC)[reply]

`archive.fo`

archive.fo seems to be one of its domains, and sometimes archive.is redirects there. This is my original research of course, but food for references. 80.221.159.67 (talk) 23:31, 30 August 2016 (UTC)[reply]

How to I meet verifiability to say the site is hosted at Hostkey ( hostkey.nl / hostkey.ru ) ?

Cloudflare told me it is hosted there after a year of complaining about this malicious scraper botnet site. Plimitarmed (talk) 05:29, 8 November 2016 (UTC)[reply]

@Plimitarmed: Hostkey is the primary host (servers in Amsterdam and Moscow), distributed by Cloudflare.^[1]^[2] [see references]
--John Navas (talk) 20:06, 3 December 2016 (UTC)[reply]

References

^ "Netcraft - Search Web by Domain". Netcraft Services. Netcraft Ltd. Retrieved 3 December 2016.
^ "About HOSTKEY". HOSTKEY. HOSTKEY. Retrieved 3 December 2016.

Some links I found citing it's now at Hostkey.ru / Hostkey.nl

Most of these links say Hostkey then Cloudflare, however Cloudflare also told me it's hosted there so I can be sure it's not moved from Hostkey. Plimitarmed (talk) 07:53, 8 November 2016 (UTC)[reply]

@Plimitarmed: Hostkey is the primary host (servers in Amsterdam and Moscow), distributed by Cloudflare.^[1]^[2] [see references]
--John Navas (talk) 20:05, 3 December 2016 (UTC)[reply]

References

^ "Netcraft - Search Web by Domain". Netcraft Services. Netcraft Ltd. Retrieved 3 December 2016.
^ "About HOSTKEY". HOSTKEY. HOSTKEY. Retrieved 3 December 2016.

Since the current references mentioned by Jnavas2 only mention either Cloudflare or the servers locations (but not that Hostkey is the provider for archive.is), I added one of the links suggested by Plimitarmed. Saturnalia0 (talk) 01:12, 24 January 2017 (UTC)[reply]

Data centers

It would be interesting if something like this https://en.wikipedia.org/wiki/Google_Data_Centers could be written about how the site runs. Plimitarmed (talk) 19:50, 11 November 2016 (UTC)[reply]

@Plimitarmed: Distribution is handled by Cloudflare. --John Navas (talk) 20:03, 3 December 2016 (UTC)[reply]

Indeed. Who owns it? Who runs it? And who owns them? Millions of people use their service without even asking who and what they are. This article doesn't even begin to be Wikipedian. — Preceding unsigned comment added by 86.239.242.222 (talk • contribs) 15:44, 5 June 2018 (UTC)[reply]

Reliability

@Rhododendrites: I maintain that reliability is a serious issue for archive sites, and added relevant information to the Article. Rhododendrites disagrees. Let's discuss. --John Navas (talk) 20:02, 3 December 2016 (UTC)[reply]

Jnavas2 - Well, let's start with the first rather obvious problem. Sourcing. You've provided no source alongisde your claim. On 2 December 2016 the site became unavailable with browsers displaying Loading spinners indefinitely. It resumed normal operation late in the day. For all I know, this could be localized to you and your internet connection specifically. But, let's assume for a moment that you've included a source with the claim and look at it from an encyclopaedic perspective. Wikipedia isn't a collection of all knowledge. Site crashes are generally discussed in some greater context. For example, many sites - including Wikipedia - went on blackout in response to SOPA and PIPA. Another example would be major DDOS attacks like October's Dyn cyberattack. So, what is the greater context here? again, for all I know this could be a localized effect, or, server maintenance. It's of no encyclopaedic value in either of these cases. Has it been attacked or is there something interesting about this event that would make it notable in some way? can you also provide a source to back the addition of this content? Note, a source on this won't necessarily guarantee it's encyclopaedic value, but, at least I'll have something to work off of. Mr rnddude (talk) 22:21, 3 December 2016 (UTC)[reply]

@Mr rnddude: Why wasn't the issue handled this way to begin with?

Detail: Assuming you're not questioning my basic competence, I'd be happy to provide more detail. I ran tests with multiple browsers and diagnostic utilities on multiple devices over different Internet connections, including VPN connections to different countries, cross-checked with colleagues. I just kept the content brief so as to avoid excessive detail. How much is enough or too much? Would a long footnote be more appropriate?
Context: It seems self-evident to me that reliability is a significant issue for archive sites that profess to create "permanent" records, particularly with distribution by Cloudflare, and I was trying to avoid extraneous detail. Would you like more explanation of the issue? Perhaps by footnote?
Sourcing: My material was the result of my own work. I didn't learn it from some other source. Would you really want the detailed output of the utilities I used? Or must I find some 3rd party to vouch for my results? And how would the reliability of the 3rd party be judged?

--John Navas (talk) 23:01, 3 December 2016 (UTC)[reply]

John, your own observations fall under WP:Original research. This is not allowed on Wikipedia. If the outage of Archive.is on 2 December was not important enough to be noticed by any WP:Reliable source that we can cite, then it doesn't belong in our article. EdJohnston (talk) 23:34, 3 December 2016 (UTC)[reply]

(edit conflict)Thanks for the response. In terms of this being your work and the work of your colleagues, Wikipedia doesn't accept original research. The reason for this is that if it did, anybody could put anything into any article. It would be a true free for all with information and misinformation. It'd be impossible to distinguish between legitimate additions and bogus additions. To your point about sourcing; this brings up the question of reliable sources and event notability simultaneously. Reliable sources are those that are published by respected book publishers, news organizations, journals and other academic sources. In this case, I'd assume that news organizations might mention this information if it's significant enough. Event is generally for stand-alone articles, but, you can vet basic additions using the same principle. Is the event widely covered in the news or is it a passing mention - a footnote in history if you will. I had a quick skim of the internet for "archive.is down" and what I did find was that the only detected down time was on November 17, 2016 for a period of around 35 minutes. I also wouldn't hold the source to any particular measure of reliability. Other than that, I haven't found anything about it. As somebody who edits and develops history article's on Wikipedia I am well aware of the pains that WP:OR brings up and also the annoyance that not being allowed to draw your own conclusions - no matter how obvious or trivial - is. The simplest detail has to be drawn from another published source. If you can get your hands on a reliable third-party source, then I'd be able to vet it myself and assess its notability, if not, it falls under the purview of original research. Mr rnddude (talk) 23:39, 3 December 2016 (UTC)[reply]

@Mr rnddude: I suspect that would be a fool's errand and thus not a good use of my time, so I will simply pass. Sic transit gloria mundi --John Navas (talk) 23:48, 3 December 2016 (UTC)[reply]

As are many things. The balance, it's never quite right. If you have too many sources then little disagreements between them become a battleground between editors, if there are too few then it's borderline impossible to write anything informative about the topic and it will invariably end up at AfD. Mr rnddude (talk) 23:52, 3 December 2016 (UTC)[reply]

@Mr rnddude: Sourcing is an illusion of reliability because the Internet is so full of unreliable sources and bad information. It can have merit where data can be authenticated, but even there it's open to distortion, as in the case of climate change denial. Plus it's simply not possible to source everything in an article. So it ultimately becomes a subjective values proposition, with insiders defending their values against outsiders. As I said, a fool's errand, so I will simply pass. Too much pain for too little gain. --John Navas (talk) 00:10, 4 December 2016 (UTC)[reply]

@Jnavas2: There is also the issue of reliability for the service, there are quite a few organizations trying to shut them down or block them. I've tried to visit their site several times over the past few weeks with none of the popular domains working anymore. I'm not sure if Archive.is has shut down or is being censored by someone. Lassitergregg (talk) 03:38, 3 June 2018 (UTC)[reply]

@Lassitergregg: https://downforeveryoneorjustme.com/archive.is

How many pages?

Each of their shortened URLs have 5 characters (A-Z a-z 0-9) (4 characters until 2012)

62 possibilities per character.

4 characters = 14.776.336 possibilities.
5 characters = 916.132.832 possibilities, more than Archive.org has saved pages.

How many pages has Archive.is saved so far? Already beyond 14776336? --84.147.46.123 (talk) 01:27, 14 November 2018 (UTC)[reply]

Contents are regurarly deleted or filtered

The article needs more informations, unless those seem to be very difficult to be found, e.g. about who is the owner of the website, and who manages it.

Search results are sponsored by Google or yandex.ru, like is done by any indipendent and non commercial company in thr web.

An address space formed by only 5 alphanumeric characters is enough for all the internet requests simply because many of the saved results are deleted, censored and made unavailable to the public. Some of them are "embededded" into one result wich is shown by the search box, and continues to link the other saved pages. — Preceding unsigned comment added by 84.223.69.200 (talk) 19:12, 15 December 2018 (UTC)[reply]

Who owns/runs this site?

A strange omission from the article! I note somebody above in this talk page names a "Denis Petrov". Equinox ◑ 18:14, 26 December 2018 (UTC)[reply]

https not working?

Looks like https doesn't work at the moment. Site is still available on http though. Don't know whether this is a permanent change? Evert (talk) 15:41, 3 February 2019 (UTC)[reply]

Working ok for me. -- GreenC 15:49, 3 February 2019 (UTC)[reply]

https://www.ssllabs.com/ssltest/analyze.html?d=archive.today

Something not quite right there... Evert (talk) 16:07, 3 February 2019 (UTC)[reply]

Hmm.. do you know if this is a new condition, or just now noticed? -- GreenC 17:59, 3 February 2019 (UTC)[reply]

Looks like the certificates for this site/these sites have been fixed, so I guess all is ok now Evert (talk) 07:07, 4 February 2019 (UTC)[reply]

Evert, I'm getting "Assessment failed: Unable to connect to the server". Perhaps it has been blocked. "it usually happens due to firewall restrictions". -- GreenC 15:41, 4 February 2019 (UTC)[reply]

I used a different checker and reports SSL is working. -- GreenC 15:43, 4 February 2019 (UTC)[reply]

Blocked in New Zealand?

Can anyone confirm Archive.today has been blocked in New Zealand following the Christchurch mosque shootings? Muzilon (talk) 15:49, 17 March 2019 (UTC)[reply]

It is pingable from a server in New Zealand, but there might still be blocks at the protocol or ISP level. -- GreenC 15:56, 17 March 2019 (UTC)[reply]

I suspect the latter. A bit excessive because it's not really a video-hosting site (which is what the New Zealand ISP's are trying to target in this case). Muzilon (talk) 16:05, 17 March 2019 (UTC)[reply]

Digital preservation open archives are unavoidable for a long-term content provider based on external primary sources likeWikipedia

A total ban of archive.is from Wikipedia would be simply foolish and a suicide for the Encyclopedia. Wikipedia aims to long-term digital preservation of ist contents, but Wikipedia doesn't claim itself as a primary source of informations, even if the honesty of its contributors, the quality of its policies, the number of reviewers for each page and any single edit, make it much more affordable and objective than many other renokwn and blasonate encyclopedias. But the points are that:

Wikipedia is based uniquely upon externals sources
the middle life of a linked Web page is on the order of weeks or some months. So you will have continously to adjust broken links or elsewhere-migrated ones.

Such a type of content provider strongly needs one (or more) permanent archive(s). For example, the French Wikipedia uses a private archive (http://archive.wikiwix.com like in the w:fr:François Mitterrand#Notes et références): not alle references are archived and not all archived contents are publicly readable by anyone, e.g. for legal reasons. I think that this choice was adopted in order to avoid copyright infringements and have a private and independent external certification that a determinate source did exist and was linked to a Wikipedia oldid in the past. But any administrator can decide:

what has to be archived in the long term;
what is archived but non readable (like a private archive);
waht sources can't be ignored, exckuded and not preserved.

Such a system would be completely inappropriate for an Open Project, whose sources must be reliable and verifiable for anyone.

Due to copyright reasons, Internet Archive also has made many saved entries yet unavailable so that the copy is lost or can't be used as the archive-url parameter into the Wikipedia citation templates. In the Web we have the Internet Archive or Archive.is, basically, given that WebCite is only for particular kind of selected materials. So Archive.is has become an unavoidable choice.

[1] "Netcraft - Search Web by Domain". Netcraft Services. Netcraft Ltd. Retrieved 3 December 2016.

[2] "About HOSTKEY". HOSTKEY. HOSTKEY. Retrieved 3 December 2016.

[3] "Netcraft - Search Web by Domain". Netcraft Services. Netcraft Ltd. Retrieved 3 December 2016.

[4] "About HOSTKEY". HOSTKEY. HOSTKEY. Retrieved 3 December 2016.

[1]

[2]

[1]

[2]

@@ Line 234: / Line 234: @@
 ::I suspect the latter. A bit excessive because it's not really a video-hosting site (which is what the New Zealand ISP's are trying to target in this case). [[User:Muzilon|Muzilon]] ([[User talk:Muzilon|talk]]) 16:05, 17 March 2019 (UTC)
+== Digital preservation open archives are unavoidable for a long-term content provider based on external primary sources likeWikipedia ==
+A total ban of ''<nowiki>archive.is</nowiki>'' from Wikipedia would be simply foolish and a suicide for the Encyclopedia. Wikipedia aims to long-term [[digital preservation]] of ist contents, but Wikipedia doesn't claim itself as a primary source of informations, even if the honesty of its contributors, the quality of its policies, the number of reviewers for each page and any single edit, make it much more affordable and objective than many other renokwn and blasonate encyclopedias. But the points are that:
+* Wikipedia is based uniquely upon externals sources
+* the middle life of a linked [[Web page]] is on the order of weeks or some months. So you will have continously to adjust broken links or elsewhere-migrated ones.
+Such a type of content provider strongly needs one (or more) permanent archive(s). For example, the French Wikipedia uses a private archive (''<nowiki>http://archive.wikiwix</nowiki>.[[com]] like in the [[w:fr:François Mitterrand#Notes et références]]): not alle references are archived and not all archived contents are publicly readable by anyone, e.g. for legal reasons. I think that this choice was adopted in order to avoid copyright infringements and have a private and independent external certification that a determinate source did exist and was linked to a Wikipedia oldid in the past. But any administrator can decide:
+* what has to be archived in the long term;
+* what is archived but non readable (like a private archive);
+* waht sources can't be ignored, exckuded and not preserved.
+Such a system would be completely inappropriate for an [[Open Project]], whose sources must be [[WP:reliable sources|reliable]] and [[Wikipedia:Verifiability|verifiable]] ''for anyone''.
+Due to copyright reasons, [[Internet Archive]] also has made many saved entries yet unavailable so that the copy is lost or can't be used as the ''archive-url'' parameter into the Wikipedia citation templates. In the Web we have the Internet Archive or Archive.is, basically, given that [[WebCite]] is only for particular kind of selected materials. So Archive.is has become an unavoidable choice.