Talk:Amazon S3

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Internet  
WikiProject icon This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Computing (Rated C-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Companies  
WikiProject icon This article is within the scope of WikiProject Companies, a collaborative effort to improve the coverage of companies on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 

Total Objects Stored[edit]

I noticed that the total objects stored has a citation needed and I have generally gotten this info from Werner Vogel's (Amazon CTO) blog. I propose editing the sentence about total objects to read:

Amazon S3 is reported to store more than 14 billion objects as of January 2008, up from 10 billion in October of 2007.

With citation to point to: http://www.allthingsdistributed.com/2008/03/happy_birthday_amazon_s3.html

If there are no objections, I will make the change tomorrow... Outsideshot (talk) 20:26, 21 May 2008 (UTC)

I made the change earlier today - this statistic is obviously likely to change over time, so we should update as necessary... Outsideshot (talk) 02:49, 24 May 2008 (UTC)


The text regarding this topic now reads: "Amazon S3 is reported to store more than 102 billion objects as of March 2010[update][6]. This is up from 64 billion as of August 2009[update][7], 52 billion objects as of March 2009[8], 29 billion objects as of October 2008,[4] 14 billion objects as of January 2008, and from 10 billion in October 2007." Um, this is SCREAMING for a chart. Show, don't tell. How 'bout it, people? 65.196.0.122 (talk) 14:08, 18 August 2010 (UTC)

CNAME addressing[edit]

Note: as per this edit, virtual-hosting DNS CNAME records should indeed point to s3.amazonaws.com, not bucket.s3.amazonaws.com: see the Developer Guide's section on Virtual Hosting of Buckets. —Piet Delport 04:40, 24 July 2007 (UTC)

I thought that the CNAME record needed to point to s3.amazonaws.com, but according to [1] (retrieved Jan 10, 2010) "The CNAME DNS record should alias your domain name to the appropriate virtual hosted style host name. For example, if your bucket name (and domain name) is images.johnsmith.net, the CNAME record should alias to images.johnsmith.net.s3.amazonaws.com. [...] Setting the alias target to s3.amazonaws.com also works but may result in extra HTTP redirects." --Vrmlguy (talk) 01:23, 11 January 2010 (UTC)

Promotional links[edit]

I removed a long list of promotional links to sites providing S3-related services; see Links normally to be avoided.
In the (not unlikely) event that such links return, they should be given little tolerance. —Piet Delport 05:13, 24 July 2007 (UTC)


I'm confused. There was once a list of promotional links that was removed. Makes complete sense.

There was also a list of well known services and companies that utilized the service in the introduction. That was removed and I'm not sure why since I thought it added clear value and information (and since most weren't linked).

Then someone added a pretty clear plug for JungleDisk. Why hasn't that been removed, if the others have? I'm not sure that it should be, as it is an excellent example of a deployed product that uses the service, but it certainly is not the only one and it certainly is not the most well known one. Both ElephantDrive and BeInSync are far better covered (NYT, WSJ, PC World, PC Mag for both) as solutions and users of S3. Is this special treatment? Outsideshot (talk) 03:28, 27 February 2008 (UTC)

I think we should pull the promo links. Its harsh, because it does bring in traffic. But that is not Wikipedia's job. Now, if any of the identified companies published some good papers on working with S3, about what worked, what didn't and how best to interact with it, that would be worth linking to. SteveLoughran (talk) 19:06, 27 February 2008 (UTC)
I've pulled the JungleDisk entry. It would be unfair to leave it in and not the others. Now, if the specific products had their own wiki pages, and the pages were not voted for immediate deletion, then it would be appropriate to retain the links. SteveLoughran (talk) 19:20, 27 February 2008 (UTC)

Prices[edit]

I think that the current pricing information needs to be changed because it is inaccurate. Not only is the $1500 per terabyte-year incorrect (at $.15 per GB-month it will be $1800/year), but they recently (yesterday, 10/8) released tiered pricing. I will work on an edit now - let me know if you think this is a bad idea... Outsideshot (talk) 19:26, 9 October 2008 (UTC)


Oh, I added the price info before I noticed that someone just removed it. The article is deficient without some basic pricing info - without this, no-one can tell whether the service is free or cost (we didn't say). Describing a service does not make a Wikipedia article an advertisment. I have removed the description of the service as "inexpensive" - that's someone's opinion, and we need to let the reader make their own decisions about what is or isn't inexpensive. -- Finlay McWalter | Talk 21:18, 8 August 2007 (UTC)

Please see the policy (Wikipedia is not a sales catalog) i referred to in the edit comment: articles should not include any price information unless there's a specific justification (such as a discussion about a notable sale, or price war).
Contrariwise, "inexpensive" is not someone's opinion, but an important defining characteristic of the service that's verifiable at any of the sources. (It also communicates to the reader that the service is not free, regarding your concern.) Piet Delport 2007-08-09 01:06
Changes in pricing model are worth mentioning, and I have done so, without giving details. Why is it important? Because it provides insight into the implementation details, and what parts of the system cost the company more. SteveLoughran (talk) 19:07, 27 February 2008 (UTC)

Competition[edit]

We should enumerate S3's competition. Does Google Base really compete in this space? -- Finlay McWalter | Talk 21:20, 8 August 2007 (UTC)

I don't think Google Base overlaps. Google File System probably comes closest, but it's not available to the public. Piet Delport 2007-08-09 01:12

I would say that Google Cloud Storage is now a direct competitor and should be listed now. The Wikipedia page for Google Storage lists S3 as a comparable solution http://en.wikipedia.org/wiki/Google_Storage — Preceding unsigned comment added by 86.166.0.34 (talk) 18:46, 28 November 2013 (UTC)

Advertisement?[edit]

This kind of reads like an advertisement (especially the notable uses section), maybe this should be changed? 76.103.51.218 (talk) 23:14, 16 February 2008 (UTC)

I concur. I have started to add some technical content instead. It may be relevant to list libraries that provide access to the repository, and books that cover it. To list all the web sites that offer S3-hosted backup services would be like listing all web sites that use MySQL under the MySQL entry. SteveLoughran (talk) 19:09, 27 February 2008 (UTC)
Especially the line about 99.999999999% durability. A single corrupted object is enough to break that, and I can't believe they have a perfect record so far. I have no information whether any data has been lost during outages, but with a service of this kind, _some_ loss is expected, at least unless every object is replicated off-site before any single write succeeds (and that would give noticeable delays, which we don't get). KiloByte (talk) 22:31, 26 May 2012 (UTC)

OR removed from page[edit]

I removed the following section due to its entirely being original research and speculation. Per User:SteveLoughran's request, here it is.

The implementation details of S3 are not documented, but it is possible to infer aspects of it from the behaviour (and pricing) of the service.

  • There is an S3 repository in the US; there is another in Europe. Their locations are not public.
  • From an EC2 server, read access can initially be slow. This could imply that S3 data is not always in the same physical location as all the EC2 servers.
  • Amazon do not bill for data transferred between the US EC2 server farm and the US S3 store, but they do for access between the servers and the European S3 store. While the S3 store and the EC2 servers may be in separate locations, they must be close enough together that cost of transferring data is neglible. Both services are presumably hosted on the same Metropolitan Area Ethernet.
  • Connectivity between the S3 datastores and the rest of the Internet is good. This implies they are close to one of the main Internet exchanges.
  • After the initial slow access, later accesses are fast. This could imply that the data is cached.
  • Write access is tangibly slower than writing to a local hard disk. Amazon guarantee that when the operation has completed, data has been written to disk in multiple locations.
  • Amazon warn that even after data has been written, old values may still be read. This implies that there can be more than one cache of the data, and that every GET request does not trigger a check from the front-end caches to the back-end store.
  • Amazon adjusted their initial pricing from a simple flat rate per GB to one that charges small frequently retrieved items per thousand GET or HEAD operations, rather than purely per byte. There may be a flat cost for every entry (indexing, billing entries) and a per-request cost in CPU-time for every access. These fixed overheads are now billed for.

The dynamic per-bucket DNS entries are implemented by having a custom S3 DNS server that returns a hostname for every host under the s3.amazonaws.com subdomain, even if a bucket of that name has never been created:

> nslookup made-up-name-for-wikipedia.s3.amazonaws.com

Non-authoritative answer:
made-up-name-for-wikipedia.s3.amazonaws.com     canonical name = s3-directional-w.amazonaws.com.
s3-directional-w.amazonaws.com  canonical name = s3-1-w.amazonaws.com.
Name:   s3-1-w.amazonaws.com
Address: 72.21.211.228

At the identified host, there is a web server that always serves up 404 error pages when a GET request against a nonexistent bucket is issued.

> telnet made-up-name-for-wikipedia.s3.amazonaws.com  80
Trying 72.21.207.212...
Connected to s3-1-w.amazonaws.com.
Escape character is '^]'.
GET / HTTP/1.0
host: made-up-name-for-wikipedia.s3.amazonaws.com

HTTP/1.1 404 Not Found
x-amz-request-id: F0F7301EF1873635
x-amz-id-2: 9gT/YmUa7EZXIm9FNv7GGThAre8Kn5CEfXpoJpthwuq54Pm+5RRcThAdBa20XsLj
Content-Type: application/xml
Date: Wed, 27 Feb 2008 18:54:42 GMT
Connection: close
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchBucket</Code>
<Message>The specified bucket does not exist</Message>
<RequestId>F0F7301EF1873635</RequestId>
<BucketName>made-up-name-for-wikipedia</BucketName>
<HostId>9gT/YmUa7EZXIm9FNv7GGThAre8Kn5CEfXpoJpthwuq54Pm+5RRcThAdBa20XsLj</HostId>
</Error>
Connection closed by foreign host.

Notice how the XML error text is intended for machine interpretation, rather than end users. The RequestId element can be used in support requests, as Amazon log every request, at least for a few days.

Other Services?[edit]

Are there other services than Amazon S3 that have the same concept? Tried to find comparisons and so on but no luck —Preceding unsigned comment added by 67.224.157.246 (talk) 19:03, 31 July 2008 (UTC)

There are several other services that offer on-demand raw storage via a services interface. Examples include Nirvanix (private, based in San Diego) and Rackspace Cloud Files (Rackspace is public, based in San Antonio), but there are many others operating at smaller scale or trying to get off the ground. Amazon appears to be the most widely adopted at this point. Outsideshot (talk) 06:38, 1 April 2010 (UTC)

Links to S3 viewing services[edit]

After the link purge, the page was free of external links other than to S3 itself. But I can see links to web interfaces to S3 creeping in. Can I discuss what the policy should be here. I am personally against them as by sharing your private AWS keys with third parties, they get de-facto access to your credit card to run up bills across AWS. By sharing your private AWS keys across unencrypted HTTP connections, anyone on the net has effective access to AWS services under your user ID. Encouraging people to use these services is dangerous. —Preceding unsigned comment added by SteveLoughran (talkcontribs) 11:36, October 9, 2008

WP:EL should of course be the overriding policy, but to clarify in this case I removed the third-party links because (1) they served mostly to advertise services, rather than providing useful information about the topic, and (2) we can't possibly link to every S3-related resource (see WP:NOTLINK). If, as you say, using these services actually poses a security risk, then that is an even stronger reason for us not to link to them. Adam McMaster (talk) 17:42, 9 October 2008 (UTC)

Noteable Uses[edit]

DropBox seems to be one of the highest profile users of S3 for data storage —Preceding unsigned comment added by 207.47.252.53 (talk) 21:37, 9 December 2008 (UTC)

the rather perfect Kompoz.com also excellently uses Amazon S3 —Preceding unsigned comment added by 149.254.218.148 (talk) 21:12, 6 March 2009 (UTC)

Basecamp from 37signals seems to be using this service for uploaded file storage now too. --74.69.116.217 (talk) 03:31, 4 August 2009 (UTC)

Is there any criteria being applied to evaluate notable use in this context? Within this section alone there are three mentions (dropbox, kompoz, and 37signals) and only one is mentioned in the article. Ubuntu One is mentioned in the article but not here. What about JungleDisk, Elephantdrive, or Zmanda (or the many others)? —Preceding unsigned comment added by Outsideshot (talkcontribs) 06:56, 1 April 2010 (UTC)

Commentors?[edit]

"After seven months of using S3, Smugmug claimed to have saved almost $1 million in storage costs, though some commentors questioned SmugMugs claims, pointing out that S3 charges per month[10]."

The correct word is 'commentators', right? Google returns some results, but online Cambridge Dictionary doesn't. Lukaszsw (talk) 21:22, 8 August 2009 (UTC)

Guarantee[edit]

I don't know if there is a guarantee (or even what a guarantee really implies, does it mean they would have to pay you damages for lost data?), but there is a service level agreement that states: "AWS will use commercially reasonable efforts to make Amazon S3 available with a Monthly Uptime Percentage (defined below) of at least 99.9% during any monthly billing cycle (the “Service Commitment”). In the event Amazon S3 does not meet the Service Commitment, you will be eligible to receive a Service Credit as described below."

http://aws.amazon.com/s3-sla/ --Bijansoleymani (talk) 05:07, 4 November 2009 (UTC)

Limitations of S3[edit]

Seems like this article tout's S3's success without discussing many of its limitations. Some ideas for a new section:

  • No uptime guarantees
  • No concurrency guarantees
  • Not possible to determine if a change is committed globally or not.
  • Relatively large number of spontaneously failed requests (I've seen as high as 5%)
  • Objects limited to 5GB
  • Somewhat cumbersome interface

Please feel free to contribute your ideas here or build them into a new section to make the article more NPOV. Cheers, Vectro (talk) 16:56, 11 November 2009 (UTC)

I'm in favour of this, and would add "non-standard semantics for move/copy operations, rather than supporting the WebDav operations", and "no way to put a hard limit on your bandwidth use". One issue though, it may get blocked off as original research -as my other work did- unless there is some external citation of it that we can point to —Preceding unsigned comment added by SteveLoughran (talkcontribs) 12:08, 12 November 2009 (UTC)
These limitations have been blogged and documented extensively, so I don't think it would be hard to add citations. Unfortunately, I don't have the time to add these changes, but feel free to be WP:BOLD and go ahead! Vectro (talk) 02:48, 30 November 2009 (UTC)

Downtime[edit]

Apparently S3 was down for several hours on 15 February 2008, causing much pain to many customers whose businesses were therefore down as well. Shouldn't the article mention this incident and what Amazon did about it? As well as their subsequent record, which I haven't looked for yet.

Citations:

Tualha (Talk) 10:27, 11 February 2011 (UTC)

I agree, the article should cover this outage. It was a significant event in the field of distributed computing, and it was analyzed by some of the top people in the field. I'd wager there is enough material out there to write an entire WP article on the subject of the 2008 S3 outage alone, and link to it from here. It's mentioned in the WP article on Byzantine fault tolerance.
Also I would note that there was some debate over whether the affected customers would have experienced a service outage had they been making proper use of Amazon's availability zones feature. I haven't investigated enough to have an informed opinion either way, but I suspect a NPOV article might have to include perspectives from both sides of that argument. 137.254.4.5 (talk) 17:37, 4 July 2012 (UTC)

Type of site Online backup service?[edit]

S3 Objects are commonly used for many other purposes other than online backup - e.g. serving both static and dynamic web assets , direct website hosting, storage for web/desktop/server applications, file system storage, etc.

I'm not sure what term it can be classified as, but it's definitely not just for online backups. Suggestions are welcome --cookingwithrye (talk) 07:11, 5 March 2013 (UTC)


I am in agreement with this, perhaps "online storage service?" It's a Fox! (What did I break) 07:11, 6 April 2013 (UTC)