|This is the talk page for discussing improvements to the Amazon S3 article.|
|This article is of interest to the following WikiProjects:|
Total Objects Stored
I noticed that the total objects stored has a citation needed and I have generally gotten this info from Werner Vogel's (Amazon CTO) blog. I propose editing the sentence about total objects to read:
Amazon S3 is reported to store more than 14 billion objects as of January 2008, up from 10 billion in October of 2007.
With citation to point to: http://www.allthingsdistributed.com/2008/03/happy_birthday_amazon_s3.html
The text regarding this topic now reads: "Amazon S3 is reported to store more than 102 billion objects as of March 2010[update]. This is up from 64 billion as of August 2009[update], 52 billion objects as of March 2009, 29 billion objects as of October 2008, 14 billion objects as of January 2008, and from 10 billion in October 2007." Um, this is SCREAMING for a chart. Show, don't tell. How 'bout it, people? 188.8.131.52 (talk) 14:08, 18 August 2010 (UTC)
- I thought that the CNAME record needed to point to s3.amazonaws.com, but according to  (retrieved Jan 10, 2010) "The CNAME DNS record should alias your domain name to the appropriate virtual hosted style host name. For example, if your bucket name (and domain name) is images.johnsmith.net, the CNAME record should alias to images.johnsmith.net.s3.amazonaws.com. [...] Setting the alias target to s3.amazonaws.com also works but may result in extra HTTP redirects." --Vrmlguy (talk) 01:23, 11 January 2010 (UTC)
I'm confused. There was once a list of promotional links that was removed. Makes complete sense.
There was also a list of well known services and companies that utilized the service in the introduction. That was removed and I'm not sure why since I thought it added clear value and information (and since most weren't linked).
Then someone added a pretty clear plug for JungleDisk. Why hasn't that been removed, if the others have? I'm not sure that it should be, as it is an excellent example of a deployed product that uses the service, but it certainly is not the only one and it certainly is not the most well known one. Both ElephantDrive and BeInSync are far better covered (NYT, WSJ, PC World, PC Mag for both) as solutions and users of S3. Is this special treatment? Outsideshot (talk) 03:28, 27 February 2008 (UTC)
- I think we should pull the promo links. Its harsh, because it does bring in traffic. But that is not Wikipedia's job. Now, if any of the identified companies published some good papers on working with S3, about what worked, what didn't and how best to interact with it, that would be worth linking to. SteveLoughran (talk) 19:06, 27 February 2008 (UTC)
I think that the current pricing information needs to be changed because it is inaccurate. Not only is the $1500 per terabyte-year incorrect (at $.15 per GB-month it will be $1800/year), but they recently (yesterday, 10/8) released tiered pricing. I will work on an edit now - let me know if you think this is a bad idea... Outsideshot (talk) 19:26, 9 October 2008 (UTC)
Oh, I added the price info before I noticed that someone just removed it. The article is deficient without some basic pricing info - without this, no-one can tell whether the service is free or cost (we didn't say). Describing a service does not make a Wikipedia article an advertisment. I have removed the description of the service as "inexpensive" - that's someone's opinion, and we need to let the reader make their own decisions about what is or isn't inexpensive. -- Finlay McWalter | Talk 21:18, 8 August 2007 (UTC)
- Please see the policy (Wikipedia is not a sales catalog) i referred to in the edit comment: articles should not include any price information unless there's a specific justification (such as a discussion about a notable sale, or price war).
- Contrariwise, "inexpensive" is not someone's opinion, but an important defining characteristic of the service that's verifiable at any of the sources. (It also communicates to the reader that the service is not free, regarding your concern.) —Piet Delport 2007-08-09 01:06
- I don't think Google Base overlaps. Google File System probably comes closest, but it's not available to the public. —Piet Delport 2007-08-09 01:12
I would say that Google Cloud Storage is now a direct competitor and should be listed now. The Wikipedia page for Google Storage lists S3 as a comparable solution http://en.wikipedia.org/wiki/Google_Storage — Preceding unsigned comment added by 184.108.40.206 (talk) 18:46, 28 November 2013 (UTC)
- I concur. I have started to add some technical content instead. It may be relevant to list libraries that provide access to the repository, and books that cover it. To list all the web sites that offer S3-hosted backup services would be like listing all web sites that use MySQL under the MySQL entry. SteveLoughran (talk) 19:09, 27 February 2008 (UTC)
- Especially the line about 99.999999999% durability. A single corrupted object is enough to break that, and I can't believe they have a perfect record so far. I have no information whether any data has been lost during outages, but with a service of this kind, _some_ loss is expected, at least unless every object is replicated off-site before any single write succeeds (and that would give noticeable delays, which we don't get). KiloByte (talk) 22:31, 26 May 2012 (UTC)
OR removed from page
I removed the following section due to its entirely being original research and speculation. Per User:SteveLoughran's request, here it is.
The implementation details of S3 are not documented, but it is possible to infer aspects of it from the behaviour (and pricing) of the service.
- There is an S3 repository in the US; there is another in Europe. Their locations are not public.
- From an EC2 server, read access can initially be slow. This could imply that S3 data is not always in the same physical location as all the EC2 servers.
- Amazon do not bill for data transferred between the US EC2 server farm and the US S3 store, but they do for access between the servers and the European S3 store. While the S3 store and the EC2 servers may be in separate locations, they must be close enough together that cost of transferring data is neglible. Both services are presumably hosted on the same Metropolitan Area Ethernet.
- Connectivity between the S3 datastores and the rest of the Internet is good. This implies they are close to one of the main Internet exchanges.
- After the initial slow access, later accesses are fast. This could imply that the data is cached.
- Write access is tangibly slower than writing to a local hard disk. Amazon guarantee that when the operation has completed, data has been written to disk in multiple locations.
- Amazon warn that even after data has been written, old values may still be read. This implies that there can be more than one cache of the data, and that every GET request does not trigger a check from the front-end caches to the back-end store.
- Amazon adjusted their initial pricing from a simple flat rate per GB to one that charges small frequently retrieved items per thousand GET or HEAD operations, rather than purely per byte. There may be a flat cost for every entry (indexing, billing entries) and a per-request cost in CPU-time for every access. These fixed overheads are now billed for.
The dynamic per-bucket DNS entries are implemented by having a custom S3 DNS server that returns a hostname for every host under the
s3.amazonaws.comsubdomain, even if a bucket of that name has never been created:> nslookup made-up-name-for-wikipedia.s3.amazonaws.com Non-authoritative answer: made-up-name-for-wikipedia.s3.amazonaws.com canonical name = s3-directional-w.amazonaws.com. s3-directional-w.amazonaws.com canonical name = s3-1-w.amazonaws.com. Name: s3-1-w.amazonaws.com Address: 220.127.116.11
At the identified host, there is a web server that always serves up 404 error pages when a GET request against a nonexistent bucket is issued.> telnet made-up-name-for-wikipedia.s3.amazonaws.com 80 Trying 18.104.22.168... Connected to s3-1-w.amazonaws.com. Escape character is '^]'. GET / HTTP/1.0 host: made-up-name-for-wikipedia.s3.amazonaws.com HTTP/1.1 404 Not Found x-amz-request-id: F0F7301EF1873635 x-amz-id-2: 9gT/YmUa7EZXIm9FNv7GGThAre8Kn5CEfXpoJpthwuq54Pm+5RRcThAdBa20XsLj Content-Type: application/xml Date: Wed, 27 Feb 2008 18:54:42 GMT Connection: close Server: AmazonS3 <?xml version="1.0" encoding="UTF-8"?> <Error><Code>NoSuchBucket</Code> <Message>The specified bucket does not exist</Message> <RequestId>F0F7301EF1873635</RequestId> <BucketName>made-up-name-for-wikipedia</BucketName> <HostId>9gT/YmUa7EZXIm9FNv7GGThAre8Kn5CEfXpoJpthwuq54Pm+5RRcThAdBa20XsLj</HostId> </Error> Connection closed by foreign host.Notice how the XML error text is intended for machine interpretation, rather than end users. The
RequestIdelement can be used in support requests, as Amazon log every request, at least for a few days.
Are there other services than Amazon S3 that have the same concept? Tried to find comparisons and so on but no luck —Preceding unsigned comment added by 22.214.171.124 (talk) 19:03, 31 July 2008 (UTC)
- There are several other services that offer on-demand raw storage via a services interface. Examples include Nirvanix (private, based in San Diego) and Rackspace Cloud Files (Rackspace is public, based in San Antonio), but there are many others operating at smaller scale or trying to get off the ground. Amazon appears to be the most widely adopted at this point. Outsideshot (talk) 06:38, 1 April 2010 (UTC)
Links to S3 viewing services
After the link purge, the page was free of external links other than to S3 itself. But I can see links to web interfaces to S3 creeping in. Can I discuss what the policy should be here. I am personally against them as by sharing your private AWS keys with third parties, they get de-facto access to your credit card to run up bills across AWS. By sharing your private AWS keys across unencrypted HTTP connections, anyone on the net has effective access to AWS services under your user ID. Encouraging people to use these services is dangerous. —Preceding unsigned comment added by SteveLoughran (talk • contribs) 11:36, October 9, 2008
- WP:EL should of course be the overriding policy, but to clarify in this case I removed the third-party links because (1) they served mostly to advertise services, rather than providing useful information about the topic, and (2) we can't possibly link to every S3-related resource (see WP:NOTLINK). If, as you say, using these services actually poses a security risk, then that is an even stronger reason for us not to link to them. Adam McMaster (talk) 17:42, 9 October 2008 (UTC)
- Is there any criteria being applied to evaluate notable use in this context? Within this section alone there are three mentions (dropbox, kompoz, and 37signals) and only one is mentioned in the article. Ubuntu One is mentioned in the article but not here. What about JungleDisk, Elephantdrive, or Zmanda (or the many others)? —Preceding unsigned comment added by Outsideshot (talk • contribs) 06:56, 1 April 2010 (UTC)
"After seven months of using S3, Smugmug claimed to have saved almost $1 million in storage costs, though some commentors questioned SmugMugs claims, pointing out that S3 charges per month."
I don't know if there is a guarantee (or even what a guarantee really implies, does it mean they would have to pay you damages for lost data?), but there is a service level agreement that states: "AWS will use commercially reasonable efforts to make Amazon S3 available with a Monthly Uptime Percentage (defined below) of at least 99.9% during any monthly billing cycle (the “Service Commitment”). In the event Amazon S3 does not meet the Service Commitment, you will be eligible to receive a Service Credit as described below."
Limitations of S3
Seems like this article tout's S3's success without discussing many of its limitations. Some ideas for a new section:
- No uptime guarantees
- No concurrency guarantees
- Not possible to determine if a change is committed globally or not.
- Relatively large number of spontaneously failed requests (I've seen as high as 5%)
- Objects limited to 5GB
- Somewhat cumbersome interface
- I'm in favour of this, and would add "non-standard semantics for move/copy operations, rather than supporting the WebDav operations", and "no way to put a hard limit on your bandwidth use". One issue though, it may get blocked off as original research -as my other work did- unless there is some external citation of it that we can point to —Preceding unsigned comment added by SteveLoughran (talk • contribs) 12:08, 12 November 2009 (UTC)
Apparently S3 was down for several hours on 15 February 2008, causing much pain to many customers whose businesses were therefore down as well. Shouldn't the article mention this incident and what Amazon did about it? As well as their subsequent record, which I haven't looked for yet.
- I agree, the article should cover this outage. It was a significant event in the field of distributed computing, and it was analyzed by some of the top people in the field. I'd wager there is enough material out there to write an entire WP article on the subject of the 2008 S3 outage alone, and link to it from here. It's mentioned in the WP article on Byzantine fault tolerance.
- Also I would note that there was some debate over whether the affected customers would have experienced a service outage had they been making proper use of Amazon's availability zones feature. I haven't investigated enough to have an informed opinion either way, but I suspect a NPOV article might have to include perspectives from both sides of that argument. 126.96.36.199 (talk) 17:37, 4 July 2012 (UTC)
Type of site Online backup service?
S3 Objects are commonly used for many other purposes other than online backup - e.g. serving both static and dynamic web assets , direct website hosting, storage for web/desktop/server applications, file system storage, etc.