Talk:Google File System

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Google (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Google, a collaborative effort to improve the coverage of Google and related topics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 

Access control vs. cacheing[edit]

How can users cache metadata if the master-server is used for file-access control? This makes little sense and someone knowledgeable should make corrections. —Preceding unsigned comment added by 193.15.216.70 (talkcontribs)

My understanding is that each client asks the master whether they can access the chunk, which is different from asking where to find the chunk. Not especially secure a system, but on the other hand it is usually running on a private secure network... --maru (talk) contribs 17:57, 13 April 2006 (UTC)

Request for Elaboration[edit]

It would be nice if someone more knowledgeable than I could point to real-whorl applications or similar systems, and add them to the body of the article. --Maru Dubshinki 09:30 PM Sunday, 06 March 2005

Grammar[edit]

Is it 'high data throughput, at the expense of low latency' or 'high data throughput, at the expense of high latency'? The meaning is that in order to get a high data throughput, low latency performace is sacrificed. But now that someone else has edited it to the latter, I am no longer so sure of my grammar. (hmm... 'high latency' or 'low latency'? ...) --maru 01:26, 28 Apr 2005 (UTC)

Additional External Link[edit]

Please consider adding the following link as the article discusses at length GFS implementation: http://www.baselinemag.com/article2/0,1540,1985040,00.asp --Todd B --July 11, 2006

Working on it. Thanks for the link- http://www.baselinemag.com/article2/0,1540,1985046,00.asp and http://www.baselinemag.com/article2/0,1540,1985047,00.asp mention "BigFiles", which I'd not heard of before. --maru (talk) contribs 13:21, 11 July 2006 (UTC)

Dead links[edit]

The links to the GFS paper .pdf, and the ZDnet article are dead links. http://news.zdnet.com/2100-9588_22-5596811.html http://labs.google.com/papers/gfs.html Both return file not found errors. 22 Nov 2011 74.95.124.50 (talk) 02:57, 23 November 2011 (UTC)

I've fixed the ZDnet link. I don't know about the Lab page & PDF - they both still register in a Google search for the paper, so maybe it's a temporary error? --Gwern (contribs) 03:42 23 November 2011 (GMT)
The GFS .pdf link is gone with everything else that was Google Labs. http://googleblog.blogspot.com/2011/10/fall-sweep.html Google shut down the site without even correcting its own links as Gwern notes (thanks for the quick fix). One copy of the .pdf exists here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.789&rep=rep1&type=pdf --74.95.124.50 (talk) 23:47, 23 November 2011 (UTC)
Oy vey. So that's what they meant by shutting Labs down? Fine, I've just updated the PDF link and remove some of the others. --Gwern (contribs) 00:25 24 November 2011 (GMT)
The google link to the GFS paper, along with some historical influences still exist via the following page - http://research.google.com/people/hgobioff/ 5 February 2012 — Preceding unsigned comment added by 24.218.43.47 (talk) 22:25, 5 February 2012 (UTC)

the issue of the GPL[edit]

The part about the GPL in the first sentence of the article is misleading, at best. Regardless of how exactly something is (or isn't) derived from a GPL'd piece of software, no entity (corporation or individual) is compelled by the GPL to release anything unless they redistribute their altered code in binary form. Google (or anyone else) is perfectly free to mercilessy hack away any GPL'd piece of code they want, and as long as they only use it "in-house", for their own purposes, and don't redistribute it (whether for a fee or not is immaterial), then they have no obligation to release source. Considering what a sticky area this is, I didn't want to just hack up that sentence, but something along the lines of "Google has shown no interest in releasing their filesystem, either for profit or for the good of the Internet community" would be more accurate. Reference to the GPL is probably superfluous, and should simply link to another appropriate article if it needs to remain.

Embedded or not, even if "The only way it is available to another enterprise is in embedded form--if you buy a high-end version of the Google Search Appliance", shouldn't they still release the GFS under the GPL (this is distribution after all)? Chutz 10:19, 14 February 2007 (UTC)

Criticism[edit]

Ok, is it really necessary to include a section criticizing a product that is not even available to the public. I mean, the program may be very bad or very good, but is there any real point in saying anything about its quality if it's only used internally by Google, and not by anyone else? If no one but Google is using this, then why would there be any public criticism in the first place. Avador 18:02, 23 December 2006 (UTC)

I removed the section; the criticism is not for a release product (as you mentioned) and was entirely inaccurate (as the first line in the section stated). Osmaker 00:10, 18 January 2007 (UTC)

netgfs listed on googlecode.com[edit]

The project is listed here: netgsf on code.google.com but I wonder if it's really going to be open-sourced someday, or if it's only there to be accessed for internal use. Self Torture 03:52, 2 February 2007 (UTC)

generic no size file system[edit]

how about a no size file system?

Start table:

Entry type:File name:file location on disk:file size .... .... .... Entry type=pointer to next table:File name:file location on disk:filesize —The preceding unsigned comment was added by 220.227.207.194 (talk) 12:39, 11 May 2007 (UTC).

Merge?[edit]

I would like to propose that this article be merged into a generic Google technology article (perhaps Google platform, though that suffers from WP:V problems as well). A closed-source filesystem with no open source analogue and no use outside of a specific company (and which has limited applications outside a few types of read-centric applications) doesn't seem like something Wikipedia needs an article for. Don't get me wrong. I love the technical implications of this as much as the next guy, but an internal-only product can't really satisfy WP:V. JRP (talk) 23:05, 29 October 2008 (UTC)

I don't think it's true that there is currently no FLOSS analogue (the see alsos seem to have at least 2). As for WP:V issues - it seems quite well documented to me. Multiple MSM articles and research papers, which is more than most articles on software.
And it's widely used inside Google: all their prominent services like search and GMail are running on GFS one way or another. (The interview I linked makes this clear.) --Gwern (contribs) 09:29 12 August 2009 (GMT)

Google File System is a specific entity that is disjoint from the Google Platform. Although it is a proprietary system, its methods are of scientific interest (cf. Hadoop) completely apart from the marketing and usage interests of Google Platform. Claiming that Wikipedia should not support a concept simply because it's, at present, only used internally to one company, seems provincial and shortsighted to me. I recommend keeping the articles separate. 98.151.17.47 (talk) 19:44, 31 March 2010 (UTC)John

Name ambiguity[edit]

GFS stands for Global File System, should you replace any GFS with GoogleFS? —Preceding unsigned comment added by 192.132.34.15 (talk) 12:39, 16 April 2009 (UTC)

I don't follow. GFS can stand for many things. --Gwern (contribs) 09:26 12 August 2009 (GMT)

high level or low level[edit]

Is this filesystem low level (like FAT) or rather just a overlay for anoter file system (like WinFS on NTFS)? —Preceding unsigned comment added by 148.81.137.4 (talk) 23:50, 21 June 2009 (UTC)

The physical operations are handled by Linux filesystems like ext3. --Gwern (contribs) 09:25 12 August 2009 (GMT)

Possible Typo in Performance Section[edit]

In the Performance section the speed for a small number of nodes is listed as 80-100 MB/s, and the speed for a large number of nodes is 583 Mb/s (note: megabits, not megabytes). This equates to 72.875 MB/s. This doesn't check out logically. Can somebody help me verify the correct measurement?

Razbarrie (talk) 00:41, 26 November 2011 (UTC)

All measurements are megabytes; sorry if I confused you, I always have difficulty remembering which is which (it's rare enough that I always assume an author means megabytes unless otherwise specified). --Gwern (contribs) 00:51 26 November 2011 (GMT)