Talk:Computer cluster

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Mid	This article has been rated as Mid-importance on the project's importance scale.
	This article is supported by Networking task force (assessed as Mid-importance).

Xbox clusters

The reference for Xbox clusters no longer exists and I never heard any attempt about Xbox or Xbox 360 clusters. Possibly because one of the main differences between a Xbox 360 and a PS3 is the processors capacity, being PS3's processors extremly desired for supercomputer/cluster usage. While there's an wikipedia article about PS3 clusters, I see no justification for the Xbox cluster mention. Agreed? (200.192.66.252 (talk) 14:32, 1 June 2010 (UTC))[reply]

Shameles Plug

I removed the shameless plugs under Load balanced clusters. —Preceding unsigned comment added by 206.173.47.4 (talk) 16:25, 2 May 2008 (UTC)[reply]

PVM and MPI

Where is PVM? MPI?--80.98.246.107 22:39, 14 Jun 2004 (UTC)

System X at Virginia Tech

The article talks about VA Tech's X system but doesn't even mention the top 500 or any other clusters... sounds like a shameless plug to me...--Dhuss 16:27, 21 Jun 2004 (UTC)

Especially it gives absolutely no information why the claimed "benefits in price / performance" are realized in that particular cluster. Absolutely worthless information from the last decade. — Preceding unsigned comment added by 85.180.173.24 (talk) 06:58, 10 May 2014 (UTC)[reply]

Fisheye Lens??

Does someone have a better image of a cluster? The fisheye image looks slightly juvenile or un-proffesional and out of place in a encyclopaedia.

As the previous author stated, there must be a mention of PVM and MPI compliant resource distribution framworks, MPICH (Open MPI implementation) and DIPC (Network System V IPC). The author then goes on to affirm that beowulf is software, Beowulf is not software more than a philosophy and methodology of applying open and free operating systems, GNU/Linux; with commodity hardware based on a high performance network to acheive a distributive computing enviroment.

Computer Clusters

Clustering was indeed "invented" by Digital Equipment Corp. in the 1980s. Actually, the VAXcluster software (as it was then known) was a response to the needs of VMS users who were already sharing files between systems over DECnet and needed a more robust, more integrated strategy for sharing resources among a group of computer systems.

OpenVMS clusters remain in use today. Some of the few operations to survive 9-11 were OpenVMS clusters where redundant "shadow" sites were outside the area destroyed in the attack.

To date, no other clustering solution has provided the redundancy, parallelism and disaster survivability offered by OpenVMS clusters. Some approaches are now catching up to where OpenVMS (then known as VAX/VMS) was twenty(20) years ago.

Reference: http://www.hp.com/go/openvms

David J Dachtera

It seems odd that the article claims that the first Tandem contribution to clustering dates to the 1994 Himalaya. In fact, the very first Tandem product, the T/16 in 1976, was a 2 to 16 CPU, single system image cluster. The clustering bus was a dual 10 MBytes/sec proprietary bus (total 20 Mbytes/sec). Can someone explain why it is not until the Himalaya that the Tandem architecture qualifies as a cluster? If the 1977 date for Arcnet is correct, then it could not have been the first commercial cluster, since the T/16 was around a year earlier. — Preceding unsigned comment added by 71.198.24.202 (talk) 15:02, 28 July 2011 (UTC)[reply]

Well the probem is, was a Tandem a cluster or a computer? As a TXP user I always saw my machine as a single multiprocessor computer. Tandem did have clustering software before himalaya (we had machines in Paris and Munich in a cluster (what was the software called?)) but connected by low speed serial links. AFAIK Himalaya introduced the high speed fibre optic links between machines. HughesJohn (talk) 02:41, 14 August 2011 (UTC)[reply]

misnomers, etc.

The taxonomy is such that I've never heard before - and I run one of the projects it refers to (http://linux-ha.org), and perform technical reviews of books on the subject, and am the author of a commonly-referred-to web site on the subject. This is basically about 70% right and about 30% wrong.

There are lots of taxonomies of clusters, but the one chosen is really not very good, and the article isn't very well organized, and will likely confuse people more than it will enlighten them.

Director clusters are more often called "Load Balancing Clusters", or server farms, or various other things.

Two node versus multi-node is not a useful distinction. A more useful distinction (at least for the present time) is "High-Availability (HA) Clusters" versus "High-Performance Clusters". Some HA clusters have 2 nodes, and some have more than 2. Two is a minimum.

"Massively Parallel" isn't defined - but it's probably intended to mean High Performance Clusters. But, high-performance clusters aren't necessarily massively parallel - they're just faster than 1 machine.

And, Grid computing is not the next phase of cluster computing. It's a related idea, like a network of workstations is a related idea. And, a grid is commonly no locally located. But the distinction isn't in how well connected they are, it's in how trust between the computers is managed.

A cluster is a collection of computers part of a single domain of trust (political entity) tasked to perform a set of jobs as though they were one computer. They spread this set of tasks across them, in various ways, and for many purposes as though they were a single computer.

A grid is a collection of computers part of many domains of trust tasked to act as a common computing utility for a plurality of user communities sprad across these many domains of trust.

Regarding who invented clusters - it's at best arguable, and perhaps words about "some of the earliest clusters were" rather than "XXX invented clusters". Pyramid also produced some of the earliest clusters around. And, like most things, clusters weren't "invented" by any company - it's an idea that grew up over time from things like collections of PDP-11s given a single task to do.

I may have a go at a partial rewrite of the page.

64.208.49.21 (talk) 10:02, 17 January 2008 (UTC) I assume you did the rewrite ?[reply]

Cluster Software

I'll be adding IBM's HACMP, since it's not mentioned in the article. Gbeeker 14:02, 16 September 2005 (UTC)[reply]

Also, MC/Service Guard is for HP-UX. And I am not sure why the list of software is split into open source and other. Gbeeker 14:28, 16 September 2005 (UTC)[reply]

In a more general note, the most important commercial implementations should have a place here. Microsoft is included, but there is no mention at all of SUN/Oracle RAC and others. —Preceding unsigned comment added by 64.208.49.21 (talk) 10:04, 17 January 2008 (UTC)[reply]

Notability of Cluster Software Products

I would like to open a discussion as to the notability of some of the cluster software products which are on the article's list right now. I know that there are tens of thousands of SunCluster and Veritas Cluster Server HA clusters out there; they're standard in enterprise organizations, and have been for most of a decade. HACMP and Beowulf and MSCS and LSF and Sun N1 Grid are all well known. Linux-HA is sufficiently strong and growing that I think is notable. MC Service Guard and the old VMS cluster stuff are from major vendor and have market presence. BOINC is widely known as SETI@HOME successor, etc.

Less well known would include Moab, NEC ExpressCluster, Parallel Sysplex (I know, it's real and legit and has been around for a long time, but pretty small market presence), Novell (same).

Possibly not notable, include KeyCluster (so not notable that its two-paragraph WP article is nominated for deletion), PolyServe, SteelEye.

I would like to propose that a line be drawn on notability, and not-notable products not be in the list. I think that the first group are clearly above the line, the last group are clearly below the line, and the middle group ... I don't know. Other opinions sought. Georgewilliamherbert 20:17, 16 February 2006 (UTC)[reply]

uh, notability isn't the same as market success. is Homo Habilis notable? Don't see many of 'em around today. HughesJohn (talk) 02:35, 14 August 2011 (UTC)[reply]

History

I did a fairly significant edit of the History section just now. See what you all think. I used as references both Greg Pfister's In Search of Clusters and numerous references from around Wikipedia (and a few I googled up outside as well). I think that it provides a lot more detailed view of how the development of clusters and the development of networking have gone hand in hand. It also removes what can only be called a commercial POV -- the idea that DEC invented the cluster. Pfister addresses this directly and I heartily concur -- compute clusters were probably invented by the first group that could afford to purchase more than one computer. I won't go to the extreme of claiming that the Bombe computers used to decrypt Enigma transmissions in the Ultra project formed a cluster, although I think it is arguable that they did, albeit one with human-based IPC's. However it is almost certain that Bombes were linked into clusters very shortly thereafter by intelligence agencies, and by the 1950's the probability that there were at least some covert clusters doing cryptography, among other things, approaches unity.

The less covert history of clusters goes hand in hand with the invention of packet switched networks, the Arpanet, and Unix, as I note. The Internet itself is basically the first cluster built on a packet switched network, all grown up. Socket based computing was being used over both local and grid like computing at the research level pretty much continuously from 1969 on, although the formal protocols weren't specified by means of RFC until much later. While DEC certainly was involved in the creation of a network stack in the form of DECnet at about the same time that IBM was introducing their own in the form of SNA, TCP/IP was clearly first and indeed the provided a clear demonstration of the necessity of any "player" in the world of computing having a network protocol for interconnecting machines into a cluster. If I had the patience to do the research or build the links, I'd do a more careful set of crossreferences that include DECnet and SNA, even though really they turned out to be nothing more than intermediate states in the development of large scale clusters based on commodity networks.

I also added the missing reference to the invention of PVM as being pivotal to the widespread implementation of HPC cluster computing, added explicit mention of the beowulf project, didn't add a discussion of MPI (as it was developed by commercial big iron supercomputer vendors and their users and didn't become a basis of commodity HPC clustering until long after PVM was well-established and the beowulf project itself was being begun).

The history section still, in my opinion, lacks a discussion of the history of HA compute clusters (aside from my passing mention of Tandem and IBM) -- this is not my speciality and so I leave it for somebody else who is more of an expert here. There are also still corrections that I agree need to be made in the original discussion of big famous clusters.

In particular Va Tech's cluster was something of a joke (seriously) when it was first built at great expense and with much fanfare -- and with no actual plan for the research that was to be done for it. Clearly a commercial venture by Apple trying to break into the cluster market, almost totally irrelevant from the point of view of "important clusters". REALLY inappropriate for a discussion of the cost-benefit of clusters (something I'm something of an expert on:-) as cost-benefit tends to be obscured when a cluster is built with an undisclosed price tag and obviously "very special pricing" on the part of the vendor and for no particular purpose but to put the cluster and hosting institution on the clustering map, so to speak (whatever it is being used for by now).

Finally, why in the world is there a statement that John Koza owns the largest cluster owned by an individual? First, the cluster referenced is owned by a corporation, not an individual. Second, nowhere in the universe that I know of is there a list of people who own large clusters and the size of the clusters owned. I've personally owned a cluster that contains between 8 and 10 nodes depending on how rich I feel (it costs around $1000/year to run a 10 node private/personal cluster for power and cooling alone). I know several other people on the beowulf list that own small clusters -- ballpark same size as mine. I have a hard time imagining that there are really rich people that can afford to pop $100K/year on a personal cluster with 1000 nodes unless it is really owned as a corporate entity, used for business, deducted on tax forms, payed for with income derived from same. There one has to really ask whether the words "owned as an individual" still mean anything. Is there any point whatsoever in including this in the article, or is this just somebody's personally inserted POV?

Anyway, just some thoughts. Since clustering is near and dear to my heart, I'll likely return to this page and make another round of fairly significant edits when next I have time. Let me know what you think.

Rgbatduke 16:56, 27 April 2006 (UTC)[reply]

I added the note on C.mmp/Hydra because that work generated a significant number of papers on tightly-coupled vs. loosely-coupled (C.mmp vs CM*), fault tolerant architecture (C.vmp) and how to do security and capability-based permissions in the OS (Hydra). There were other research clusters at the time. -Smallpond 18:37, 17 July 2006 (UTC)[reply]

Early "Clusters"

The article states that the B5700 was "the first production system designed as a cluster", and then proceeds to describe a typical multiprocessing system of that era. The distinguishing feature is that "each computer" can be "restarted" without taking the entire system down.

However, other, earlier systems also had those features. In particular General Electric's GE 600 systems, later rebranded as Honeywell 6000, were earlier (1962 v 1964). These also featured a modular system of multiple CPUs, multiple memories, multiple disks, and the ability to take processors on and off line. Under both GCOS and Multics, you could add and remove CPUs while the system was running. (I think it may have actually even have had process migration, so that your job kept running when you shut down a CPU. That's more sophisticated than not having to shut down the system in order to drop a CPU. Would have to check on the details.)

Perhaps the unsourced claim that the Burroughs machine was "first" should be investigated. Dicirnah (talk) 14:52, 26 October 2017 (UTC)[reply]

Shared Everything, Shared Nothing, Shared Disk

Should these be added here as a distinction in clustering technology? I cannot find any reference to them on Wikipedia and I understood them to be different clustering architectures.

Pixie2000 12:37, 16 October 2006 (UTC)[reply]

We should address them... ok. Taken as a request for enhancement. I'll work on it when time allows, or if someone else has bandwidth before then... Georgewilliamherbert 21:10, 16 October 2006 (UTC)[reply]

Please update the part about Nasa's "Lincoln" cluster

According to TOP500.org current list, of November 2006, there is not a single site (among those listed) running any version of the windows Operating System for distributed/parallel/cluster computing.

It kinda sends a message about windows, doesn't it ? —The preceding unsigned comment was added by 189.10.212.219 (talk) 03:30, 6 May 2007 (UTC).[reply]

Need to update the article with respect to Lincoln- 1) It doesn't have 455 blades, it has 192 Dell PowerEdge 1950 1U servers 2) It has 96 NVIDIA Tesla S1070 GPU accelerators 3) It runs RHEL5 linux as the base OS, Windows Computer Cluster is only run on demand for certain Users 4) It can be combined with the ABE cluster for really large runs see http://www.ncsa.illinois.edu/UserInfo/Resources/ for more info Dlapine (talk) 06:18, 11 February 2010 (UTC)[reply]

Grid computing

On the grid computing section it saids Seti@Home and another *@Home are grids, and they are not, they are p2p which is a very different approach to ubicuos computing. If someone with a good writing skill (that leaves me out) could correct this. --201.242.124.72 (talk) 18:15, 12 October 2008 (UTC)[reply]

Assuming you meant ubiquitous computing it is the case that Seti/etc. are volunteer computing approaches. The BOINC type approaches may be classified as grids, however, parts of grids may rely on peer-to-peer computations while others may be large servers. But, none of those belongs on this page which is about clusters. History2007 (talk) 15:40, 25 December 2011 (UTC)[reply]

Flash mob compurting

Flash mob computing should also be mentioned in the text. Not sure where dough (unfamiliar with types of clusters).

Flash mob computing was an attempt, but has not gone very far, somewhat like pop-up restaurants. It is not essential to understanding clusters, but I will mention it anyway. History2007 (talk) 15:46, 25 December 2011 (UTC)[reply]

US Government Warning?=

Just reworded a statement which indicated that the US government had warned about some countries obtaining gaming consoles to build military computers. The referenced web article merely quoted anonymous sources as saying that some people in the government had speculated that pre-2003 Iraq may have been doing this. Anonymous sources speculating didn't seem to qualify as anything as official sounding as a "warning" from the "US government". —Preceding unsigned comment added by 146.145.84.7 (talk) 18:50, 6 August 2009 (UTC)[reply]

Requested move

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the move request was: page moved. Vegaswikian (talk) 19:37, 14 July 2010 (UTC)[reply]

Cluster (computing) → computer cluster — WP:TITLE: use common names. Also per WP:TITLE, "computer cluster" is more recognizable and easier to find than "cluster (computing)", while no less concise or precise. Also note that "computer cluster" is the title term used to introduce the article. ENeville (talk) 15:13, 7 July 2010 (UTC)[reply]

Survey

Feel free to state your position on the renaming proposal by beginning a new line in this section with *'''Support''' or *'''Oppose''', then sign your comment with ~~~~. Since polling is not a substitute for discussion, please explain your reasons, taking into account Wikipedia's policy on article titles.

Support: I think bracketing should be avoided where there's a good alternative. -Hibernian1

Discussion

Any additional comments:

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

Article quality

The entire series on computer clustering has serious quality problems. The articles are neglected, confused, unkempt, often incorrect and usually an embarrassment for Wikipedia. I will try to clean them up during the next 3 months and work the series up to the clean up the supercomputing articles. History2007 (talk) 14:49, 20 December 2011 (UTC)[reply]

In any case, almost all of old text is gone now, except for part of the history section, so the comment above no longer applies. It turned out to be a 90% rewrite at least, but is in good shape now. I do not expect to add anything new now that node communication, failure management, etc. have been mentioned in correct form. I should, however, mention that several of the articles pointed to from here (e.g. Load balancing (computing), Message passing, etc.) need much work, and should not be relied upon on their own for now. History2007 (talk) 15:11, 30 December 2011 (UTC)[reply]

I asked this on the talk page for high-availability clusters, but... Could you explain in more detail the flaws in each of the related articles, as you see them?

Thanks. Georgewilliamherbert (talk) 02:32, 31 December 2011 (UTC)[reply]

I will get to those articles when/if I can, but as you see there are multiple tags on each. Message passing has been tagged as low quality since 2009 etc. Articles such as High-availability cluster have zero references, Load balancing has just one non-RS reference, etc. Automatic parallelization is specially confused and misses the top level theoretical issues, and Degree of parallelism needs no comment really, given its obvious state. Heartbeat private network is just a "two sentence article" without explaining the concepts or having any sources, etc., etc. etc. I do not, however, expect to have time to clean them all. It would take years of work and as I suggested on the talk for Message passing, new editors familiar with the topic need to be found. But that is another story. I will make comments on those as time allows, but do not expect to do that instantly - there are just too many articles that need serious help. History2007 (talk) 03:40, 31 December 2011 (UTC)[reply]

HA cluster has both the Pfister and Marcus/Stern books as general references. It's hard to go much deeper than that without invoking a particular vendor's how-to manuals or whitepapers (regrettably), though a fair and balanced (cough) mixture of vendors manual section references might work reasonably well.

The article has had a persistent problem of having not really notable vendors try to park very not notable cluster systems in the now-departed products section, which is why it isn't there anymore. ::::Georgewilliamherbert (talk) 04:08, 31 December 2011 (UTC)[reply]

I responded to the HA cluster issues on that page, since it relates to that article. But that article has zero inline references at the moment, as I stated there. History2007 (talk) 04:27, 31 December 2011 (UTC)[reply]

must be pushed to scratch ???

Can the phrase must be pushed to scratch be changed or explained? Ronbarak (talk) 09:14, 1 August 2013 (UTC)[reply]

It appears that the section "‎Characteristics of Clusters" was added by a single-purpose account that was quickly blocked as a "sockpuppet"? See Special:Contributions/MOmarFarooq Not sure about that, but the section had no sources and seemed incoherent, so I boldly removed it. W Nowicki (talk) 22:02, 1 August 2013 (UTC)[reply]

Pure irony in just one section

I find it at most ironic how the benefits section does not benefit the article at all. Seriously, who wrote it? A marketer? I've tagged it accordingly. FosterHaven (talk) 06:05, 2 September 2014 (UTC)[reply]

Datapoint ARC System

Somebody writes that Digital Equipment Corporatin invented the computer cluster in 1983.

Datapoint offered a clustered computer system, called the Attached Resource Computer system or ARC system, as a commercial product in 1977. They did not call it the first cluster, or even a cluster. They did claim it was built onto the first commercially available local area network, now known as ARCnet. I know, I was there, but I don't know how to cite it in the article.

A Datapoint ARC system used up to 255 small, weak (8008-like) computers, some dedicated to computing, others to serving disks. The Digital VAXcluster initially used up to six fairly powerful VAX computers. ARC was meant to allow incremental capacity growth. It did not provide high availability as VMSClusters did. 72.211.218.139 (talk) 06:51, 4 September 2014 (UTC)[reply]

"Cluster (computing" listed at Redirects for discussion

An editor has identified a potential problem with the redirect Cluster (computing and has thus listed it for discussion. This discussion will occur at Wikipedia:Redirects for discussion/Log/2022 October 27#Cluster (computing until a consensus is reached, and readers of this page are welcome to contribute to the discussion. Steel1943 (talk) 20:01, 27 October 2022 (UTC)[reply]

Perhaps this isn't the place to do so (but I don't see anywhere else), but I'm in favor of "Cluster (computing)" (with the closing ')', of course). The reason is that the plain term "cluster" is almost ubiquitous in computer science these days, so a CS type will look it up as that, not "Computer cluster". BMJ-pdx (talk) 18:37, 18 September 2023 (UTC)[reply]