Jump to content

Talk:Single system image: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
VanishedUserABC (talk | contribs)
thread from too long ago
Line 204: Line 204:


[[User:Ericfluger|Ericfluger]] 19:49, 10 March 2007 (UTC)
[[User:Ericfluger|Ericfluger]] 19:49, 10 March 2007 (UTC)

:I agree. I came to this page expecting references to SGI machines that are generally not considered to be 'clusters' but are called SSI, but instead read all about clusters that appear to be a single machine. IMO, SSI is more than just *pretending* to be a single image...they are a single computer, or are so at a much lower level than clusters. Of course, I'm mostly talking about the old IRIX machines. Perhaps things have changed.

The above URL has gone. Here's a new one : http://www.sgi.com/products/servers/uv/

:[[User:Davidmaxwaterman|Davidmaxwaterman]] ([[User talk:Davidmaxwaterman|talk]]) 11:35, 20 August 2012 (UTC)


==Proposed rewrite==
==Proposed rewrite==

Revision as of 11:35, 20 August 2012

WikiProject iconComputing C‑class Mid‑importance
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-importance on the project's importance scale.
Collapse thread about the article in a previous life. Has changed since then.

This article exposes several troubling issues

  • Distributed operating system has absolutely no business being redirected to this article
  • This article makes grossly negligent and misrepresentative statements in its lead and throughout
  • This article uses multiple references in an inappropriate, inaccurate, and fallacious manner
  • If these claims are substantiated, this article should be nominated for deletion; or moved for repair
I will be very happy to assist with this page; but I must finish others first... JLSjr (talk) 09:13, 28 April 2010 (UTC)[reply]

Distributed operating system has absolutely no business being redirected to this article

A Distributed operating system is the operating system of a specifically designed distributed system, while the single system image is an attribute of some systems that employ a distributed computing model. Why would a search for a specific variety of operating system redirect here, to an attribute of a system. A single system image is not a cluster of machines! Single system image is a paradigm of design in distributed computing.

This article makes grossly negligent and misrepresentative statements in its lead and throughout

The wholesale concept of the single system image is that of an image, a perception, an illusion that is created for the user of the system. In the opening statement of the lead, this article attempts to morph the single system image into a cluster of machines. If this is an article about a Single system image (SSI) cluster, it should be titled accordingly.

The article continues throughout, using the metaphoric terms of SSI system and SSI cluster. Only three times is SSI used in the whole word sense: between the parentheses in the lead sentence, (correctly) in the title of the table Properties, and finally in the reference for the DragonFly entry in that same table (usage of single system image is the same). This article is not about SSI, it is about systems/clusters that express the SSI.

This article uses multiple references in an inappropriate, inaccurate, and fallacious manner

The three references that attempt to congeal a loose idea that the SSI is synonymous with a distributed operating system, are embarrassingly fallacious in their usage. Since I am primarily concerned with the DOS relationship, I have not checked the other half of the references. I outline this claim in detail below.

Since no page numbers were included in the references, I took the liberty of including every instance of the context of: single system image and distributed operating system, in the source.


Reference #3

Distributed Systems: Concepts and Design

By contrast, one could envisage an operating system in which users are never concerned with where their programs run, or the location of any resources. There is a “single system image.” The operating system has control over all the nodes in the system, and it transparently locates new processes at whatever node suits its policies.

— GEORGE COULOURIS, GEORGE F. COULOURIS, et. al, Distributed Systems: Concepts and Design, Pg. 222

An operating system that produces a single system image like this for all the resources in a distributed system is called a distributed operating system.

— GEORGE COULOURIS, GEORGE F. COULOURIS, et.al., Distributed Systems: Concepts and Design, pg. 223

These passages state that a single system image is something that removes a system’s users from all concerns regarding where any process runs or where any resource resides. They also say that the distributed operating system produces a single system image for all processes and resources. Absolutely nothing in these passages either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


Reference #4

Scheduling in Distributed Computing Systems: Analysis, Design and Models

The operating systems commonly used for DCS can be broadly classified into two types- Network Operating System (NOS) and Distributed Operating System (DOS). ...In NOS, the users are aware of the multiplicity of the machine and can access the resources either by ... On the other hand; in DOS the users would not be aware of the multiplicity machines. [DOS] provides a single system image to its users.

— DEO P. VIDYARTHI, BIPLAB K. SARKER, et.al., Scheduling in Distributed Computing Systems: Analysis, Design and Models, pg. 8

This passage states clearly that the concept of Single system image is an encapsulating quality of a system, hiding its internal complexity, causing it to be perceived as a single entity. DOS provides this quality. Absolutely nothing in this passage either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


Transparency is one of the biggest issues in the design of a DCS that gives the DCS a single system image.

— DEO P. VIDYARTHI, BIPLAB K. SARKER, et.al., Scheduling in Distributed Computing Systems: Analysis, Design and Models, pg. ix

Achieving complete transparency is a difficult task and research is still continuing on this issue. Of the several transparency issues identified by the ISO Reference Model for Open Distributed Processing, location transparency, migration transparency, and concurrency transparency are very important.

— DEO P. VIDYARTHI, BIPLAB K. SARKER, et.al., Scheduling in Distributed Computing Systems: Analysis, Design and Models, pg. 10

These passages state transparency is one of the most important design goals in DCS. It continues to define three important sub-aspects of transparency. More importantly, the earlier passage reveals that transparency provides DCS with SSI. Absolutely nothing in these passages either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


Reference #5

Operating System Directions for the Next Millennium

Achieving complete transparency is a difficult task and research is still continuing on this issue. Of the several transparency issues identified by the ISO Reference Model for Open Distributed Processing, location transparency, migration transparency, and concurrency transparency are very important.

— WILLIAM J. BOLOSKY, RICHARD P. DRAVES, et. al., Operating System Directions for the Next Millennium, pg. 2

This passage states transparency is one of the most important design goals in DCS. It continues to define three important sub-aspects of transparency. More importantly, the earlier passage reveals that transparency provides DCS with SSI.. Absolutely nothing in this passage either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


Operating System Directions for the Next Millennium

- Security. Although a single system image is presented, data and computations may be in many different trust domains, with different trust domains, with different rights and capabilities available to different security principals. Like the Internet, the system should allow non-hierarchical trust domains with no central authority necessary.

— WILLIAM J. BOLOSKY, RICHARD P. DRAVES, et. al., Operating System Directions for the Next Millennium, pg. 2

This passage involves outlining the goals of the Millennium Project (a Microsoft Operating System); under the bullet-point of Security, the author speaks to "SSI" having no bearing on the application of Security in a Distributed Operating system. Absolutely nothing in this passage either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


— Location-irrelevance. Objects should be allowed to reference each other and invoke operations without regard for their current location or replication state. The system should have a seamless appearance despite its underlying distributed nature.

— WILLIAM J. BOLOSKY, RICHARD P. DRAVES, et. al., Operating System Directions for the Next Millennium, pg. 2

The use of the term "seamless appearance" IS synonymous to "Single System Image; and as such, is not synonymous with a distributed operating system. Absolutely nothing in this passage either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


Legion [Grimshaw et al. 97] proposes using an extensible object model to provide a single-system model for a worldwide network of computers.

— WILLIAM J. BOLOSKY, RICHARD P. DRAVES, et. al., Operating System Directions for the Next Millennium, pg. 6

This passage speaks to a specific project system, "Legion" and a proposal to use an Object-model that would provide a SSI as an aid to extensibility. Absolutely nothing in these passages either implicitly or explicitly correlates SSI and DOS as synonymous. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


According to some of the article’s other sources:

Reference #6

Grid and Cluster Computing

by C.S.R. Prabhu

In a cluster environment, a Single System Image is expected; all processors in a cluster will function as if they are one system.

— C.S.R. PRABHU, Grid and Cluster Computing, pg. 177

If there is a distributed operating system in a cluster, then this distributed operating system will perform all the jobs of allocating the incoming processes to various processors in the cluster. It will also perform all the component load sharing, distribution and load balancing functions.

— C.S.R. PRABHU, Grid and Cluster Computing, pg. 177

The initial passage presents SSI as inseparable from the cluster. The latter passage states the cluster may or may not involve a distributed operating system. These passages directly contradict the correlation of SSI and DOS as synonymous. If SSI is required, but DOS is not, how can they be synonymous? If a = b, and b = c sometimes; can it be considered that a = c? No; if the associative link is broken – in any way – the rule (or synonym, in this case) no longer holds. This reference, in relation to the statement, “The concept [of SSI] is often considered synonymous with that of a distributed operating system,” is fallacious.


Reference #1

In search of clusters

by Gregory F. Pfister

Your search - "distributed operating system" - did not match any documents.


Reference #2

Single System Image (SSI)

by Rajkumar Buyya

A single system image (SSI) is the property of a system that hides the heterogeneous and distributed nature of the available resources and presents them to users and applications as a single unified computing resource.

— RAJKUMAR BUYYA, 'Single System Image (SSI), pg. 1

This passage represents the opening remark or first sentence of the paper, and states the point, "A single system image (SSI) is [a] property of a system;" and as such, a property of a system cannot be synonymous with an operating system.


Well, I'm not [Perry Mason]; but, it would appear there is at least room for discussion about everything from the basic direction of this article, to researching some appropriate and semantically and contextually accurate references. I know I'm a bit too passionate for my own good, but this has the opportunity to seriously misguide and delude those who come. More importantly, if someone who is aware of this level of menutiae; it seriously casts a bad light on the computing segment of Wikipedia. Again, I will help. Stay in touch, and when I finish my current projects, I'll dive in with ya'.

JLSjr (talk) 09:13, 28 April 2010 (UTC)[reply]

Reply

Your insanely anal rant notwithstanding, the two notions are not universally or precisely defined. The list of systems given in this article identify themselves as either "distributed operating system", "cluster operating system" or "single system image", despite having very similar traits and purpose. It's mostly a matter of marketing whether something is labeled "cluster OS", "distributed OS" or "SSI OS". Check the original documentation of the systems listed.

I've read your draft User:JLSjr/Distributed operating system, and your brain, my friend, completely lacks any power of abstraction. Yes, this article is crap (like most of Wikipedia) for it insists too much on clusters. Until a couple of weeks ago, distributed operating system redirected to parallel computing, which did not even mention the concept. And that state of affairs lasted for 3 years! Your article is however far worse than this one because it attempts to built a textbook architecture of a distributed OS by cropping together loosely-related sources. The bottom line is that the main trait of a distributed OS is the provision of a SSI, and this is supported by the various sources cited, your attempts to split hairs aside. Pcap ping 12:38, 28 April 2010 (UTC)[reply]

The main improvement that this article really needs is some more historical examples, and some connections with other notions. Your draft has some historical distributed OS examples. Those would be nice. But please, use some survey papers or books that treat these in perspective. The SSI notion precedes the moniker and "cluster computing" terminology, never mind the newer grid computing / cloud computing (off topic: read that talk page for a classic example of massive wiki circle jerk.) The notion of distributed OS or SSI is not limited to clusters, understood as machines in the same location (possibly HPC cluster if the interconnect is beefier, especially in terms of latency), but the SSI / DOS concept has been mostly implemented in clusters for practical reasons. Although infrastructure like PlanetLab is now available to researchers (never mind Google or Amazon's own infrastructure), Internet-level "full" distributed OS has few practical benefits, so most the "cloud computing" stuff is usually limited to subsystems like storage, e.g., Google File System. So, a certain focus on clusters is inevitable in the discussion of DOS / SSI, because that's the level of scaling they've been capable of, so far. Pcap ping 13:25, 28 April 2010 (UTC)[reply]


Retort

Your insanely anal rant Your brain ...completely lacks any power of abstraction

This type of remark is unnecessary. I will continue to refer to the article, as it is the subject around which my issues revolve.


I made 3 statements, and 1 supposition; and they were:

  1. Distributed operating system should not redirect to a page entitled "Single system image."
  2. Statements are made throughout the article that misrepresent fact; and that these statements are both negligent and extreme in nature.
  3. Many of the references used in the article do not directly support their subject; and worse, in some cases not at all.


Let’s consider them in reverse order.

References used in the article:
"I clearly show that three of your references [3,4,5], in no way support the statement to which they are attached. In the references, the object terms do appear on the same page; but that’s it. There is no discernable continuity between these three references and the statement they allegedly support. Once again, each reference does mention DOS, and they do mention SSI, and in each reference both DOS and SSI are mentioned on the same page; however there is no statement, intimation, hint, or allegation in any of these three references cogent to the point they are indicated to support.

Each of these entities being compared do share certain aspects of relevance; despite our differences of opinion, we do agree on that much. It is critical to understand, this matters not. If the entities in question were identical twins, it would not matter. If the entities in question were two discrete pointers to the same instantiated object, it would not matter. The references do not support it; period.

I would like for it to be noticed, that in my earlier remarks, I did not indicate ignorance, stupidity, or "insanity" were possible as a cause. Please notice as well, I did not insinuate dishonest, surreptitious, illicit, or fraudulent intent. I indicated negligence: a simple lack of effort or diligence.

In any event, there can be no doubt to the refutation of the references; and they should be removed. This is where I would normally say, "And hopefully replaced." I actually did begin to type those words, and then remembered... This leads us to point number 2.


Misrepresentative statements are made throughout the article:
"I will cite two instances, and just take the first two sentences of the article; and refute both. First, the article is entitled "Single system image." I don't think we will disagree on that. The lead line however, has as its object the term, "single system image (SSI) cluster." I can certainly see that wiki markup has been cleverly used to embolden only the SSI portion of the sentence's object. In the spirit affability, I will concede the point that your lead sentence is, in itself both syntactically and semantically correct. The misrepresentation occurs in the correlation of SSI with a SSI cluster; and this is exactly what has happened here, somewhere between the title and the end of the first sentence.

The only possible argument would be that SSI is the object of the sentence, and it is directly followed by the stand-alone noun, "cluster." I will be more than happy to debate the stochastic opportunities in this instance. How about, we entitle an article "Semi-metallic paint;" and the lead line could have a lead line such as, "A 'Semi-metallic paint' brush has bristles impregnated with inorganic metal ions in order to..." The paint is on the brush. The brush spreads the paint. It’s not about the relationship, or the activities, or anything else; it is about the disparity between the article’s title and what the lead sentence indicates the subject of the article is. It is misleading, confusing, and just plain inaccurate. Is more required?


Secondly, The next sentence: "The concept is often considered synonymous with that of a distributed operating system,[3][4][5]..." A "single system image" is not a system! A "single system image" is not a network of, a collection of, or a cluster of computers. Single system image is an idea. It is a design principle; the implementation of which, into a system of computers brings about a perception from the user. Don't believe me! Believe your reference, of the self-titled paper.

"A single system image (SSI) is the property of a system that hides the heterogeneous and distributed nature..." —RAJKUMAR BUYYA, Single System Image (SSI), pg. 1

"A [single system image] can be defined as the illusion created by hardware or software, that presents a collection of resources..." —RAJKUMAR BUYYA, Single System Image (SSI), pg. 1

Twice on the first page Buyya describes (and defines) SSI as an illusion, or a property of a system. In other areas of the paper, the term Single system image Cluster is used. The SSI cluster is not an idea. It is a network of, a collection of, AND a cluster of computers. Trouble is, we are talking about two completely different things; a SSI, and a SSI cluster.

Either change the title or change the wording; but don't get confused and think I am trying to impugn character that must be defended, I am not. Just be careful, and maybe not too abstract. Factual righteous truth carries the overwhelming load of that "crappy writing" out there. On the other hand, one can have the most eloquent and well-written piece ever; and if it is not accurate, its toast. If you want my assistance, just ask; but yes, I will be man-handling that beast I'm wrestling currently. Maybe you would curb this, and come help me???


Distributed operating system redirection:
"I never understood parallel computing, especially when distributed computing was available. All in all, that is where it should redirect at the moment. If I thought we could agree on that, I could stop right here... Is it important to have DOS redirect here? I am perfectly willing to entertain any pertinent rationale for it; and anyway I'm tired and hungry. So I will expect nothing less than your most wise and sagacious efforts towards mutual ground.


The mutual ground:
"1)Change the title, or re-phrase the article.
"2)Redirect DOS to Distributed computing (or back to parallel)
"3)That will take care of over half of your references right there.


Come on, you seem like a really sharp guy; and it is admirable to see your efforts on this article. I'll even go so far as to apologize for my (dis)-passionate diatribe. Between working full-time and school full-time, and family-full time; you think I would be full of time, but no. Everyone sleeps, but not me.

Thank you, for your patience and consideration, in advance. JLSjr (talk) 02:03, 29 April 2010 (UTC)[reply]

Question

Can you explain in 50 words or less the relationship between "single system image" and "distributed operating system", as you see it? Pcap ping 17:18, 29 April 2010 (UTC)[reply]


World-Class Effort

Exactly 50 words, you may never know how difficult this was...

The SSI is the user's perception of the transparency designed into DOS.
The SSI is an extremely important illusion.
Many difficult algorithms are required to impart transparency as a property of a DOS.
Transparency is a difficult design-time implementation. (Developer’s paradigm)
The SSI is an important run-time illusion. (User’s paradigm)
I apologize for the (deleted) remark previously here. It was intended to celebrate having been able to accomplish a statement in a reasonable space. I later realized it could be viewed as goading and boorish. It certainly was not. Regards, JLSjr (talk) 23:17, 4 May 2010 (UTC)[reply]

JLSjr (talk) 04:41, 30 April 2010 (UTC)[reply]

SGI

how about adding SGI to the list?

http://www.sgi.com/company_info/newsroom/press_releases/2004/march/large_scale.html http://www.sgi.com/products/software/starp/

Ericfluger 19:49, 10 March 2007 (UTC)[reply]

I agree. I came to this page expecting references to SGI machines that are generally not considered to be 'clusters' but are called SSI, but instead read all about clusters that appear to be a single machine. IMO, SSI is more than just *pretending* to be a single image...they are a single computer, or are so at a much lower level than clusters. Of course, I'm mostly talking about the old IRIX machines. Perhaps things have changed.

The above URL has gone. Here's a new one : http://www.sgi.com/products/servers/uv/

Davidmaxwaterman (talk) 11:35, 20 August 2012 (UTC)[reply]

Proposed rewrite

Since everyone is clear that this article is a mess I'm starting a proposed rewrite at /Rewrite. In the interests of full disclosure I acknowledge that I'm a OpenSSI developer. If anyone thinks I'm giving undue weight to OpenSSI please note it here HughesJohn (talk) 20:49, 25 September 2008 (UTC)[reply]

From SINGLE SYSTEM IMAGE (SSI):
A single system image (SSI) is the property of a system That hides the heterogeneous and distributed nature of the available resources and presents them to users and applications as a single unified computing resource.
Not very intelligible HughesJohn (talk) 10:27, 26 September 2008 (UTC)[reply]

An exciting new idea, assuming that my proposed categories for SSI support are acceptable how about doing a feature matrix? HughesJohn (talk) 19:28, 26 September 2008 (UTC)[reply]

Ok, feature matrix is now a feature. HughesJohn (talk) 13:09, 2 October 2008 (UTC)[reply]

Be Bold

Well, it's time to be WP:BOLD, so I'm going to put my new article in place. Let's see what fireworks this produces :-) HughesJohn (talk) 13:11, 2 October 2008 (UTC)[reply]

Lead section

Keep in mind: "The lead section, lead, or introduction of a Wikipedia article is the section before the table of contents and first heading. The lead serves both as an introduction to the article below and as a short, independent summary of the important aspects of the article's topic." HughesJohn (talk) 13:08, 2 October 2008 (UTC)[reply]

Other Aspects

I have filled in the OpenVMS entries in the table.

I would like to discuss/suggest/add some additional aspects for SSI clusters:

Locking resources is cluster wide. Locks survive a node being removed from the cluster, regardless of whether that node was the master of the lock, or had some interest in the lock. Note that is some SSI system such as OpenVMS, the DLM is one of (if not the) fastest means of cluster communication.

A single security model

All security mechanisms are cluster wide. A single set of /etc/passwd or SYSUAF and related files are used.

Kernel integrated

The cluster software is directly integrated into the kernel. A standalone node is effectively a cluster of one node. Turning cluster software on or off is a configuration option not a re-install/rebuild. System services are mainly cluster centric. Cluster membership is present before normal file system and user access is possible. This also means before most daemons are possible.

Cluster communication versus IPC

Much if not most of the communication within the cluster is not specifically IPC. This is particularly true if the cluster software is fully integrated with the kernel, in which case most of the traffic is kernel to kernel, not specifically process to process. Thus we might use spinlocks for inter processor intra node coordination and communication, distributed lock manager for inter node kernel coordination and communication, and pipes for inter process communication. Simon L Jackson (talk) 02:52, 11 January 2009 (UTC)[reply]

Shared Roots

"Shared Roots" might not belong with SSI - A shared root cluster is intermediate between a SSI and an "incoherent" conventional linux "bunch of boxes" cluster. More precisely, all SSIs must have real or virtual shared roots, but not all shared roots need be SSIs: A shared root cluster has a unified filesystem between nodes, but not necessarily a unified process/memory space. A shared root without unification of other aspects is a convenient compromise between scalability and ease of management.

For linux clusters, the largest clusters (10000s of nodes+) are "bunch of boxes" clusters, shared roots to date are used up to 1000s of nodes, SSIs up to 100s. But SSIs are the easiest to manage/use, then shared roots, then bunches of boxes.

Additional "Shared Roots" thought from Simon L Jackson (talk) 02:26, 11 January 2009 (UTC)[reply]

  • Historically, the term cluster, as used by DEC from the early 1980s, meant SSI cluster.
  • Shared roots are often not particularly shared. Should we distinguish shared boot from shared root? Both OpenVMS and TruCluster can boot off the shared root regardless of whether it is a directly available disk (eg Y cabled bus, iSCSI or SAN) or via a network boot (sometimes referred to as a "satellite" node).
  • To share a root, a single security model should be considered.
  • To have a fully shared root (or perhaps this should be shared boot) means cluster communications needs to be present very early in the boot process, before the shared root is formally mounted, and therefore before a full IP stack can be running. Thus some SSI clusters either don't or prefer not to IP for communications. If they do, they use a separate simplified IP stack. The SCS protocol used by OpenVMS is specifically designed to provide flexible cluster communication and is significantly more efficient than IPv4.
Followups from HughesJohn (talk) 13:47, 12 January 2009 (UTC) about "shared roots"[reply]
  • Yeah, when I did my rewrite I was of the opinion (influenced by my OpenSSI background I expect) that a cluster was SSI, and to be SSI it had to have the whole kit and caboodle. I came up with the idea of splitting the discussion into a set of features, which were more or less provided by different systems as a way of avoiding flamewars about which were "real" SSI systems (for example I was originally very dubious about the openMosix claims to be SSI). However as time passes I think I stumbled on the right idea - SSI is not an absolute, different systems include different SSI features.
  • Yes, shared boot is different from shared root. For example OpenSSI usually doesn't do shared boot - each node boots from its own local disk, then joins the cluster to find the root. (It can do shared boot with Etherboot or PXE, but that has the disadvantage of serialising the boot process).
  • It seems to me that shared root implies a single security model. Maybe we should discuss this in that section.
  • These days a full IP stack could be in a network card's boot rom (Etherboot/PXE) so that doesn't seem to be much of a problem. As I said above OpenSSI usually boots the full Linux kernel from a local device, so it can handle fairly complex protocols for in cluster communications - Infiniband for example).
HughesJohn (talk) 13:47, 12 January 2009 (UTC)[reply]

Hot node addition/removal?

This is the ability to add or remove nodes at runtime, rather than at cluster start time. I know that OpenSSI can do it, Kerrighed can't (but is working on it), not sure about others. Somewhat important because it affects the purpose of the cluster: Is it designed to be highly available, such that individual machines can fail but the cluster lives? Or does adding additional systems increase performance but decrease reliability, since node failure means cluster failure? Essentially the same kind of difference as (say) RAID-1 and RAID-0, respectively.

On a similar note, there's also the question of whether processes can live independently of their initial node. In Mosix-based systems, one node may be doing the heavy lifting CPU-wise, but all I/O and IPC has to be proxied back to the starting node; if it fails, the process is dead. By contrast, I believe OpenSSI tries to translate as many resources as it can to equivalent resources on the local machine, so only hardware-reliant processes die if the corresponding starter node dies. This feature is obviously only relevant if a system supports hot removal, since if a node dies in (say) Kerrighed, your cluster is dead and the whole issue is moot.

Not sure if either or both of these warrant addition to the features section, some new section, and/or the grid. — Wisq (talk) 21:19, 16 September 2009 (UTC)[reply]