||This article provides insufficient context for those unfamiliar with the subject. (July 2012)|
A cluster manager usually is a backend graphical user interface (GUI) or command-line software that runs on one or all cluster nodes (in some cases it runs on a different server or cluster of management servers.) The cluster manager works together with a cluster management agent. These agents run on each node of the cluster to manage and configure services, a set of services, or to manage and configure the complete cluster server itself (see super computing.) In some cases the cluster manager is mostly used to dispatch work for the cluster (or cloud) to perform. In this last case a subset of the cluster manager can be a remote desktop application that is used not for configuration but just to send work and get back work results from a cluster. In other cases the cluster is more related to availability and load balancing than to computational or specific service clusters.
Free and open source solutions
- Apache Mesos, from Apache Software Foundation
- Keepalived, keepalived.sourceforge.net
- Linux Cluster Manager (LCM), linuxcm.sourceforge.net
- Heartbeat, from Linux-HA
- oneSIS, from onesis.org
- Rocks Cluster Distribution, from www.rocksclusters.org
- SCMS.pro, from www.scms.pro
- Ultra Monkey, from www.ultramonkey.org
- YARN, distributed with Apache Hadoop
- xCAT 
- Bright Cluster Manager, from Bright Computing
- CycleServer HPC manager, from Cycle Computing
- Cluster Server, from Microsoft
- IBM Tivoli System Automation for Multiplatforms, from IBM
- Insight Cluster Management Utility (CMU), from HP
- ClusterWare , from Penguin Computing
- StackIQ Enterprise Cluster Manager , from StackIQ
- Etu Software Appliance, from Etu Solutions
- Adaptive Control of Extreme-scale Stream Processing Systems Proceedings of the 26th IEEE International Conference on Distributed Computing Systems.
- Design, implementation, and evaluation of the linear road benchmark on the stream processing core Proceedings of the 2006 ACM SIGMOD international conference on Management of data.
- Parallel Job Scheduling A Status Report (2004) 10th Workshop on Job Scheduling Strategies for Parallel Processing, New-York, NY, June 2004.
- Condor-G: A Computation Management Agent for Multi-Institutional Grids Springer Journal Cluster Computing Volume 5, Number 3 / July, 2002
- From clusters to the fabric: the job management perspective Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on
- An Overview of the Galaxy Management Framework for Scalable Enterprise Cluster Computing IEEE International Conference on Cluster Computing (Cluster'00), 2000.
- Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment ACM/IEEE Supercomputing 2000: High Performance Networking and Computing.
- DIRAC: a scalable lightweight architecture for high throughput computing Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on
- AgentTeamwork: Coordinating grid-computing jobs with mobile agents Springer Journal Applied Intelligence Volume 25, Number 2 / October, 2006
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center UC Berkeley Tech Report, May, 2010
- The Laundromat Model for Autonomic Cluster Computing Autonomic Computing, 2006. ICAC '06. IEEE International Conference on.
- Distributed Stream Management using Utility-Driven Self-Adaptive Middleware Proceedings of the Second International Conference on Automatic Computing (2005).
- Fault-tolerance in the Borealis distributed stream processing system Proceedings of the 2005 ACM SIGMOD international conference on Management of data.
- A Global-State-Triggered Fault Injector for Distributed System Evaluation IEEE Transactions On Parallel And Distributed Systems / July, 2004
- Job-Site Level Fault Tolerance for Cluster and Grid environments IEEE International Conference on Cluster Computing (Cluster 2005)
- Fault Injection in Distributed Java Applications Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
- Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes ACM SIGOPS Operating Systems Review, October 1995.