Jump to content

gLite

From Wikipedia, the free encyclopedia
(Redirected from Workload management system)
gLite
Developer(s)EGEE
Stable release
3.2 / 23 March 2009
Operating systemScientific Linux 3, 4 ,5
TypeGrid computing
LicenseEGEE Collaboration 2004
Websiteglite.cern.ch

gLite (pronounced "gee-lite") is a middleware computer software project for grid computing used by the CERN LHC experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet. The gLite services were adopted by more than 250 computing centres, and used by more than 15000 researchers in Europe and around the world.

History

[edit]

After prototyping phases in 2004 and 2005, convergence with the LHC Computing Grid (LCG-2) distribution was reached in May 2006, when gLite 3.0 was released, and became the official middle-ware of the Enabling Grids for E-sciencE (EGEE) project which ended in 2010.

Development of the gLite middle-ware was then taken over by the European Middleware Initiative, and is now maintained as part of the EMI software stack.

The distributed computing infrastructure built by EGEE is now supported by the European Grid Infrastructure. It runs the Grid middle-ware produced by the "European Middleware Initiative", many components of which came from the gLite middle-ware.

Middle-ware description

[edit]

Security

[edit]

The gLite user community is grouped into Virtual Organisations (VOs).[1] A user must join a VO that is supported by the infrastructure running gLite to be authenticated and authorized to using grid resources.

The Grid Security Infrastructure (GSI) in WLCG/EGEE enables secure authentication and communication over an open network.[2] GSI is based on public key encryption, X.509 certificates, and the Secure Sockets Layer (SSL) communication protocol, with extensions for single sign-on and delegation.

To authenticate oneself, a user needs to have a digital X.509 certificate issued by a Certification Authority (CA) trusted by the infrastructure running the middle-ware.

The authorization of a user on a specific grid resource can be done in two different ways. The first is simpler, and relies on the grid-mapfile mechanism. The second way relies on the Virtual Organisation Membership Service (VOMS) and the LCAS/LCMAPS mechanism, which allow for a more detailed definition of user privileges.

User interface

[edit]

The access point to the gLite Grid is the User Interface (UI). This can be any machine where users have a personal account and where their user certificate is installed. From a UI, a user can be authenticated and authorized to use the WLCG/EGEE resources, and can access the functionalities offered by the Information, Workload and Data management systems. It provides CLI tools to perform some basic Grid operations:

  • list all the resources suitable to execute a given job;
  • submit jobs for execution;
  • cancel jobs;
  • retrieve the output of finished jobs;
  • show the status of submitted jobs;
  • retrieve the logging and bookkeeping information of jobs;
  • copy, replicate and delete files from the Grid;
  • retrieve the status of different resources from the Information System.

Computing element

[edit]

A Computing Element (CE), in Grid terminology, is some set of computing resources localized at a site (i.e. a cluster, a computing farm). A CE includes a Grid Gate (GG), which acts as a generic interface to the cluster; a Local Resource Management System (LRMS) (sometimes called batch system), and the cluster itself, a collection of Worker Nodes (WNs), the nodes where the jobs are run.

There are two CE implementations in gLite 3.1: the LCG CE, developed by EDG and used in LCG-22, and the gLite CE, developed by EGEE. Sites can choose what to install, and some of them provide both types. The GG is responsible for accepting jobs and dispatching them for execution on the WNs via the LRMS.

In gLite 3.1 supported LRMS types were OpenPBS/PBSPro, Platform LSF, Maui/Torque, BQS and Condor, and Sun Grid Engine.[3]

Storage element

[edit]

A Storage Element (SE) provides uniform access to data storage resources. The Storage Element may control simple disk servers, large disk arrays or tape-based Mass Storage Systems (MSS). Most WLCG/EGEE sites provide at least one SE.

Storage Elements can support different data access protocols and interfaces. Simply speaking, GSIFTP (a GSI-secure FTP) is the protocol for whole-file transfers, while local and remote file access is performed using RFIO or gsidcap.

Most storage resources are managed by a Storage Resource Manager (SRM), a middle-ware service providing capabilities like transparent file migration from disk to tape, file pinning, space reservation, etc. However, different SEs may support different versions of the SRM protocol and the capabilities can vary.

There is a number of SRM implementations in use, with varying capabilities. The Disk Pool Manager (DPM) is used for fairly small SEs with disk-based storage only, while CASTOR is designed to manage large-scale MSS, with front-end disks and back-end tape storage. dCache is targeted at both MSS and large-scale disk array storage systems. Other SRM implementations are in development, and the SRM protocol specification itself is also evolving.

Classic SEs, which do not have an SRM interface, provide a simple disk-based storage model. They are in the process of being phased out.[when?]

Information service

[edit]

The Information Service (IS) provides information about the WLCG/EGEE Grid resources and their status. This information is essential for the operation of the whole Grid, as it is via the IS that resources are discovered. The published information is also used for monitoring and accounting purposes.

Much of the data published to the IS conforms to the GLUE Schema,[4] which defines a common conceptual data model to be used for Grid resource monitoring and discovery.

The Information System that is used in gLite 3.1 inherits its main concepts from the Globus Monitoring and Discovery Service (MDS).[5] However, the GRIS and GIIS in MDS has been replaced by the Berkeley Database Information Index (BDII) which is essentially an OpenLDAP server that is updated by an external process.

Workload management

[edit]

The purpose of the Workload Management System (WMS)[6] is to accept user jobs, to assign them to the most appropriate Computing Element, to record their status and retrieve their output. The Resource Broker (RB) is the machine where the WMS services run.

Jobs to be submitted are described using the Job Description Language (JDL), which specifies, for example, which executable to run and its parameters, files to be moved to and from the Worker Node on which the job is run, input Grid files needed, and any requirements on the CE and the Worker Node.

The choice of CE to which the job is sent is made in a process called match-making, which first selects, among all available CEs, those which fulfill the requirements expressed by the user and which are close to specified input Grid files. It then chooses the CE with the highest rank, a quantity derived from the CE status information which expresses the goodness of a CE (typically a function of the numbers of running and queued jobs).

The RB locates the Grid input files specified in the job description using a service called the Data Location Interface (DLI), which provides a generic interface to a file catalogue. In this way, the Resource Broker can talk to file catalogs other than LFC (provided that they have a DLI interface).

The most recent implementation of the WMS from EGEE allows not only the submission of single jobs, but also collections of jobs (possibly with dependencies between them) in a much more efficient way then the old LCG-2 WMS, and has many other new options.

Finally, the Logging and Bookkeeping service (LB)[7] tracks jobs managed by the WMS. It collects events from many WMS components and records the status and history of the job.

References

[edit]
  1. ^ Foster, Kesselman, Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations Archived 2009-03-10 at the Wayback Machine, Int. J. High Performance Computing Applicat., 2001
  2. ^ The Globus Toolkit 4.0, Overview of the Grid Security Infrastructure Archived 2008-04-20 at the Wayback Machine
  3. ^ CESGA Experience with the Grid Engine batch system
  4. ^ OGF MDS 2.2 Features Archived 2012-12-13 at the Wayback Machine in the Globus Toolkit 2.2 Release
  5. ^ GLUE Working Group (GLUE)
  6. ^ F Pacini, EGEE User's Guide, WMS Service, DATAMAT, 2005
  7. ^ EGEE User's Guide, Service Logging and Bookkeeping (L&B), CESNET, 2005
[edit]