Jump to content

GPFS

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Maggu (talk | contribs) at 15:39, 29 October 2007 (External links: GPFS official homepage (new URL)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

IBM GPFS
Developer(s)IBM
Stable release
Operating systemAIX / Linux
Typefilesystem
LicenseProprietary
Websitewww.ibm.com

General Parallel File System (GPFS) is a high-performance shared-disk clustered file system developed by IBM. GPFS distinguishes itself from other cluster file systems by providing concurrent high-speed file access to applications executing on multiple nodes of an AIX 5L cluster, a Linux cluster, or a heterogeneous cluster of AIX and Linux nodes. In addition to providing file system storage capabilities, GPFS provides tools for management and administration of the GPFS cluster and allows for shared access to file systems from remote GPFS clusters.

GPFS provides high-performance data access from a single node to many nodes. The largest existing configurations exceed 2,000 nodes. GPFS has been available on AIX® since 1998 and Linux since 2001, and is offered as part of the IBM System Cluster 1350.

History

GPFS began as the Tiger Shark file system, a research project at IBM's Almaden Research Center as early as 1993. The first commercial release of GPFS was in 1998.

GPFS was initially designed to support high throughput multimedia applications. This design turned out to be well suited to scientific computing. Today GPFS is used by many of the top 500 supercomputers listed on the Top 500 Supercomputing Sites web site. Since inception GPFS has been successfully deployed for many commercial applications including: Digital Media, grid analytics and scalable file service.

Architecture

GPFS provides high performance by allowing data to be accessed over multiple computers at once. Most existing file systems are designed for a single server environment, and adding more file servers does not improve performance. GPFS provides higher input/output performance by "striping" blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel. Other feature provided by GPFS include high availability, support for heterogeneous clusters, disaster recovery, security, DMAPI, HSM and ILM.

Information Lifecycle Management (ILM) Tools

GPFS is designed to help achieve data lifecycle management efficiencies through policy-driven automation and tiered storage management. Storage pools, filesets and user-defined policies provide the ability to better match the cost of storage resources to the value of your data.

Storage pools allow for the grouping of disks within a file system. Tiers of storage can be created by grouping disks based on performance, locality or reliability characteristics. For example, one pool could be high performance fibre channel disks and another more economical SATA storage.

A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user defined policy.

There are two types of user defined policies in GPFS: File placement and File management. File placement policies direct file data as files are created to the appropriate storage pool. File placement rules are determined by attributes such as file name, the user name or the fileset. File management policies allow the file's data to be moved or replicated or files deleted. File management policies can be used to move data from one pool to another without changing the file's location in the directory structure. File management policies are determined by file attributes such as last access time, path name or size of the file.

The GPFS policy processing engine is scalable and can be run on many nodes at once. This allows management policies to be applied to a single file system with billions of files and complete in a few hours.