Hitachi Content Platform

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Hitachi Content Platform is an intelligent, multipurpose distributed object storage system. It enables IT organizations and cloud service providers to store, protect, preserve and retrieve unstructured content with a single storage platform. It supports multiple levels of service, evolves with technology and scale changes, and eliminates the need for tape-based backups.

Hitachi Content Platform (HCP), previously named Hitachi Content Archive Platform (HCAP), is a distributed object-based storage system designed to support large, growing repositories of fixed-content data. Fixed-content data is information that does not change but must be kept available for future reference (and be easily accessible when needed). A fixed-content storage system is one in which the data cannot be modified.

Hitachi Content Platform stores files and associated metadata (information about the file) as an object. The system imbues these objects with intelligence that enables them to automatically take advantage of advanced storage and data management features to ensure proper placement and distribution of content. With a vast array of data protection and content preservation technologies, the system can eliminate the need for tape-based backups of itself or of edge devices connected to the platform.

HCP readily evolves with changes in the scale, scope, application, search, storage and server technologies over time. Old storage can be replaced with new storage automatically, enabling smooth scaling from terabytes to petabytes, without disruption. With many applications in clouds, content depots, archives and other systems where data must live for decades, centuries or indefinitely, these capabilities are invaluable.

HCP eliminates the need for a siloed approach to storing unstructured content. Massive scale, multiple storage tiers, reliability, cloud capabilities, multi-tenancy and configurable attributes for each tenant allow the platform to support a wide range of applications on a single physical cluster. By dividing the physical cluster into multiple, uniquely configured tenants, administrators create “virtual content platforms” that can be further subdivided into namespaces for further organization of content, policies and access.

The HCP system has many components (server, disk array and distributed file system) that are connected together to form a scalable appliance for object-based storage. HCP runs on an array of independent nodes which are networked together as a cluster. All runtime operations and physical storage, including data and metadata, are distributed among cluster nodes. HCP architecture isolates archived data from the hardware layer. Archived files are represented as objects that combine data and metadata required to support applications. Through multi tenancy architecture, it offers industry-standard interfaces such as HTTP REST (including cURL support), WebDAV, NFS, CIFS, and SMTP to store and retrieve files.

HCP Main Concepts[edit]

Namespaces and Tenants[edit]

An HCP repository is partitioned into namespaces. A namespace is a logical repository as viewed by an application. Each namespace consists of a distinct logical grouping of objects with its own directory structure, such that the objects in one namespace are not visible in any other namespace. Namespaces are owned and managed by tenants. A tenant is a logical repository as viewed by an administrator. A tenant typically represents an actual organization such as a company or a department within a company that uses a portion of a cluster. A tenant can also correspond to an individual person.

A tenant is an administrative entity that provides segregation of management, while namespaces offer segregation of data, by providing a mechanism for separating the data stored for different applications, business units, or customers. For example, there may be one namespace for Accounts Receivable and another for Accounts Payable. Namespaces also allow operations to work against selected subsets of repository objects. For example, one can perform a search that targets the Accounts Receivable and Accounts Payable namespaces but not the employees’ namespace.

HCP provides access to objects in namespaces through a variety of industry-standard protocols, as well as through an integrated search facility. Access to one namespace does not grant a user access to another namespace. To the user of a namespace, the namespace is the repository. Administration of a namespace is done at the owning tenant level. Namespaces are not associated with any pre-allocated storage; they share the same underlying physical storage. A single namespace can host one or more applications, however typically a namespace will host one and only one application.

Figure 2 shows the logical structure of an HCP cluster with respect to its multi-tenancy features.

An HCP system can have multiple HCP tenants, each of which can own multiple namespaces.

Cluster, Tenant, and Namespace Management[edit]

The HCP implementation of segregation of management is illustrated in Figure 2.

An HCP system is configured and monitored by two classes of administrative user accounts:

  • System-level administrative accounts, or Cluster Administrators, are used for configuring system-wide features, monitoring system hardware and software and overall repository usage, and managing system-level users. The Cluster Administrator user interface – System Management Console – provides the functionality needed by the maintainer of the physical HCP system, such as shutting down the cluster, seeing information about nodes, managing policies and services, and creating HCP tenants. Cluster administrators have a view of the cluster as a whole, including all HCP software and hardware that comprise the cluster, and can perform all of the administration for actions that have cluster scope.
  • Tenant-level administrative accounts, or Tenant Administrators, are used for creating namespaces and data accounts, configuring individual tenants and namespaces, monitoring namespace usage at the tenant and namespace levels, and controlling access to the namespaces. The required functionality is provided by the Tenant Administrator user interface, Tenant Management Console, and is intended for use by the maintainer of the virtual HCP system (an individual tenant with a set of namespaces it owns). The tenant-level administration feature facilitates segregation of management, which is essential in the virtualized HCP environment.

In certain situations, such as enterprise deployments, an HCP tenant may grant cluster administrators the ability to manage this tenant, in which case any cluster administrator will also be able to function as a tenant administrator, as shown in Figure 2.

Data Access[edit]

HCP supports multiple common, open, standard protocols in order to allow applications ease of writing, reading, and manipulating their data. See the Open Architecture section for a list of the standard supported protocols

Main Features[edit]

The following is a not a comprehensive list and is intended to list only the basic main features of the HCP system.

Fixed-Content Archiving[edit]

HCP is optimized for fixed-content data archiving. As a content-focused object store, HCP is adept at managing both structured and unstructured data, helping eliminate storage infrastructure silos, and providing a single object view across the distributed, multi-tenant storage environment. The system utilizes Write-Once, Read-Many (WORM) storage technology, and a variety of metadata, protection, retention, and other policies and services to ensure the integrity of data in the repository. The WORM storage means that data, once ingested into the repository, cannot be updated or modified for the life of that data – it is guaranteed to remain unchanged from when it was originally stored. If the versioning feature is enabled within the HCP system, different versions of the data can be stored (and retrieved) in which case each version is WORM. HCP also offers a host of advanced storage and data management features, including automated replication, data compression, deduplication, and multiple storage tiers.

Cloud Storage[edit]

HCP also serves as a massively scalable cloud storage platform. As the “engine” at the core of the HDS cloud architecture, HCP provides robust management capabilities, secure multi-tenancy, REST API access, and a host of features to optimize cloud storage operations for both service providers and subscribers. Separate policies can be configured for each tenant on HCP to govern and automate performance, protection, capacity, and retention, allowing the platform to accommodate a wide range of subscriber use cases and business models on a single physical cluster. To facilitate provider/subscriber transactions, HCP provides chargeback capabilities and tools that allow third party management software vendors to write to the API and easily integrate with the HDS solution for billing, chargeback, and reporting.

Distributed Scale[edit]

Aor object-based storage. HCP stores objects that include both data and metadata that describes the data. It distributes these objects across the storage space but still presents them as files in a standard directory structure. HCP runs on an array of servers, or nodes, which are networked together as a cluster. All runtime operations and physical storage, including data and metadata, are distributed among cluster nodes. Storage nodes store data objects. Objects that are stored on any particular node are available from all other nodes.

Open Architecture[edit]

Hitachi Content Platform has an open architecture that insulates stored data from technology changes, as well as from changes in HCP itself due to product enhancements. This open architecture ensures that users will have access to the data long after it has been added to the repository. HCP acts as both a repository that can store customer data and an on-line portal that enables access to that data by means of several industry-standard interfaces. The HTTP/HTTPS, WebDAV, NFS, and CIFS protocols support various operations including storing data, creating directories, viewing directories and object data and metadata, modifying certain metadata, and deleting objects. These protocols can be used to access the data via a web browser, the HCP client tools, third-party applications, Windows Explorer, or native Windows or Unix tools. HCP allows special-purpose access to the repository through the SMTP protocol that is only used for storing email. For data backup and restore, HCP supports the NDMP protocol.

Summary of industry standard supported data protocols[edit]

  • HTTP/HTTPS
  • WebDAV
  • NFS
  • CIFS
  • SMTP
  • NDMP
  • SNMP

Multi-tenancy[edit]

Multi-tenancy support allows a single physical HCP instance to be partitioned into multiple namespaces, or logical partitions of the HCP cluster that serve as a collection of objects particular to a defined application. Each namespace has a private object store and set of independently configured attributes with respect to other namespaces. Namespaces provide segregation of data, while tenants – groupings of namespaces – provide segregation of management. Each tenant and its set of namespaces constitute a virtual HCP system that can be accessed and managed independently by users and applications. This HCP feature is essential in enterprise, cloud, and service provider environments.

Hitachi Content Platform has the concept of namespaces that reside within tenants. HCP supports all access protocols (HTTP/HTTPS, WebDAV, NFS, CIFS, SMTP, and NDMP) and data access requests can either be authentication or non-authenticated. HCP's unique Fixed Content File System (FCFS) supports enterprise-class provisioning for multiple applications. FCFS, also called HCP-FS (HCP File System), provides full access to repository objects by extending an application's native file system so it can access an object’s data and metadata.

Clients can use an HTTP-based REST interface to access a namespace within HCP. Representational State Transfer (REST) is a style of software architecture commonly used in client-server web applications and services. Using this interface, clients can perform actions such as writing objects to the namespace, viewing and retrieving objects, changing object metadata, writing, updating, and retrieving object custom-metadata (which is user defined metadata), and deleting objects. The namespace can be accessed programmatically with applications, interactively with a command-line tool using the REST API, or through a GUI interface using Namespace Browser web console.

HCP architecture isolates stored data from the hardware layer. Externally, HCP presents each object either as a set of files in a standard directory structure or as a Uniform Resource Locator (URL) accessible by users and applications via HTTP/HTTPS.

Versioning[edit]

Object versioning introduced the capability of a namespace to create, store, and manage multiple versions of objects in the HCP repository, thus providing a history of how the data has changed over time. The object versioning feature facilitates storage and replication of evolving content, taking HCP beyond WORM-only storage and creating new opportunities for HCP in markets such as content depots, workflow applications, and others.

The versioning feature is supported in HCP authenticated namespaces. It is configured at the namespace level, and its full functionality is only accessible through HTTP. Versioning applies to data objects only, not to directories or symbolic links. Each version is a unique object, with its data, metadata, and custom metadata. A new object version is created when the object’s data changes (leaving the original version intact and unchanged). A version can also be a special entity that represents a deleted object. Updates to system metadata or custom metadata are made in place on the current version and do not create new versions. Previous versions of objects that are older than a specified amount of time are automatically deleted, or pruned. It is not possible to delete specific historical versions of an object; however, a user or application with appropriate permissions can purge the object to delete all its versions.

Replication[edit]

Replication, an add-on feature to HCP, is the process that keeps two HCP systems in sync with each other. The replication service copies one or more tenants from one HCP system to another to ensure data availability and enable disaster recovery. The HCP system in which the objects are initially created is called the primary system. The second system is called the replica. Typically, the primary system and the replica are in separate geographic locations and connected by a high-speed wide area network. HCP supports several replication topologies including Many-to-One, Chain, and bi-directional. By default the data sent between the primary and replica systems are encrypted and secured to help ensure data integrity. HCP also ensures that the data transported between the primary and replica systems remains unchanged from its original form.

Automatic Tech Refresh[edit]

The Automatic Tech Refresh feature of HCP allows an HCP system to automatically, seamlessly, and without disruption, migrate all objects from a specified storage device within the HCP system to other available storage in the system. Effectively, the target storage device (where the data is being migrated from) is drained and can then be removed from the system.

The problem this feature solves for a customer is the prevention of them from having to perform what is referred to in the storage world as a "fork-lift upgrade". Moving 10s to 100s of TB off of old, out of service, storage systems and on to new ones usually comes with multi-month service engagements and increased risk of data unavailability or data loss. How a long-term object store can deal with this issue is a common question in archive project RFPs, but one that can seldom be well answered.

The Automatic Tech Refresh feature is HCP's solution to the fork-lift upgrade problem. HCP considers itself a long-term object store that is specifically designed to non-disruptively evolve with storage technology over time.

HCP allows new age storage to be added alongside, and connected to, the same storage nodes that are using older storage systems. With this feature, the older storage can automatically drain itself onto newer storage and declare itself ready to be removed and discarded.

Chargeback Reporting[edit]

The HCP charge back reporting feature allows HCP administrators to monitor their system data write and read usage per individual gateway. See the Open Architecture section section for the different gateways that can be used to write and read data.

MAPI (Management Application Programming Interface )[edit]

The MAPI feature of HCP allows for managing and configuring HCP programmatically without the need to use the Admin UI.

Integrated Search UI[edit]

HCP comes integrated with the ability to perform customizable metadata searches against the ingested data.

Metadata Query API[edit]

The Metadata Query API allows 3rd parties to develop their own query applications to be used against the HCP data set.

Data Compliance Features[edit]

HCP allows the configuration of Namespaces to retain objects for specific time periods without them being able to be deleted. Also, HCP allows applications to specify retention on objects as they are ingested into the system. This feature allows companies and groups to meet strict government regulations that may be imposed on them.

HCP also allows individual objects to be put on HOLD, regardless of retention, so that they cannot be deleted until after whatever issue that caused them to be put on hold is resolved.

Spindown Storage support[edit]

HCP SAIN systems can take advantage of the Power Savings feature available with certain Hitachi storage arrays. This feature enables disks to be spin down when they’re not in use, thereby saving energy and reducing the cost of storage.

Active Directory support[edit]

HCP can be configured to support Windows Active Directory (AD) for user authentication at the system, tenant, and namespace levels. This means that users with AD user accounts can access the HCP System Management Console, Tenant Management Console, Search Console, and namespace content, provided they have the applicable permissions in HCP.

Access Control Lists (ACLs)[edit]

HCP namespaces can be configured to allow users to associate access control lists (ACLs) with objects. An ACL is metadata consisting of a set of grants of permissions to perform various operations on an object. Permissions can be granted to individual users or to groups of users.

Resource monitoring[edit]

HCP provides a user the ability to use the UI in the HCP System Management Console to monitor the use of system resources. The information on this page can help you determine the causes of system issues such as slowed responses to client read and write requests or abnormal conditions reported in the system log. By reviewing trends in resource usage, you can anticipate future needs and plan system growth accordingly.

Email notification of events[edit]

HCP can be configured to send email to specified recipients to notify them about messages added to the system or tenant log, as applicable. A user can configure each recipient to receive notification of only selected messages based on the message importance, severity, and type. Recipients are added to the Bcc list for each email, so they are not visible to one another. The To list remains empty.

HCP allows a user to configure the content of the email that HCP sends. For example, the user could choose to have HCP send the full text, severity, date and time, and node ID for each log message. Or, if the user is concerned about exposing system information in what is by nature an insecure medium, HCP could be configured to format the email to say only that a log message was recorded.

Virtual Network Management[edit]

HCP supports virtual networking - a technology enabling the overlay of multiple logical network configurations onto a single physical network. Virtual networking enables the segregation of network traffic between clients and different HCP tenants, between management and data access functions, and between system-level and tenant-level traffic.

The Virtual Network Management (VNeM) feature allows for the following:

  • Creation of user defined management & data networks (tagged and untagged) at the tenant level.
    • Each with individual domains if desired.
    • Each fully capable of having one or more SSL certificate associations.

Tier to External storage[edit]

HCP supports the use of external storage for storing object content. External storage is storage on devices that are not managed by HCP. Offloading (or tiering) content from HCP storage into external storage can help optimize the use of HCP storage. It also enables users to maximize their investment in under-utilized storage that’s already in place.

HS3 API[edit]

HCP HS3 is a RESTful, HTTP(s)-based API which is compatible with clients written to use the Amazon S3 APIs. The HS3 API allows clients already written against the Amazon S3 API to be directed at an HCP system and continue to work without being changed.

Supported Compatibility[edit]

Using the HS3 API, you can:

  • Create buckets (PUT Bucket)
  • List the buckets you own (GET Service)
  • Check the existence of a bucket (HEAD Bucket)
  • Set ACLs on buckets (PUT Bucket acl)
  • Retrieve ACLs for buckets (GET Bucket acl)
  • Enable or suspend object versioning for buckets you own (PUT Bucket versioning)
  • Check the status of object versioning for buckets you own (GET Bucket versioning)
  • List objects that are in a bucket (GET Bucket)
  • List versions of objects that are in a bucket (GET Bucket Object versions)
  • Delete buckets you own (DELETE Bucket)
  • Store objects in a bucket (PUT Object)
  • Add custom metadata to objects (PUT Object Copy replace)
  • Retrieve custom metadata for objects (HEAD Object)
  • Add ACLs to objects (PUT Object acl)
  • Retrieve ACLs for objects (GET Object acl)
  • Copy objects (PUT Object Copy)
  • Retrieve objects (GET Object)
  • Delete objects (DELETE Object)

Product branding[edit]

By default, the graphical user interfaces HCP provides for tenant and namespace access are branded for the company Hitachi and the product Hitachi Content Platform. This branding is shown by the company logo, product name, and product name abbreviation that appear in various places in the HCP Tenant Management Console, Namespace Browser, and Search Console.

These elements can be edited (using the HCP UI) to display custom company logos and product names.

HCP-VM[edit]

A full production version of HCP may be run in a VMWare environment. In an HCP-VM system, each node runs on a virtual machine, with the virtual storage emulating the internal storage of a RAIN system.

Figure 2[edit]

Basic layout of user interaction with HCP system.png

Hardware and Software Architecture[edit]

From a hardware perspective, each HCAP cluster consists of the following categories of components:

  • Nodes (servers)
  • Internal (RAIN) or SAN-attached storage (SAIN)
  • Networking components (s witches, cabling)
  • Cluster infrastructure (racks, power distribution units)

HCP runs on a redundant array of independent nodes (RAIN) or a SAN-attached array of independent nodes (SAIN) and is a fully symmetric, distributed application that manages archive objects. In addition to using RAID and SAN technologies to provide data integrity and availability, HCP can use software mirroring to store each object’s data and metadata in multiple locations on different nodes. This feature is characterized by Data Protection Level (DPL) parameter which is a number of copies of each object HCP must maintain in the archive to ensure required level of data protection.

Public and Private Network[edit]

An HCP cluster utilizes private back end and public front end networks for connecting the nodes. The back end isolated network is used for vital inter-node communication and coordination. It makes use of dual bonded network adapters and switches and is fully redundant on a switch, cable, and network adapter level. The front end network is used for customer interaction with the cluster and also uses dual bonded network ports on the nodes.

See also[edit]

HCP Product Page[edit]

More information about HCP can be found on the HCP vendor page. This page includes downloadable resources such as white papers, datasheets, analyst reports, videos and demos, and more.

Click on the "HCP Product Page" link in the External links section to see this info and data.

Customer References[edit]

Internals links[edit]

External links[edit]