Hitachi Content Platform (HCP) is a robust storage system designed to support large, growing repositories of fixed-content data. HCP stores objects that include both data and metadata that describes that data. Objects exist in buckets, which are logical partitions of the repository.
HCP provides access to the repository through a variety of industry-standard protocols, as well as through various HCP-specific interfaces. One of these interfaces is the Hitachi API for Amazon S3, a RESTful, HTTP-based API that is compatible with Amazon S3.
About Hitachi Content Platform
Hitachi Content Platform is a combination of hardware and software that provides an object-based data storage environment. An HCP repository stores all types of data, from simple text files to medical images to multigigabyte database images.
HCP provides easy access to the repository for adding, retrieving, and deleting data. HCP uses write-once, read-many (WORM) storage technology and a variety of policies and internal processes to ensure the integrity and availability of the stored data.
HCP stores objects in a repository. Each object permanently associates data HCP receives (for example, a document, an image, or a movie) with information about that data, called metadata.
An object encapsulates:
An exact digital reproduction of data as it existed before it was stored in HCP. Once it’s in the repository, this fixed-content data cannot be modified.
System-managed properties that describe the fixed-content data (for example, its size and creation date). System metadata includes policies, such as retention, that influence how transactions and internal processes affect the object.
Optional metadata that a user or application provides to further describe the object. Custom metadata is specified as one or more annotations, where each annotation is a discrete unit of information about the object.
You can use custom metadata to create self-describing objects. Users and applications can use this metadata to understand and repurpose object content.
Access control list (ACL)
Optional metadata consisting of a set of grants of permissions to perform various operations on the object. Permissions can be granted to individual users or to groups of users.
Like custom metadata, ACLs are provided by users or applications.
HCP can store multiple versions of an object, thus providing a history of how the data has changed over time. Each version is a separate object, with its own system metadata and, optionally, its own custom metadata and ACL.
HCP supports multipart uploads with the Hitachi API for Amazon S3. With a multipart upload, the data for an object is broken into multiple parts that are written to HCP independently of each other. Even though the data is written in multiple parts, the result of a multipart upload is a single object. An object for which the data is stored in multiple parts is called a multipart object.
HCP supports uploads using HTML forms in POST requests. POST object uploads can reduce latency. Because an object is uploaded in a single operation, an HTTP success response indicates that the entire object has been stored.
An HCP repository is partitioned into buckets. A bucket is a logical grouping of objects such that the objects in one bucket are not visible in any other bucket. Buckets are also called namespaces.
Buckets provide a mechanism for separating the data stored for different applications, business units, or customers. For example, you could have one bucket for accounts receivable and another for accounts payable.
Buckets also enable operations to work against selected subsets of objects. For example, you could perform a query that targets the accounts receivable and accounts payable buckets but not the employees bucket.
Buckets are owned and managed by administrative entities called tenants. A tenant typically corresponds to an organization, such as a company or a division or department within a company.
In addition to being owned by a tenant, each bucket can have an owner that corresponds to an individual HCP user. The owner of a bucket automatically has permission to perform certain operations on that bucket.
The core hardware for an HCP system consists of servers that are networked together. These servers are called nodes.
When you access an HCP system, your point of access is an individual node. To identify the system, however, you can use either the domain name of the system or the IP address of an individual node. When you use the domain name, HCP selects the access node for you. This helps ensure an even distribution of the processing load.
Replication is a process that supports configurations in which selected tenants and buckets are maintained on two or more HCP systems and the objects in those buckets are managed across those systems. This cross-system management helps ensure that data is well-protected against the unavailability or catastrophic failure of a system.
A replication topology is a configuration of HCP systems that are related to each other through replication. Typically, the systems in a replication topology are in separate geographic locations and are connected by a high-speed wide area network. This arrangement provides geographically distributed data protection (called geo-protection).
You can read from buckets on all systems where those buckets are replicated. The replication topology, which is configured at the system level, determines the systems on which you can write to buckets.
Replication has several purposes, including:
- If a system in a replication topology becomes unavailable (for example, due to network issues), another system in the topology can provide continued data availability.
- If a system in a replication topology suffers irreparable damage, another system in the topology can serve as a source for disaster recovery.
- If multiple HCP systems are widely separated geographically, each system may be able to provide faster data access for some applications than the other systems can, depending on where the applications are running.
- If an object cannot be read from one system in a replication topology (for example, because a node is unavailable), HCP can try to read it from another system in the topology. Whether HCP tries to do this depends on the bucket configuration.
- If a system in a replication topology is unavailable, HTTP requests to that system can be automatically serviced by another system in the topology. Whether HCP tries to do this depends on the bucket configuration.
About the Hitachi API for Amazon S3
The Hitachi API for Amazon S3 is a RESTful, HTTP-based API that is compatible with Amazon S3.
To use the S3 compatible API to perform the operations listed above, you can write applications that use any standard HTTP client library. The S3 compatible API is also compatible with many third-party tools that support Amazon S3.
Other bucket access methods
HCP allows access to bucket (namespace) content through several namespace access protocols, HCP Namespace Browser, HCP metadata query API, Search Console, and HCP Data Migrator.
Namespace access protocols
Along with the S3 compatible API, HCP supports access to namespace content through these industry-standard protocols: a RESTful, HTTP-based API called REST, WebDAV, CIFS, and NFS. HCP also supports access to namespace content through an OpenStack® Swift-compatible API called HSwift.
Using the supported protocols, you can access namespaces programmatically with applications, interactively with a command-line tool, or through a GUI. You can use these protocols to perform actions such as storing objects in a namespace, viewing and retrieving objects, changing object metadata, and deleting objects.
HCP allows special-purpose access to namespaces through the SMTP protocol. This protocol is used only for storing email.
The namespace access protocols are configured separately for each namespace and are enabled or disabled independently of each other.
When you use the S3 compatible API to create a namespace (bucket), both the S3 compatible API and the REST API are automatically enabled for that namespace. Additionally, both the HTTP and HTTPS ports are open for both protocols (that is, the namespace can be accessed with or without SSL security).
Tenant administrators can enable and disable namespace access protocols for any namespace. This includes enabling the S3 compatible API for namespaces created through other HCP interfaces and disabling the S3 compatible API for namespaces created using the S3 compatible API.
Objects added to a namespace through any protocol, including the S3 compatible API, are immediately accessible through any other protocol that’s enabled for the namespace.
HCP Namespace Browser
The Namespace Browser lets you manage content in and view information about HCP namespaces. With the Namespace Browser, you can:
- Store objects
- List, view, retrieve, and delete objects, including old versions of objects
- View custom metadata and ACLs for objects, including old versions of objects
- Create empty directories
- Display namespace information
HCP metadata query API
The HCP metadata query API lets you search HCP for objects that meet specified criteria. The API supports two types of queries:
Search for objects based on object metadata. This includes both system metadata and the content of custom metadata and ACLs. The query criteria can also include the object location (that is, the namespace and/or directory that contains the object). These queries use a robust query language that lets you combine search criteria in multiple ways.
Object-based queries search only for objects that currently exist in the repository. For objects with multiple versions, object-based queries return only the current version.
Search not only for objects currently in the repository but also for information about objects that have been deleted. For namespaces that support versioning, operation-based queries can return both current and old versions of objects.
Criteria for operation-based queries can include object status (for example, created or deleted), change time, index setting, and location.
The metadata query API returns object metadata only, not object data. The metadata is returned either in XML format, with each object represented by a separate element, or in JSON format, with each object represented by a separate name/value pair. For queries that return large numbers of objects, you can use paged requests.
HCP Search Console
The HCP Search Console is an easy-to-use web application that lets you search for and manage objects based on specified criteria. For example, you can search for objects that were stored before a certain date or that are larger than a specified size. You can then delete the objects listed in the search results or prevent those objects from being deleted. Similar to the metadata query API, the Search Console returns only object metadata, not object data.
By offering a structured environment for performing searches, the Search Console facilitates e-discovery, namespace analysis, and other activities that require the user to examine the contents of namespaces. From the Search Console, you can:
- Open objects
- Perform bulk operations on objects
- Export search results in standard file formats for use as input to other applications
- Publish feeds to make search results available to web users
The Search Console works with either of these two search facilities:
The HCP metadata query engine
This facility is integrated with HCP and works internally to perform searches and return results to the Search Console. The metadata query engine is also used by the metadata query API.NoteWhen working with the metadata query engine, the Search Console is called the Metadata Query Engine Console.
The Hitachi Data Discovery Suite (HDDS) search facility
This facility interacts with Data Discovery Suite, which performs searches and returns results to the HCP Search Console. Data Discovery Suite is a separate product from HCP.
The Search Console can use only one search facility at any given time. The search facility is selected at the HCP system level. If no facility is selected, the HCP system does not support use of the Search Console to search namespaces.
Each search facility maintains its own index of objects in each search-enabled namespace and uses this index for fast retrieval of search results. The search facilities automatically update their indexes to account for new and deleted objects and changes to object metadata.
HCP Data Migrator
HCP Data Migrator (HCP-DM) is a high-performance, multithreaded, client-side utility for viewing, copying, and deleting data.
With HCP Data Migrator, you can:
- Copy objects, files, and directories between the local file system, HCP namespaces, default namespaces, and earlier HCAP archives
- Delete individual objects, files, and directories and perform bulk delete operations
- View the content of current and old versions of objects and the content of files
- Purge all versions of an object
- Rename files and directories on the local file system
- View object, file, and directory properties
- Change system metadata for multiple objects in a single operation
- Add, replace, or delete custom metadata for objects
- Add, replace, or delete ACLs for objects
- Create empty directories
HCP Data Migrator has both a graphical user interface (GUI) and a command-line interface (CLI).
To use the S3 compatible API to create and manage buckets, you need a user account that’s configured to allow you to take those actions. To work with objects in a bucket, you may or may not need a user account. This depends on how the S3 compatible API is configured for the bucket.
By default, when you create a bucket, both the S3 compatible API and the REST API are configured to require users to have user accounts in order to work with objects in that bucket. You cannot use the the S3 compatible API to change this configuration. However, tenant administrators can change this configuration for the buckets you create.
A user account can be either an account created by a tenant administrator in HCP or, if the tenant is configured to support Active Directory® (AD) authentication, an AD user account that HCP recognizes. (With an AD user account, you cannot create buckets.)
When you use the S3 compatible API with a user account, you provide credentials that are based on the username and password for your account. HCP checks these credentials to ensure that they are valid. The process of checking credentials is called user authentication. If the credentials you supply are valid, you are an authenticated user.
When you use the S3 compatible API without a user account, you are an anonymous user.
Data access permissions
Data access permissions allow you to access bucket content through the various HCP interfaces. You get these permissions either from your user account or from the bucket configuration.
Data access permissions are granted separately for individual buckets. Each data access permission allows you to perform certain operations. However, not all operations allowed by data access permissions apply to every HCP interface. For example, you can view and retrieve ACLs through the REST API and the S3 compatible API but not through any other namespace access protocol.
Although many of the operations allowed by data access permissions are not supported by the S3 compatible API, a tenant administrator can give you permission for those operations. You can then perform the operations through other HCP interfaces that support them.
The data access permissions that you can have for a bucket are:
Lets you list bucket contents.
- View and retrieve objects in the bucket, including the system and custom metadata for objects
- View and retrieve previous versions of objects
- List annotations for objects
- Check the existence of objects
Users with read permission also have browse permission.
Lets you view and retrieve bucket and object ACLs.
- Add objects to the bucket
- Modify system metadata (except retention hold) for objects in the bucket
- Add or replace custom metadata for objects in the bucket
Lets you add, replace, and delete bucket and object ACLs.
Lets you change the bucket owner and the owners of objects in the bucket.
Lets you delete objects, custom metadata, and bucket and object ACLs.
Lets you delete all versions of an object with a single operation. Users with purge permission also have delete permission.
NoteAll holds (a single hold and all labeled holds) must be released on the object before it can be deleted, regardless of the retention setting.
- Delete or purge objects that are under retention, provided that you also have delete or purge permission for the bucket
- Hold or release objects, provided that you also have write permission for the bucket
Lets you use the HCP metadata query API and the HCP Search Console to query or search the bucket for objects that meet specified criteria. Users with search permission also have read permission.
If you have any data access permissions for a bucket, you can view information about that bucket through the HTTP protocol and Namespace Browser.
Examples in this help
This help contains instructions and examples for using the S3 compatible API to perform the operations listed in About the Hitachi API for Amazon S3. The examples use a command-line tool called s3curl. s3curl is freely available open-source software.
After downloading s3curl, you need to configure it to work with HCP.
The examples in this section of the help are based on a bucket named finance in which these objects are stored:
(four versions stored and one deleted)
(two versions stored)
The finance bucket also contains in-progress multipart uploads for these objects: