Introduction to Hitachi Content Platform
Hitachi Content Platform (HCP) is a robust storage system designed to support large, growing repositories of fixed-content data. HCP stores objects that include both data and metadata that describes that data. Objects exist in containers, which are logical partitions of the repository.
About Hitachi Content Platform
Hitachi Content Platform is a combination of hardware and software that provides an object-based data storage environment. An HCP repository stores all types of data, from simple text files to medical images to multigigabyte database images.
HCP provides easy access to the repository for adding, retrieving, and deleting data. HCP uses write-once, read-many (WORM) storage technology and a variety of policies and internal processes to ensure the integrity of the stored data and the efficient use of storage capacity
Object-based storage
HCP stores objects in a repository. Each object permanently associates data HCP receives (for example, a document, an image, or a movie) with information about that data, called metadata.
An object encapsulates:
Fixed-content data
An exact digital reproduction of data as it existed before it was stored in HCP. Once it’s in the repository, this fixed-content data cannot be modified.
System metadata
System-managed properties that describe the fixed-content data (for example, its size and creation date). System metadata includes policies, such as retention and data protection level, that influence how transactions and internal processes affect the object.
Custom metadata
Optional metadata that a user or application provides to further describe the object. Custom metadata is specified as one or more annotations, where each annotation is a discrete unit of information about the object. Annotations are typically specified in XML format.
You can use custom metadata to create self-describing objects. Users and applications can use this metadata to understand and repurpose object content.
HCP can store multiple versions of an object, thus providing a history of how the data has changed over time. Each version is a separate object, with its own system metadata and, optionally, its own custom metadata and ACL.
HCP supports multipart uploads with the Hitachi API for Amazon S3. With a multipart upload, the data for an object is broken into multiple parts that are written to HCP independently of each other. Even though the data is written in multiple parts, the result of a multipart upload is a single object. An object for which the data is stored in multiple parts is called a multipart object.
Containers and accounts
An HCP repository is partitioned into namespaces which are called containers in the context of the HSwift API. A container is a logical grouping of objects such that the objects in one container are not visible in any other container. Containers are also called namespaces.
Containers provide a mechanism for separating the data stored for different applications, business units, or customers. For example, you could have one container for receivable and another for payable items.
Containers also enable operations to work against selected subsets of objects. For example, you could perform a query that targets the receivable and payable items containers but not the employees containers.
Containers are managed by administrative entities called tenants in HCP. Tenants, in the context of HSwift, are referred to as accounts and typically correspond to organizations, such as companies or divisions or departments within a company.
This book uses the terms account and container when discussing the HSwift API and it uses the terms tenant and namespace when discussing HCP interfaces in general.
HCP nodes
The core hardware for an HCP system consists of servers that are networked together. These servers are called nodes.
When you access an HCP system, your point of access is an individual node. To identify the system, however, you can use either the domain name of the system or the IP address of an individual node. When you use the domain name, HCP selects the access node for you. This helps ensure an even distribution of the processing load.
Replication
Replication is a process that supports configurations in which selected tenants and namespaces are maintained on two or more HCP systems and the objects in those namespaces are managed across those systems. This cross-system management helps ensure that data is well-protected against the unavailability or catastrophic failure of a system.
A replication topology is a configuration of HCP systems that are related to each other through replication. Typically, the systems in a replication topology are in separate geographic locations and are connected by a high-speed wide area network. This arrangement provides geographically distributed data protection (called geo-protection).
You can read from namespaces on all systems where those namespaces are replicated. The replication topology, which is configured at the system level, determines the systems on which you can write to namespaces.
Replication has several purposes, including:
- If a system in a replication topology becomes unavailable (for example, due to network issues), another system in the topology can provide continued data availability.
- If a system in a replication topology suffers irreparable damage, another system in the topology can serve as a source for disaster recovery.
- If multiple HCP systems are widely separated geographically, each system may be able to provide faster data access for some applications than the other systems can, depending on where the applications are running.
- If an object cannot be read from one system in a replication topology (for example, because a node is unavailable), HCP can try to read it from another system in the topology. Whether HCP tries to do this depends on the namespace configuration.
- If a system in a replication topology is unavailable, HTTP requests to that system can be automatically serviced by another system in the topology. Whether HCP tries to do this depends on the namespace configuration.
About the HCP HSwift API
The HCP HSwift API is a RESTful, HTTP-based API that is compatible with OpenStack.
To use the HSwift API to perform the operations listed above, you can write applications that use any standard HTTP client library. HSwift is also compatible with many third-party tools that OpenStack Swift implements.
Other container access methods
HCP allows access to container (namespace) content through several namespace access protocols, the HCP Namespace Browser, the HCP metadata query API, the HCP Search Console, and HCP Data Migrator (HCP-DM).
Namespace access protocols
Along with the HSwift API, HCP supports access to namespace content through these protocols: REST, an S3 compatible API, WebDAV, CIFS, and NFS. With these protocols, you can access namespaces programmatically with applications, interactively with a command-line tool, or through a GUI. You can use these protocols to perform actions such as storing objects in a namespace, viewing and retrieving objects, changing object metadata, and deleting objects.
HCP allows special-purpose access to namespaces through the SMTP protocol. This protocol is used only for storing email.
The namespace access protocols are configured separately for each namespace and are enabled or disabled independently of each other.
When you use the HSwift API to create a namespace (container), both the HSwift API and the HTTP protocol are automatically enabled for that namespace. Additionally, both the HTTP and HTTPS ports are open for both protocols (that is, the namespace can be accessed with or without SSL security).
Tenant administrators can enable and disable access protocols for any namespace. File-system protocols such as CIFS and NFS can be enabled only on a namespace that is not optimized for cloud protocols only. Cloud protocols such as REST, the S3 compatible API, and HSwift can be enabled or disabled at any time regardless of optimization or the protocol used to create the namespace.
Objects added to a namespace through any protocol, including HSwift, are immediately accessible through any other protocol that’s enabled for the namespace. Default namespaces cannot use the HSwift API.
Namespace browser
The HCP Namespace Browser lets you manage content in and view information about HCP namespaces. With the Namespace Browser, you can:
- List, view, and retrieve objects, including old versions of objects
- View custom metadata and ACLs for objects, including old versions of objects
- Store and delete objects
- Create empty directories
- Display namespace information, including:
- The namespaces that you own or can access
- Retention classes available for a given namespace
- Permissions for namespace access
- Namespace statistics such as the number of objects in a given namespace or the total capacity of the namespace
The Namespace Browser is not available for the default namespace. However, you can use a web browser to view the contents of that namespace.
HCP metadata query API
The HCP metadata query API lets you search HCP for objects that meet specified criteria. The API supports two types of queries:
Object-based queries
Search for objects based on object metadata. This includes both system metadata and the content of custom metadata and ACLs. The query criteria can also include the object location (that is, the namespace and/or directory that contains the object). These queries use a robust query language that lets you combine search criteria in multiple ways.
Object-based queries search only for objects that currently exist in the repository. For objects with multiple versions, object-based queries return only the current version.
Operation-based queries
Search not only for objects currently in the repository but also for information about objects that have been deleted. For namespaces that support versioning, operation-based queries can return both current and old versions of objects.
Criteria for operation-based queries can include object status (for example, created or deleted), change time, index setting, and location.
The metadata query API returns object metadata only, not object data. The metadata is returned either in XML format, with each object represented by a separate element, or in JSON format, with each object represented by a separate name/value pair. For queries that return large numbers of objects, you can use paged requests.
HCP Search Console
The HCP Search Console is an easy-to-use web application that lets you search for and manage objects based on specified criteria. For example, you can search for objects that were stored before a certain date or that are larger than a specified size. You can then delete the objects listed in the search results or prevent those objects from being deleted. Similar to the metadata query API, the Search Console returns only object metadata, not object data.
By offering a structured environment for performing searches, the Search Console facilitates e-discovery, namespace analysis, and other activities that require the user to examine the contents of namespaces. From the Search Console, you can:
- Open objects
- Perform bulk operations on objects
- Export search results in standard file formats for use as input to other applications
- Publish feeds to make search results available to web users
The Search Console works with either of these two search facilities:
The HCP metadata query engine
This facility is integrated with HCP and works internally to perform searches and return results to the Search Console. The metadata query engine is also used by the metadata query API.
NoteWhen working with the metadata query engine, the Search Console is called the Metadata Query Engine Console.The Hitachi Data Discovery Suite (HDDS) search facility
This facility interacts with Data Discovery Suite, which performs searches and returns results to the HCP Search Console. Data Discovery Suite is a separate product from HCP.
The Search Console can use only one search facility at any given time. The search facility is selected at the HCP system level. If no facility is selected, the HCP system does not support use of the Search Console to search namespaces.
Each search facility maintains its own index of objects in each search-enabled namespace and uses this index for fast retrieval of search results. The search facilities automatically update their indexes to account for new and deleted objects and changes to object metadata.
HCP Data Migrator
HCP Data Migrator (HCP-DM) is a high-performance, multithreaded, client-side utility for viewing, copying, and deleting data.
With HCP Data Migrator, you can:
- Copy objects, files, and directories between the local file system, HCP namespaces, default namespaces, and earlier HCAP archives
- Delete individual objects, files, and directories and perform bulk delete operations
- View the content of current and old versions of objects and the content of files
- Purge all versions of an object
- Rename files and directories on the local file system
- View object, file, and directory properties
- Change system metadata for multiple objects in a single operation
- Add, replace, or delete custom metadata for objects
- Add, replace, or delete ACLs for objects
- Create empty directories
HCP Data Migrator has both a graphical user interface (GUI) and a command-line interface (CLI).
Accounts
In the context of this book, the term HSwift account is synonymous with the terms HCP tenant and Keystone HCP tenant. Each term pertains to its individual interface. In order to use an HSwift account with the HSwift API, the HSwift account must be associated with an HCP tenant with the same name as the HSwift account.
Once an HSwift account is created, you need to authenticate when you store and manage containers and objects. If you want to use Keystone authentication, you need to include a Authentication header and a Keystone authentication token with your command. This token verifies that you are authorized to work with containers and objects on the HSwift account.
If you choose to authenticate with an alternative authentication method, you can use local authentication to access your HSwift account. This method requires that you generate a temporary authentication token with your HCP user account (not your HSwift account) credentials.
Data access permissions
Data access permissions allow you to access container content through the various HCP interfaces. You get these permissions either from your HCP user account or from the container configuration.
Data access permissions are container specific. That is, they are granted separately for individual containers.
Each data access permission allows you to perform certain operations. However, not all operations allowed by data access permissions apply to every HCP interface.
Although many of the operations allowed by data access permissions are not supported by the HSwift API, a tenant administrator can give you permission for those operations. You can then perform them through other HCP interfaces that support them.
The data access permissions that you can have for a container are:
Browse
Lets you list container contents.
Read
Lets you:
- View and retrieve objects in the container, including the system and custom metadata for objects
- View and retrieve previous versions of objects
- List annotations for objects
- Check the existence of objects
Users with read permission also have browse permission.
Read ACL
Lets you view and retrieve containers and object ACLs.
Write
Lets you:
- Add objects to the container
- Modify system metadata (except retention hold) for objects in the container
- Add or replace custom metadata for objects in the container
Write ACL
Lets you add, replace, and delete container ACLs.
Change owner
Lets you change the container owner and the owners of objects in the container.
Delete
Lets you delete objects, custom metadata, and container ACLs.
Privileged
Lets you:
- Delete objects that are under retention, provided that you also have delete or purge permission for the container
- Hold or release objects, provided that you also have write permission for the container
Search
Lets you use the HCP metadata query API and the HCP Search Console to query or search the containers for objects that meet specified criteria. Users with search permission also have read permission.
Examples in this book
This book contains instructions and examples for using HSwift to perform the operations. The examples use a command-line tool called cURL. cURL is freely available open-source software. You can download it from http://curl.haxx.se/download.html
After downloading cURL, you need to configure it to work with HCP.