Skip to main content
Hitachi Vantara Knowledge

Introduction to Hitachi Content Platform

Hitachi Content Platform (HCP) is a robust storage system designed to support large, growing repositories of fixed-content data. HCP stores objects that include both data and metadata that describes that data. Objects exist in containers, which are logical partitions of the repository.

About Hitachi Content Platform

Hitachi Content Platform is a combination of hardware and software that provides an object-based data storage environment. An HCP repository stores all types of data, from simple text files to medical images to multigigabyte database images.

HCP provides easy access to the repository for adding, retrieving, and deleting data. HCP uses write-once, read-many (WORM) storage technology and a variety of policies and internal processes to ensure the integrity of the stored data and the efficient use of storage capacity

Object-based storage

HCP stores objects in a repository. Each object permanently associates data HCP receives (for example, a document, an image, or a movie) with information about that data, called metadata.

An object encapsulates:

  • Fixed-content data

    An exact digital reproduction of data as it existed before it was stored in HCP. Once it’s in the repository, this fixed-content data cannot be modified.

  • System metadata

    System-managed properties that describe the fixed-content data (for example, its size and creation date). System metadata includes policies, such as retention and data protection level, that influence how transactions and internal processes affect the object.

  • Custom metadata

    Optional metadata that a user or application provides to further describe the object. Custom metadata is specified as one or more annotations, where each annotation is a discrete unit of information about the object. Annotations are typically specified in XML format.

    You can use custom metadata to create self-describing objects. Users and applications can use this metadata to understand and repurpose object content.

HCP can store multiple versions of an object, thus providing a history of how the data has changed over time. Each version is a separate object, with its own system metadata and, optionally, its own custom metadata and ACL.

HCP supports multipart uploads with the Hitachi API for Amazon S3. With a multipart upload, the data for an object is broken into multiple parts that are written to HCP independently of each other. Even though the data is written in multiple parts, the result of a multipart upload is a single object. An object for which the data is stored in multiple parts is called a multipart object.

NoteMultipart uploads are possible only with the S3 compatible API, but objects created by multipart uploads can be managed and retrieved with the HSwift API.

Containers and accounts

An HCP repository is partitioned into namespaces which are called containers in the context of the HSwift API. A container is a logical grouping of objects such that the objects in one container are not visible in any other container. Containers are also called namespaces.

Containers provide a mechanism for separating the data stored for different applications, business units, or customers. For example, you could have one container for receivable and another for payable items.

Containers also enable operations to work against selected subsets of objects. For example, you could perform a query that targets the receivable and payable items containers but not the employees containers.

Containers are managed by administrative entities called tenants in HCP. Tenants, in the context of HSwift, are referred to as accounts and typically correspond to organizations, such as companies or divisions or departments within a company.

This book uses the terms account and container when discussing the HSwift API and it uses the terms tenant and namespace when discussing HCP interfaces in general.

HCP nodes

The core hardware for an HCP system consists of servers that are networked together. These servers are called nodes.

When you access an HCP system, your point of access is an individual node. To identify the system, however, you can use either the domain name of the system or the IP address of an individual node. When you use the domain name, HCP selects the access node for you. This helps ensure an even distribution of the processing load.

Replication

Replication is a process that supports configurations in which selected tenants and namespaces are maintained on two or more HCP systems and the objects in those namespaces are managed across those systems. This cross-system management helps ensure that data is well-protected against the unavailability or catastrophic failure of a system.

A replication topology is a configuration of HCP systems that are related to each other through replication. Typically, the systems in a replication topology are in separate geographic locations and are connected by a high-speed wide area network. This arrangement provides geographically distributed data protection (called geo-protection).

You can read from namespaces on all systems where those namespaces are replicated. The replication topology, which is configured at the system level, determines the systems on which you can write to namespaces.

Replication has several purposes, including:

  • If a system in a replication topology becomes unavailable (for example, due to network issues), another system in the topology can provide continued data availability.
  • If a system in a replication topology suffers irreparable damage, another system in the topology can serve as a source for disaster recovery.
  • If multiple HCP systems are widely separated geographically, each system may be able to provide faster data access for some applications than the other systems can, depending on where the applications are running.
  • If an object cannot be read from one system in a replication topology (for example, because a node is unavailable), HCP can try to read it from another system in the topology. Whether HCP tries to do this depends on the namespace configuration.
  • If a system in a replication topology is unavailable, HTTP requests to that system can be automatically serviced by another system in the topology. Whether HCP tries to do this depends on the namespace configuration.

About the HCP HSwift API

The HCP HSwift API is a RESTful, HTTP-based API that is compatible with OpenStack.

To use the HSwift API to perform the operations listed above, you can write applications that use any standard HTTP client library. HSwift is also compatible with many third-party tools that OpenStack Swift implements.

Other container access methods

HCP allows access to container (namespace) content through several namespace access protocols, the HCP Namespace Browser, the HCP metadata query API, the HCP Search Console, and HCP Data Migrator (HCP-DM).

Namespace access protocols

Along with the HSwift API, HCP supports access to namespace content through these protocols: REST, an S3 compatible API, WebDAV, CIFS, and NFS. With these protocols, you can access namespaces programmatically with applications, interactively with a command-line tool, or through a GUI. You can use these protocols to perform actions such as storing objects in a namespace, viewing and retrieving objects, changing object metadata, and deleting objects.

HCP allows special-purpose access to namespaces through the SMTP protocol. This protocol is used only for storing email.

The namespace access protocols are configured separately for each namespace and are enabled or disabled independently of each other.

When you use the HSwift API to create a namespace (container), both the HSwift API and the HTTP protocol are automatically enabled for that namespace. Additionally, both the HTTP and HTTPS ports are open for both protocols (that is, the namespace can be accessed with or without SSL security).

Tenant administrators can enable and disable access protocols for any namespace. File-system protocols such as CIFS and NFS can be enabled only on a namespace that is not optimized for cloud protocols only. Cloud protocols such as REST, the S3 compatible API, and HSwift can be enabled or disabled at any time regardless of optimization or the protocol used to create the namespace.

TipYou can ask your tenant administrator to close the HTTP port for the namespaces you create, thereby allowing only secure access to those namespaces.

Objects added to a namespace through any protocol, including HSwift, are immediately accessible through any other protocol that’s enabled for the namespace. Default namespaces cannot use the HSwift API.

Namespace browser

The HCP Namespace Browser lets you manage content in and view information about HCP namespaces. With the Namespace Browser, you can:

  • List, view, and retrieve objects, including old versions of objects
  • View custom metadata and ACLs for objects, including old versions of objects
  • Store and delete objects
  • Create empty directories
  • Display namespace information, including:
    • The namespaces that you own or can access
    • Retention classes available for a given namespace
    • Permissions for namespace access
    • Namespace statistics such as the number of objects in a given namespace or the total capacity of the namespace

The Namespace Browser is not available for the default namespace. However, you can use a web browser to view the contents of that namespace.

HCP metadata query API

The HCP metadata query API lets you search HCP for objects that meet specified criteria. The API supports two types of queries:

  • Object-based queries

    Search for objects based on object metadata. This includes both system metadata and the content of custom metadata and ACLs. The query criteria can also include the object location (that is, the namespace and/or directory that contains the object). These queries use a robust query language that lets you combine search criteria in multiple ways.

    Object-based queries search only for objects that currently exist in the repository. For objects with multiple versions, object-based queries return only the current version.

  • Operation-based queries

    Search not only for objects currently in the repository but also for information about objects that have been deleted. For namespaces that support versioning, operation-based queries can return both current and old versions of objects.

    Criteria for operation-based queries can include object status (for example, created or deleted), change time, index setting, and location.

The metadata query API returns object metadata only, not object data. The metadata is returned either in XML format, with each object represented by a separate element, or in JSON format, with each object represented by a separate name/value pair. For queries that return large numbers of objects, you can use paged requests.

HCP Search Console

The HCP Search Console is an easy-to-use web application that lets you search for and manage objects based on specified criteria. For example, you can search for objects that were stored before a certain date or that are larger than a specified size. You can then delete the objects listed in the search results or prevent those objects from being deleted. Similar to the metadata query API, the Search Console returns only object metadata, not object data.

By offering a structured environment for performing searches, the Search Console facilitates e-discovery, namespace analysis, and other activities that require the user to examine the contents of namespaces. From the Search Console, you can:

  • Open objects
  • Perform bulk operations on objects
  • Export search results in standard file formats for use as input to other applications
  • Publish feeds to make search results available to web users

The Search Console works with either of these two search facilities:

  • The HCP metadata query engine

    This facility is integrated with HCP and works internally to perform searches and return results to the Search Console. The metadata query engine is also used by the metadata query API.

    NoteWhen working with the metadata query engine, the Search Console is called the Metadata Query Engine Console.
  • The Hitachi Data Discovery Suite (HDDS) search facility

    This facility interacts with Data Discovery Suite, which performs searches and returns results to the HCP Search Console. Data Discovery Suite is a separate product from HCP.

The Search Console can use only one search facility at any given time. The search facility is selected at the HCP system level. If no facility is selected, the HCP system does not support use of the Search Console to search namespaces.

Each search facility maintains its own index of objects in each search-enabled namespace and uses this index for fast retrieval of search results. The search facilities automatically update their indexes to account for new and deleted objects and changes to object metadata.

HCP Data Migrator

HCP Data Migrator (HCP-DM) is a high-performance, multithreaded, client-side utility for viewing, copying, and deleting data.

With HCP Data Migrator, you can:

  • Copy objects, files, and directories between the local file system, HCP namespaces, default namespaces, and earlier HCAP archives
  • Delete individual objects, files, and directories and perform bulk delete operations
  • View the content of current and old versions of objects and the content of files
  • Purge all versions of an object
  • Rename files and directories on the local file system
  • View object, file, and directory properties
  • Change system metadata for multiple objects in a single operation
  • Add, replace, or delete custom metadata for objects
  • Add, replace, or delete ACLs for objects
  • Create empty directories

HCP Data Migrator has both a graphical user interface (GUI) and a command-line interface (CLI).

Accounts

In the context of this book, the term HSwift account is synonymous with the terms HCP tenant and Keystone HCP tenant. Each term pertains to its individual interface. In order to use an HSwift account with the HSwift API, the HSwift account must be associated with an HCP tenant with the same name as the HSwift account.

Once an HSwift account is created, you need to authenticate when you store and manage containers and objects. If you want to use Keystone authentication, you need to include a Authentication header and a Keystone authentication token with your command. This token verifies that you are authorized to work with containers and objects on the HSwift account.

If you choose to authenticate with an alternative authentication method, you can use local authentication to access your HSwift account. This method requires that you generate a temporary authentication token with your HCP user account (not your HSwift account) credentials.

NoteYour HSwift account and HCP user account are not the same. The HSwift account is the name of your HCP tenant in the context of the HSwift API. Your HCP user account is a set of credentials that gives an HCP user access to other interfaces.

Data access permissions

Data access permissions allow you to access container content through the various HCP interfaces. You get these permissions either from your HCP user account or from the container configuration.

Data access permissions are container specific. That is, they are granted separately for individual containers.

Each data access permission allows you to perform certain operations. However, not all operations allowed by data access permissions apply to every HCP interface.

Although many of the operations allowed by data access permissions are not supported by the HSwift API, a tenant administrator can give you permission for those operations. You can then perform them through other HCP interfaces that support them.

The data access permissions that you can have for a container are:

  • Browse

    Lets you list container contents.

  • Read

    Lets you:

    • View and retrieve objects in the container, including the system and custom metadata for objects
    • View and retrieve previous versions of objects
    • List annotations for objects
    • Check the existence of objects

    Users with read permission also have browse permission.

  • Read ACL

    Lets you view and retrieve containers and object ACLs.

  • Write

    Lets you:

    • Add objects to the container
    • Modify system metadata (except retention hold) for objects in the container
    • Add or replace custom metadata for objects in the container
  • Write ACL

    Lets you add, replace, and delete container ACLs.

  • Change owner

    Lets you change the container owner and the owners of objects in the container.

  • Delete

    Lets you delete objects, custom metadata, and container ACLs.

  • Privileged

    Lets you:

    • Delete objects that are under retention, provided that you also have delete or purge permission for the container
    • Hold or release objects, provided that you also have write permission for the container
  • Search

    Lets you use the HCP metadata query API and the HCP Search Console to query or search the containers for objects that meet specified criteria. Users with search permission also have read permission.

NoteSome of the features and data access permissions listed here are not available for HSwift.

Examples in this book

This book contains instructions and examples for using HSwift to perform the operations. The examples use a command-line tool called cURL. cURL is freely available open-source software. You can download it from http://curl.haxx.se/download.html

After downloading cURL, you need to configure it to work with HCP.

 

  • Was this article helpful?