Hitachi Content Platform (HCP) is a distributed storage system designed to support large amounts of data. HCP provides access to the stored data through a variety of industry-standard protocols, as well as through an integrated Search Console.
The Search Console enables you to search for objects stored in HCP using either of two search facilities: the metadata query engine or the Data Discovery Suite search facility.
This chapter provides an introduction to HCP and searching namespaces, including how to use the Search Console, the types of queries you can construct, and what search results look like.
About Hitachi Content Platform
Hitachi Content Platform is the distributed, fixed-content, data storage system from Hitachi Vantara. HCP provides a cost-effective, scalable, easy-to-use repository that can accommodate all types of data, from simple text files to medical image files to multigigabyte database images.
A fixed-content storage system is one in which the data cannot be modified. HCP uses write-once, read-many (WORM) storage technology, and a variety of policies and internal processes to ensure the integrity of the stored data.
Object based storage
HCP stores objects in a repository. Each object permanently associates data HCP receives with information about that data; that is, each object encapsulates both object data and metadata.
HCP distributes objects across its storage space but still presents them as files in a standard directory structure.
Namespaces and tenants
An HCP repository is partitioned into namespaces. A namespace is a logical grouping of objects such that the objects in one namespace are not visible in any other namespace.
Namespaces provide a mechanism for separating the data stored for different applications, business units, or customers. For example, you could have one namespace for accounts receivable and another for accounts payable.
Namespaces also enable operations to work against selected subsets of repository objects. For example, you could perform a query that targets the accounts receivable and accounts payable namespaces but not the employees namespace.
Namespaces are owned and managed by administrative entities called tenants. A tenant typically corresponds to an organization such as a company or a division or department within a company.
HCP automatically generates metadata for each object. Some of this metadata is specific to HCP. Examples of this type of metadata are the retention setting, object creation date, and cryptographic hash value.
Objects also have POSIX metadata. POSIX is a set of standards that defines an application programming interface (API) for software designed to run under heterogeneous operating systems. These standards include specific types of metadata, such as permissions and ownership.
Users and applications can override the defaults for some HCP-specific and POSIX metadata when they add an object to a namespace. They can also change certain metadata values for existing objects.
Users can create their own custom metadata to associate additional descriptive information with an object. Custom metadata is specified as annotations, where each annotation is a discrete unit of information about the object.
Custom metadata enables the creation of self-describing objects. Future users and applications can use this metadata to understand and repurpose object content.
When added to a namespace, custom metadata becomes part of the target object. Custom metadata is typically but not necessarily formatted as XML.
Users can also associate access control lists (ACLs) with objects. An ACL grants permissions for an individual object to specified users or groups of users.
When added to a namespace, an ACL becomes part of the target object. When viewed or specified in the Console, ACLs are formatted as XML.
ACLs are enabled on a per-namespace basis. In namespaces where ACLs are enabled, the namespace can be configured to either enforce or ignore the permissions granted by ACLs.
Each object has a retention setting that specifies how long the object must remain in its namespace before it can be deleted; this duration is called the retention period. While an object cannot be deleted due to its retention setting, it is said to be under retention.
The retention setting for an object can be:
A specific date and time
This is the time before which the object cannot be deleted. If this is a date in the past, this setting is displayed as Expired in the Search Console.
The object can be deleted at any time. This value is displayed as Expired in the Search Console.
The object can never be deleted.
The object does not yet have a specific retention setting and cannot be deleted until it has a setting that allows deletion.
A retention class
This is a named retention setting. It can be a duration (such as seven years) or one of the special values listed above.
Retention classes are namespace specific. That is, an object in one namespace cannot be assigned a retention class that’s defined in a different namespace.
Retention mode is a property of a namespace that affects which operations are allowed on objects under retention. A namespace can be in either of two retention modes:
- In compliance mode, objects that are under retention cannot be deleted through any mechanism. Additionally, the duration of a retention class cannot be shortened, and retention classes cannot be deleted.
- In enterprise mode, users and applications can delete objects under retention if they have specific permission to do so. This is called privileged delete.
Also in enterprise mode, the duration of a retention class can be shortened, and retention classes can be deleted.
About searching namespaces
HCP lets you search namespaces for objects that meet specified criteria. This capability supports search and discovery to satisfy government requirements and provides support for audits and litigation. You can use the results of a search to analyze namespace contents and manipulate groups of objects.
HCP provides an interactive interface for searching namespaces. This interface, called the Search Console, is a web application that offers a structured environment for creating and executing queries. You can also use the Search Console to perform these operations on groups of objects: hold, release, delete, purge, privileged delete, privileged purge, change owner, and set ACLs.
A query is a request you submit that contains a collection of criteria that each object in the search results must satisfy. The response to a query is metadata about the objects that meet the query criteria. You can use this metadata to retrieve objects of interest. Additionally, from the Search Console, you can export the metadata for use as input to other applications.
The Search Console works with either of these search facilities:
The metadata query engine
This facility is integrated with HCP and is also used by the metadata query API, which is a programmatic interface for querying namespaces.
The Hitachi Data Discovery Suite search facility
This facility interacts with Data Discovery Suite, which performs searches and returns results to the HCP Search Console. Data Discovery Suite is a separate product from HCP.
This book covers aspects of Data Discovery Suite that are specific to HCP.
Only one search facility can be selected for use with the Search Console at any given time. This facility, called the active search facility, is selected at the HCP system level. If no search facility is selected, the HCP system does not support searching namespaces.
Each search facility maintains an index of objects. The index maintained by the metadata query engine resides in HCP. The index maintained by the Data Discovery Suite search facility resides in Data Discovery Suite.
The metadata query engine index is based on system metadata, custom metadata that is well-formed XML, and ACLs. The index maintained by the Data Discovery Suite search facility is based on object data and metadata.
Indexing is enabled on a per-namespace basis. If a namespace is not indexed, searches do not return any results for objects in the namespace.
Indexing of custom metadata is also enabled on a per-namespace basis. If indexing of custom metadata is disabled for a namespace, the index associated with the metadata query engine does not include custom metadata for objects in the namespace.
HCP namespaces can be configured to store multiple versions of objects. Each index, however, includes only the most current version of an object.
To maintain its index, each search facility periodically checks indexable namespaces for new objects and for objects with metadata that has changed since the last check. When it finds new or changed information, it updates its index. The amount of time a search facility takes to update its index depends on the amount of information to be indexed.
Metadata query engine indexing of custom metadata can be configured as follows:
- Specific content properties can be indexed.
- Specific annotations in a namespace can be excluded from indexing.
- Indexing can be enabled or disabled for the full text of custom metadata.
Custom metadata in a namespace can be indexed based content properties. A content property is a named construct used to extract an element or attribute value from custom metadata that's well-formed XML. Each content property has a data type that determines how the property values are treated by the metadata query engine. Additionally, a content property is defined as either single-valued or multivalued. A multivalued property can extract the values of multiple occurrences of the same element or attribute from the XML.
Content properties are grouped into content classes, and each namespace can be associated with a set of content classes. The content properties that belong to a content class associated with the namespace are indexed for the namespace. Content classes are defined at the tenant level, so multiple namespaces can be associated with the same content class.
For example, if the namespace
Personnel is associated with the content class
MedInfo, and the content property
DrName is a member of the content class, the query engine will use the
DrName content property to index the custom metadata in the
Each object has an index setting that the metadata query engine uses to determine whether to index custom metadata for the object. The metadata query engine always indexes object metadata and ACLs regardless of the index setting on an object.
Index settings do not affect Data Discovery Suite search facility indexing.
In addition to object data and metadata, the index maintained by the Data Discovery Suite search facility includes extracted metadata. Extracted metadata is metadata that’s specific to a document format. Examples of this type of metadata are the author and title of a stored document.
For a namespace to be searchable in the Search Console:
- The namespace must be indexed by the active search facility.
- The namespace must be configured to allow searches. This property of a namespace is separate from whether the namespace is indexed.