Skip to main content
Outside service Partner
Hitachi Vantara Knowledge

Repository management


Repository management requires:

Maintaining the integrity and security of stored data

Ensuring the continuous availability of that data

Keeping the data in compliance with local regulations

Optimizing the use of storage and network bandwidth

HCP supports these requirements through:

The hardware and network configuration of the system

Software configuration options (both installation and runtime)

Automated processes

Individual namespace and object settings

Object storage and retrieval options

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Data integrity and security


HCP includes many features specifically designed to protect the integrity and ensure the security of stored data:

Write-once, read-many (WORM) storage — Once the data for an object is stored in the repository, HCP prevents that data from being modified or overwritten.

Node login prevention — HCP does not allow system-console logins on its nodes. This provides a basic level of protection not only for the stored data but also for the system software.

Secure Sockets Layer (SSL) — HCP can use SSL to ensure the privacy of HTTP and WebDAV access to namespaces. It always uses SSL to secure the Management and Search Consoles. Additionally, use of the HCP management API requires SSL.

For information on using SSL with HCP, see Managing domains and SSL server certificates.

Content verification service — Each object has a cryptographic hash value that’s calculated from the object data. The content verification service ensures the integrity of each object by periodically checking that its data still matches its hash value.

For more information on the content verification service, see Content verification service.

Scavenging service — The scavenging service protects namespaces from the loss of system metadata. If the service encounters an object with invalid metadata, it restores the correct metadata by using a copy from another location.

For more information on the scavenging service, see Scavenging service.

Retention policy — Each object has a retention setting that specifies how long the object must remain in the repository before it can be deleted; this duration is called the retention period. HCP ensures that objects are kept until their retention periods expire. The only exception to this behavior occurs in namespaces in enterprise mode. In these namespaces, users with explicit permission to do so can delete objects that are under retention. Such deletions are recorded in the tenant log.

For more information on enterprise mode, see the description of retention mode in Regulatory compliance. For more information on the retention policy, see Retention policy.

Shredding policy — Objects can be marked for shredding. When such an object is deleted, HCP overwrites its storage location in such a way as to completely remove any trace that the object was there.

For more information on the shredding policy, see Shredding policy.

Data access authentication — The HTTP, S3 compatible, HSwift, WebDAV, and CIFS protocols can be configured to require authentication for access to an HCP namespace. If these are the only protocols enabled for the namespace, users and applications must present valid credentials for access to the namespace content.

HCP supports both local and remote authentication methods. For remote authentication, HCP supports Windows Active Directory® and RADIUS.

For more information on configuring namespace access protocols to require authentication, see Managing a Tenant and Its Namespaces. For information on local and remote authentication, see User authentication.

Data access permission masks — Data access permission masks determine which operations are allowed in a namespace. These masks are set at the system, tenant, and namespace levels. The effective permissions for a namespace are the operations that are allowed by the masks at all three levels.

For more information on data access permission masks, see Setting the systemwide permission mask.

Data access permissions — Data access permissions determine which operations a user or application can perform on the objects in an HCP namespace. These permissions can be:

oAssociated with a tenant-level user or group account, in which case they apply to all objects in the namespace

oSpecified in the namespace configuration as the minimum permissions for authenticated or unauthenticated users, in which case they apply to all objects in the namespace

oSpecified in an ACL, in which case they apply to the individual object for which the ACL is defined

For more information on data access permissions that apply to all objects in a namespace, see Managing a Tenant and Its Namespaces. For more information on ACLs, see Using a Namespace.

Virtual networking — Virtual networking is a technology that enables you to define multiple logical networks over which clients can communicate with HCP. You can assign different networks to different tenants, thereby segregating network traffic to and from the namespaces owned by one tenant from network traffic to and from the namespaces owned by other tenants. This segregation enhances the privacy and security of data transmitted between clients and the HCP system.

For more information on virtual networking, see Network administration.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Data availability


HCP has these features that help ensure the continuous availability of stored data:

Multipathing — In a SAIN system, a single node can connect to more than one port on a storage array, either directly or through multiple Fibre Channel switches. This creates multiple physical paths between the node and any given logical volume that maps to it. With this setup, if one component of a physical path connecting such a node to the array fails, the node still has access to the logical volume through another physical path.

Multiple means of access to a logical volume from a single node is called multipathing.

Zero-copy failover — In a SAIN system, one node can automatically take over management of storage previously managed by another node that has failed. This process is called zero-copy failover.

To support zero-copy failover, each logical volume that stores objects or the metadata query engine index must map to two different storage nodes. The pair of nodes forms a set such that the volumes that map to one of the nodes also map to the other. This is called cross-mapping.

For more information on zero-copy failover and cross-mapping, see Zero-copy failover behavior.

Service plans — Each namespace has a service plan that defines both a data protection strategy and a storage tiering strategy for the objects in that namespace. At any given point in the lifecycle of an object, its data protection strategy specifies the number of copies of the object that must exist in the HCP repository and the type of storage on which each copy must be stored.

Because some types of storage are more highly available than others, you can use the service plan for a namespace to control both data redundancy and data availability for the objects in that namespace.

For more information on using service plans to define a data protection strategy for objects in a namespace, see About service plans.

Protection service — HCP uses the protection service to maintain the correct number of copies of each object in the HCP repository. When the number of existing copies of an object goes below the number of object copies specified in the applicable service plan (for example, because of a logical volume failure), the protection service automatically creates a new copy of that object in another location. When the number of existing copies of an object goes above the number of object copies specified in the applicable service plan, the protection service automatically deletes all unnecessary copies of that object.

For more information on the protection service, see Protection service.

Protection sets — To protect data availability against concurrent node failures, HCP stores multiple copies of each object on different nodes in an automatically predetermined set of nodes, called a protection set. If a node (or one of its logical volumes) fails, objects stored on its associated volumes (or on the failed volume) are still available through other nodes in the set.

For information on protection sets, see Ingest tier data protection level.

Geo-protection — You can create a configuration in which selected tenants and namespaces are maintained on two or more HCP systems and the objects in those namespaces are managed across those systems. This cross-system management helps ensure that data is well-protected against the unavailability or catastrophic failure of a system.

Typically, the systems involved in cross-system data management are in separate geographic locations and are connected by a high-speed wide area network. This arrangement provides geographically distributed data protection (called geo-protection).

Geo-protection enables HCP to support a cloud storage model, where any type of client request can be serviced equally by any system in the topology. Additionally, each system may be able to provide faster data access for some applications than the other systems can, depending on where the applications are running.

HCP supports two methods of geo-protection: whole-object protection and erasure-coded protection. Both methods are implemented by the replication service, which is responsible for the distribution of configuration information and either complete copies of object data or chunks for erasure-coded objects.

For more information on geo-protection, see Geographically distributed data protection.

Read from remote — If an object in a replicated HCP namespace or default-namespace directory is unavailable on one system in a replication topology (for example, because a node is unavailable), HCP can try to read the object from another system in the topology. HCP tries this only if the namespace has the read-from-remote feature enabled and the object has already been replicated.

For information on enabling the read-from-remote feature for a namespace, see Managing a Tenant and Its Namespaces or Managing the Default Tenant and Namespace.

NoteWebHelp.png

Note: The read-from-remote feature does not affect the ability of HCP to read the data for metadata-only objects from another system or to restore or reconstruct erasure-coded objects.

Automatic redirection to other systems in a replication topology — HTTP requests for access to an unavailable HCP system can be automatically redirected to any other system in a replication topology in which the unavailable system participates. This means that, to be satisfied, the request does not need to be reissued with a different URL.

For another system to satisfy the request, the target HCP namespace or default-namespace directory must be replicated to that system. Also, the namespace must be configured to accept requests directed to other HCP systems. Additionally, the DNS must be configured to support redirection between HCP systems in the replication topology, and the unavailable system must be configured to allow this redirection.

For information on:

oConfiguring namespaces to accept requests directed to other systems, see Managing a Tenant and Its Namespaces or Managing the Default Tenant and Namespace

oConfiguring HCP in your DNS, see Configuring DNS for HCP

oConfiguring an HCP system to support redirection of client requests, see Replicating Tenants and Namespaces

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Regulatory compliance


HCP includes features that enable you to comply with local regulations regarding data storage and maintenance:

Data privacy — At HCP installation time, you can choose to encrypt all data and metadata stored in the repository, thereby ensuring data privacy in a compliance context. Encryption prevents unauthorized users and applications from directly viewing namespace content. Lost or stolen storage devices are useless to parties without the correct encryption key.

HCP handles data encryption and decryption automatically, so no access or process changes are required.

Retention classes — Some government regulations require that certain types of data be kept for a specific length of time. For example, local law may require that medical records be kept for a specific number of years.

A retention class is a named duration that can be used as the retention setting for an object. When an object is assigned to a retention class, the object cannot be deleted until the specified length of time past its creation date. For example, a retention class named HlthReg-107 could have a duration of 21 years. Objects assigned to that class then could not be deleted for 21 years after they were created.

For more information on retention classes, see Managing a Tenant and Its Namespaces or Managing the Default Tenant and Namespace.

Retention mode — A namespace can be created in either of two modes: enterprise or compliance. The retention mode determines which operations are allowed on objects that are under retention:

oIn enterprise mode, users and applications can delete objects under retention if they have explicit permission to do so. This is called privileged delete (see below).

Also, in enterprise mode, authorized administrative users can delete retention classes and shorten retention class durations.

oIn compliance mode, objects that are under retention cannot be deleted through any mechanism. Additionally, retention classes (see above) cannot be deleted, and retention class durations cannot be shortened.

Privileged delete — Some localities require that certain data be destroyed in response to changing circumstances. For example, companies may be required to destroy particular information about employees who leave.

Privileged delete is an HCP feature that enables authorized users to delete objects even if they are under retention. This feature is available only in namespaces that are in enterprise mode. In compliance mode, objects can never be deleted while they are under retention.

When performing a privileged delete operation, the user is required to specify a reason for the deletion. HCP logs each privileged delete operations along with its specified reason, thereby creating an audit trail.

For more information on privileged delete, see Managing a Tenant and Its Namespaces or Managing the Default Tenant and Namespace.

Retention hold — To support legal discovery, users and applications can place a hold on selected objects. While an object is on hold, it cannot be deleted through any mechanism, regardless of its retention setting.

For more information on retention hold, see Using a Namespace or Using the Default Namespace.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Storage usage optimization


HCP uses these features to reclaim and balance storage capacity:

Compression service — The compression service makes more efficient use of HCP storage by compressing object data, thereby freeing space for storing more objects.

For more information on the compression service, see Compression service.

Duplicate elimination service — A repository can contain multiple objects that have identical data but different metadata. When the duplicate elimination service finds such objects, it merges their data to free storage space occupied by all but one of the objects.

For more information on the duplicate elimination service, see Duplicate elimination service.

Disposition service — The disposition service automatically deletes objects with expired retention periods. To be eligible for disposition, an object must have a retention setting that’s either a date in the past or a retention class with automatic deletion enabled and a calculated expiration date in the past.

For more information on the disposition service, see Disposition service. For more information on retention classes, see Managing a Tenant and Its Namespaces or Managing the Default Tenant and Namespace.

Version pruning — An HCP namespace can be configured to allow storage of multiple versions of objects. Version pruning is the automatic deletion of previous versions of an object that are older than a specified amount of time.

For more information on versioning and version pruning, see Managing a Tenant and Its Namespaces and Using a Namespace.

Garbage collection service — The garbage collection service reclaims storage space both by completing logical delete operations and by deleting objects left behind by incomplete transactions.

For more information on the garbage collection service, see Garbage collection service.

Capacity balancing service — The capacity balancing service ensures that the percentage of space used is roughly equivalent across all the storage nodes in the system. Balancing storage usage across the nodes helps HCP balance the processing load.

For more information on the capacity balancing service, see Capacity balancing service.

Service plans — Each namespace has a service plan that defines both a storage tiering strategy and a data protection strategy for the objects in that namespace. At any given point in the lifecycle of an object, its storage tiering strategy specifies the types of storage on which copies of that object must be stored and specifies the number of object copies that must be stored on each type of storage.

By default, throughout the lifecycle of an object, HCP stores that object only on primary running storage, which is storage that’s managed by the nodes in the HCP system and consists of continuously spinning disks. However, you can configure HCP to use other types of storage for tiering purposes.

Every service plan defines primary running storage as the initial storage tier, called the ingest tier. The default storage tiering strategy specifies only that tier.

Primary running storage is designed to provide both high data availability and high performance for object data storage and retrieval operations. To optimize data storage price/performance for the objects in a namespace, you can configure the service plan for that namespace to define a storage tiering strategy that specifies multiple storage tiers.

Storage tiering service — HCP uses the storage tiering service to maintain the correct number of copies of each object in a namespace on the storage tiers that are defined by the storage tiering strategy for that namespace. When the number of object copies on a storage tier goes below the number of object copies specified for that tier in the applicable service plan, the storage tiering service automatically creates a new copy of that object on that tier. When the number of copies of an object on a storage tier goes above the number of object copies specified for that tier in the applicable service plan, the storage tiering service automatically deletes all unnecessary copies of that object from that tier.

Primary spindown storage — On a SAIN system, HCP can be configured to use primary spindown storage, which is primary storage that consists of disks that can be spun down when not being accessed, for tiering purposes. You can then configure the service plan for any given namespace to define primary spindown storage as a storage tier for the objects in that namespace. Using primary spindown storage to store object data that’s accessed infrequently saves energy, thereby reducing the cost of storage.

HCP moves object data between primary running storage, primary spindown storage, and other types of storage that are used for tiering purposes according to rules that are specified in storage tiering strategies defined by service plans.

For more information on primary spindown storage, see Storage for HCP systems. For more information on service plans, see About service plans.

S Series storage — HCP can be configured to use S Series storage, which is storage on external HCP S Series Nodes that are separate from the HCP system. S Series Nodes are used for tiering purposes, and the HCP system communicates with them through the S3 compatible API and management API.

Extended storage — HCP can be configured to use extended storage, which is storage that’s managed by devices outside of the HCP system, for tiering purposes. HCP can be configured to use up to six different types of extended storage:

oNFS — Volumes that are stored on extended storage devices and are accessed using NFS mount points

oAmazon S3 — Cloud storage that’s accessed using an Amazon Web Services user account

oGoogle Cloud — Cloud storage that’s accessed using a Google Cloud Platform user account

oMicrosoft Azure — Cloud storage that’s accessed using a Microsoft Azure user account

oS3 compatible — Any physical storage device or cloud storage service that’s accessed using a protocol that’s compatible with the Amazon S3 API

Moving object data from primary storage to extended storage frees up HCP system storage space so that you can ingest additional objects.

NoteWebHelp.png

Note: While all of the data for an object can be moved off primary running storage and stored only on extended storage, at least one copy of the system metadata, custom metadata, and ACL for that object must always remain on primary running storage.

In addition, you can optimize data storage price/performance for the objects in a namespace by configuring the service plan for that namespace to define a storage tiering strategy that defines storage tiers for multiple types of extended storage.

HCP moves object data between primary running storage, primary spindown storage (if it’s used), and one or more types of extended storage according to rules specified in the storage tiering strategies defined by service plans.

For more information on extended storage, see Extended storage components. For more information on service plans, see About service plans.

Erasure-coded protection — Erasure-coded protection is a method of geo-protection where the data for each object in a replicated namespace is subject to erasure coding. With erasure coding, the data is encoded and broken into multiple chunks that are then stored across multiple HCP systems. All but one chunk contains object data. The other chunk contains parity for the object data.

With erasure-coded protection, each system stores one data or parity chunk for any given erasure-coded object. The size of each chunk for an object is the size of the object data divided by the number of data chunks for the object. This means that the total storage used for an object in a replicated namespace is at most the size of a chunk times the total number of data and parity chunks for the object. (Storage usage can be less due to compression and duplicate elimination.)

For whole-object protection (the other method of geo-protection) to provide the same level of data protection as erasure-coded protection provides, at least two systems must each store all the data for each object in a replicated namespace. With two systems, the total storage used for each object is at most two times the size of the object data, which is greater than the total storage used when the same object is erasure coded. This is true regardless of the number of systems across which the chunks for the erasure-coded object are distributed.

Additionally, with erasure-coded protection, the storage footprint on any individual system that stores chunks for objects is smaller than the storage footprint resulting from storing complete object data on that system.

For more information on erasure-coded protection, see Protection types.

Metadata-only objects — With multiple HCP systems participating in a replication topology, you may not need to store object data in every system. A metadata-only object is one from which HCP has removed the data, leaving the system metadata, custom metadata, and ACL for the object in place. HCP makes an object metadata-only only if at least one copy of the object data exists elsewhere in the topology.

Metadata-only objects enable some systems in a replication topology to have a smaller storage footprint than other systems, even when the same namespaces are replicated to all systems in the topology.

HCP makes objects metadata-only according to the rules specified in service plans. If the rules change, HCP can restore data to the objects to meet the new requirements.

For more information on metadata-only objects, see Making objects metadata-only. For more information on service plans, see About service plans.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Network bandwidth usage optimization


HCP offers these features to help maximize network throughput and reduce the use of network bandwidth by read and write operations:

Link aggregation — Each node in an HCP system has two bonded ports for connecting to the front-end network. When using a single front-end switch, you can take advantage of this setup by using two cables per node to connect to the switch and configuring both HCP and the applicable ports on the switch for active-active (802.3ad) bonding. The redundant ports and cables help ensure a high-availability connection to the front-end network, and the active-active bonding allows for increased network throughput.

10Gb Ethernet connectivity — Optionally, for SAIN systems, HCP supports 10Gb Ethernet connectivity to the front-end network. The 10GbE network interface allows for greater network throughput than does the 1GbE interface option.

Systems with the 10GbE network interface on the front end also use 10GbE for the back-end network. This enables the HCP nodes to transmit data among themselves at a rate that supports the higher front-end throughput.

Compressed data transmission — Clients that use the HTTP protocol to communicate with HCP can reduce network bandwidth usage by sending and receiving data in a compressed format. Before sending data to HCP, the client uses the publicly available gzip utility to compress the data. Upon receiving the data, HCP uncompresses it automatically before storing it. When requested to do so, HCP uses gzip to compress data before sending it to the client. The client then uses gunzip to uncompress the data.

For more information on compressed data transmission, see Using a Namespace and Using the Default Namespace.

Combined data and custom metadata on reads and writes — Clients that use the HTTP protocol for namespace access can store or retrieve both the data and custom metadata for an object with a single request. Issuing a single request instead of separate requests for the data and custom metadata reduces the network load.

This feature can be used in conjunction with compressed data transmission to further reduce the network load.

For more information on combining data and custom metadata on reads and writes, see Using a Namespace and Using the Default Namespace.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.