Repository management requires:
- Maintaining the integrity and security of stored data
- Ensuring the continuous availability of that data
- Keeping the data in compliance with local regulations
- Optimizing the use of storage and network bandwidth
HCP supports these requirements through:
- The hardware and network configuration of the system
- Software configuration options (both installation and runtime)
- Automated processes
- Individual namespace and object settings
- Object storage and retrieval options
Data integrity and security
HCP includes many features specifically designed to protect the integrity and ensure the security of stored data:
Write-once, read-many (WORM) storage
After the data for an object is stored in the repository, HCP prevents that data from being modified or overwritten.
Node login prevention
HCP does not allow system-console logins on its nodes. This provides a basic level of protection not only for the stored data but also for the system software.
Secure Sockets Layer (SSL)
HCP can use SSL to ensure the privacy of HTTP and WebDAV access to namespaces. It always uses SSL to secure the Management and Search Consoles. Additionally, use of the HCP management API requires SSL.
Content Verification service
Each object has a cryptographic hash value that’s calculated from the object data. The Content Verification service ensures the integrity of each object by periodically checking that its data still matches its hash value.
The Scavenging service protects namespaces from the loss of system metadata. If the service encounters an object with invalid metadata, it restores the correct metadata by using a copy from another location.
Each object has a retention setting that specifies how long the object must remain in the repository before it can be deleted; this duration is called the retention period. HCP ensures that objects are kept until their retention periods expire. The only exception to this behavior occurs in namespaces in enterprise mode. In these namespaces, users with explicit permission to do so can delete objects that are under retention. Such deletions are recorded in the tenant log.
Objects can be marked for shredding. When such an object is deleted, HCP overwrites its storage location in such a way as to completely remove any trace that the object was there.
Data access authentication
The HTTP, S3 compatible, WebDAV, and CIFS protocols can be configured to require authentication for access to an HCP namespace. If these are the only protocols enabled for the namespace, users and applications must present valid credentials for access to the namespace content. HCP supports both local and remote authentication methods. For remote authentication, HCP supports Windows Active Directory® and RADIUS.
Data access permission masks
Data access permissions determine which operations a user or application can perform on the objects in an HCP namespace. These permissions can be:
- Associated with a tenant-level user or group account, in which case they apply to all objects in the namespace
- Specified in the namespace configuration as the minimum permissions for authenticated or unauthenticated users, in which case they apply to all objects in the namespace
- Specified in an ACL, in which case they apply to the individual object for which the ACL is defined
Virtual networking is a technology that enables you to define multiple logical networks over which clients can communicate with HCP. You can assign different networks to different tenants, thereby segregating network traffic to and from the namespaces owned by one tenant from network traffic to and from the namespaces owned by other tenants. This segregation enhances the privacy and security of data transmitted between clients and the HCP system.
HCP has these features that help ensure the continuous availability of stored data:
In a SAIN system, a single node can connect to more than one port on a storage system, either directly or through multiple Fibre Channel switches. This creates multiple physical paths between the node and any given logical volume that maps to it. With this setup, if one component of a physical path connecting such a node to the storage system fails, the node still has access to the logical volume through another physical path. Multiple means of access to a logical volume from a single node is called multipathing.
In a SAIN system, one node can automatically take over management of storage previously managed by another node that has failed. This process is called zero-copy failover. To support zero-copy failover, each logical volume that stores objects or the metadata query engine index must map to two different storage nodes. The pair of nodes forms a set such that the volumes that map to one of the nodes also map to the other. This is called cross-mapping.
Each namespace has a service plan that defines both a data protection strategy and a storage tiering strategy for the objects in that namespace. At any given point in the lifecycle of an object, its data protection strategy specifies the number of copies of the object that must exist in the HCP repository and the type of storage on which each copy must be stored. Because some types of storage are more highly available than others, you can use the service plan for a namespace to control both data redundancy and data availability for the objects in that namespace.
HCP uses the Protection service to maintain the correct number of copies of each object in the HCP repository. When the number of existing copies of an object goes below the number of object copies specified in the applicable service plan (for example, because of a logical volume failure), the Protection service automatically creates a new copy of that object in another location. When the number of existing copies of an object goes above the number of object copies specified in the applicable service plan, the Protection service automatically deletes all unnecessary copies of that object.
To protect data availability against concurrent node failures, HCP stores multiple copies of each object on different nodes in an automatically predetermined set of nodes, called a protection set. If a node (or one of its logical volumes) fails, objects stored on its associated volumes (or on the failed volume) are still available through other nodes in the set.
You can create a configuration in which selected tenants and namespaces are maintained on two or more HCP systems and the objects in those namespaces are managed across those systems. This cross-system management helps ensure that data is well-protected against the unavailability or catastrophic failure of a system.
Typically, the systems involved in cross-system data management are in separate geographic locations and are connected by a high-speed wide area network. This arrangement provides geographically distributed data protection (called geo-protection).
Geo-protection enables HCP to support a cloud storage model, where any type of client request can be serviced equally by any system in the topology. Additionally, each system may be able to provide faster data access for some applications than the other systems can, depending on where the applications are running.
HCP supports two methods of geo-protection: whole-object protection and erasure-coded protection. Both methods are implemented by the Replication service, which is responsible for the distribution of configuration information and either complete copies of object data or chunks for erasure-coded objects.
Read from remote
If an object in a replicated HCP namespace or default-namespace directory is unavailable on one system in a replication topology (for example, because a node is unavailable), HCP can try to read the object from another system in the topology. HCP tries this only if the namespace has the read-from-remote feature enabled and the object has already been replicated.NoteThe read-from-remote feature does not affect the ability of HCP to read the data for metadata-only objects from another system or to restore or reconstruct erasure-coded objects.
Automatic redirection to other systems in a replication topology
HTTP requests for access to an unavailable HCP system can be automatically redirected to any other system in a replication topology in which the unavailable system participates. This means that, to be satisfied, the request does not need to be reissued with a different URL.
For another system to satisfy the request, the target HCP namespace or default-namespace directory must be replicated to that system. Also, the namespace must be configured to accept requests directed to other HCP systems. Additionally, the DNS must be configured to support redirection between HCP systems in the replication topology, and the unavailable system must be configured to allow this redirection.
HCP includes features that enable you to comply with local regulations regarding data storage and maintenance:
At HCP installation time, you can choose to encrypt all data and metadata stored in the repository, thereby ensuring data privacy in a compliance context. Encryption prevents unauthorized users and applications from directly viewing namespace content. Lost or stolen storage devices are useless to parties without the correct encryption key.
HCP handles data encryption and decryption automatically, so no access or process changes are required.
Some government regulations require that certain types of data be kept for a specific length of time. For example, local law may require that medical records be kept for a specific number of years.
A retention class is a named duration that can be used as the retention setting for an object. When an object is assigned to a retention class, the object cannot be deleted until the specified length of time past its creation date. For example, a retention class named HlthReg-107 could have a duration of 21 years. Objects assigned to that class then could not be deleted for 21 years after they were created.
A namespace can be created in either of two modes: enterprise or compliance. The retention mode determines which operations are allowed on objects that are under retention:
- In enterprise mode, users and applications can delete objects under retention if they have explicit permission to do so. This is called privileged delete (see below).
Also, in enterprise mode, authorized administrative users can delete retention classes and shorten retention class durations.
- In compliance mode, objects that are under retention cannot be deleted through any mechanism. Additionally, retention classes (see above) cannot be deleted, and retention class durations cannot be shortened.
- In enterprise mode, users and applications can delete objects under retention if they have explicit permission to do so. This is called privileged delete (see below).
Some localities require that certain data be destroyed in response to changing circumstances. For example, companies may be required to destroy particular information about employees who leave.
Privileged delete is an HCP feature that enables authorized users to delete objects even if they are under retention. This feature is available only in namespaces that are in enterprise mode. In compliance mode, objects can never be deleted while they are under retention.
When performing a privileged delete operation, the user is required to specify a reason for the deletion. HCP logs each privileged delete operations along with its specified reason, thereby creating an audit trail.
To support legal discovery, users and applications can place a hold on selected objects. While an object is on hold, it cannot be deleted through any mechanism, regardless of its retention setting.
Storage usage optimization
HCP uses a number of features to reclaim and balance storage capacity.
The Compression/Encryption service makes more efficient use of HCP storage by compressing object data, thereby freeing space for storing more objects.
Duplicate Elimination service
A repository can contain multiple objects that have identical data but different metadata. When the Duplicate Elimination service finds such objects, it merges their data to free storage space occupied by all but one of the objects.
The Disposition service automatically deletes objects with expired retention periods. To be eligible for disposition, an object must have a retention setting that’s either a date in the past or a retention class with automatic deletion enabled and a calculated expiration date in the past.
An HCP namespace can be configured to allow storage of multiple versions of objects. Version pruning is the automatic deletion of previous versions of an object that are older than a specified amount of time.
Garbage Collection service
The Garbage Collection service reclaims storage space both by completing logical delete operations and by deleting objects left behind by incomplete transactions.
Capacity Balancing service
The Capacity Balancing service ensures that the percentage of space used is roughly equivalent across all the storage nodes in the system. Balancing storage usage across the nodes helps HCP balance the processing load.
Specifies the types of storage on which copies of that object must be stored and specifies the number of object copies that must be stored on each type of storage.
By default, throughout the lifecycle of an object, HCP stores that object only on primary running storage, which is storage that’s managed by the nodes in the HCP system and consists of continuously spinning disks. However, you can configure HCP to use other types of storage for tiering purposes.
Every service plan defines primary running storage as the initial storage tier, called the ingest tier. The default storage tiering strategy specifies only that tier.
Primary running storage is designed to provide both high data availability and high performance for object data storage and retrieval operations. To optimize data storage price/performance for the objects in a namespace, you can configure the service plan for that namespace to define a storage tiering strategy that specifies multiple storage tiers.
Storage Tiering service
HCP uses the Storage Tiering service to maintain the correct number of copies of each object in a namespace on the storage tiers that are defined by the storage tiering strategy for that namespace. When the number of object copies on a storage tier goes below the number of object copies specified for that tier in the applicable service plan, the Storage Tiering service automatically creates a new copy of that object on that tier. When the number of copies of an object on a storage tier goes above the number of object copies specified for that tier in the applicable service plan, the Storage Tiering service automatically deletes all unnecessary copies of that object from that tier.
Primary spindown storage
On a SAIN system, HCP can be configured to use primary spindown storage, which is primary storage that consists of disks that can be spun down when not being accessed, for tiering purposes. You can then configure the service plan for any given namespace to define primary spindown storage as a storage tier for the objects in that namespace. Using primary spindown storage to store object data that’s accessed infrequently saves energy, thereby reducing the cost of storage.
HCP moves object data between primary running storage, primary spindown storage, and other types of storage that are used for tiering purposes according to rules that are specified in storage tiering strategies defined by service plans.
HCP S Series storage
HCP can be configured to use S Series storage, which is storage on external HCP S Series Nodes that are separate from the HCP system. S Series Nodes are used for tiering purposes, and the HCP system communicates with them through the S3 compatible API and management API.
HCP can be configured to use extended storage, which is storage that’s managed by devices outside of the HCP system, for tiering purposes. HCP supports the following types of extended storage:
Volumes that are stored on extended storage devices and are accessed using NFS mount points
Cloud storage that is accessed using an Amazon Web Services user account
Cloud storage that is accessed using a Google Cloud Platform user account
Cloud storage that is accessed using a Microsoft Azure user account
Any physical storage device or cloud storage service accessed using a protocol that is compatible with the Amazon S3 API
S3 compatible cloud storage that is accessed using a ThinkOn cloud user account
Moving object data from primary storage to extended storage frees up HCP system storage space so that you can ingest additional objects.NoteWhile all of the data for an object can be moved off primary running storage and stored only on extended storage, at least one copy of the system metadata, custom metadata, and ACL for that object must always remain on primary running storage.
In addition, you can optimize data storage price/performance for the objects in a namespace by configuring the service plan for that namespace to define a storage tiering strategy that defines storage tiers for multiple types of extended storage.
HCP moves object data between primary running storage, primary spindown storage (if it is used), and one or more types of extended storage according to rules specified in the storage tiering strategies defined by service plans.
Erasure-coded protection is a method of geo-protection where the data for each object in a replicated namespace is subject to erasure coding. With erasure coding, the data is encoded and broken into multiple chunks that are then stored across multiple HCP systems. All but one chunk contains object data. The other chunk contains parity for the object data.
With erasure-coded protection, each system stores one data or parity chunk for any given erasure-coded object. The size of each chunk for an object is the size of the object data divided by the number of data chunks for the object. This means that the total storage used for an object in a replicated namespace is at most the size of a chunk times the total number of data and parity chunks for the object. (Storage usage can be less due to compression and duplicate elimination.)
For whole-object protection (the other method of geo-protection) to provide the same level of data protection as erasure-coded protection provides, at least two systems must each store all the data for each object in a replicated namespace. With two systems, the total storage used for each object is at most two times the size of the object data, which is greater than the total storage used when the same object is erasure coded. This is true regardless of the number of systems across which the chunks for the erasure-coded object are distributed.
Additionally, with erasure-coded protection, the storage footprint on any individual system that stores chunks for objects is smaller than the storage footprint resulting from storing complete object data on that system.
With multiple HCP systems participating in a replication topology, you may not need to store object data in every system. A metadata-only object is one from which HCP has removed the data, leaving the system metadata, custom metadata, and ACL for the object in place. HCP makes an object metadata-only only if at least one copy of the object data exists elsewhere in the topology.
Metadata-only objects enable some systems in a replication topology to have a smaller storage footprint than other systems, even when the same namespaces are replicated to all systems in the topology.
HCP makes objects metadata-only according to the rules specified in service plans. If the rules change, HCP can restore data to the objects to meet the new requirements.
Network bandwidth usage optimization
HCP offers these features to help maximize network throughput and reduce the use of network bandwidth by read and write operations:
Each node in an HCP system has two bonded ports for connecting to the front-end network. When using a single front-end switch, you can take advantage of this setup by using two cables per node to connect to the switch and configuring both HCP and the applicable ports on the switch for active-active (802.3ad) bonding. The redundant ports and cables help ensure a high-availability connection to the front-end network, and the active-active bonding allows for increased network throughput.
10Gb Ethernet connectivity
Optionally, for SAIN systems, HCP supports 10Gb Ethernet connectivity to the front-end network. The 10GbE network interface allows for greater network throughput than does the 1GbE interface option.
Systems with the 10GbE network interface on the front end also use 10GbE for the back-end network. This enables the HCP nodes to transmit data among themselves at a rate that supports the higher front-end throughput.
Compressed data transmission
Clients that use the HTTP protocol to communicate with HCP can reduce network bandwidth usage by sending and receiving data in a compressed format. Before sending data to HCP, the client uses the publicly available gzip utility to compress the data. Upon receiving the data, HCP uncompresses it automatically before storing it. When requested to do so, HCP uses gzip to compress data before sending it to the client. The client then uses gunzip to uncompress the data.
Combined data and custom metadata on reads and writes
Clients that use the HTTP protocol for namespace access can store or retrieve both the data and custom metadata for an object with a single request. Issuing a single request instead of separate requests for the data and custom metadata reduces the network load.
This feature can be used in conjunction with compressed data transmission to further reduce the network load.