Content Verification service
When an object is created, HCP uses cryptographic hash algorithms to calculate various hash values for it. These values, which are generated based on the object data, system metadata, and custom metadata are stored with the primary metadata for the object.
One of the hash values that is generated only from the object data is also stored with the secondary metadata for the object. The cryptographic hash algorithm HCP uses to calculate this hash value is namespace dependent. It is set when the namespace is created. After set, it cannot be changed.
Users and applications can see, but not modify, hash values generated from object data and annotations. They cannot see any other hash values.
For the purpose of content verification, HCP treats the following items as individual objects:
- Parts of multipart objects
- Parts of in-progress multipart uploads
- Chunks for erasure-coded objects
- Chunks for erasure-coded parts of multipart objects
The Content Verification service ensures the integrity of each object by:
- Checking that the object data, system metadata, and custom metadata still match the stored cryptographic hash values
- Ensuring that certain secondary metadata other than the hash value matches the primary metadata for the object
The Content Verification service runs according to the active service schedule.
During HCP content verification, HCP attempts to repair any files that HCP S Series Nodes report as being irreparable.
HCP supports these cryptographic hash algorithms for selection at the namespace level:
- MD5
- SHA-1
- SHA-256
- SHA-384
- SHA-512
- RIPEMD-160
The more complex the hash algorithm, the greater the impact on performance when objects are stored or when services run.
Content Verification service processing
The Content Verification service has two main functions: detecting corrupted data and discrepancies in metadata, and repairing that data and metadata.
To detect corrupted data, the Content Verification service regenerates the cryptographic hash values for each object. After regenerating the hash values, the Content Verification service checks that these regenerated values match the corresponding values in the primary metadata.
The Content Verification service detects metadata discrepancies by checking that certain secondary metadata for each object matches the primary metadata for the object.
A violation occurs when either of the conditions described above is not true. Violations of the second type are not reported in the system log.
When an object is stored through the CIFS or NFS protocol, its primary metadata does not initially include cryptographic hash values that are based on the object data. HCP waits several minutes to ensure that the object content is complete before calculating these values. Large objects stored through these protocols may take longer to get hash values than smaller objects do.
If the Content Verification service encounters primary metadata without hash values, it adds the regenerated values to it.
If the Content Verification service finds a discrepancy between the cryptographic hash values it regenerates for the object and the corresponding hash value in the primary metadata, it creates a new copy of the object from an existing good copy and marks the corrupted copy for deletion.
If replication is in effect and the Content Verification service cannot find a good copy of the object in the current repository, it can repair the object by using a copy from another HCP system in the replication topology.
To repair a chunk for an erasure-coded object, the Content Verification service recalculates the chunk either by using a full copy of the object data, if one exists on another system in the replication topology, or by using the chunks for the object on all the other systems in the replication topology.
If the Content Verification service finds a discrepancy between other secondary metadata for the object and the corresponding primary metadata, it uses the primary metadata to replace the secondary metadata.
When the Content Verification service cannot repair a violation, it marks the object as either unavailable or irreparable:
- An object is unavailable if all of these are true:
- At least one copy of the object is unavailable because of a node, logical volume, or extended storage device being unavailable.
- None of the available copies of the object are good.
- Either the namespace that contains the object is not being replicated, or all copies of the object data on other systems in the replication topology are either inaccessible or not good.
- An object is irreparable if all of these are true:
- All of the primary storage volumes, NFS volumes, and extended storage devices on which copies of the object data are stored are available.
- None of the copies of the object data are good.
- Either the namespace that contains the object is not being replicated, or all copies of the object data on other systems in the replication topology are either inaccessible or not good.
Configuring the Content Verification service
The Content Verification service regenerates cryptographic hash values to detect object corruption. Under certain circumstances, you might want to modify or disable this function to reduce the load on the system.
The circumstances when you might want to modify or disable the Content Verification service from regenerating cryptographic hash values include:
- In a namespace that is not being replicated, has a service plan that sets the ingest tier DPL to 1, and does not define any additional storage tiers, only one copy of each object exists. Therefore, if the Content Verification service discovers a discrepancy in the cryptographic hash values for an object, it cannot repair the object from another copy.
You can choose to have the Content Verification service regenerate hash values only for objects that it could repair, if needed. With this option, the service does not regenerate hash values for objects in a namespace if HCP is configured to maintain only one copy of each object in that namespace.
Although the service cannot repair corrupt objects in this situation, it can report them. For this reason, if performance is not an issue, you might want to keep hash-value regeneration enabled for all objects.
- When the load on the system is high, temporarily disabling all hash-value regeneration can provide some relief.
- In the top-level menu of the System Management Console, click . The Content Verification page opens.
- Select the applicable Content Verification Mode option:
- To configure the Content Verification service to regenerate hash values for all objects stored in the repository, regardless of the number of copies of each object that HCP must maintain in the repository, select Check all objects and repair if needed.
- To configure the Content Verification service to regenerate hash values for a given object only when HCP is required to maintain multiple copies of that object in the repository, select Check only objects that can be repaired and repair if needed.
- To disable the hash-value regeneration function, select Do not check and repair objects.
- Click Update Settings.
If you selected the second or third option in the previous step, a confirm message is displayed. To confirm that you understand the consequences of your actions, select I understand. Then click Update Settings.