Skip to main content
Hitachi Vantara Knowledge

Duplicate Elimination service

Duplicate elimination is the process of merging the data associated with two or more identical objects. For objects to be identical, their data content must match exactly. By eliminating duplicates, HCP increases the amount of space available for storing additional objects.

For example, if the same document is added to several different directories, duplicate elimination ensures that each copy of the document content that HCP must maintain in the repository is stored in only one location. This saves the space that would have been used by the additional copies of the document.

For the purpose of duplicate elimination, HCP treats these as individual objects:

  • Parts of multipart objects
  • Chunks for erasure-coded objects
  • Chunks for erasure-coded parts of multipart objects
  • Full copies of the data for objects and parts that are subject to erasure coding before those copies are reduced to chunks

The Duplicate Elimination service does not merge parts of in-progress multipart uploads, parts of a multipart upload that have been replaced, parts of an aborted multipart upload, or unused parts of completed multipart uploads.

The Duplicate Elimination service runs according to the active service schedule.

Note

The Duplicate Elimination service does not eliminate duplicate objects stored in namespaces that use service plans that have S Series storage devices set as the ingest tier.

Duplicate Elimination service processing

HCP performs duplicate elimination by first sorting objects, parts, and chunks according to their MD5 hash values. After sorting all the objects, parts, and chunks in the repository, the service checks for objects, parts, and chunks with the same hash value. If the service finds any, it compares the object, part, or chunk content. If the content is the same, the service merges the object, part, or chunk data but still maintains the required number of copies of the data that’s specified in the service plan for the namespace that contains the object, part, or chunk.

The metadata for each merged object, part, or chunk points to the merged object, part, or chunk data. The Duplication Elimination service never deletes any of the metadata for duplicate objects, parts, or chunks.

The following figure shows duplicate elimination for two objects with the same content where the DPL is two.

Duplicate Elimination screenshot

These considerations apply:

  • The Duplicate Elimination service does not merge objects, parts, and chunks smaller than seven KB.
  • The Duplicate Elimination service does not merge the data for chunks with the data for objects and parts that are not erasure coded.
  • If the Duplicate Elimination service merges the data for a whole object that is subject to erasure coding and then merges the data for applicable chunk after the object is erasure coded, only the merge of the whole object data is included in the duplicate elimination statistics.
  • The Duplicate Elimination service does not merge data that is stored on extended storage.
  • For objects, parts, and chunks stored on primary running storage, the Duplicate Elimination service generally merges objects, parts, and chunks from different namespaces only if the namespaces have the same ingest tier DPL.
  • For objects, parts, and chunks stored on primary spindown storage, the Duplicate Elimination service generally merges objects, parts, and chunks from different namespaces only if the namespaces have the same primary spindown storage tier DPL.
  • For the purpose of duplicate elimination, HCP considers an object, part, or chunk stored on extended storage to have a DPL that is one less than the ingest tier DPL that’s specified in the service plan for the namespace that contains the object, part, or chunk. So, for example, the Duplicate Elimination service will merge objects, parts, and chunks stored on primary running storage in a namespace that has an ingest tier DPL of 1 with objects stored on extended storage in a namespace that has an ingest tier DPL of 2.
  • The Duplicate Elimination service may bypass merging certain objects until it reprocesses them. This can happen with:
    • Objects stored with CIFS or NFS that are still open because of lazy close
    • Objects stored with CIFS or NFS that do not immediately have MD5 hash values

Understanding the Duplicate Elimination page

The Duplicate Elimination page in the HCP System Management Console shows statistics about duplicate-eliminated objects, parts, and chunks.

NoteTo view the Duplication Elimination Status panel, you need the monitor or administrator role.

To display the Duplicate Elimination page, in the top-level menu of the System Management Console, select Services Duplicate Elimination.

The Duplication Elimination page shows:

  • Total objects and object parts merged

    The total number of these items for which data was merged since HCP was installed: objects, parts of multipart objects, chunks for erasure-coded objects, and chunks for erasure-coded parts of multipart objects.

  • Total bytes saved from duplicate elimination

    The total number of bytes of storage freed due to duplicate elimination since HCP was installed.

    The amount of storage freed when you merge duplicates is the size of the data times one less than the number of objects, parts, and chunks merged, times the total number of copies that HCP needs to maintain on primary storage to comply with the ingest tier DPL and primary spindown storage DPL (if applicable) specified in the applicable service plans and to satisfy all protection set requirements.

HCP increases both of these numbers when duplicate data is deleted but does not subtract from these numbers when duplicate-eliminated objects are deleted from the repository.

 

  • Was this article helpful?