Skip to main content
Hitachi Vantara Knowledge

Protection service

The Protection service ensures the stability of the repository by maintaining a specified level of data redundancy, called the data protection level (DPL), for each object in the repository throughout the entire object lifecycle. The DPL for an object is the number of copies of the object data that HCP must maintain.

For the purpose of data protection, HCP treats these as individual objects:

  • Parts of multipart objects
  • Parts of in-progress multipart uploads
  • Chunks for erasure-coded objects
  • Chunks for erasure-coded parts of multipart objects

Each namespace has a service plan that defines both a storage tiering strategy and a data protection strategy for the objects in that namespace. For all objects in a given namespace, the storage tiering strategy defines one or more types of storage as tiers. The data protection strategy specifies the DPL that’s applied to the objects that are stored on each tier.

At any given point in the lifecycle of an object, the data protection strategy specifies the number of copies of the object that must exist in the HCP repository and the storage tier on which each copy must be stored.

HCP initially stores all object data in either primary running storage or S Series storage and all metadata on primary running storage. Therefore, the service plan for a namespace must always define either primary running storage or S Series storage as the initial storage tier, called the ingest tier, and must specify both the data protection level and the metadata protection level (MPL) for the that tier.

For each object in a given namespace, the ingest tier DPL is the number of copies of the object data that HCP must maintain on primary running storage or S Series storage, as applicable, from the time the object is first stored in the repository until the time the object data is moved to another storage tier. The ingest tier MPL is the number of copies of the object metadata that HCP must maintain on primary running storage for as long as the object exists in the repository.

On SAIN and VM systems, by default, the ingest tier DPL and MPL are both set to one. On RAIN systems, by default, the ingest tier DPL and MPL are both set to two. At any time, you can modify the service plan for a namespace to set the ingest tier DPL and MPL for that namespace.

For any given namespace, you can assign a service plan that will give the namespace a DPL setting of one (supported on SAIN and VM systems only), two, three, or four. You can also set the ingest tier MPL to one, two, three, or four. However, the ingest tier MPL for a namespace must be equal to the ingest tier DPL for that namespace.

HCP uses the Protection service to maintain the correct number of copies of each object in the HCP repository. When the number of existing copies of an object goes below the number of object copies specified in the applicable service plan (for example, because of a logical volume failure), the Protection service automatically creates a new copy of that object in another location. When the number of existing copies of an object goes above the number of object copies specified in the applicable service plan, the Protection service automatically deletes all unnecessary copies of that object.

The Protection service runs according to the active service schedule and in response to certain events.

Ingest tier data protection level

Each namespace has a service plan that defines one or more storage tiers for that namespace and specifies the data protection level (DPL) that’s applied to the objects that are stored on each tier.

NoteFor the purpose of DPL, HCP treats parts of multipart objects, parts of multipart uploads, chunks for erasure-coded objects, and chunks for erasure-coded parts of multipart objects as individual objects.

Every service plan defines primary running storage or S Series storage as the initial storage tier, called the ingest tier, and specifies a DPL setting and an MPL setting for that tier.

For each object in a given namespace, the ingest tier DPL is the number of copies of the object data that HCP must maintain on primary running storage or S Series storage, as applicable, from the time the object is first stored in the repository until the time the object data is moved to one or more other storage tiers (if multiple storage tiers are defined for the namespace). The ingest tier MPL is the number of copies of the object metadata that HCP must maintain on primary running storage for as long as the object exists in the repository.

In the default namespace, each directory also has an ingest tier DPL setting. This setting is the same as the ingest tier DPL setting that’s specified in the service plan that’s assigned to the default namespace.

The ingest tier DPL for a namespace affects the amount of storage that’s used when data is added to that namespace. With an ingest tier DPL of 1, HCP creates only one copy of the object data on primary running storage or S Series storage, as applicable. With an ingest tier DPL of 2, HCP creates two copies, thereby using twice as much storage.

For both objects and directories, the ingest tier DPL setting is stored as metadata. Users and applications can see, but not modify, this metadata.

NoteWhen the ingest tier DPL of a namespace changes, for each object in that namespace that’s stored on the ingest tier, HCP creates or deletes copies of the object data, as needed to satisfy the new ingest tier DPL. This can take some time, during which some objects have the old required number of copies and some have the new. When viewing object metadata, however, users and applications always see the intended number of copies (that is, the ingest tier DPL specified in the service plan for the namespace).
Protection sets

HCP groups storage nodes into protection sets with the same number of nodes in each set. To improve reliability in the case of multiple component failures, HCP tries to store all the copies of the data for an object that exist on primary running storage or primary spindown storage on nodes in a single protection set. Each copy is stored on a logical volume associated with a different node.

HCP creates protection sets for each possible ingest tier DPL setting that can be specified in a service plan. For example, if an HCP system has six nodes, it creates three groups of protection sets:

  • One group of six protection sets with one node in each set (for DPL 1)
  • One group of three protection sets with two nodes in each set (for DPL 2)
  • One group of two protection sets with three nodes in each set (for DPL 3)

For each object in a given namespace, to store copies of the object data on primary running storage, HCP uses the group of protection sets that corresponds to the ingest tier DPL setting that’s specified in the service plan for the namespace. To store copies of the object data on primary spindown storage (if it’s used), HCP uses the group of protection sets that corresponds to the primary spindown storage tier DPL setting.

The nodes in a protection set are not necessarily all associated with the same amount of storage. If the total number of storage nodes in the system is not evenly divisible by a DPL setting, HCP can use the storage associated with the extra nodes as standby storage. At any time, HCP can add standby storage to any existing protection set that requires additional storage to balance available storage capacity among its nodes.

The Protection Service is responsible for checking and repairing protection sets. If a node in a protection set fails and the system includes an extra node, the service creates a new protection set that includes all the healthy nodes in the original protection set and the extra node.

NoteRegardless of whether HCP uses the storage associated with a node that’s not in a protection set, the node itself runs all the HCP software and performs all the same functions as the nodes in protection sets.
Data availability

When HCP needs to maintain multiple copies of the data for an object on primary running storage or on primary spindown storage, HCP stores each copy of the object data on storage that’s managed by a different node. All but one of these copies can become unavailable without affecting access to the object.

Copies of object data become unavailable on primary running storage or primary spindown storage when HCP detects an improperly functioning logical volume or corrupted or missing data. Copies of the object data also become unavailable if the nodes that provide access to those copies become unavailable. A data outage occurs when all the nodes that provide access to all the copies of the data for an object fail.

Protection service processing

The Protection service has two main functions: detecting protection violations and repairing those violations.

Detecting protection violations

To detect protection violations, the Protection service checks that for each object in a given namespace, at any given point in the object lifecycle:

  • The total number of existing copies of object data is equal to the total number of copies of object data that are currently required to exist on all of the storage tiers defined for the namespace by its service plan
  • If copies of the object data are stored on primary running storage or primary spindown storage:
    • Each copy of the object data is stored on a different node
    • All copies of the object data are stored in the same protection set
    • Each copy of the object data is accessible

A violation occurs when any one of these conditions is not true.

Repairing protection violations

The Protection service can repair certain protection violations for an object, usually by relying on other good copies of the object data stored in the HCP repository.

For each object in a given namespace, at any given point in the object lifecycle:

  • If the total number of existing copies of the object data is less than the total required number of copies that’s specified in the namespace service plan (for example, because of a logical volume failure on primary running storage), then on each storage tier that’s defined for the namespace, the Protection service creates the number of copies of the object data that’s required to bring the object into compliance with the namespace service plan.
    • If one or more copies of the object data are supposed to be stored on a tier that’s currently inaccessible (for example, due to a failed network connection), but rehydration is enabled for that tier, the Protection service creates an extra copy of the object data on primary running storage.
    • For objects stored on primary storage, if the repository contains fewer than the required number of copies of the object data for a set of duplicate-eliminated objects, then for each object, the Protection service creates enough additional copies of the object data on primary storage to:
      • Satisfy the ingest tier DPL and, if applicable, the primary spindown storage tier DPL specified in the service plan for the namespace that contains the object
      • Comply with the protection set requirements for the applicable ingest tier and primary spindown storage tier DPL settings

        The Duplicate Elimination service then merges the object data again the next time it runs.

  • If the total number of existing copies of the object data is greater than the total required number of copies that’s specified in the namespace service plan, then the Protection service deletes the correct number of copies of the object data from each storage tier in order to bring the object into compliance with the namespace service plan.

    An object can have an extra copy of its data if the object was rehydrated after a read from primary spindown storage (if it’s used) or from any extended storage tier that’s defined for the namespace that contains the object. Copies of objects on primary running storage that are supposed to be metadata-only can have data if they were rehydrated after a read from a remote system. The Protection service marks rehydrated object data for deletion only after the rehydration keep time has expired and only if another copy of the data exists.

    The Protection service may determine that it should mark object data on primary spindown storage or on extended storage for deletion when a rehydrated copy of that data exists on primary running storage. In this case, before marking the copy on primary spindown storage or extended storage for deletion, the Protection service checks the service plan for the applicable namespace to determine whether the object is supposed to be moved back onto the applicable storage tier. If the object is supposed to be moved back onto the applicable storage tier, the Protection service doesn’t mark the copy that’s currently on that storage tier for deletion.

  • On primary storage, if two copies of the data for an object are stored on the same node, the Protection service creates a new copy on a different node and marks the extra one in the first location for deletion.
  • On primary running storage, primary spindown storage, or NFS storage, if a logical volume has a copy of the secondary metadata for an object but no copy of the object data with that metadata, the Protection service creates a replacement copy of the object data on that volume.

    If replication is in effect and the Protection service cannot find a copy of the object data on the current system, it can repair the object by using a copy from another HCP system in the replication topology.

    To repair a chunk for an erasure-coded object, the Protection service recalculates the chunk either by using a full copy of the object data, if one exists on another system in the replication topology, or by using the chunks for the object on all the other systems in the replication topology.

  • For an object that’s stored on primary running storage or primary spindown storage, if fewer than the required number of copies of the object data are accessible on the nodes in a protection set, the Protection service first tries to increase the number of copies stored on those nodes. If the Protection service cannot create all the required copies of the object data on the nodes in the protection set (for example, because a node is unavailable), the service tries to put the required number of copies on the nodes in a different protection set. If the service cannot put all required copies of the object data on nodes in the same protection set, the service stores the copies on different nodes in different protection sets.
Unavailable and irreparable objects

When the Protection service cannot repair a violation, it marks the object as either unavailable or irreparable:

  • An object is unavailable if all of these are true:
    • At least one copy of the object data is unavailable due to a node, logical volume, or extended storage device being unavailable.
    • None of the available copies of the object data are good.
    • Either the namespace that contains the object is not being replicated, or all copies of the object data on other systems in the replication topology are either inaccessible or not good.
  • An object is irreparable if all of these are true:
    • All of the primary storage volumes, NFS volumes, and extended storage devices on which copies of the object data are stored are available.
    • None of the copies of the object data are good.
    • Either the namespace that contains the object is not being replicated, or all copies of the object data on other systems in the replication topology are either inaccessible or not good.

Protection service triggers

In addition to running according to the service schedule, the Protection service runs in response to certain events. In these cases, the service does a full run (that is, it examines every object in the repository regardless of the schedule and regardless of whether the object data is stored on primary running storage, primary spindown storage, or extended storage).

Events that trigger a Protection service run are:

  • Node shutdown

    When a node becomes unavailable, HCP triggers the Protection service after waiting 90 minutes to ensure that the node is not just temporarily unavailable.

  • Logical-volume failure

    When HCP determines that a local logical volume is broken, it triggers the Protection service after waiting one minute to ensure that the volume is not just temporarily unavailable.

  • Node removal

    When a node is removed from the HCP system, HCP triggers the Protection service after waiting ten minutes to ensure that the node removal is permanent.

NoteWhen the Protection service is disabled, its scheduled runs are canceled. However, the Protection service still runs in response to the triggers listed above unless all of these conditions are true:
  • None of the tenants or namespaces on the HCP system are being replicated.
  • All existing service plans are configured to set the ingest tier DPL to (1) one.
  • If the HCP system is configured to use spindown storage, all existing service plans set the primary spindown storage tier DPL to 1 (one).

 

  • Was this article helpful?