Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Host Based and HCP Workflows

This section describes high level workflows for repository based data protection including tiering to HCP and backup to cloud. These workflows focus on basic data protection scenarios involving files located on an OS Host. For guidance on protecting supported applications, refer to the relevant Protector Application Guide listed in Related documents.

If you encounter problems, please refer to the following:

How to batch backup a file system path to a repository

Before you begin

It is assumed that the following tasks have been performed:

  • The Protector Master software has been installed and licensed on a dedicated node. See Installation Tasks and License Tasks.
  • The Protector Client software has been installed on the source node where the file system path resides.
  • The Protector Client software has been installed on the destination node where the Repository will reside.
  • Permissions have been granted to enable the Protector UI, required activities and participating nodes to be accessed. In this example all nodes will be left in the default resource group, so there is no need to allocate nodes to user defined resource groups. Refer to How to configure basic role based access control.

This task describes the steps to follow when protecting data that resides on a file system, to a repository, using batch mode backup. The data flow and policy are as follows:

Batch Data Flow
GUID-3174639B-4560-45CB-92B6-2378F94431E1-low.png
Path Backup Policy
Classification TypeParametersValue
PathIncludeC:\testdata
Operation TypeParametersValueAssigned Nodes
BackupRPO10 minsRepository
Retention1 hour
Run OptionsRun on RPO

Procedure

  1. Locate the source node in the Nodes Inventory and check that it is authorized and online. This node is where the production data to be backed up resides.

    For a file system backup using a Path classification, a basic OS Host node is required. It is not necessary to create the source node in this case since all Protector client nodes default to this type when installed. See How to authorize a node.
  2. Locate the destination node in the Nodes Inventory and check that it is authorized and online.

    This node is where the repository will be hosted and is identified as the Proxy Node when creating the repository in the next step.
  3. Create a new Repository node using the Repository Storage Node Wizard and check that it is authorized and online.

    The Repository node type is grouped under Storage in the Node Type Wizard. You can direct data from multiple nodes to a single repository so there is no need to create a new repository if a suitable one already exists. See How to add a node, and How to authorize a node.
  4. Define a policy as shown in the table above using the Policy Wizard, Path Classification Wizard and Backup Operation Wizard.

    The Path classification is grouped under Physical in the Policy Wizard. See How to create a policy.
  5. Draw a data flow as shown in the figure above, that shows the OS Host source node connected to the Repository destination node via a Batch mover, using the Data Flow Wizard.

    See How to create a data flow.
  6. Assign the Path-Backup policy to the OS Host source node and the Backup operation to the Repository destination node on the data flow.

    Select the Standard Store Template when assigning the operation to the repository. See How to apply a policy to nodes on a data flow.
  7. Compile and activate the data flow, checking carefully that there are no errors or warnings.

    See How to activate a data flow.
  8. Locate the active data flow in the Monitor Inventory and open its Monitor Details.

    The policy will be invoked automatically to create an initial backup and then repeatedly according to the RPO specified in the policy. The policy can also be manually triggered from the source node in the monitor data flow. See How to trigger an operation from an active data flow.
  9. Watch the active data flow via the Monitor Details to ensure the policy is operating as expected.

    For a healthy data flow you may periodically see:
    • An animated resynchronization icon appear above the batch mover each time the RPO is reached.
    • Transient Node Status icons appear over nodes and associated information messages displayed to the right of the data flow area.
    • Backup jobs appearing in the Jobs area below the data flow that cycle through stages and ending in Progress - Completed.
    • Information messages appearing in the Logs area below the data flow indicating rules activation, replication and resynchronization events.
    For a problematic data flow you may see:
    • Permanent Node Status icons appear over nodes and associated warning messages displayed to the right of the data flow area.
    • Backup jobs appearing in the Jobs area below the data flow that cycle through stages and terminating in Progress - Failed.
    • Warning and error messages appearing in the Logs area below the data flow indicating failed events.
  10. Review the status of the Repository via the relevant Generation 1 Repository Details and the stores via the relevant Gen1 Repository Store Details, to ensure backup snapshots are being created.

    Repositories require ongoing surveillance to ensure that they are operating correctly and sufficient resources are available to store your data securely. See How to view the status of a repository.New snapshots will appear in the Gen1 Repository Store Details periodically as dictated by the RPO of the policy. Old snapshots will be removed periodically as dictated by the Retention Period of the policy. The retention period of individual snapshots can be modified here if required.

How to tier a file system path to HCP via a repository

Before you begin

NoteTiering repository data to HCP is only available when using a generation 1 repository node and a generation 1 HCP node.
NoteIn order to tier data from a repository store to HCP, a new tiering data flow, using a new, unpopulated repository store is required. Adding a tiering mover and HCP node to an existing repository data flow will not work.
It is assumed that the following tasks have been performed:
  • The Protector Master software has been installed and licensed on a dedicated node. See Installation Tasks and License Tasks.
  • The Protector Client software has been installed on the source node where the file system path resides.
  • The Protector Client software has been installed on the destination node where the Repository will reside.
  • HCP generation 1 node has been set up as per the Protector requirements and prerequisites. Refer to Generation 1 Hitachi Content Platform prerequisites.
  • Permissions have been granted to enable the Protector UI, required activities and participating nodes to be accessed. In this example all nodes will be left in the default resource group, so there is no need to allocate nodes to user defined resource groups. Refer to How to configure basic role based access control.

This task describes the steps to follow when tiering data that resides on a file system, to HCP. These files are first ingested by a repository, using batch mode backup, and then immediately moved from the repository to an associated namespace within a tenant on HCP. Once data is tiered to HCP, the repository retains only the metadata describing the backed up files system. The data tiered to HCP can be located and restored by following the same workflow as that used for restoring repository data (see How to restore a repository snapshot of a file system path). The data flow and policy are as follows:

Tiering Data Flow
GUID-F5195D36-5AC6-412E-AF8C-0D4BB0804F60-low.png

If you need frequent backups available in a local repository as well as long term backups on HCP, then implement a repo-to-repo data flow (see How to backup an onsite repository to an offsite repository) and tier to HCP from the second repository. The first repository will hold frequent local backups while the second manages long term retention on HCP.

Path Backup Policy
Classification TypeParametersValue
PathIncludeC:\testdata
Operation TypeParametersValueAssigned Nodes
BackupRPO1 WeekRepository (this controls the retention of data on HCP)
Retention5 Years
Run OptionsRun on RPO
TierNoneN/AHitachi Content Platform

Procedure

  1. Locate the source node in the Nodes Inventory and check that it is authorized and online. This node is where the production data to be backed up resides.

    For a file system backup using a Path classification, a basic OS Host node is required. It is not necessary to create the source node in this case since all Protector client nodes default to this type when installed. See How to authorize a node.
  2. Locate the intermediate node in the Nodes Inventory and check that it is authorized and online.

    This node is where the repository will be hosted and is identified as the Proxy Node when creating the repository in the next step.
  3. Create a new generation 1 Repository node using the Repository Storage Node Wizard and check that it is authorized and online.

    The Repository node type is grouped under Storage in the Node Type Wizard. You can direct data from multiple nodes to a single repository so there is no need to create a new repository if a suitable one already exists. See How to add a node, and How to authorize a node.
  4. Create a new generation 1 Hitachi Content Platform node using the Hitachi Content Platform Storage Node Wizard and check that it is authorized and online.

    The Hitachi Content Platform node type is grouped under Storage in the Node Type Wizard. You can direct data from multiple repository stores to a single HCP node (each repository store maps to a separate namespace within the HCP tenant), so there is no need to create a new HCP node if a suitable one already exists.
    NoteEach HCP node in Protector represents a single tenant, so if you need to strictly segregate repository data then create separate HCP nodes for each tenant and consider placing them in separate RBAC resource groups.
    See How to add a node, and How to authorize a node.
  5. Define a policy as shown in the table above using the Policy Wizard, Path Classification Wizard, Backup Operation Wizard and Tier Operation Wizard.

    The Path classification is grouped under Physical in the Policy Wizard. See How to create a policy.
  6. Draw a data flow as shown in the figure above, using the Data Flow Wizard, that shows the OS Host source node connected to the Repository intermediate node via a Batch mover, then to the Hitachi Content Platform destination node via a second Batch mover.

    See How to create a data flow.
  7. Assign the Path-Backup-Tier policy to the OS Host source node, the Backup operation to the Repository node and the Tier operation to the HCP node on the data flow.

    Select the Standard Store Template when assigning the operation to the repository. There is no value in selecting a template that performs source-side or repository-side deduplication in this situation. See How to apply a policy to nodes on a data flow.
  8. Compile and activate the data flow, checking carefully that there are no errors or warnings.

    See How to activate a data flow.
    NoteDo not deactivate tiering data flows unless they are longer required. Subsequent reactivation will force the source and repository to undergo resynchronization, leading to all files being re-tiered to HCP.
  9. Locate the active data flow in the Monitor Inventory and open its Monitor Details.

    The policy will be invoked automatically to create an initial backup and then repeatedly according to the RPO specified in the policy. The policy can also be manually triggered from the source node in the monitor data flow. See How to trigger an operation from an active data flow.
  10. Watch the active data flow via the Monitor Details to ensure the policy is operating as expected.

    For a healthy data flow you may periodically see:
    • An animated resynchronization icon appear above the batch mover into the repository each time the RPO is reached.
    • An animated tiering icon appear above the batch mover into the HCP node each time the repository tiers data.
    • Repository Statistics - Queues - Tier values changing, indicating objects queued and actively being tiered to HCP.
      TipCheck the tier queue if RPO is not being met for tiered data flows.
    • Transient Node Status icons appear over nodes and associated information messages displayed to the right of the data flow area.
    • Network/Cache Utilization fluctuations within normal limits if large amounts of data are being backed up to the repository.
    • Backup jobs appearing in the Jobs area below the data flow that cycle through stages and ending in Progress - Completed. Note there is no Tiering job since this process takes place on an ad hoc basis.
    • Information messages appearing in the Logs area below the data flow indicating rules activation, HCP namespace creation, resynchronization and ingestion throttling events.
    For a problematic data flow you may see:
    • Permanent Node Status icons appear over nodes and associated warning messages displayed to the right of the data flow area.
    • Local/Remote Memory Cache constantly at excessively high levels if large amounts of data are being backed up, indicating data transfer issues.
    • Backup jobs appearing in the Jobs area below the data flow that cycle through stages and terminating in Progress - Failed.
    • Warning and error messages appearing in the Logs area below the data flow indicating failed events.
  11. Review the status of the Repository via the relevant Generation 1 Repository Details and the stores via the relevant Gen1 Repository Store Details, to ensure backup snapshots are being created. Also monitor the health of HCP via its Tenant Management Console, especially Namespaces - Usage.

    The UUID of the repository store (used to name the corresponding HCP namespace) can be found in the Gen1 Repository Store Details.

    Repositories require ongoing surveillance to ensure that they are operating correctly and sufficient resources are available to store your data securely. See How to view the status of a repository. A repository store that has been completely tiered to HCP will report a zero size.

    The space reported by HCP to tier a repository may appear much larger than that reported by the source node's file system. HCP allocation size is 8KB minimum per object, and each file requires at least 2 HCP objects (one per stream). When tiering many small files, the size reported HCP usage will appear larger than expected. Add to this the fact that HCP will create 2 or more replicas depending on the DPL setting.

    New snapshots will appear in the Gen1 Repository Store Details periodically as dictated by the RPO of the policy. Old snapshots will be removed periodically as dictated by the Retention Period of the policy. The retention period of individual snapshots can be modified here if required.

How to restore a repository snapshot of a file system path

Before you begin

It is assumed that a file system path policy that creates repository snapshots has been implemented and that at least one snapshot has been created in the designated repository store. See How to batch backup a file system path to a repository for an example of how to do this.

This task describes the steps to follow when restoring a file system path snapshot from a repository store to a node other than the one from which the backup originated, as shown:

Procedure

  1. Identify the destination where the data set is to be restored. Here we will restore the snapshot to a different machine to the one from which the data originated.

    Depending on the scenario, you can restore data to its original node and directory, its original node in a different directory or to a different node entirely.
  2. Ensure that the restore location is prepared to receive the restored data set by locating the node in the Nodes Inventory and checking it is authorized and online.

    The restore location must have Protector Client software installed. We will assume that no applications are accessing the restore location since the restored data doesn't yet exist.
  3. In this example we will assume that there are no backup policies currently active on the location where the data set is being restored, so there is no need to suspend any data flows. The existing backup policy can continue to run while we perform our restore.

  4. Locate the data set to be restored by navigating to the Repository Snapshot Details (Storage) - File System for the repository store snapshot in question.

    See How to view the contents of a snapshot in a repository store.
  5. Check that the target restore location (identified in the previous steps) has enough free space to accommodate the restored snapshot.

    The Logical Size of the snapshot is shown in the Analysis Details area of the Repository Snapshot Details (Storage) - File System. It may be necessary to click Analyze if these details have not yet been evaluated.
  6. Click Restore Snapshot to open the Restore Repository Snapshot Wizard - File System.

    CautionThe process of restoring data may result in the overwriting some of the original data that exists on the restore location.

    Ensure that any critical data is copied to a safe location or is included in the data set being restored.

    The Restore Repository Snapshot Wizard - File System - Select restore options page provides numerous File Name Collision Policy options to help manage potential file overwrite situations.

    In this example we will restore to a different destination node but use the original file paths. No routing is required since the Repository and Restore target are connected to the same LAN.
    1. From the Restore Repository Snapshot Wizard - File System, choose whether to restore the Entire Snapshot or a User Selection of files. Click Next.

    2. If restoring a User Selection, select the files and folders to be restored. Click Next.

    3. Set the Destination Node to the one identified in the previous steps above.

    4. Set Restore To to Original Location so that the files are placed on the same path as the originals.

    5. Set File Name Collision Policy to Rename any colliding files so that any existing files of the same name are preserved.

      This has no effect when initially creating the restored files in a new location, but if the restore is repeated then it will preserve any existing files from previous restore jobs.
    6. Review the restore options carefully to ensure that everything has been specified correctly, then click Finish to initiate the restore job.

      A Processing message will appear briefly, then the wizard will close and the Jobs Inventory will be displayed. A new Restore Job will appear at the top of the Jobs list, with the Progress entry initially indicating processing and finally indicating successful completion.
  7. Once the restore process is complete, further steps may be needed to fix-up the data set before using it. In this example we will assume that no additional work is required other than inspecting the restored data on the target machine.

    The amount of fix-up work required depends on the applications accessing the restored data.
    NoteThis example restores data created using a Path classification. If you are backing up one of the application types directly supported by Protector, then you should use one of the Application classifications and refer to the appropriate Application Guide listed in Related documents).
  8. Restart any applications that access the restored data.

    For supported applications, these additional steps are described in the appropriate Application Guide (see Related documents). For other applications, consult the vendor's documentation.
  9. Resume any backup policies for the restored data set. If you have restored data to a new location for repurposing (test and development work for example), you should consider if it is necessary to implement a new backup policy to protect this new instance.

    Data flows can be reactivated via the Data Flows Inventory.

How to backup an onsite repository to an offsite repository

Before you begin

NoteWhen performing repository to repository backups repository generations can't be mixed.

It is assumed that the following tasks have been performed:

  • The Protector Master software has been installed and licensed on a dedicated node. See Installation Tasks and License Tasks.
  • The Protector Client software has been installed on the source node where the file system path resides.
  • The Protector Client software has been installed on the destination nodes where the repositories will reside.
  • Permissions have been granted to enable the Protector UI, required activities and participating nodes to be accessed. In this example all nodes will be left in the default resource group, so there is no need to allocate nodes to user defined resource groups. Refer to How to configure basic role based access control.

This task describes the steps to follow when protecting data backed up to a primary repository by copying it to a secondary repository. The secondary repository would normally be located offsite to provide protection from local catastrophic failures. The data flow and policy are as follows:

Repository to Repository Data Flow
GUID-B301FE2B-F266-4689-981A-76D68D014AC4-low.png
Repository Backup Policy
Classification TypeParametersValue
PathIncludeC:\testdata
Operation TypeParametersValueAssigned Nodes
Onsite Daily (Backup)RPO1 dayOnsite Repository
Retention1 week
Run OptionsRun on RPO
Onsite Weekly (Backup)RPO1 weekOnsite Repository
Retention6 months
Run OptionsRun on RPO
Offsite Daily (Backup)RPON/AOffsite Repository
Retention1 week
Run OptionsRun on completion of operation Onsite Daily
Offsite Weekly (Backup)RPON/AOffsite Repository
Retention6 months
Run OptionsRun on completion of operation Onsite Weekly

Procedure

  1. Locate the source node in the Nodes Inventory and check that it is authorized and online. This node is where the production data to be backed up resides.

    See How to authorize a node.
  2. Locate the destination nodes in the Nodes Inventory and check that they are authorized and online.

    These nodes are where the onsite and offsite repositories will be hosted and are identified as the Proxy Node when creating the repositories in the next step.
  3. Create two new Repository nodes using the Repository Storage Node Wizard and check that they are authorized and online.

    You can direct data from multiple nodes to a single repository so there is no need to create new repositories if suitable ones already exists. See How to add a node, and How to authorize a node.
  4. Define a policy as shown in the table above using the Policy Wizard, Path Classification Wizard and Backup Operation Wizard.

    NoteBoth onsite operations are triggered according to their RPOs; the offsite operations are then triggered on completion of their respective onsite operations. The RPOs for the offsite operations do not need to be specified; they have no effect.
    See How to create a policy.
  5. Draw a cascaded data flow as shown in the figure above, that shows the OS Host source node connected to the Onsite Repository and then to the Offsite Repository destination nodes via Batch movers, using the Data Flow Wizard.

    See How to create a data flow.
  6. Assign the Repository-Backup policy to the OS Host source node, both Onsite Backup operations to the Onsite Repository and both Offsite Backup operations to the Offsite Repository destination nodes on the data flow.

    If using generation 1 repositories select the Standard Store Template when assigning the operation to the repositories. See How to apply a policy to nodes on a data flow.
  7. Compile and activate the data flow, checking carefully that there are no errors or warnings.

    See How to activate a data flow.
  8. Locate the active data flow in the Monitor Inventory and open its Monitor Details.

    The policy will be invoked automatically to create an initial backup and then repeatedly according to the RPO specified in the policy. The policy can also be manually triggered from the source node in the monitor data flow. See How to trigger an operation from an active data flow.
  9. Watch the active data flow via the Monitor Details to ensure the policy is operating as expected.

    NoteIf the onsite repository contains a large amount of backup data and/or the network connection between the onsite and offsite repositories has limited bandwidth, the process of synchronizing the two repositories can be expedited by seeding the offsite repository as described in How to seed an offsite repository.
  10. Review the status of the Repositories via the relevant Generation 1 Repository Details and the Stores via the relevant Gen1 Repository Store Details, to ensure backup snapshots are being created.

    New snapshots will appear in the Gen1 Repository Store Details periodically as dictated by the RPO of the policy. Old snapshots will be removed periodically as dictated by the Retention Period of the policy. The retention period of individual snapshots can be modified here if required.

How to seed an offsite repository

Before you begin

It is assumed that you have an onsite repository containing a large amount of backup data, and you want to create a secondary offsite backup of this data. You may have attempted to set up an offsite data flow as described in How to backup an onsite repository to an offsite repository, but data volume and/or bandwith restrictions between sites are preventing the initial synchronization from completing in an acceptable timeframe.

Ensure that you have a removable disk available with enough spare capacity to hold the onsite repository's data files. This disk will be required to physically transport the data to the offsite location.

The initial synchronization can be completed more rapidly by seeding the offsite repository store with data from the onsite store. Once seeded, the offsite repository may request additional, lower volume, differential updates to ensure the two sites are completely synchronized. The process, known as seeding is performed as follows:

Procedure

  1. In the Monitor Details for the relevant data flow, resynchronise the onsite repository by selecting the source nodes feeding into it, clicking Trigger Operation then selecting the operations that backup to the onsite repository.

  2. Wait for the onsite repository to become synchronised with the source machine(s) at the primary site by confirming that the corresponding backup job has completed.

  3. For both the onsite and offsite repository, note down the Mount Directory displayed in the Configuration area of the Generation 1 Repository Details.

  4. For both the onsite and offsite repository, unmount the repository by clicking Unmount.

    The Offline status icon GUID-7C18CB49-A428-4F78-A99F-17F5C7C4A506-low.png will appear on the respective repository tiles in the Storage Inventory when the stores have unmounted.
  5. Using Windows File Explorer, navigate to the repository root directory for the onsite repository (noted above) and copy the folder to a removable disk, then safely eject the disk from the machine.

  6. For the onsite repository only, navigate to the Generation 1 Repository Details and mount the repository by clicking Mount.

    The Online status icon GUID-548F548E-D971-477B-8655-0F8DB9F38C85-low.png will appear on the onsite repository tile in the Storage Inventory when the stores have mounted. The onsite repository will now resume backup of your source nodes according to the onsite policies on force.
  7. Take the removable disk containing the copy of the onsite repository to the remote site, insert it into the machine hosting the offsite repository and replace the files in the directory of the offsite repository (noted above) with those on the removable disk.

    Physically transferring the data between the onsite and offsite nodes circumvents the bandwidth restrictions of the network between the two sites.
  8. For the offsite repository, navigate to the Generation 1 Repository Details and mount the repository by clicking Mount.

    The Online status icon GUID-548F548E-D971-477B-8655-0F8DB9F38C85-low.png will appear on the offsite repository tile in the Storage Inventory when the stores have mounted.
  9. In the Monitor Details for the relevant data flow, resynchronise the repositories by selecting the onsite repository, clicking Trigger Operation then selecting the operations that backup to the offsite repository.

    Because the seeding process has been performed, only a minimal amount of data transfer will be required between the two sites.

 

  • Was this article helpful?