Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Disaster recovery

You can use the disaster recovery overview to prepare volumes and groups for disaster recovery.

Disaster recovery overview

Preparing for disaster recovery involves the following major steps:

  1. Identify the volumes and groups that contain important files and data for disaster recovery.
  2. Create TrueCopy pairs, paying special attention to the options in P-VOL Fence Level Settings to ensure that the system responds the way you want in the event of a failure (see Allowing I/O to the P-VOL after a split: Fence Level options).
  3. Install and configure host failover software between the primary and secondary sites.
  4. Establish file and database recovery procedures. These procedures for recovering volumes due to control unit failure must already be in place.
  5. Make sure that the host system at the primary site is configured to receive sense information from the primary storage system (for example, using SNMP). This must also be done at the secondary site if a host is connected to it.
NoteProcedures for disaster recovery involve releasing pairs. However, when using CCI you can perform disaster recovery without releasing pairs. To do this, when setting up TrueCopy, add remote paths between the secondary system and primary system. For VSP 5000 series, connect the Bidirectional port in the secondary storage system and the Bidirectional port in the primary storage system via a remote path in advance. Then add a remote connection from the secondary system CU to the primary system CU. Use the same path group ID as you used from the primary to secondary system connection.

Remote copy and disaster recovery procedures are complex. Consult customer support on sense-level settings and recovery procedures.

Sense information shared between sites

When the primary system splits a TrueCopy pair due to an error condition, the primary and secondary systems send sense information with unit check status to the appropriate hosts. This sense information is used during disaster recovery to determine the consistency of the S-VOL and must be transferred to the secondary site using the host failover software.

File and database recovery

File recovery procedures for disaster recovery should be the same as those used for recovering a data volume that becomes inaccessible due to control unit failure.

TrueCopy does not provide a procedure for detecting and retrieving lost updates. To detect and recreate lost updates, you must check other current information (for example, database log file) that was active at the primary system when the disaster occurred.

The detection and retrieval process can take some time. Your disaster recovery scenario should be designed so that detection and retrieval of lost updates is performed after the application has been started at the secondary site.

You should prepare for file and database recovery using files for file recovery (for example, database log files that have been verified as current).

Switching operations to the secondary site

If a disaster or failure occurs at the primary site, the first disaster recovery activity is to switch your operations to the secondary site. S-VOLs are recovered individually based on the pair status and P-VOL fence level information for each pair.

You can switch operations to the secondary site either by deleting pairs and then re-establishing them when recovery is completed, or by not deleting pairs. Both methods are presented below.

Switching operations to the secondary site by deleting pairs

  1. Check the pair status and fence level of each S-VOL.

  2. Analyze the consistency of the S-VOLs, based on pair status and Primary Volume Fence Level setting in the Create TC Pairs window. See Checking S-VOL consistency with the P-VOL.

    You can perform this task using the pairdisplay command of CCI.
  3. Perform file recovery as needed.

  4. Split all pairs from the secondary system using one of the following:

    • CCI pairsplit command
    • HDvM - SN Split Pairs window.
  5. Release all pairs using one of the following:

    • CCI pairsplit -S command
    • HDvM - SN Delete Pairs window.
    NoteWhen the S-VOL is no longer paired, it cannot be distinguished it from a non-TrueCopy volume. Use the appropriate means to change the S-VOL volume labels.
  6. Complete file recovery procedures.

  7. Vary the S-VOLs online.

  8. At the secondary site, start critical host operations, with the previous S-VOLs now the P-VOLs.

Switching operations to the secondary site by not deleting pairs

  1. Record the pair status and fence level of each S-VOL.

  2. Analyze the consistency of the S-VOLs, based on pair status and the Primary Volume Fence Level setting in the Create TC Pairs window. See Checking S-VOL consistency with the P-VOL.

  3. Perform file recovery as needed.

  4. Run the CCI horctakeover or pairsplit -RS command on the S-VOL.

    If a failure occurs after the one volume capacity of a TC pair can be expanded, the swap resync operation of the TC pair cannot be performed because the capacity of both the volumes is not the same. Make sure to expand the other volume capacity so that the capacity of both the volumes is the same, and then retry the operation.
  5. Complete file recovery procedures.

  6. Vary the S-VOLs online.

  7. At the secondary site, start critical host operations, with the previous S-VOLs now the P-VOLs.

Checking S-VOL consistency with the P-VOL

An S-VOL's consistency refers to whether S-VOL data is identical to data in the P-VOL. This is dependent on your Fence Level setting, which determines whether data is copied to the P-VOL if an error occurs during an update to the S-VOL.

The following table shows S-VOL consistency information, based on Device Manager - Storage Navigator pair status and the P-VOL fence level setting.

S-VOL status

Split type

Fence level

Consistency of S-VOL

Device Manager - Storage Navigator

CCI

BCM

HDvM - SN

CCI

Unpaired volume

SMPL

SIMPLEX

--

Data, Status, Never

data, status, never

Not consistent. The S-VOL does not belong to a pair. Even if you have created a pair using this volume, if the pair status is still SMPL, you must regard its data as not consistent with the P-VOL.

COPY

COPY

PENDING

--

Data, Status, Never

data, status, never

Not consistent. The S-VOL is not synchronized because not all tracks have been copied from the P-VOL yet. This S-VOL must be initialized (or copied from the P-VOL at a later time).

PAIR

PAIR

DUPLEX

--

Data, Status

data, status

Consistent. The S-VOL is synchronized with its P-VOL.

Never

never

Needs to be analyzed. The S-VOL requires further analysis to determine its level of consistency.

PSUE

PSUE

SUSPER(50)

Initial copy failed

Data, Status, Never

data, status, never

Not consistent. The S-VOL is not synchronized because not all tracks have been copied from the P-VOL yet. The S-VOL must be initialized (or copied from the P-VOL at a later time).

PSUS

PSUS

SUSPOP(04)

S-VOL by operator

Data, Status, Never

data, status, never

Suspect. The S-VOL is not synchronized with its P-VOL if any write I/Os were issued to the P-VOL after the pair was split. The pair must be released and restarted using Entire Volume for the Initial Copy Type option. If you are sure that no data on the P-VOL changed, you can use None for Initial Copy Type.

PSUS or PSUE

PSUE

SUSPOP/SUSPER(all other types)

All other types

Data

data

Consistent. The S-VOL is synchronized with its P-VOL.

Status, Never

status, never

Suspect. The S-VOL is not synchronized with its P-VOL if any write I/Os were issued to the P-VOL after the pair was split. Restore the consistency of the S-VOL and update it, if required. The time of suspension indicated on the Last Update Time field of the Detailed Information dialog box (MCU SVP time) will help to determine the last time the S-VOL was updated.

Legend:

Data: Data in the secondary volume

Status: Status of the secondary volume

For pairs whose P-VOL fence level in HDvM - SN is Never, or for pairs whose output results of the pairdisplay command for Fence in CCI is never, further analysis is required to determine the S-VOL consistency. This can be determined by using sense information transferred by host failover, or by comparing the contents of the S-VOL with other files that are confirmed to be consistent (for example, database log files). The S-VOLs should be recovered using the files that are confirmed to be consistent.

Note: Actual data recovery must be done using recovery point data in the database operation log.

Transferring operations back to the primary site

When host operations are running at the secondary site, the primary site must be restored and operations transferred back.

Create a TrueCopy pair by specifying secondary site volume to primary volume and primary site volume to secondary volume.

Select the appropriate procedure below based on whether you deleted pairs to switch operations to the secondary site, or ran the CCI horctakeover or pairsplit -RS to the S-VOL command.

Transferring operations back to the primary site if pairs were deleted

  1. At the primary site, bring up the host. Make sure that TC components are operational.

  2. At the primary system, split all pairs on the primary system.

    The Delete Pair by Force option of the Force Delete Pairs (TC Pairs) window of HDvM - SN must be used because the paired S-VOLs are in the SMPL state at the secondary site.
  3. At the primary system, delete the TC association with the secondary systems (Remove Remote Connections).

    In Device Manager - Storage Navigator, connect to each primary system to make sure that all secondary systems are deleted.
  4. (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) At the primary and secondary systems, change path and port settings.

    • To use the same switches, change the operating mode to the opposite direction.
    • To use the same extenders, change the operating mode to the opposite direction. The boxes/nodes connected to the primary system must be set to channel-mode, and the boxes/nodes connected to the secondary systems must be set to device-mode.
  5. (VSP 5000 series) At the secondary system, check that it is ready to create TrueCopy pair.

    (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) At the secondary site, set TrueCopy operations in the reverse direction.
  6. (VSP 5000 series) At the secondary system, create TrueCopy pair and synchronize S-VOL with P-VOL.

    (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) At the secondary site, create a TC pair in the reverse direction, and synchronize the old P-VOL with the S-VOL. Make sure to use Entire Volume for the Initial Copy Type option in HDvM - SN, or execute the paircreate command in CCI without specifying the -nocopy option. Confirm that the pairs are created and that status is PAIR.

    If a failure occurs after the one volume capacity of a TC pair can be expanded, the swap resync operation of the TC pair cannot be performed because the capacity of both the volumes is not the same. Make sure to expand the other volume capacity so that the capacity of both the volumes is the same, and then retry the operation.

  7. At the secondary system, halt host operations and vary the P-VOL (old S-VOL) offline. This maintains synchronization of the pairs.

  8. At the secondary system(VSP 5000 series) or the primary system, which is an old secondary system (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) , split the pairs and destage held data from cache.

    Confirm that the pairs are split and status is PSUS before proceeding. If an error occurs, resolve it before proceeding.

    If a failure occurs after the one volume capacity of a TC pair can be expanded, the swap resync operation of the TC pair cannot be performed because the capacity of both the volumes is not the same. Make sure to expand the other volume capacity so that the capacity of both the volumes is the same, and then retry the operation.

  9. At the secondary system(VSP 5000 series) or the primary system, which is an old secondary system (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) , release the pairs. You do not need to use the Force Delete Pairs (TC Pairs) option.

  10. (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) At the primary and secondary systems, change the path and port settings.

    • To use the same switches, change the operating mode back to the original direction.
    • To use the same channel extenders, change the operating mode back to the original direction. The boxes/nodes connected to the primary system must be set to channel-mode, and the boxes/nodes connected to the secondary systems must be set to device-mode.
  11. At the primary system, check that it is ready to create TrueCopy pairs.

  12. At the primary system, create TrueCopy pairs(VSP 5000 series) or set TrueCopy pairs to the original direction (VSP G/F350, G/F370, G/F700, G/F900, VSP E series) .

    If all P-VOL and S-VOL are synchronized, you can use None for the Initial Copy Type option in HDvM - SN, or execute the paircreate command in CCI by specifying the -nocopy option. If P-VOL and S-VOL are not fully synchronized, use Entire Volume for Initial Copy Type.

    If a failure occurs after the one volume capacity of a TC pair can be expanded, the swap resync operation of the TC pair cannot be performed because the capacity of both the volumes is not the same. Make sure to expand the other volume capacity so that the capacity of both the volumes is the same, and then retry the operation.

  13. Vary the primary system and P-VOLs online, and start host operations.

Transferring operations back to the primary site if pairs were not deleted

  1. At the primary site, bring up the host. Make sure that TC components are operational.

  2. Run the CCI pairresync -swaps command to the S-VOL.

    If the pair data flow is already set in the opposite direction and the pair is in PAIR or COPY status, the pairresync -swaps needs not to be run.If a failure occurs after the one volume capacity of a TC pair can be expanded, the swap resync operation of the TC pair cannot be performed because the capacity of both the volumes is not the same. Make sure to expand the other volume capacity so that the capacity of both the volumes is the same, and then retry the operation.
  3. At the secondary system, halt host operations and vary the P-VOL (old S-VOL) offline. This maintains synchronization of the pairs.

  4. Run the horctakeover command to the P-VOL.

    If a failure occurs after the one volume capacity of a TC pair can be expanded, the swap resync operation of the TC pair cannot be performed because the capacity of both the volumes is not the same. Make sure to expand the other volume capacity so that the capacity of both the volumes is the same, and then retry the operation.
  5. Verify the primary system and P-VOLs online, and start host operations at the primary site.