Skip to main content
Outside service Partner
Hitachi Vantara Knowledge

Managing replication


To help manage the load on an HCP system and the replication network, you can control the performance level for send activity on each replication link in which the system participates. You can use a schedule to change the performance level for a link automatically at specific times on a weekly basis. Alternatively, you can choose a single performance level for the entire week.

Occasionally, you may need to temporarily stop all send activity on an individual link. Or, you may want to temporarily stop send activity only for particular tenants.

Rarely, you may need to stop all activity on all links in which the HCP system participates. This action stops not only replication and recovery but also read and repair from remote systems. It also prevents you from changing which items are included on the links.

If the HCP system uses virtual networking, you can select the network to use for communications through any replication link. You typically do this only once.

You can control whether the HCP system allows DNS failover to other systems in the replication topology. Disallowing DNS failover prevents the system from servicing requests redirected from remote systems regardless of whether this is allowed by the targeted namespace.

You can configure an HCP system to automatically share its domains and SSL server certificates with other systems with which it participates in replication links. This ensures that SSL works for access to replicated namespaces on those systems.

You can manage the load the replication verification service puts on the system by disabling the service, setting the service to run once, or allowing the service to run continuously.

This section of the Help explains how to perform the tasks outlined above.

RoleWebHelp.png

Roles: To view replication link schedules and global replication settings, you need the monitor role. To modify replication link schedules, manage replication at the link, tenant, and system levels, and set global replication options, you need the administrator role.

NoteWebHelp.png

Note: You can also use the HCP management API to manage replication. For information on doing this, see Replication resources.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Scheduling activity on a replication link


The amount of send activity that can occur on a replication link at any given time is controlled by the performance level that applies to the sending system. An active/active link has two performance levels at any given time, one for each system involved in the link. An active/passive link has only one performance level at any given time. That level applies to the primary system during replication and to the replica during data recovery.

The performance levels for different replication links are independent of each other, as are the performance levels for the two systems involved in an active/active link. In each case, the performance level can be low, medium, high, custom, or off. Off means that no send activity is occurring for the applicable link on the applicable system.

For each system involved in an active/active link and for an active/passive link, you can schedule the performance level to change automatically over the course of a week. At any time, if you don’t want to use the schedule, you can override it by selecting a single performance level to apply until you cancel the override.

Overriding a replication schedule lets you change the performance level without changing the configured schedule. You might do this, for example, on a holiday when the load on the applicable HCP system is expected to be light.

NoteWebHelp.png

Note: When data recovery begins on an active/passive link, a schedule override with a performance level of high takes effect. When data recovery is complete, the performance level automatically returns to the currently scheduled level if the schedule was in effect when data recovery began or to the override level that was in effect when data recovery began, as applicable.

The performance level for a sending system determines the amount of load replication or recovery processing puts on that system and on the network connecting that system to the other system involved in the applicable replication link. If a system is sending data on multiple links, the total load on the system is determined by the performance levels for all of those links together. To minimize the load, you should schedule the links to be active at different times.

TipWebHelp.png

Tip: If the system load from other activities is light, consider raising the performance level for one or more links to reduce any replication backlog. If the system load is heavy, consider lowering the performance level for one or more links to free resources for other system activity.

For more detailed information on performance levels, see Changing the custom performance level.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

About replication schedules


You use the Schedule panel on the Replication page in the HCP System Management Console to set the schedule for replication activity on a replication link. For an active/active link, this panel has two tabs on each system involved in the link — one labeled Local for the local system and one labeled Remote for the remote system.

For an active/passive link, you set the schedule only for the primary system. Therefore, on the primary system the Schedule panel has a tab labeled Local. On the replica, the Schedule panel has a tab labeled Remote.

Each Local and Remote schedule panel contains a grid in which the weekdays from Sunday through Saturday are each broken out into 24 hours. To set a schedule, you assign performance levels to time periods in the grid. The Console makes it easy for you to do this for the whole week, individual days, individual hours, or ranges of hours within a day.

In the schedule grid, the top of each time period with a given performance level displays the start and end times for that period (for example, 8 am to 6 pm). These times are in the time zone of the primary system.

The top of each time period also displays the performance level for that period. Additionally, each time period is color coded in the schedule grid to indicate the performance level so you can easily see which levels are assigned to which periods:

Low: ScheduleLow.png (light green)

Medium: ScheduleMedium.png (green)

High: ScheduleHigh.png (dark green)

Custom: ScheduleCustom.png (blue)

Off: ScheduleOff.png (gray)

While a schedule is overridden, the words Schedule Overridden appear on the schedule grid.

You cannot set a schedule to off for the entire week. Instead, to disable activity on a replication link for an extended period of time, suspend activity on the link. For information on suspending activity on a link, see Suspending and resuming activity on an individual link.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Modifying a replication schedule


By default, the schedules for an active/active link and the schedule for an active/passive link specify a performance level of medium for the entire week.

To change a schedule for a replication link:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the replication Links page, click on the link with the schdule you want to modify.

3.On the replication link details page, click on Schedule.

4.In the link Schedule panel, click on the Local or Remote tab, as applicable.

5.In the Local or Remote panel, as applicable, take any of these actions as many times as needed to set the schedule you want:

oTo set the performance level for the entire week:

1.Hover over All to display the list of performance levels.

2.Click on the performance level you want.

oTo set the performance level for an individual day:

1.Hover over the name of the day to display the list of performance levels.

2.Click on the performance level you want.

oTo set the performance level for a single hour or a range of hours:

1.Either click on an hour, or click and drag from one hour to another in the same day.

The Set Performance Level window opens.

2.Optionally, select different start and end times in the Start time and End time fields, respectively.

3.In the Level field, select the performance level you want.

4.Click on Submit.

6.Click on Update Schedule.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Overriding a replication schedule


To override a schedule for a replication link or to cancel a schedule override:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the replication Links page, click on the link with the schedule you want to override.

3.On the replication link details page, click on Schedule .

4.In the Schedule panel, click on the Local or Remote tab, as applicable.

5.In the Local or Remote panel, as applicable, click on Schedule Override.

6.In the Schedule Override section, take either of these actions:

oTo override the schedule:

1.Select Override schedule.

2.Under Performance Level, select the performance level you want.

oTo cancel the schedule override, deselect Override schedule.

7.Click on Update Schedule.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Suspending and resuming activity on an individual link


You can suspend and resume replication or recovery activity on individual replication links. When you suspend activity on a link, HCP stops all send activity on the link. You might suspend activity on a link, for example, before making changes to system hardware or to the network over which the two systems involved in the link communicate with each other.

While send activity is suspended on a link, the applicable HCP tenants and namespaces and default-namespace directories on each system remain read-write or read-only, as applicable. Additionally, the link continues to support other functions such as read from remote and repair from remote.

Resuming activity on a link restarts all send activity on the link. After suspending activity on a link, you need to resume activity manually.

Suspending activity on a link in an erasure coding topology prevents each system involved in the link from sending full copies of object data and chunks for objects to the other system over that link. Link suspensions, therefore, may prevent newly ingested objects from being protected. Suspending a link does not prevent full copies of object data from being reduced to chunks on the systems involved in the link.

The replication service periodically checkpoints its progress. When you suspend activity on a link, no special checkpoint occurs. When you resume link activity, therefore, processing starts from the last checkpoint before the suspension.

To suspend or resume activity on a replication link:

1.In the top-level menu of the System Management Console for either system involved in the link, select Services Replication.

2.On the replication Links page, click on the link on which you want you want to suspend or resume activity.

3.On the replication link details page, click on Link.

4.In the replication Link panel, click on the Management tab.

5.In the link Management panel, click on Suspend or Resume, as applicable.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Pausing and resuming replication or recovery of a tenant


You can pause and resume replication or recovery of an individual tenant on a replication link. Pausing replication or recovery of a tenant on a link causes the replication service on the sending system to stop all send activity for the tenant on that link. With an active/active link, the replication service on both systems involved in the link stops all send activity for the tenant on the link.

Resuming replication or recovery of a tenant on a link restarts all send activity for that tenant on that link. After pausing replication or recovery of a tenant on a link, you need to resume replication or recovery manually.

You might pause replication of some tenants on a link, for example, to give more processing time to other tenants with greater replication backlogs. While replication or recovery of a tenant is paused on a link, the two systems involved in the link can still use the link to read from each other the objects in the tenant's replicated namespaces.

Replication or recovery of a tenant can also be paused automatically due to certain events. After replication or recovery of a tenant is paused automatically, you need to resume replication or recovery manually. However, you cannot do this until the issue that caused replication or recovery to be paused is resolved. For information on events that can cause tenant replication or recovery to be paused automatically, see Automatically paused tenant replication or recovery.

Pausing replication or recovery of a tenant on a link in an erasure coding topology prevents each system involved in the link from sending full copies of object data and chunks for objects in the tenant's namespaces to the other system over that link. Pausing replication or recovery of a tenant, therefore, may prevent newly ingested objects in those namespaces from being protected.

Pausing replication or recovery of a tenant on a link does not prevent full copies of object data in the tenant's namespaces from being reduced to chunks on the systems involved in that link.

The replication service periodically checkpoints its progress. When you pause replication or recovery of a tenant, no special checkpoint occurs. When you resume processing, therefore, processing starts from the last checkpoint before the pause.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Pausing and resuming tenant replication or recovery


To pause or resume replication or recovery for an individual tenant on a replication link:

1.In the top-level menu of the System Management Console, select Services Replication.

2.On the replication Links page, click on the link for which you want to pause or resume replication or recovery.

3.On the replication link details page, click on Status.

4.In the link Status panel, click on the Tenants tab.

5.In the list of tenants, click on the pause control ( PauseControl.png ) or resume control ( ResumeControl.png ), as applicable, for the tenant for which you want to pause or resume activity.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Automatically paused tenant replication or recovery


Certain events can cause the replication service to automatically pause replication or recovery of a tenant on a replication link. The following sections describe these events.

TipWebHelp.png

Tip: To avoid situations that can cause the replication service to automatically pause replication of a tenant, do not try to create the same tenants, namespaces, content classes, user accounts, and group accounts on both of the systems that are replicating to each other on an active/active link. Instead, allow the items you create on each system to replicate to the other system.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Tenant name collisions


A tenant name collision occurs when the replication service tries to replicate an HCP tenant from one HCP system to another HCP system that already has a different tenant with the same name.

To recover from a tenant name collision, take one of these actions:

Rename the tenant on one of the systems involved in the link.

Delete the tenant on the receiving system.

Here are two scenarios that show how a tenant name collision can cause the replication service to pause replication of a tenant.

Scenario 1

In this scenario:

System A replicates to system B on link AB. Link AB can be either active/active or active/passive.

System A has a tenant named T1 that is not on link AB.

These events occur in the order shown:

1.On system A, you add T1 to link AB.

2.Before T1 is replicated to system B, you create a tenant named T1 on system B.

3.The replication service tries to replicate T1 to system B. The replication is unsuccessful because a different tenant named T1 already exists on system B. As a result, the service automatically pauses replication of T1 on link AB.

Scenario 2

In this scenario:

System A replicates to system B on link AB, which replicates to system C on link BC, where link AB is chained into link BC. Link AB can be either active/active or active/passive. Link BC is active/passive.

System A and system C each have a tenant named T1, where T1 was created independently on each system.

These events occur in the order shown:

1.On system A, you add T1 to link AB.

2.T1 is replicated to system B.

3.Because link BC includes link AB, the replication service tries to replicate T1 to system C. The replication is unsuccessful because a different tenant named T1 already exists on system C. As a result, the service automatically pauses replication of T1 on link BC.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Namespace name collisions


Each HCP namespace you create in an HCP system has an internal ID that uniquely identifies that namespace. As a result, two namespaces created on different systems are different from each other, even if they have the same name and are owned by the same tenant.

A namespace name collision occurs when the replication service tries to replicate a namespace from one system to another system that already has a different namespace with the same name, where both namespaces are owned by the same tenant.

To recover from a namespace name collision, take one of these actions:

Rename the namespace on one of the systems involved in the link.

Deselect the namespace from replication.

Delete the namespace on the receiving system.

Here’s a scenario that shows how a namespace name collision can cause the replication service to pause replication of a tenant. In this scenario:

System A and system B replicate to each other over active/active link AB.

Link AB includes tenant T1, so T1 exists on both systems.

On system A, T1 owns namespace NS1, which is not selected for replication.

These events occur in the order shown:

1.On system A, you select NS1 for replication.

2.Before NS1 is replicated to system B, you create a namespace named NS1 for T1 on system B.

3.The replication service tries to replicate NS1 to system B. The replication is unsuccessful because a different namespace named NS1 already exists on system B. As a result, the service automatically pauses replication of T1 on link AB.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Namespace compliance issues


Different HCP systems can have different definitions for service plans with the same name. When an HCP tenant or namespace is replicated, the name of its associated service plan, not the service plan itself, is replicated with it. As a result, the service plan that applies to a namespace can differ on the two HCP systems involved in a link on which the namespace is being replicated.

Service plans can be compliant or noncompliant. However, the service plan that applies to a namespace in compliance mode must be compliant. A namespace compliance issue occurs when replication of a namespace in compliance mode would cause a noncompliant service plan to apply to the namespace on the receiving system.

To recover from namespace compliance issue, take one of these actions:

Redefine the noncompliant service plan on the receiving system to be compliant.

If the service plan is assigned to the tenant that owns the namespace, assign a different service plan to the tenant on the sending system, where that service plan is complaint on both systems involved in the link.

If the service plan is assigned to the namespace, have the tenant administrator assign a different service plan to the namespace on the sending system, where that service plan is complaint on both systems involved in the link.

Deselect the namespace from replication.

Here are two scenarios that show how a namespace compliance issue can cause the replication service to pause replication of a tenant.

Scenario 1

In this scenario:

System A replicates to system B on link AB. Link AB can be either active/active or active/passive.

Link AB includes tenant T1, so T1 exists on both systems.

On system A, T1 owns namespace NS1, which is in compliance mode. NS1 not selected for replication.

The service plan that applies to NS1 is named SP1. SP1 is compliant on system A and noncompliant on system B.

These events occur in the order shown:

1.On system A, you select NS1 for replication.

2.The replication service tries to replicate NS1 to system B. The replication is unsuccessful because it would cause NS1, which is in compliance mode, to have a noncompliant service plan on system B. As a result, the service automatically pauses replication of T1 on link AB.

Scenario 2

In this scenario:

System A replicates to system B on link AB. Link AB can be either active/active or active/passive.

Link AB includes namespace NS1, which is owned by tenant T1, so NS1 exists on both systems.

NS1 is in enterprise mode, not compliance mode.

The service plan that applies to NS1 is named SP1. SP1 is compliant on system A and noncompliant on system B.

These events occur in the order shown:

1.On system A, you change NS1 to be in compliance mode.

2.The replication service tries to replicate the change to system B. The replication is unsuccessful because it would cause NS1 to be in compliance mode with a noncompliant service plan on system B. As a result, the service automatically pauses replication of T1 on link AB.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content class collisions


Each content class you create in an HCP system has an internal ID that uniquely identifies that content class. As a result, two content classes created on different HCP systems are different from each other, even if they have the same name and are defined for the same tenant.

A content class collision occurs when the replication service tries to replicate a content class from one system to another system that already has a different content class with the same name, where both content classes are defined for the same tenant.

To recover from a content class collision, take one of these actions:

Rename the content class on either of the systems involved in the link.

Delete the content class on either of the systems involved in the link.

Here’s a scenario that shows how a content class collision can cause the replication service to pause replication of a tenant. In this scenario:

System A and system B replicate to each other over active/active link AB.

Link AB includes tenant T1, so T1 exists on both systems.

These events occur in the order shown:

1.On system A, you create a content class named CC1 for T1.

2.Before CC1 is replicated to system B, you create a content class named CC1 for T1 on system B.

3.The replication service tries to replicate CC1 to system B. The replication is unsuccessful because a different content class named CC1 already exists on system B. As a result, the service automatically pauses replication of T1 on link AB.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

User account collisions


Each user account you create in an HCP system has an internal ID that uniquely identifies that user account. As a result, two user accounts created on different systems are different from each other, even if they have the same username and are defined for the same HCP tenant.

A user account collision occurs when the replication service tries to replicate a user account from one system to another system that already has a different user account with the same username, where both user accounts are defined for the same tenant.

To recover from a user account collision, take one of these actions:

Change the username for the user account on either of the systems involved in the link.

Delete the user account on either of the systems involved in the link.

Here’s a scenario that shows how a user account collision can cause the replication service to pause replication of a tenant. In this scenario:

System A and system B replicate to each other over active/active link AB.

Link AB includes tenant T1, so T1 exists on both systems.

These events occur in the order shown:

1.On system A, you create a user account with username U1 for T1.

2.Before U1 is replicated to system B, you create a user account with username U1 for T1 on system B.

3.The replication service tries to replicate U1 to system B. The replication is unsuccessful because a different user account with username U1 already exists on system B. As a result, the service automatically pauses replication of T1 on link AB.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Group account collisions


Each HCP group account you create in an HCP system has an internal ID that uniquely identifies that group account. As a result, two group accounts created on different systems are different from each other, even if they are created from the same Active Directory group and are defined for the same HCP tenant. (An HCP group account always has the same name as the AD group it’s created from, so group accounts created from the same AD group on two different systems have the same name as each other.)

A group account collision occurs when the replication service tries to replicate a group account from one system to another system that already has a different group account created from the same AD group, where both group accounts are defined for the same tenant.

To recover from a group account collision, delete the group account on either of the systems involved in the link.

Here’s a scenario that shows how a group account collision can cause the replication service to pause replication of a tenant. In this scenario:

System A and system B replicate to each other over active/active link AB.

Link AB includes tenant T1, so T1 exists on both systems.

These events occur in the order shown:

1.On system A, you create an HCP group account for T1 from the AD group named AD1. The name of the group account you create is AD1.

2.Before AD1 is replicated to system B, you create an HCP group account for T1 from the AD group named AD1 on system B. The name of the group account you create is AD1.

3.The replication service tries to replicate the HCP group account named AD1 to system B. The replication is unsuccessful because a different group account named AD1 already exists on system B. As a result, the service automatically pauses replication of T1 on link AB.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Tenant conflicts


A tenant conflict occurs when you add a tenant to an erasure coding topology and that tenant is on one or more replication links where:

The link is not in the topology

One or both of the systems involved in the link are in the topology

When a tenant conflict occurs, the replication service automatically pauses replication or recovery of the tenant on each link that meets the criteria listed above.

To recover from a tenant conflict, take one of these actions:

Remove the tenant from the links on which replication or recovery of the tenant is paused.

Remove the tenant from the erasure coding topology. Then resume replication or recovery of the tenant on the applicable links.

Here's a scenario that shows how a tenant conflict can cause the replication service to pause replication of a tenant. In this scenario:

Erase coding topology ECT1 is a ring topology that consists of:

oSystems A, B, C, and D

oActive/active link AB between systems A and B

oActive/active link BC between systems B and C

oActive/active link CD between systems C and D

oActive/active link DA between systems D and A

ECT1 is the active erasure coding topology.

Tenant T1 is included in topology ECT1. Therefore, T1 is included on links AB, BC, CD, and DA and exists on systems A, B, C, and D.

TenantAddToEcTop-0.png

These events occur in the order shown:

1.You deploy system E.

2.You retire topology ECT1.

3.You create active/active link BE between systems B and E.

4.You create active/active link CE between systems C and E.

5.You create a new topology, ECT2, that consists of systems B, C, and E and links BC, BE, and CE.

TenantAddToEcTop-1.png

6.You add tenant T1 to topology ECT2.

7.All of these occur:

oT1 is added to links BE and CE and is replicated to system E.

oThe replication service on system B pauses replication or recovery of T1 on link AB because system B is in topology ECT2 and system A is not

oThe replication service on system C pauses replication or recovery of T1 on link CD because system C is in topology ECT2 and system D is not

TenantAddToEcTop-2.png

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Shutting down and reestablishing all replication links


Shutting down replication links on an HCP system stops all activity on all links in which the system participates. You cannot shut down an individual link.

While links are shut down:

No replication or recovery activity occurs on the links.

The system where the links are shut down cannot read or repair objects from other HCP systems.

Other HCP systems cannot read or repair objects from the system where the links are shut down.

The system where the links are shut down can still service requests that are redirected from other HCP systems in the replication topology. For information on this capability, see Managing DNS failover.

Objects ingested on the system where the links are shut down are not protected from system failure.

The system where the links are shut down is isolated in all erasure coding topologies that include the system. As a result:

oErasure-coded objects cannot be read from that system.

oFor a system in the active erasure coding topology, other systems in that topology cannot send full copies of object data or chunks for objects to that system. As a result, the topology protection status is broken on all systems in the topology.

You cannot change which HCP tenants and namespaces, default-namespace directories, and inbound links are included in the links in which the system participates.

An alert indicating that all links are shut down appears on the Overview page in the System Management Console for the system where the links are shut down.

A message indicating that all links are shut down appears at the top of the Replication page in the System Management Console for the system where the links are shut down.

You may want to shut down replication links, for example, if you need to temporarily dedicate as much network bandwidth as possible to applications that are unrelated to HCP. Additionally, your authorized HCP service provider may ask you to shut down all links for certain system upgrade scenarios.

When you shut down all replication links, you are required to specify a reason for the action.

To restart activity on replication links after you’ve shut them down, you need to reestablish the links. Reestablishing the links returns each link to the state it was in before you shut down the links. If a link was active when you shut down all links, it becomes active again when you reestablish the links. If a link was suspended when you shut down all links, it remains suspended when you reestablish the links.

To shut down or reestablish all replication links on an HCP system:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, take either of these actions:

oTo shut down all replication links:

1.Click on Shut Down All Links.

The Shut Down All Replication Links window appears.

2.In the Reason field, type the reason why you’re shutting down all links. This text can be up to 1,024 characters long and can contain any valid UTF-8 characters, including white space.

3.Click on Shut Down All Links.

oTo reestablish all replication links, click on Reestablish All Links.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Selecting the network for replication


In an HCP system that uses virtual networking, you can define multiple networks and then select the networks to use for various purposes, including replication. The network you select for replication on a given system is used for both incoming and outgoing replication traffic.

For information on virtual networking with HCP, see About virtual networking with HCP.

Networking infrastructure

Different HCP systems can use different networks for the purpose of replication. When you select a replication network for a given system, you need to ensure that your networking infrastructure is configured to allow communications to be routed between that system and all other systems with which that system participates in a replication link.

IP mode

The two systems involved in a replication link must be able to use the same IP mode to communicate with each other. That is, either both networks must be configured with IPv4 addresses, or both networks must be configured with IPv6 addresses.

One or both of the networks involved can have both IPv4 and IPv6 addresses. If the replication networks for the two systems involved in a replication link have both types of addresses, HCP uses the IP addresses for the first mode in which it can establish communication between the two systems, with preference given to IPv6.

Within a replication topology, different pairs of systems can use different IP modes for communication. The figure below shows a replication chain in which different IP modes are used for communication over each link.

ReplicationNetworkModes-Reduced.png

In the topology shown above:

System A replicates to system B on link AB.

System B replicates to system C on link BC.

The replication network for system A has only IPv6 IP addresses.

The replication network for system B has both IPv4 IP address and IPv6 addresses.

The replication network for system C has only IPv4 IP addresses.

System A and system B use IPv6 addresses to communicate with each other.

System B and system C use IPv4 addresses to communicate with each other.

Loss of connectivity

You can select a different network for replication at any time. Selecting a different replication network on any given system can result in loss of connectivity to other systems with which that system is directly involved in replication.

When connectivity is lost between the two systems that participate in a replication link, the replication service automatically suspends activity on that link. After restoring connectivity, you need to manually resume activity on the link.

For any given replication link, connectivity is lost between the two systems involved when you select a different network on one of the systems and either of these apply:

The networking infrastructure is not configured to route communications between the new network and the network selected for replication on the other system.

The new network is associated with a different domain from the previously selected network. In this case, the SSL server certificate used for replication changes, and you need to share the new certificate with the other system involved in the link.

Additionally, in this case, if the system where you selected the new network is identified by domain name in the link configuration, you need to update the domain name in the link configuration.

In any case, if you select a different network on one system and the link identifies that system by IP addresses, you need to update those IP addresses in the link configuration.

If you need to both share new SSL server certificates and update the identification of one of the systems in the link configuration, you should share the certificates first. If you update the link first and then share the certificates, the link status changes to broken. To recover from this situation, you need to click again on Update Link in the Settings panel for the link.

For information on the SSL certificate change that can occur when you select a different network for replication, see Sharing SSL server certificates.

Shared domain name

When you create a replication link, if both of the following are true, you need to use the replication.admin.hcp-domain-name format to identify the other system involved in the link:

You are identifying the other system by domain name.

The domain associated with the network selected for replication on the other system is also associated with another network.

In the configuration of an existing replication link, the domain name used to identify a system can have a format other than replication.admin.hcp-domain-name. In this case, you need to update the system identification to use the replication.admin.hcp-domain-name format if any of these happens on that system:

You change the domain associated with the replication network to a domain that’s also associated with another network.

You select a new replication network, and the new network is associated with a domain that’s also associated with another network.

You change the domain associated with another network to the domain that’s associated with the replication network.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Selecting the replication network


By default, HCP uses the [hcp_system] network for replication.

To select a different network to be used for replication traffic:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, in the Replication Network field, select the network you want to use for replication traffic. The dropdown list of networks does not include empty, degraded, or partial networks.

The Replication Network field is present only while virtual network management is enabled for the HCP system.

4.Click on Update Settings.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Displaying the zone definition for the replication network domain


Each network in HCP is associated with a domain. Multiple networks can be associated with the same domain. If you’re using DNS for domain name resolution, the DNS needs to include a zone definition for each combination of network and domain.

If an HCP system is involved in replication, the zone definition for the replication network domain for that system needs to be added to the upstream DNS servers for the other system on the replication link. An upstream DNS server is a DNS server to which HCP routes the outbound communications HCP initiates (for example, for sending log messages to syslog servers or for communicating with Active Directory).

If the domain for the replication network is shared with other networks, the domain name in the zone definition you add for the replication network domain must be:

replication.admin.replicaton-network-domain-name

From the Replication page, you can display the zone definition for the replication network domain, formatted for Unix DNS servers. You can then copy that definition to the upstream DNS servers.

NoteWebHelp.png

Note: You can display the zone definition only while virtual network management is enabled for the HCP system and only if at least one user-defined network exists.

To display the zone definition for the replication network domain:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, click on Zone Definition.

The Zone Definition section opens. This section shows the zone definition for the network (with a type of slave or stub, as applicable), formatted as shown in this example:

# net1 Special-purpose HCP system network
zone "replication.admin.system_1.example.com" IN {
       type slave;
       file "/var/named/slave/replication.admin.system_1.example.com";
       masters {
             172.25.147.11;
             172.25.147.12;
             172.25.147.13;
             172.25.147.14;
       };
};

For more information on defining zones for HCP domains in the DNS, see Zones.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Managing DNS failover


HCP replication helps ensure continuous namespace availability. If a system in a replication topology fails, client requests for access to a replicated namespace can be serviced by other systems in the topology.

You can use the HCP DNS failover feature to automate the process of redirecting client requests from a failed system to a healthy one. For active/active links, this process can be automated in other ways as well. Automating redirection means that applications do not need to be modified to explicitly direct requests to another system when the normally targeted system fails.

The DNS failover feature requires that the HCP systems involved use a shared DNS for system addressing. If the systems don’t use a shared DNS, DNS failover is not an option for automating redirection of client requests.

An HCP system can service automatically redirected requests only if the target namespace is configured to support service by remote systems. If the namespace is not configured this way, requests to access the namespace on a failed system fail.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

About DNS failover


DNS failover is an HCP system configuration option that, when enabled on the one system involved in a replication link, forces requests to that system to be automatically redirected to the other system involved in the link while the link is failed over to the other system. This redirection occurs only when the request identifies the target system by domain name, not by IP address.

In effect, DNS failover causes the domain name for the failed-over system to be associated with the IP addresses for the nodes in the other system. Therefore, all types of requests that specify that domain name are redirected to the other system. This includes not only requests for namespace access but also requests for access to HCP interfaces such as the Tenant Management Console and HCP management API.

An HCP system can service redirected requests only if they come in through a namespace access protocol. This means that requests for access to the failed-over system that are made through other interfaces fail.

With an active/active link, failover can occur in either direction between the two systems involved in the link. Therefore, if you are using DNS failover for automatic redirection of client requests, you should enable it on both systems.

With an active/passive link, failover can occur only from the primary system to the replica. In this case, therefore, you need to enable DNS failover only on the primary system. However, if the replica is also the primary system for another link, you need to enable DNS failover on the replica as well.

For DNS failover to work for the system where it’s enabled, the HCP domains for that system in the DNS must be configured to support service by remote systems. If DNS failover is not enabled, the HCP domains should not be configured that way.

DNS failover is intended to address cases of catastrophic failure of the HCP system where DNS failover is enabled. However, DNS failover also applies if you fail over a link while the system is healthy. In this case, the method used to access unreplicated items on that system depends on the data access network for the tenant that owns the target namespace.

For example, suppose:

Tenants ten1 and ten2 both use the network named net1 for data access.

Tenant ten1 and its namespace ns1 are in a replication link that is failed over from system A to system B.

Tenant ten2 and its namespace ns2 are not in the failed-over replication link.

Client requests for access to ns1 on system A, where the request URL specifies the name of the domain associated with net1, are redirected to system B. Because they come in on the same network, client requests for access to ns2 on system A, where the request URL specifies the domain name, are also redirected to the system B and, therefore, fail. For those requests to succeed, they need to access system A by using an IP address assigned to a node in net1 on that system instead of by using the domain name.

The same consideration applies to access to other HCP interfaces. For example, if the data access network for a tenant in a link that’s failed over from system A to system B is [hcp_system], you need to use an IP address to access the HCP System Management Console on system A.

DNS failover also affects replication between the failed-over system and any other system with which that system participates in a replication link. If the other system identifies the failed-over system by domain name, all replication activity on the link between the two systems stops.

For:

An introduction to failover with HCP replication, see Failover and failback

Information on configuring namespaces to accept redirected requests, see Changing replication options or Managing the Default Tenant and Namespace

Information on networks and configuring DNS for HCP, see Network administration and Configuring DNS for HCP

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Enabling or disabling DNS failover


To enable or disable DNS failover for an HCP system:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, take either of these actions:

oTo enable DNS failover, select Enable DNS failover to other systems in the replication topology.

oTo disable DNS failover, deselect Enable DNS failover to other systems in the replication topology.

4.Click on Update Settings.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Alternatives to DNS failover


DNS failover is an HCP-specific method for automatically managing service by remote systems. With active/active replication, other options exist:

In an environment in which load balancers are used to spread client requests among multiple HCP systems, if one of the systems fails, the load balancers can ensure that the requests go to other systems.

In a cloud storage environment, the networking and DNS infrastructure can be configured to support multiple HCP systems that use the same domain names. In this configuration, client requests are normally handled by the local DNS, but if an HCP system fails, the request can be passed on to another DNS that’s local to another HCP system.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Automatically sharing domains and SSL server certificates


With DNS failover, client requests to a failed system that identify the system by domain name are automatically redirected to another system in the replication topology. If the client request uses HTTP with SSL security (HTTPS), the system to which the request is redirected must have an SSL server certificate for the domain specified in the request. Because the fully qualified domain name for a replicated namespace is different on different systems in a replication topology that uses DNS failover, the system to which the request is redirected would not normally have such a certificate.

An HCP system can be configured to periodically send its domains and SSL server certificates to each other system with which it participates as a sending system in a replication link. If the system targeted by an HTTPS request has shared its certificates with the system to which the request is redirected, that second system can service the request.

When a system is configured to share its domains and SSL server certificates, it sends all the domains and certificates it has, including those that were sent to it by another system. If you configure all the systems in a replication topology to share their domains and certificates, any system in the topology can service an HTTPS request redirected from any other system in the topology.

Sharing domains and SSL server certificates has an additional benefit. If an HCP system is rebuilt after a catastrophic failure, the domains and certificates originally created on that system can be recovered from a system with which they were shared before the failure.

To enable or disable automatic sharing of domains and SSL server certificates for an HCP system:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, take either of these actions:

oTo enable automatic sharing of domains and certificates, select Send local domains and certificates to other systems in the replication topology.

oTo disable automatic sharing of domains and certificates, deselect Send local domains and certificates to other systems in the replication topology.

4.Click on Update Settings.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Managing the replication verification service


You can set the replication verification service to run once or to run continuously, or you can disable the service so that it doesn't run at all. For a description of the replication verification service, see Replication verification service processing.

To manage the replication verification service:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, take one of these actions:

oTo run the replication verification service once, select Run replication verification service. Then select Run once.

oTo run the replication verification service continuously, select Run replication verification service. Then select Run continuously.

oTo disable the replication verification service, deselect Run replication verification service.

4.Click on Update Settings.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Changing link connectivity failure reporting


In the HCP System Management Console, you can set how long you want the HCP system to wait after a failed replication link connection before reporting the link connectivity failure.

To change how long the HCP system should wait before reporting a replication link connectivity failure:

1.In the top-level menu of the HCP System Management Console, select Services Replication.

2.On the left side of the Replication page, click on Settings.

3.On the replication Settings page, locate the option to Report link connectivity failure after n seconds.

4.In the field represented by n in the above step, type the number of seconds you want HCP to wait before reporting a replication link connectivity failure.

Valid values are integers greater than or equal to zero. On new installations of HCP, n is set to 0.

5.Click on Update Settings.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.