Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Failover and failback

Failover is a process that stops replication on a link and results in a situation in which, for read-write access:

  • For an active/active link, applications should use only the HCP system to which the link was failed over
  • For an active/passive link, applications should use only the replica

Typically, you fail over a link when one of the systems involved in the link becomes unavailable. With an active/passive link, this system must be the primary system. You don’t need to fail over the link when the replica fails.

You can fail over a link while both systems are available. You might do this, for example, if you need to shut down one of the systems for maintenance.

Depending on the link configuration, failover either is a manual procedure or occurs automatically. When automatic failover is enabled for a link, the link fails over automatically after the applicable system is unavailable for a specified amount of time.

You enable or disable automatic failover separately for each system involved in an active/active link. For an active/passive link, you enable or disable automatic failover only for the replica.

Failback is the process that restarts replication on a link that has been failed over and returns the HCP systems involved in the link to normal operation. Typically, you fail back a link when an unavailable system becomes available again.

If connectivity was lost between the two systems involved in a failed-over link, before failback can occur, connectivity must be restored. Connectivity exists when the network infrastructure through which the two systems communicate is healthy and the applicable SSL server certificates have been shared between the two systems.

In a disaster recovery situation in which the system that became unavailable has been rebuilt, the link no longer exists on that system. In this case, before failback can occur, the link must be restored to the rebuilt system.

With an active/active link, failback is a manual procedure. With an active/passive link, the failback procedure can be partially automated.

Failover and failback with active/active links

The effects of failing over and failing back an active/active link differ depending on whether DNS failover was enabled for the system that became unavailable. In all cases, however, when the link fails over, replication on that link stops. When the link fails back, normal replication restarts.

Failover with an active/active link

With an active/active link, failover can occur in either direction between the two systems involved in the link. While the link is failed over, the replicated HCP tenants and namespaces and default-namespace directories remain read-write on both systems. However, because failover normally occurs when one system is unavailable, to avoid wasting resources, neither system tries to read or repair objects from the other system.

With DNS failover enabled, when an active/active link fails over from one of the HCP systems involved in the link (system A) to the other system involved in the link (system B), system B broadcasts a new configuration to the DNS. This new configuration causes client requests targeted to system A by domain name to be redirected to system B when the request is for an HCP namespace or default-namespace directory that was being replicated on the failed-over link.

NoteSystem B can service redirected namespace access requests only if the applicable namespace is configured to allow service by remote systems.

If a client request targeted to system A is for an HCP namespace or default-namespace directory that was not being replicated on the failed-over link, the request is not redirected to system B. Client requests that target system A by IP address are also not redirected to system B.

Client requests that use a domain name to target the Tenant Management Console for a replicated HCP tenant on system A are redirected to system B, but system B cannot process such requests. Instead, system B returns a 403 error code.

While the link is failed over, system A does not broadcast any configuration information to the DNS.

With DNS failover disabled, failing over an active/active link stops replication on the link but does not cause any other changes. Clients can still access system A by domain name (if system A is available).

Failback with an active/active link

Failing back an active/active link entails a single action, fail back. When an active/active link fails back:

  • Replication immediately restarts in both directions on the link. With DNS failover enabled, each HCP system involved in the link broadcasts its own configuration to the DNS. From that point on, client requests that target either system by domain name are directed to the specified system.

Failover and failback with active/passive links

The effects of failing over and failing back an active/passive link differ depending on whether DNS failover was enabled for the system that became unavailable. In all cases, however, when an active/passive link fails over, replication on that link stops. When the link fails back, normal replication restarts.

Failover with an active/passive link

Failover with an active/passive link occurs only from the primary system to the replica. When an active/passive link fails over to the replica, the replicated HCP tenants and namespaces and default-namespace directories become read-write on the replica.

With DNS failover enabled, when an active/passive link fails over, the replica broadcasts a new configuration to the DNS. This new configuration causes client requests targeted to the primary system by domain name to be redirected to the replica when the request is for an HCP namespace or default-namespace directory that was being replicated on the failed-over link.

NoteA replica can service redirected namespace access requests only if the applicable namespace is configured to allow service by remote systems.

If a client request targeted to the primary system is for an HCP namespace or default-namespace directory that was not being replicated on the failed-over link, the request is not redirected to the replica. Client requests that target the primary system by IP address are also not redirected to the replica.

Client requests that use a domain name to target the Tenant Management Console for a replicated HCP tenant on the primary system are redirected to the replica, but the replica cannot process such requests. Instead, the replica returns a 403 error code.

If the primary system is still available when an active/passive link is failed over and the primary system and the replica can communicate with each other, the replicated HCP tenants and namespaces and default-namespace directories become read-only on the primary system. Also, while the link is failed over, the primary system does not broadcast any configuration information to the DNS.

If the two systems cannot communicate with each other when the link is failed over, the replicated items remain read-write on the primary system, and the primary system continues to broadcast its configuration information to the DNS. If DNS failover is disabled, clients can still access the primary system by domain name.

If clients are allowed to write to both the primary system and the replica while an active/passive link is failed-over, configuration conflicts and conflicts in namespace content may occur when the link is failed back. Although HCP resolves such conflicts in a predictable manner, the recommended practice is to avoid them in the first place. Therefore, when you fail over an active/passive link without DNS failover enabled, you should tell the applicable tenant administrators to direct all client access requests to the replica.

Failover with bidirectional active/passive links

With bidirectional active/passive links, failover is a independent process for each of the two links. If one of the HCP systems involved in the links becomes unavailable, failover needs to occur only on the link for which that system is the primary system. The link for which that system is the replica does not need to be failed over, and the status of the HCP tenants and namespaces and default-namespace directories on that link does not change.

Failback with an active/passive link

Failing back an active/passive link has two phases, begin recovery and complete recovery. The begin recovery phase is always started manually. The complete recovery phase can be started manually or automatically.

When recovery begins on a link, the replica starts sending configuration changes and changes to namespace content back to the primary system. The replicated HCP tenants and namespaces and default-namespace directories remain read-write on the replica and read-only on the primary system.

The complete recovery phase is designed to allow data recovery to catch up to the current time before normal replication restarts. At the beginning of this phase, the replicated items change to read-only on the replica and remain read-only on the primary system. The replica continues to send data to the primary system until the two HCP systems involved in the link are in sync with each other. Within a minute after that point:

  • Normal replication restarts on the link.
  • With DNS failover enabled, each system involved in the link broadcasts its own configuration to the DNS. From that point on, client requests that target either system by domain name are directed to the specified system.

Before starting the complete recovery phase manually, you should wait until data recovery is caught up with the current time. When you configure automatic failback for an active/passive link, you specify how up to date data recovery must be before the complete recovery phase begins.

 

  • Was this article helpful?