Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Managing bucket synchronization

Hitachi Content Platform for cloud scale (HCP for cloud scale) provides functions to let you configure and manage bucket synchronization.

About bucket synchronization

HCP for cloud scale can synchronize the following kinds of data in buckets:

  • Object data
  • All user metadata (that is, anything that can be returned in the header x-amz-meta-*)
  • Tags
  • Content-Type system metadata
  • Objects that the owner of the source bucket doesn't have permission to read
Note

Objects that existed before the functions are configured are not synchronized.

HCP for cloud scale checks the rules that are valid at the time an object is synchronized, not at the time the object is ingested.

Objects that are marked as deleted are not synchronized.

Most system metadata is not synchronized, specifically:

  • Owner ID and Name
  • Timestamps (when last modified)
  • Metadata returned in x-amz-grant-*
  • Metadata returned in x-amz-acl
  • Metadata returned in x-amz-grant-*
  • Metadata returned in x-amz-acl
  • Metadata returned in x-amz-storage-class
  • Metadata returned in x-amz-replication-status
  • Metadata returned in x-amz-server-side-encryption-*
  • Metadata returned in x-amz-restore-*
  • Metadata returned in x-amz-version-id-*
  • Metadata returned in x-amz-website-redirect-location
  • Metadata returned in x-amz-object-lock-*
Comparing synchronization to replication

Unlike AWS replication, HCP for cloud scale can synchronize with buckets on storage systems outside of AWS.

AWS determines the destination bucket using rules, but only applies one rule to each new object. In contrast, HCP for cloud scale can apply multiple rules to each new object so long as the destination buckets are different. This is how one-to-many synchronization is implemented.

AWS does not replicate, but HCP for cloud scale synchronizes, objects that the owner of the source bucket doesn't have permission to read.

In contrast with AWS replication, HCP for cloud scale does not synchronize the following:

  • Access control lists (ACLs)
  • Lock retention information
  • Objects that are encrypted using Amazon S3 managed keys (SSE-S3) and AWS KMS managed keys (SSE-KMS)

If an object being synchronized has the same name as an object in the target bucket, the result depends on whether or not the target bucket uses versioning:

  • If versioning is used, the old object is kept as an old version.
  • If versioning is not used, the old object is replaced by the new object.

HCP for cloud scale buckets always use versioning. The best practice is to use versioning in all target buckets.

Best-effort ordering

HCP for cloud scale guarantees that operations are applied in the order of their arrival (strong consistency). However, synchronizing multiple operations applied in a short period of time to the same object presents the following difficulties:

  • In a distributed system, especially when many systems are involved, synchronizing all operations in correct order is complex.
  • Even if HCP for cloud scale synchronizes all operations in correct order to an external storage component, that component might not guarantee that the operations are applied with strong consistency. In particular, AWS guarantees only "eventual consistency."
  • For bucket sync-from, the external queue service might not guarantee that messages are provided in correct order. In particular, AWS Simple Queue Service (SQS) does not support first-in, first-out (FIFO) queues for S3 notifications.

Therefore, HCP for cloud scale makes its best effort to synchronize only the latest state of an object, not each version or operation for the object. For example:

  • Assume that a client sends three operations to an object, and that they are all committed: (1) PUT, (2) PUT, (3) DEL. The latest state of the object is (3) DEL. HCP for cloud scale only synchronizes DEL.
  • Assume that a client sends three operations to an object, and that they are all committed: (1) PUT, (2) DEL, (3) PUT. The latest state of the object is (3) PUT. HCP for cloud scale only synchronizes (3) PUT.

This approach does not guarantee that the latest state of an object will be in the external storage for all situations. Partly because of the "eventual consistency" provided by AWS S3 API, corner cases still exist.

Overview of tasks

These are the high-level steps involved in setting up bucket synchronization:

  1. If appropriate, the HCP for cloud scale administrator assigns a sync-to or sync-from role to tenant administrators.
  2. The administrator creates buckets.
  3. The administrator configures synchronization rules.
  4. Users can now use synchronization when writing objects to buckets.
NoteIf you use the AWS command-line interface to configure bucket synchronization, use at least aws-cli v1.16.211.

Bucket synchronization configuration

Bucket synchronization is configured using S3 PUT bucket replication API requests that define rules. Each bucket can have up to 1,000 rules, but all rules must be sync-to or sync-from rules. Each rule defines the following:

  • External bucket settings
  • A set of one or more prefixes; an object with one of the prefixes is mirrored
  • A set of one or more tags; an object with all, or any, of the tags is mirrored
  • For sync-from, external queue settings

Because you can configure multiple rules with multiple tags, you have flexibility in selecting objects to mirror. For example:

  • To mirror all objects that contain Tag1 and Tag2, you can configure one rule that includes both tags.
  • To mirror all objects that contain Tag1 or Tag2, you can configure two rules, one for each tag.
Rule collisions

HCP for cloud scale can apply multiple bucket synchronization rules to each new object so long as the destination buckets are different. This is how one-to-many synchronization is implemented.

A rule collision is when two or more rules that apply to an object have the same destination (that is, the same external host, port, and bucket). HCP for cloud scale does not allow rule collisions, so PUT bucket replication requests are rejected if they contain rule collisions. To avoid rule collisions, you can define as many tags in a rule as necessary, so that multiple rules with the same destination are not needed.

Effect of configuration changes

When bucket synchronization rules are created, updated, or deleted, the changes only apply to new objects or new S3 API operations. Objects that existed before the rules are configured are not synchronized. If an object exists in the state PENDING when a rule is created, updated, or deleted, the rule might not be applied to it, because the object might be in the midst of copying.

Configure bucket synchronization (PUT bucket replication)

You can configure S3 bucket sync-to and sync-from settings.

HTTP request syntax (URI)
aws --endpoint -url https://host_ip s3api put-bucket-replication --bucket "bucket" --replication-configuration '{body}'
Request structure

A rule consists of up to 1000 prefixes and tag-value pairs. You can configure up to 1000 rules per bucket. Separate tag-value pairs in the rule using the keywords "And": or "Or":.

The request body is shown below:

'{
  "Role": "",
  "Rules": [{
    "ID": "string",
    "Filter": {
       "Prefix": "string",
       "Tag": {
         "Key": "string",
         "Value": "string"
      }
    },
    "Status": "boolean",
    "Destination": {
      "Bucket": "json",
      "Account": "B64_key, B64_key",
      "StorageClass": ""
    }
   }
   .
   .
   .
  }]
}'
NoteS3 parameters not shown are not required, not supported, and if specified should be left empty.
ParameterRequiredTypeDescription
RoleYesN/ANot supported; leave empty.
IDNoString

Unique identifier for rule, up to 255 characters.

All rules must specify the same bucket.

PriorityYesIntegerNot supported; ignored.
DeleteMarkerReplication.StatusNoStringNot supported; if provided, leave as Disabled.
PrefixNoStringPrefix (one per rule). Up to 1024 characters.
KeyNoStringTag key (up to 1000 per rule). Up to 128 characters.
ValueNoStringTag value. Up to 256 characters.
Rules.StatusYesBooleanEnter Enabled or Disabled. If Disabled, rule is ignored.
BucketYesBase64-encoded JSON

External S3 bucket access settings.

  • For bucket synch-to, the settings to access the external bucket.
  • For bucket sync-from, the settings to access the external bucket and the SQS queue settings.
AccountNoBase64 encoded stringThe S3 access key and secret key credentials to the external S3 bucket.
StorageClassNoEnumOptional destination storage class override. If provided, leave empty.
Bucket sync-to structure

Bucket sync-to settings are defined by a set of parameters and passed in the value of Destination.Bucket as a Base64-encoded string.

The syntax of a bucket sync-to setting is shown below:

arn::sync-to::version::host:port>::region::bucket_name::auth_version::path_style_always
ParameterRequiredTypeDescription
versionYesStringEnter 1.0.
hostYesIP addressHost IP address.
portYesintegerHost port.
regionYesStringThe S3 region.
bucket_nameYesStringThe name of the bucket. Enter a name from 3 to 63 characters long containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist.
auth_versionYesStringAWS Signature version: enter V2 or V4.
path_style_alwaysYesBooleanPath-style URLs for bucket access: enter true or false.
Bucket sync-from structure

Bucket sync-from settings include both a bucket address and a notification queue. The settings are defined by a set of parameters and passed in the value of Destination.Bucket as a Base64-encoded string.

The syntax of a bucket sync-from setting is shown below:

arn::sync-to::version::host:port>::S3_region::bucket_name::auth_version::path_style_always::AWS_SQS::SQS_region::SQS_queue::SQS_access_key::SQS_secret_key
ParameterRequiredTypeDescription
versionYesStringEnter 1.0.
hostYesIP addressHost IP address.
portYesintegerHost port.
S3_regionYesStringThe S3 region.
bucket_nameYesStringThe name of the bucket. Enter a name from 3 to 63 characters long containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist.
auth_versionYesStringAWS Signature version: enter V2 or V4.
path_style_alwaysYesBooleanPath-style URLs for bucket access: enter true or false.
SQS_regionYesStringThe SQS region.
SQS_queueYesStringThe name of the notification queue.
SQS_access_keyYesBase64-encoded stringThe access key of the S3 credentials for access to the notification queue.
SQS_secret_keyYesBase64-encoded stringThe secret key of the S3 credentials for access to the notification queue.
Response structure

None.

Example

Request example:

aws --endpoint-url https://10.08.1019 s3api put-bucket-replication --bucket "hcpcs_bucket" --replication-configuration '{body}'

JSON request:

'{
  "Role": "",
  "Rules": [{
    "ID": "sync_rule1_for_images",
    "Filter": {
      "Prefix": "/images/september/",
      "Tag": {
        "Key": "target",
        "Value": "cloud"
        }
    },
    "Status": "Enabled",
    "Destination": {
      "Bucket": "arn::sync-to::1.0::s3.amazonaws.com:443::us-east-1::redbucket::v4::true",
      "Account": "access_key, secret_key",
      "StorageClass": "STANDARD_IA"
    }
  },
  {
    "ID": "sync_rule2_for_music",
    "Filter": {
      "Prefix": "/music/october/",
      "Tag": {
        "Key": "target",
        "Value": "cloud"
      }
    },
    "Status": "Enabled",
    "Destination": {
      "Bucket": "arn::sync-from::1.0::s3.amazonaws.com:443::us-east-1::bluebucket::v4::true::AWS_SQS::us-east-1::blackqueue::MTIzNA==::Njc4OQ==",
      "Account": "access_key, secret_key",
      "StorageClass": "STANDARD_IA"
    }
  }]
}'

Get bucket synchronization rules (GET bucket replication)

You can retrieve the synchronization rules for a bucket.

HTTP request syntax (URI)
aws --endpoint -url https://host_ip s3api get-bucket-replication --bucket "bucket"
Request structure

Not applicable.

Response structure

The response body is shown below:

{
  "ReplicationConfiguration": {
    "Role": "",
    "Rules": [
      {
        "Filter": {
          "And": {
            "Prefix": "string",
            "Tags": [
              {
              "Key": "string",
              "Value": "string"
        }
        .
        .
        .
      },
      "Status": "boolean",
      "Destination": {
        "Bucket": "access_settings",
      },
       "ID": "string",
     }
     ],
  }
}
ParameterRequiredTypeDescription
RoleYesN/ANot supported; empty.
PrefixNoStringPrefix.
KeyNoStringTag key.
ValueNoStringTag value. Sets of prefixes and key-value pairs.
StatusYesBooleanIf false, rule is ignored.
BucketYesBase64-encoded JSON

Bucket access settings. S3 access and secret keys are masked.

IDNoStringUnique identifier for rule, up to 255 characters.
Return codes

Status code

HTTP name

Description

200 OK The request was executed successfully.
401 Unauthorized Access was denied due to invalid credentials.
Example

Request example:

aws --endpoint-url https://10.08.1019 s3api get-bucket-replication --bucket "hcpcs_bucket"

JSON response:

{
  "ReplicationConfiguration": {
    "Role"": "",
    "Rules": [
      {
        "Filter": {
          "And": {
            "Prefix": "SQS",
            "Tags": [
              {
                "Value": "cloud",
                "Key": "target"
              }
            ]
          }
        },
        "Status": "Enabled",
        "Destination": {
          "Bucket": "arn::sync-from::1.0::s3.amazonaws.com:443::<AWS-Region>::hcpcs_bucket::V4::true::AWS_SQS::<SQS-Region>::<SQS-QUEUE-TopicName>",
        },
        "ID": "mirrorBack_rule_for_images"
      }
    ]
  }
}

Get object synchronization status

The synchronization status of an object is returned in metadata as part of the response to a GET object or HEAD object request.

For a GET object or HEAD object request, the synchronization functions return a replication status header in addition to the standard response metadata. This information is useful before deletion from a source bucket to verify synchronization.

Response header

Description

x-amz-replication-status

Status of synchronization:

  • COMPLETED: For sync-to, all rules were successfully executed and the object was successfully synchronized.

    Note: This status is also returned for objects that match a sync-to rule but were skipped because they are not the most recent version.

  • PENDING: For sync-to, one of the following: (1) a check is pending to see if the object needs synchronization; (2) the object needs synchronization, but the process is not complete.
  • FAILED: For sync-to, the process has failed multiple times. To be synchronized, the object must be reloaded.
  • REPLICA: For sync-from, the object is a replica created by Amazon S3.
(Header not in response)The object did not match any rules.

Delete bucket synchronization rules (DELETE bucket replication)

You can delete S3 synchronization settings for buckets. This function is the same as in AWS S3.

HTTP request syntax (URI)
aws --endpoint -url https://host_ip s3api delete-bucket-replication --bucket "bucket"
Request structure

None.

Response structure
Example

Request example:

aws --endpoint-url https://10.08.1019 s3api delete-bucket-replication --bucket "hcpcs_bucket"