Bucket synchronization

Last updated
Save as PDF

Hitachi Content Platform for cloud scale (HCP for cloud scale) lets you configure and manage bucket synchronization.

To configure bucket synchronization, use S3 put bucket replication API requests. Scripts are available to simplify the process.

About bucket synchronization

HCP for cloud scale can synchronize the following kinds of data in buckets:

Object data
All user metadata (that is, anything that can be returned in the header x-amz-meta-*)
Tags
Content-Type system metadata
Objects that the owner of the source bucket doesn't have permission to read

This diagram illustrates the concept of bucket synchronization.

Conceptual diagram showing an HCP for cloud scale system with synch-to and synch-from buckets, each synchronized with different buckets in the cloud

Limitations on bucket synchronization

Objects that existed before synchronization functions are configured are not synchronized.

HCP for cloud scale verifies the rules that are valid at the time an object is synchronized, not at the time the object is ingested.

Objects that are marked as deleted are not synchronized.

Most system metadata is not synchronized, specifically:

Owner ID and Name
Timestamps (when last modified)
Metadata returned in x-amz-grant-*
Metadata returned in x-amz-acl
Metadata returned in x-amz-grant-*
Metadata returned in x-amz-acl
Metadata returned in x-amz-storage-class
Metadata returned in x-amz-replication-status
Metadata returned in x-amz-server-side-encryption-*
Metadata returned in x-amz-restore-*
Metadata returned in x-amz-version-id-*
Metadata returned in x-amz-website-redirect-location
Metadata returned in x-amz-object-lock-*

The bucket sync-from function only supports one rule for the same external SQS queue and external bucket. If a bucket has multiple sync-from rules for the same external queue, objects might not be synchronized. To use multiple rules for an external bucket, use one SQS queue for each rule.

Comparing synchronization to replication

Unlike AWS replication, HCP for cloud scale can synchronize with buckets on storage systems outside of AWS.

AWS determines the destination bucket using rules, but only applies one rule to each new object. In contrast, HCP for cloud scale can apply multiple rules to each new object so long as the destination buckets are different. This is how one-to-many synchronization is implemented.

AWS does not replicate, but HCP for cloud scale synchronizes, objects that the owner of the source bucket doesn't have permission to read.

In contrast with AWS replication, HCP for cloud scale does not synchronize the following:

Access control lists (ACLs)
Lock retention information
Objects that are encrypted using Amazon S3 managed keys (SSE-S3) and AWS KMS managed keys (SSE-KMS)

If an object being synchronized has the same name as an object in the target bucket, the result depends on whether the target bucket uses versioning:

If versioning is used, the old object is kept as an old version.
If versioning is not used, the old object is replaced by the new object.

HCP for cloud scale buckets always use versioning. The best practice is to use versioning in all target buckets.

Best-effort ordering

HCP for cloud scale guarantees that operations are applied in the order of their arrival (strong consistency). However, synchronizing multiple operations applied in a short period of time to the same object presents the following difficulties:

In a distributed system, especially when many systems are involved, synchronizing all operations in correct order is complex.
Even if HCP for cloud scale synchronizes all operations in correct order to an external storage component, that component might not guarantee that the operations are applied with strong consistency. In particular, AWS guarantees only "eventual consistency."
For bucket sync-from, the external queue service might not guarantee that messages are provided in correct order. In particular, AWS Simple Queue Service (SQS) does not support first-in, first-out (FIFO) queues for S3 notifications.

Therefore, HCP for cloud scale makes its best effort to synchronize only the latest state of an object, not each version or operation for the object. For example:

Assume that a client sends three operations to an object and that they are all committed: (1) PUT, (2) PUT, (3) DEL. The latest state of the object is (3) DEL. HCP for cloud scale only synchronizes DEL.
Assume that a client sends three operations to an object and that they are all committed: (1) PUT, (2) DEL, (3) PUT. The latest state of the object is (3) PUT. HCP for cloud scale only synchronizes (3) PUT.

This approach does not guarantee that the latest state of an object will be in the external storage for all situations. Partly because of the "eventual consistency" offered by AWS S3 API, corner cases still exist.

Synchronization to an external bucket: high-level tasks

Synchronization to an external bucket involves assigning roles and permissions to users, creating and synchronizing the buckets, and then reading from and writing to the buckets.

This description of high-level tasks assumes three classes of user:

An HCP for cloud scale system administrator to create roles and assign them to users using an IdP
An HCP for cloud scale bucket administrator, who could be a tenant administrator, to create and configure an HCP for cloud scale bucket
An Amazon Web Services (AWS) user, who could be a customer, to create a remote bucket using AWS S3 and then read and write data

NoteThe default HCP for cloud scale account has full permissions and can perform the tasks assigned to the first two user classes.

Procedure

The system administrator assigns permissions to the bucket administrator to configure bucket synchronization.
1. In the System Management application, create a role with the permission group bucket_sync.
2. In the IdP server, set up two groups: bucket administrators and bucket users.
3. In the IdP server, register users in these groups.
4. In the System Management application, assign the role to the bucket administrator group.
The bucket administrator creates local and remote buckets.
1. In the S3 User Credentials application, generate S3 credentials.
  
  TipUse the base64 utility to encode S3 credentials.
2. Using the S3 credentials, use an S3 API to create an HCP for cloud scale (local) bucket.
3. Use an AWS S3 API to create an S3 (remote) bucket.
The bucket administrator configures bucket synchronization between the HCP for cloud scale bucket and the S3 bucket using an S3 PUT Bucket Replication method, replacing the bucket ARN with configuration settings. By using multiple rules and filters, the bucket administrator can specify what objects are synchronized to the S3 bucket.
The bucket administrator sets access control lists to let the bucket user write data to the HCP for cloud scale bucket.
1. Using a management API, get the user ID of the bucket user.
2. Using an S3 API, assign write permission to the bucket user for the HCP for cloud scale bucket.
The AWS user is now free to write objects to the HCP for cloud scale bucket, which is now synchronized with the remote bucket.

Synchronization from an external bucket: high-level tasks

Synchronization from an external bucket involves assigning roles and permissions to users, creating and synchronizing buckets, and then reading from and writing to the buckets.

This description of high-level tasks assumes three classes of user:

An HCP for cloud scale system administrator to create roles and assign them to users using an IdP
An HCP for cloud scale bucket administrator, who could be a tenant administrator, to create and configure an HCP for cloud scale bucket
An AWS user, who could be a customer, to create a remote bucket using AWS S3, create an AWS SQS queue, and then configure S3 notifications to SQS

NoteThe default HCP for cloud scale account has full permissions and can perform the tasks assigned to the first two user classes.

Procedure

The system administrator assigns permissions to the bucket administrator to configure bucket synchronization.
1. In the System Management application, create a role with the permission group bucket_sync.
2. In the IdP server, set up two groups: bucket administrators and bucket users.
3. In the IdP server, register users in these groups.
4. In the System Management application, assign the role to the bucket administrator group.
The bucket administrator creates local and remote buckets.
1. In the S3 User Credentials application, generate S3 credentials.
  
  TipUse the base64 utility to encode S3 credentials.
2. Using the S3 credentials, use an S3 API to create an HCP for cloud scale (local) bucket.
3. Use an AWS S3 API to create an S3 (remote) bucket.
The AWS user creates a standard queue in SQS.
1. Using an AWS account, create a queue of the type Standard Queue.
2. Create a policy document.
The AWS user configures the remote bucket to send S3 notifications to the AWS SQS queue.
1. Add a notification for all object creation events to the remote bucket.
The bucket administrator configures bucket synchronization between the S3 bucket and the HCP for cloud scale bucket using an S3 PUT Bucket Replication method, replacing the bucket ARN with configuration settings. By using multiple rules and filters, the bucket administrator can specify what objects are synchronized to the local bucket.
The bucket administrator sets access control lists to let the bucket user read data from the HCP for cloud scale bucket.
1. Using a management API, get the user ID of the bucket user.
2. Using an S3 API, assign write permission to the bucket user for the HCP for cloud scale bucket.
The AWS user is now free to read objects from the HCP for cloud scale bucket, which is now synchronized with the remote bucket.

Bucket synchronization configuration

Bucket synchronization is configured using S3 PUT bucket replication API requests that define rules. Each bucket can have up to 1,000 rules, but all rules must be sync-to or sync-from rules. Each rule defines the following:

External bucket settings
A set of one or more prefixes; an object with one of the prefixes is mirrored
A set of one or more tags; an object with all, or any, of the tags is mirrored
For sync-from, external queue settings

Because you can configure multiple rules with multiple tags, you have flexibility in selecting objects to mirror. For example:

To mirror all objects that contain Tag₁ and Tag₂, you can configure one rule that includes both tags.
To mirror all objects that contain Tag₁ or Tag₂, you can configure two rules, one for each tag.

For information on PUT bucket replication see Configure bucket synchronization (PUT bucket replication).

Visibility of new buckets and objects

After they are created, buckets and objects are not immediately visible. Some client applications (such as Cloudberry Explorer) immediately retrieve the list of buckets to display the new bucket or object, which is not visible. If you create a new bucket or object and it's not immediately visible, update the list manually.

Rule collisions

HCP for cloud scale can apply multiple bucket synchronization rules to each new object so long as the destination buckets are different. This is how one-to-many synchronization is implemented.

A rule collision is when two or more rules that apply to an object have the same destination (that is, the same external host, port, and bucket). HCP for cloud scale does not allow rule collisions, so PUT bucket replication requests are rejected if they contain rule collisions. To avoid rule collisions, you can define as many tags in a rule as necessary, so that multiple rules with the same destination are not needed.

Effect of configuration changes

After an object operation is performed, the policy engine asynchronously checks if that object needs to be copied according to the sync-to rules. When bucket synchronization rules are created, updated, or deleted, the changes only apply to new objects, object operations, and to objects that have not been yet processed by the policy engine. Objects that existed before the rules were configured are not synchronized. If an object exists in the PENDING state when a rule is created, updated, or deleted, the rule change might not be applied.

Synchronizing to the same source and destination

You cannot set up bucket synchronization with the same bucket as both the source and the destination.

Configure bucket synchronization (PUT bucket replication)

You can configure S3 bucket sync-to and sync-from settings.

Notes

If you use the AWS command-line interface to configure bucket synchronization, use at least aws-cli v1.16.211 and aws-sdk 1.11.610.
Configuration rules should be provided to AWS CLI from a file, rather than inline. This is to avoid problems with double quote characters in some terminals.

HTTP request syntax (URI)

aws --endpoint-url https://10.08.1019 s3api put-bucket-replication --bucket "hcpcs_bucket" --replication-configuration file://rules.json

Request structure

A rule consists of up to 1000 prefixes and tag-value pairs. You can configure up to 1000 rules per bucket. Separate tag-value pairs in the rule using the keywords "And": or "Or":.

The content of the configuration JSON file is:

{
  "Role": "",
  "Rules": [{
    "ID": "string",
    "Filter": {
       "Prefix": "string",
       "Tag": {
         "Key": "string",
         "Value": "string"
      }
    },
    "Status": "boolean",
    "Destination": {
      "Bucket": "json"
    }
   }
   .
   .
   .
  }]
}

NoteS3 parameters not shown are not required, not supported, and if specified should be left empty.

Account Parameter	Required	Type	Description
Role	Yes	N/A	Not supported; leave empty.
ID	No	String	Unique identifier for rule, up to 255 characters. All rules must specify the same bucket.
Priority	Yes	Integer	Not supported; ignored.
DeleteMarkerReplication.Status	No	String	Not supported; if provided, leave as `Disabled`.
Prefix	No	String	Prefix (one per rule). Up to 1024 characters.
Key	No	String	Tag key (up to 1000 per rule). Up to 128 characters.
Value	No	String	Tag value. Up to 256 characters.
Rules.Status	Yes	Boolean	`Enabled` or `Disabled`. If `Disabled`, rule is ignored.
Rules.Destination.Bucket	Yes	Base64-encoded JSON	External S3 bucket access settings. For bucket sync-to, the settings to access the external bucket. For bucket sync-from, the settings to access the external bucket and the SQS queue settings. You can't specify the same bucket name and host as both source and destination.
Rules.Destination.Account	No	N/A	Not supported; leave empty.

Bucket sync-to structure

Bucket sync-to settings are defined by a set of parameters and passed in the value of Rules.Destination.Bucket as a Base64-encoded JSON structure.

The syntax inside the bucket parameter for the sync-to setting is:

{
  'version': 'version', 
  'action': 'sync-from', 
  'externalBucket': {
    'host': 'host', 
    'type': 'type', 
    'region': 'region', 
    'remoteBucketName': 'bucket_name', 
    'accessKey': 'B64_key', 
    'secretKey': 'B64_key', 
    'port': 'port', 
    'authVersion': 'auth_version', 
    'usePathStyleAlways': '[true|false]'
    },
  'notifications': {
    'type': 'type', 
    'region': 'region', 
    'queue': 'queue', 
    'accessKey': 'B64_key', 
    'secretKey': 'B64_key'
    }
}

Parameter	Required	Type	Description
version	Yes	String	`1.0`.
host	Yes	IP address	Host IP address.
type	Yes	String	Destination storage class: `AMAZON_S3` or `GENERIC_S3`.
region	Yes	String	The S3 region.
remoteBucketName	Yes	String	The name of the bucket, from 3 to 63 characters long, containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist.
accessKey	Yes	Base64 encoded string	The S3 access key credentials to the external S3 bucket.
secretKey	Yes	Base64 encoded string	The S3 secret key credentials to the external S3 bucket.
port	Yes	integer	Host port.
authVersion	Yes	String	AWS Signature version: `V2` or `V4`.
usePathStyleAlways	Yes	Boolean	Path-style URLs for bucket access: `true` or `false`.

Bucket sync-from structure

Bucket sync-from settings include both a bucket address and a notification queue. The settings are defined by a set of parameters and passed in the value of Rules.Destination.Bucket as a Base64-encoded string.

The syntax inside the bucket parameter for sync-from setting is:

"{
  'version': 'version', 
  'action': 'sync-from', 
  'externalBucket': {
    'host': 'host', 
    'type': 'type', 
    'region': 'region', 
    'remoteBucketName': 'bucket_name', 
    'accessKey': 'B64_key', 
    'secretKey': 'B64_key', 
    'port': 'port', 
    'authVersion': 'auth_version', 
    'usePathStyleAlways': '[true|false]'
    }
}"

Parameter	Required	Type	Description
version	Yes	String	Enter `1.0`.
host	Yes	IP address	Host IP address.
type	Yes	String	Destination storage class: `AMAZON_S3` or `GENERIC_S3`.
region	Yes	String	The S3 region.
remoteBucketName	Yes	String	The name of the bucket, from 3 to 63 characters long, containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist.
accessKey	Yes	Base64 encoded string	The S3 access key credentials to the external S3 bucket.
secretKey	Yes	Base64 encoded string	The S3 secret key credentials to the external S3 bucket.
port	Yes	integer	Host port.
authVersion	Yes	String	AWS Signature version: `V2` or `V4`.
usePathStyleAlways	Yes	Boolean	Path-style URLs for bucket access: `true` or `false`.
Destination.type	Yes	String	Always set as `AWS_SQS`.
Destination.region	Yes	String	Region of your AWS_SQS queue.
Destination.queue	Yes	String	Name of your AWS_SQS queue.
Destination.accessKey	Yes	Base64 encoded string	accessKey for permissions to read from your AWS_SQS queue.
Destination.secretKey	Yes	Base64 encoded string	secretKey for permissions to read from your AWS_SQS queue.

Response structure

None.

Example

Request example:

aws --endpoint-url https://10.08.1019 s3api put-bucket-replication --bucket "hcpcs_bucket" --replication-configuration file://rules.json

Configuration rules.json:

{
    "ID": "sync_rule2_for_music",
    "Filter": {
      "Prefix": "/music/october/",
      "Tag": {
        "Key": "target",
        "Value": "cloud"
        }
      }
    },
    "Status": "Enabled",
    "Destination": {
      "Bucket": "{
        'version' : '1.0',
        'action' : 'sync_from',
        'externalBucket' : {
          'type' : 'AMAZON_S3',
          'region' : 'us-east-1',
          'remoteBucketName' : 'bluebucket',
          'authVersion' : 'V4',
          'usePathStyleAlways' : 'true',
          'accessKey' : 'access_key',
          'secretKey' : 'secret_key'
          },
        "notifications" : {
          "type" : "AMAZON_SQS",
          "region" : "us-east-1",
          "queue" : "testQueue",
          "accessKey" : "access_key",
          "secretKey" : "secret_key"
          }
        },
      }
    }
  }]
}

Script to generate bucket sync-to JSON

HCP for cloud scale includes a script to generate the JSON needed to configure bucket synchronization to an external bucket (sync-to).

The script is written in Python and located in the folder install_path/product/bin (for example, /opt/hcpcs/bin).

The script generates the JSON string that you can use for the field destination.bucket in the AWS S3 command put-bucket-replication. Optionally, the script verifies whether the destination bucket exists. If you omit the secret key, the script prompts you for it, which lets you create a script that calls this script without storing the secret key. You can mix the short and full form of arguments.

NoteThe script produces JSON using single quotes.

Syntax

SyncToBucketJsonGenerator.py
  [--help]
  --s3host host
  --region region
  --bucket bucket
  --accessKey access_key
  [--secretKey secret_key]
  [--s3type { GENERIC_S3 | AMAZON_S3 }]
  [--port port]
  [--authVersion { v2 | v4 }]
  [--usePathStyleAlways {true | false}]
  [--jsonSample output_file.json]
  [--verifyTarget]
  [--http]
  [--quietMode]

Options and parameters

-h, --help
Optional. Displays a help message and exits.

--s3host host, -s3 host
Host name of the remote S3 storage component.
--region region, -r region
Region of the remote bucket.
--bucket bucket, -b bucket
Name of the remote bucket.
--accessKey access_key, -ak access_key
Access key for the remote bucket.
--secretKey secret_key, -sk secret_key
Secret key for the remote bucket. The script prompts for the key if you don't specify it.
--s3type { GENERIC_S3 | AMAZON_S3 }, -s3t { GENERIC_S3 | AMAZON_S3 }
Optional. The remote bucket type:
- GENERIC_S3: An S3-compatible node
- AMAZON_S3: An Amazon Web Services S3-compatible node
If not specified, the default bucket type is AMAZON_S3.
--port port, -p port
Optional. Port of the remote bucket. If not specified, the default port is 443.
--authVersion { v2 | v4 }, -av { v2 | v4 }
Optional. The Auth Version of the remote bucket. If not specified, the default version is v4.
--usePathStyleAlways {true | false}, -upsa {true | false}
Optional. Sets the Use Path Style Always flag for the remote bucket. If not specified, the default is true.
--jsonSample output_file.json, -json output_file.json
Optional. Creates a file named output_file.json with a sample JSON structure for bucket replication configuration. If not specified, no sample file is created.
--verifyTarget, -verify
Optional. Verifies that the remote bucket exists. SSL certificates are not validated. This option requires python3 and boto3. If not specified, the bucket's existence isn't verified.
--http, -http
Optional. Use HTTP when verifying the remote bucket. If not specified, the default is to use HTTPS.
--quietMode, -qm
Optional. Displays only the Destination.Bucket element.
NoteYou can't specify both --quietMode and --verifyTarget together.

Example

$ SyncToBucketJsonGenerator.py -s3 s3.us-east-2.amazonaws.com -b hcpcs-bucket-5 -r us-east-2 -ak A1234567890 -sk S1234567890  -verify -json testto.json

This example can produce the following output:

Verifying that a remote bucket "hcpcs-bucket-5" exists...
Verification successfully completed: remote bucket "hcpcs-bucket-5" is FOUND

Generated a JSON string for the Destination->Bucket element for bucket replication sync-to configuration:

{'action': 'sync-to', 'version': '1.0', 'externalBucket': {'host': 's3.us-east-2.amazonaws.com', 'type': 'AMAZON_S3', 'region': 'us-east-2', 
'remoteBucketName': 'hcpcs-bucket-5', 'accessKey': 'A1234567890=', 'secretKey': 'S1234567890==', 'port': 443, 
'authVersion': 'v4', 'usePathStyleAlways': 'true'}}

Saved sample JSON file for bucket replication sync-to configuration in 'testto.json'

You can use 'testto.json' sample JSON file as an input to put-bucket-replication S3 API. For example, using aws s3api command:
aws s3api put-bucket-replication --no-verify-ssl --endpoint-url https://cloudscale-hostname --bucket cloudscale-bucket --replication-configuration file://testto.json

Script to generate bucket sync-from JSON

A script is included to generate the JSON needed to configure bucket synchronization from an external bucket (sync-from).

The script is written in Python and located in the folder install_path/product/bin (for example, /opt/hcpcs/bin).

The script generates the JSON string that you can use for the field destination.bucket in the AWS S3 command put-bucket-replication. Optionally, the script verifies whether the destination bucket or the target AWS SQS queue exist. If you omit the secret key, the script prompts you for it, which lets you create a script that calls this script without storing the secret key. If you omit the access key for a queue, the script uses the access key and secret key for the bucket. You can mix the short and full form of arguments.

NoteThe script produces JSON using single quotes.

Syntax

SyncFromBucketJsonGenerator.py
  [--help]
  --s3host host
  --region region
  --bucket bucket
  --accessKey access_key
  [--secretKey secret_key]
  [--s3type { GENERIC_S3 | AMAZON_S3 }]
  [--port port]
  [--authVersion { v2 | v4 }]
  [--usePathStyleAlways {true | false}]
  [--jsonSample output_file.json]
  [--verifyTarget]
  [--http]
  --notificationsQueue queue
  [--notificationsRegion region]
  [--notificationsAccessKey access_key]
  [--notificationsSecretKey secret_key]
  [--quietMode]

Options and parameters

-h, --help
Optional. Displays a help message and exits.

--s3host host, -s3 host
Host name of the remote S3 storage component.
--region region, -r region
Region of the remote bucket.
--bucket bucket, -b bucket
Name of the remote bucket.
--accessKey access_key, -ak access_key
Access key for the remote bucket.
--secretKey secret_key, -sk secret_key
Secret key for the remote bucket. The script prompts for the key if you don't specify it.
--s3type { GENERIC_S3 | AMAZON_S3 }, -s3t { GENERIC_S3 | AMAZON_S3 }
Optional. The remote bucket type:
- GENERIC_S3: An S3-compatible node
- AMAZON_S3: An Amazon Web Services S3-compatible node
If not specified, the default bucket type is AMAZON_S3.
--port port, -p port
Optional. Port of the remote bucket. If not specified, the default port is 443.
--authVersion { v2 | v4 }, -av { v2 | v4 }
Optional. The Auth Version of the remote bucket. If not specified, the default version is v4.
--usePathStyleAlways {true | false}, -upsa {true | false}
Optional. Sets the Use Path Style Always flag for the remote bucket. If not specified, the default is true.
--jsonSample output_file.json, -json output_file.json
Optional. Creates a file named output_file.json with a sample JSON structure for bucket replication configuration. If not specified, no sample file is created.
--verifyTarget, -verify
Optional. Verifies that the remote bucket exists. SSL certificates are not validated. This option requires python3 and boto3. If not specified, the bucket's existence isn't verified.
--http, -http
Optional. Use HTTP when verifying the remote bucket. If not specified, the default is to use HTTPS.
--notificationsQueue queue, -nq queue
Name of the notifications queue.
--notificationsRegion region, -nq region
Optional. Name of the notifications region. If not specified, the default is the region of the remote bucket.
--notificationsAccessKey access_key, -nak access_key
Optional. The notifications access key. If not specified, the default is the access key of the remote bucket.
--notificationsSecretKey secret_key, -nsk secret_key
Optional. The notifications secret key. If not specified, the default is the secret key of the remote bucket.
--quietMode, -qm
Optional. Displays only the JSON for QueueArn.
NoteYou can't specify both --quietMode and --verifyTarget together.

Example

$ SyncFromBucketJsonGenerator.py -s3 s3.us-east-2.amazonaws.com -b hcpcs-bucket-5 -r us-east-2 -ak A1234567890 -sk S1234567890 -nq 'bucketevents2' -verify -json testfrom.json

This example can produce the following output:

Verifying that a remote bucket "hcpcs-bucket-5" exists...
Verification successfully completed: remote bucket "hcpcs-bucket-5" is found

Verifying that a remote notification queue with a prefix "bucketevents2" exists...
Verification successfully completed: found "bucketevents2" queue.

Generated a JSON string for the Destination->Bucket element for bucket replication sync-from configuration:

{'action': 'sync-from', 'version': '1.0', 'externalBucket': {'host': 's3.us-east-2.amazonaws.com', 'type': 'AMAZON_S3', 'region': 'us-east-2', 
'remoteBucketName': 'hcpcs-bucket-5', 'accessKey': 'A1234567890=', 'secretKey': 'S1234567890==', 'port': 443, 
'authVersion': 'v4', 'usePathStyleAlways': 'true'}, 'notifications': {'type': 'AWS_SQS', 'queue': 'bucketevents2', 'region': 'us-east-2', 
'accessKey': 'A1234567890=', 'secretKey': 'S1234567890=='}}

Saved sample JSON file for bucket replication sync-from configuration in 'testfrom.json'

You can use 'testfrom.json' sample JSON file as an input to put-bucket-replication S3 API. For example, using aws s3api command:
aws s3api put-bucket-replication --no-verify-ssl --endpoint-url https://cloudscale-hostname --bucket cloudscale-bucket --replication-configuration file://testfrom.json

Get bucket synchronization rules (GET bucket replication)

You can retrieve the synchronization rules for a bucket.

HTTP request syntax (URI)

aws --endpoint -url https://host_ip s3api get-bucket-replication --bucket "bucket"

Request structure

Not applicable.

Response structure

The response body is shown below:

{
  "ReplicationConfiguration": {
    "Role": "",
    "Rules": [
      {
        "Filter": {
          "And": {
            "Prefix": "string",
            "Tags": [
              {
              "Key": "string",
              "Value": "string"
        }
        .
        .
        .
      },
      "Status": "boolean",
      "Destination": {
        "Bucket": "access_settings",
      },
       "ID": "string",
     }
     ],
  }
}

Parameter	Required	Type	Description
Role	Yes	N/A	Not supported; empty.
Prefix	No	String	Prefix.
Key	No	String	Tag key.
Value	No	String	Tag value. Sets of prefixes and key-value pairs.
Status	Yes	Boolean	If `false`, rule is ignored.
Bucket	Yes	Base64-encoded JSON	Bucket access settings. S3 access and secret keys are masked.
ID	No	String	Unique identifier for rule, up to 255 characters.

HTTP status codes

Status code	HTTP name	Description
200	OK	The request was executed successfully.
401	Unauthorized	Access was denied due to invalid credentials.

Example

Request example:

aws --endpoint-url https://10.08.1019 s3api get-bucket-replication --bucket "hcpcs_bucket"

JSON response:

{
  "ReplicationConfiguration": {
    "Role"": "",
    "Rules": [
      {
        "Filter": {
          "And": {
            "Prefix": "SQS",
            "Tags": [
              {
                "Value": "cloud",
                "Key": "target"
              }
            ]
          }
        },
        "Status": "Enabled",
        "Destination": {
          "Bucket": {
            'version': 'version', 
            'action': 'sync-from', 
            'externalBucket': {
              'host': 'host', 
              'type': 'type', 
              'region': 'region', 
              'remoteBucketName': 'bucket_name', 
              'port': 'port', 
              'authVersion': 'auth_version', 
              'usePathStyleAlways': '[true|false]'
              }
            }"
          },
        "ID": "mirrorBack_rule_for_images"
      }
    ]
  }
}

Get object synchronization status

The synchronization status of an object is returned in metadata as part of the response to a GET object or HEAD object request.

For a GET object or HEAD object request, the synchronization functions return a replication status header in addition to the standard response metadata. This information is useful before deletion from a source bucket to verify synchronization.

When an object is created, HCP for cloud scale evaluates the sync-to rules for the bucket. If the object matches the rules, it sets the object's sync state as PENDING. Most of the time, this sync state is accurate. However, it is never definitive because users may change the sync-to rules for the bucket before the policy engine starts processing the object, which happens asynchronously. The policy engine evaluates the sync-to rules again when processing an object to act according to the latest sync rules.

For example:

An object was ingested that matches the sync-to rules, so its sync state is set as PENDING. Then, a user changes the sync-to rules. The object does not match the rules anymore so the object is actually not synced and that sync state is removed.
An object was ingested that does not match the sync-to rules, so its sync state is not set. Then, a user changes the sync rules. The object now matches the rules so the object is actually synced and the sync state is set to COMPLETED.

Response header	Description
x-amz-replication-status	Status of synchronization: COMPLETED: For sync-to, all rules were successfully executed and the object was successfully synchronized. Note: This status is also returned for objects that match a sync-to rule but were skipped because they are not the most recent version. PENDING: For sync-to, one of the following: (1) a check is pending to see if the object needs synchronization; (2) the object needs synchronization, but the process is not complete. FAILED: For sync-to, the process has failed multiple times. To be synchronized, the object must be reloaded. REPLICA: For sync-from, the object is a replica created by Amazon S3.
(Header not in response)	The object did not match any rules.

Response header

Description

x-amz-replication-status

Status of synchronization:

COMPLETED: For sync-to, all rules were successfully executed and the object was successfully synchronized.
Note: This status is also returned for objects that match a sync-to rule but were skipped because they are not the most recent version.
PENDING: For sync-to, one of the following: (1) a check is pending to see if the object needs synchronization; (2) the object needs synchronization, but the process is not complete.
FAILED: For sync-to, the process has failed multiple times. To be synchronized, the object must be reloaded.
REPLICA: For sync-from, the object is a replica created by Amazon S3.

(Header not in response)

The object did not match any rules.

Delete bucket synchronization rules (DELETE bucket replication)

You can delete S3 synchronization settings for buckets. This function is the same as in AWS S3.

HTTP request syntax (URI)

aws --endpoint -url https://host_ip s3api delete-bucket-replication --bucket "bucket"

Request structure

None.

Response structure

None.

Example

Request example:

aws --endpoint-url https://10.08.1019 s3api delete-bucket-replication --bucket "hcpcs_bucket"

NoteIf a sync-from action fails it is retried and the SQS message about the failure is retained. To avoid a possible accumulation of SQS failure messages, the best practice is to define a suitable retention policy for SQS and to delete the sync-from rule once the desired results are obtained.

We've Moved!

Product Documentation has moved to docs.hitachivantara.com

About bucket synchronization

Synchronization to an external bucket: high-level tasks

Synchronization from an external bucket: high-level tasks

Bucket synchronization configuration

Configure bucket synchronization (PUT bucket replication)

Script to generate bucket sync-to JSON

Script to generate bucket sync-from JSON

Get bucket synchronization rules (GET bucket replication)

Get object synchronization status

Delete bucket synchronization rules (DELETE bucket replication)

Still Looking?

Quick Links

About bucket synchronization

Synchronization to an external bucket: high-level tasks

Synchronization from an external bucket: high-level tasks

Bucket synchronization configuration

Configure bucket synchronization (PUT bucket replication)

Script to generate bucket sync-to JSON

Script to generate bucket sync-from JSON

Get bucket synchronization rules (GET bucket replication)

Get object synchronization status

Delete bucket synchronization rules (DELETE bucket replication)