Bucket synchronization
Hitachi Content Platform for cloud scale (HCP for cloud scale) lets you configure and manage bucket synchronization.
You can use the S3 Console to configure bucket synchronization. You can also use S3 put bucket replication API requests. This chapter describes how to use API requests.
About bucket synchronization
HCP for cloud scale can synchronize the following kinds of data in buckets:
- Object data
- All user metadata (that is, anything that can be returned in the header
x-amz-meta-*
) - Tags
Content-Type
system metadata- Objects that the owner of the source bucket doesn't have permission to read
This diagram illustrates the concept of bucket synchronization.

Objects that existed before synchronization functions are configured are not synchronized. HCP for cloud scale verifies the rules that are valid at the time an object is synchronized, not at the time the object is ingested.
Encrypted objects, ACLs, and objects that are marked as deleted are also not synchronized.
Most system metadata is not synchronized, specifically:
- Owner ID and Name
- Timestamps (when last modified)
- Metadata returned in
x-amz-grant-*
- Metadata returned in
x-amz-acl
- Metadata returned in
x-amz-storage-class
- Metadata returned in
x-amz-replication-status
- Metadata returned in
x-amz-server-side-encryption-*
- Metadata returned in
x-amz-restore-*
- Metadata returned in
x-amz-version-id-*
- Metadata returned in
x-amz-website-redirect-location
- Metadata returned in
x-amz-object-lock-*
Additionally, certain operations are not synchronized, specifically:
- DELETE Object
- Bulk DELETE Object
- PUT Object ACLs
- PUT Object tagging
- DELETE Object tagging
The bucket sync-from function only supports one rule for the same external SQS queue and external bucket. If a bucket has multiple sync-from rules for the same external queue, objects might not be synchronized. To use multiple rules for an external bucket, use one SQS queue for each rule.
In contrast with AWS replication, HCP for cloud scale does not synchronize the following:
- Access control lists (ACLs)
- Lock retention information
- Objects that are encrypted using Amazon S3 managed keys (SSE-S3) and AWS KMS managed keys (SSE-KMS)
If an object being synchronized has the same name as an object in the target bucket, the result depends on whether the target bucket uses versioning:
- If versioning is used, the old object is kept as an old version.
- If versioning is not used, the old object is replaced by the new object.
HCP for cloud scale buckets always use versioning. The best practice is to use versioning in all target buckets.
HCP for cloud scale guarantees that operations are applied in the order of their arrival (strong consistency). However, synchronizing multiple operations applied in a short period of time to the same object presents the following difficulties:
- In a distributed system, especially when many systems are involved, synchronizing all operations in correct order is complex.
- Even if HCP for cloud scale synchronizes all operations in correct order to an external storage component, that component might not guarantee that the operations are applied with strong consistency.
- For bucket sync-from, the external queue service might not guarantee that messages are provided in correct order. In particular, AWS Simple Queue Service (SQS) does not support first-in, first-out (FIFO) queues for S3 notifications.
Therefore, HCP for cloud scale makes its best effort to synchronize only the latest state of an object, not each version or operation for the object. For example:
- Assume that a client sends three operations to an object and that they are all committed: (1) PUT, (2) PUT, (3) DEL. The latest state of the object is (3) DEL. HCP for cloud scale only synchronizes DEL.
- Assume that a client sends three operations to an object and that they are all committed: (1) PUT, (2) DEL, (3) PUT. The latest state of the object is (3) PUT. HCP for cloud scale only synchronizes (3) PUT.
This approach does not guarantee that the latest state of an object will be in the external storage for all situations, however.
Synchronization to an external bucket: high-level tasks
Synchronization to an external bucket involves assigning roles and permissions to users, creating and synchronizing the buckets, and then reading from and writing to the buckets.
This description of high-level tasks assumes three classes of user:
- An HCP for cloud scale system administrator to create roles and assign them to users using an IdP
- An HCP for cloud scale bucket administrator, who could be a tenant administrator, to create and configure an HCP for cloud scale bucket
- An Amazon Web Services (AWS) user, who could be a customer, to create a remote bucket using AWS S3 and then read and write data
Procedure
The system administrator assigns permissions to the bucket administrator to configure bucket synchronization.
In the System Management application, create a role with the permission group bucket_sync.
In the IdP server, set up two groups: bucket administrators and bucket users.
In the IdP server, register users in these groups.
In the System Management application, assign the role to the bucket administrator group.
The bucket administrator creates local and remote buckets.
In the S3 User Credentials application, generate S3 credentials.
TipUse the base64 utility to encode S3 credentials.Using the S3 credentials, use an S3 API to create an HCP for cloud scale (local) bucket.
Use an AWS S3 API to create an S3 (remote) bucket.
The bucket administrator configures bucket synchronization between the HCP for cloud scale bucket and the S3 bucket using an S3 PUT Bucket Replication method, replacing the bucket's Amazon Resource Name (ARN) with configuration settings. By using multiple rules and filters, the bucket administrator can specify what objects are synchronized to the S3 bucket.
The bucket administrator sets access control lists to let the bucket user write data to the HCP for cloud scale bucket.
Using a management API, get the user ID of the bucket user.
Using an S3 API, assign write permission to the bucket user for the HCP for cloud scale bucket.
The AWS user is now free to write objects to the HCP for cloud scale bucket, which is now synchronized with the remote bucket.
Synchronization from an external bucket: high-level tasks
Synchronization from an external bucket involves assigning roles and permissions to users, creating and synchronizing buckets, and then reading from and writing to the buckets.
This description of high-level tasks assumes three classes of user:
- An HCP for cloud scale system administrator to create roles and assign them to users using an IdP
- An HCP for cloud scale bucket administrator, who could be a tenant administrator, to create and configure an HCP for cloud scale bucket
- An AWS user, who could be a customer, to create a remote bucket using AWS S3, create an AWS SQS queue, and then configure S3 notifications to SQS
Procedure
The system administrator assigns permissions to the bucket administrator to configure bucket synchronization.
In the System Management application, create a role with the permission group bucket_sync.
In the IdP server, set up two groups: bucket administrators and bucket users.
In the IdP server, register users in these groups.
In the System Management application, assign the role to the bucket administrator group.
The bucket administrator creates local and remote buckets.
In the S3 User Credentials application, generate S3 credentials.
TipUse the base64 utility to encode S3 credentials.Using the S3 credentials, use an S3 API to create an HCP for cloud scale (local) bucket.
Use an AWS S3 API to create an S3 (remote) bucket.
The AWS user creates a standard queue in SQS.
Using an AWS account, create a queue of the type Standard Queue.
Create a policy document.
The AWS user configures the remote bucket to send S3 notifications to the AWS SQS queue.
Add a notification for all object creation events to the remote bucket.
The bucket administrator configures bucket synchronization between the S3 bucket and the HCP for cloud scale bucket using an S3 PUT Bucket Replication method, replacing the bucket ARN with configuration settings. By using multiple rules and filters, the bucket administrator can specify what objects are synchronized to the local bucket.
The bucket administrator sets access control lists to let the bucket user read data from the HCP for cloud scale bucket.
Using a management API, get the user ID of the bucket user.
Using an S3 API, assign write permission to the bucket user for the HCP for cloud scale bucket.
The AWS user is now free to read objects from the HCP for cloud scale bucket, which is now synchronized with the remote bucket.
Bucket synchronization configuration
Bucket synchronization is configured using S3 PUT bucket replication
API requests that define rules. Each bucket can have up to 1,000 rules, but all rules must be sync-to or sync-from rules. Each rule defines the following:
- External bucket settings
- A set of one or more prefixes; an object with one of the prefixes is mirrored
- A set of one or more tags; an object with all, or any, of the tags is mirrored
- For sync-from, external queue settings
Because you can configure multiple rules with multiple tags, you have flexibility in selecting objects to mirror. For example:
- To mirror all objects that contain Tag1 and Tag2, you can configure one rule that includes both tags.
- To mirror all objects that contain Tag1 or Tag2, you can configure two rules, one for each tag.
For information on PUT bucket replication
see Configure bucket synchronization (PUT bucket replication).
After they are created, buckets and objects are immediately visible.
HCP for cloud scale can apply multiple bucket synchronization rules to each new object, so long as the destination bucket has identical rules.
A rule collision is when two or more rules that apply to an object have the same destination (that is, the same external host, port, and bucket). HCP for cloud scale does not allow rule collisions, so PUT bucket replication
requests are rejected if they contain rule collisions. To avoid rule collisions, you can define as many tags in a rule as necessary, so that multiple rules with the same destination are not needed.
After an object operation is performed, the policy engine asynchronously checks if that object needs to be copied according to the sync-to rules. When bucket synchronization rules are created, updated, or deleted, the changes only apply to new objects, object operations, and to objects that have not been yet processed by the policy engine. Objects that existed before the rules were configured are not synchronized. If an object exists in the PENDING
state when a rule is created, updated, or deleted, the rule change might not be applied.
You cannot set up bucket synchronization with the same bucket as both the source and the destination.
Configure bucket synchronization (PUT bucket replication)
You can configure S3 bucket sync-to and sync-from settings.
- If you use the AWS command-line interface to configure bucket synchronization, use at least
aws-cli
v1.16.211 andaws-sdk
1.11.610. - Configuration rules should be provided to AWS CLI from a file, rather than inline. This is to avoid problems with double quote characters in some terminals.
aws --endpoint-url https://10.08.1019 s3api put-bucket-replication --bucket "hcpcs_bucket" --replication-configuration file://rules.json
A rule consists of up to 1000 prefixes and tag-value pairs. You can configure up to 1000 rules per bucket. Separate tag-value pairs in the rule using the keywords "And": or "Or":.
The content of the configuration JSON file is:
aws --endpoint-url https://company.com s3api put-bucket-replication --bucket "bucket name" --replication-configuration \
'{
"Role": "",
"Rules": [{
"ID": "string",
"Filter": {
"Prefix": "string",
"Tag": {
"Key": "string",
"Value": "string"
}
},
"Status": "Enabled",
"Destination": {
"Bucket": "<a string with several parameters>",
"Account": "<a string with several parameters>",
"StorageClass": "<a string with several parameters>"
}
},
{
"ID": "string",
"Filter": {
"Prefix": "string",
"Tag": {
"Key": "string",
"Value": "string"
}
},
"Status": "Enabled",
"Destination": {
"Bucket": "<a string with several parameters>",
"Account": "",
"StorageClass": "<a string with several parameters>"
}
}]
}'
Account Parameter | Required | Type | Description |
Role | Yes | N/A | Not supported; leave empty. |
Rules | Yes | N/A | Container for a list of one or more rules. Supports up to 1000 rules. |
ID | No | String |
Unique identifier for rule, up to 255 characters. All rules must specify the same bucket. |
Status | Yes | N/A | Values: Enabled or Disabled. The rule is ignored if status is set to Disabled. |
Filter | Yes | N/A | Container for prefixes and tags. Each rule can have one prefix, and up to 1000 tags. See AWS for more details on syntax. |
Priority | Yes | Integer | Not supported; ignored. |
DeleteMarkerReplication.Status | No | String | Not supported; if provided, leave as Disabled. |
Prefix | No | String | Prefix (one per rule). Up to 1024 characters. |
Key | No | String | Tag key (up to 1000 per rule). Up to 128 characters. |
Value | No | String | Tag value. Up to 256 characters. |
Status | Yes | Boolean | Enabled or Disabled. If Disabled, rule is ignored. |
Destination.Bucket | Yes | Base64-encoded JSON |
External S3 bucket access settings.
You can't specify the same bucket name and host as both source and destination. |
Destination.Account | Yes | String | Must include the empty string "Account": "", in order for the destination rule to function correctly. |
Destination.StorageClass | No | String | An optional destination storage class override to use when synchronizing objects. If not provided, this value should be left empty. |
Destination.AccessControlTranslation.Owner | No | String | Not supported; leave empty. |
Bucket sync-to settings are defined by a set of parameters and passed in the value of Rules.Destination.Bucket
as a Base64-encoded JSON structure.
The syntax inside the bucket parameter for the sync-to setting is:
{ 'version': 'version', 'action': 'sync-from', 'externalBucket': { 'host': 'host', 'type': 'type', 'region': 'region', 'remoteBucketName': 'bucket_name', 'accessKey': 'B64_key', 'secretKey': 'B64_key', 'port': 'port', 'authVersion': 'auth_version', 'usePathStyleAlways': '[true|false]' }, 'notifications': { 'type': 'type', 'region': 'region', 'queue': 'queue', 'accessKey': 'B64_key', 'secretKey': 'B64_key' } }
Parameter | Required | Type | Description |
version | Yes | String | 1.0. |
host | Yes | IP address | Host IP address. |
type | Yes | String | Destination storage class: AMAZON_S3 or GENERIC_S3. |
region | Yes | String | The S3 region. |
remoteBucketName | Yes | String | The name of the bucket, from 3 to 63 characters long, containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist. |
accessKey | Yes | Base64 encoded string | The S3 access key credentials to the external S3 bucket. |
secretKey | Yes | Base64 encoded string | The S3 secret key credentials to the external S3 bucket. |
port | Yes | integer | Host port. |
authVersion | Yes | String | AWS Signature version: V2 or V4. |
usePathStyleAlways | Yes | Boolean | Path-style URLs for bucket access: true or false. |
Bucket sync-from settings include both a bucket address and a notification queue. The settings are defined by a set of parameters and passed in the value of Rules.Destination.Bucket
as a Base64-encoded string.
The syntax inside the bucket parameter for sync-from setting is:
"{ 'version': 'version', 'action': 'sync-from', 'externalBucket': { 'host': 'host', 'type': 'type', 'region': 'region', 'remoteBucketName': 'bucket_name', 'accessKey': 'B64_key', 'secretKey': 'B64_key', 'port': 'port', 'authVersion': 'auth_version', 'usePathStyleAlways': '[true|false]' } }"
Parameter | Required | Type | Description |
version | Yes | String | Enter 1.0. |
host | Yes | IP address | Host IP address. |
type | Yes | String | Destination storage class: AMAZON_S3 or GENERIC_S3. |
region | Yes | String | The S3 region. |
remoteBucketName | Yes | String | The name of the bucket, from 3 to 63 characters long, containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist. |
accessKey | Yes | Base64 encoded string | The S3 access key credentials to the external S3 bucket. |
secretKey | Yes | Base64 encoded string | The S3 secret key credentials to the external S3 bucket. |
port | Yes | integer | Host port. |
authVersion | Yes | String | AWS Signature version: V2 or V4. |
usePathStyleAlways | Yes | Boolean | Path-style URLs for bucket access: true or false. |
Destination.type | Yes | String | Always set as AWS_SQS. |
Destination.region | Yes | String | Region of your AWS_SQS queue. |
Destination.queue | Yes | String | Name of your AWS_SQS queue. |
Destination.accessKey | Yes | Base64 encoded string | accessKey for permissions to read from your AWS_SQS queue. |
Destination.secretKey | Yes | Base64 encoded string | secretKey for permissions to read from your AWS_SQS queue. |
None.
Request example:
aws --endpoint-url https://10.08.1019 s3api put-bucket-replication --bucket "hcpcs_bucket" --replication-configuration file://rules.json
Configuration rules.json:
{ "ID": "sync_rule2_for_music", "Filter": { "Prefix": "/music/october/", "Tag": { "Key": "target", "Value": "cloud" } } }, "Status": "Enabled", "Destination": { "Bucket": "{ 'version' : '1.0', 'action' : 'sync_from', 'externalBucket' : { 'type' : 'AMAZON_S3', 'region' : 'us-east-1', 'remoteBucketName' : 'bluebucket', 'authVersion' : 'V4', 'usePathStyleAlways' : 'true', 'accessKey' : 'access_key', 'secretKey' : 'secret_key' }, "notifications" : { "type" : "AMAZON_SQS", "region" : "us-east-1", "queue" : "testQueue", "accessKey" : "access_key", "secretKey" : "secret_key" } }, } } }] }
Get bucket synchronization rules (GET bucket replication)
You can retrieve the synchronization rules for a bucket.
aws --endpoint-url https://company.com s3api get-bucket-replication --bucket "hcpcs_bucket"
Not applicable.
This is the response body for sync-to:
{ "ReplicationConfiguration": { "Role"": "", "Rules": [ { "Filter": { "And": { "Prefix": "SQS", "Tags": [ { "Value": "string", "Key": "string" } ] } }, "Status": "Enabled", "Destination": { "Bucket": "arn::sync-to::1.0::s3.amazonaws.com:443::<AWS-Region>::<AWSBucketName>::V4::true" }, "ID": "string" } ], } }
This is the response body for sync-from:
{ "ReplicationConfiguration": { "Role"": "", "Rules": [ { "Filter": { "And": { "Prefix": "string", "Tags": [ { "Value": "string", "Key": "string" } ] } }, "Status": "Enabled", "Destination": { "Bucket": "arn::sync-from::1.0::s3.amazonaws.com:443::<AWS-Region>::<AWSBucketName>::V4::true::AWS_SQS::<SQS-Region>::<SQS-QUEUE-TopicName>", }, "ID": "string" } ] } }
Parameter | Required | Type | Description |
Role | Yes | N/A | Not supported; empty. |
Prefix | No | String | Prefix. |
Key | No | String | Tag key. |
Value | No | String | Tag value. Sets of prefixes and key-value pairs. |
Status | Yes | Boolean | If false, rule is ignored. |
Bucket | Yes | Base64-encoded JSON |
Bucket access settings. S3 access and secret keys are masked. |
ID | No | String | Unique identifier for rule, up to 255 characters. |
Status code |
HTTP name |
Description |
200 | OK | The request was executed successfully. |
401 | Unauthorized | Access was denied due to invalid credentials. |
Request example:
aws --endpoint-url https://10.08.1019 s3api get-bucket-replication --bucket "hcpcs_bucket"
JSON response:
{ "ReplicationConfiguration": { "Role"": "", "Rules": [ { "Filter": { "And": { "Prefix": "SQS", "Tags": [ { "Value": "cloud", "Key": "target" } ] } }, "Status": "Enabled", "Destination": { "Bucket": { 'version': 'version', 'action': 'sync-from', 'externalBucket': { 'host': 'host', 'type': 'type', 'region': 'region', 'remoteBucketName': 'bucket_name', 'port': 'port', 'authVersion': 'auth_version', 'usePathStyleAlways': '[true|false]' } }" }, "ID": "mirrorBack_rule_for_images" } ] } }
Get object synchronization status
The synchronization status of an object is returned in metadata as part of the response to a GET object or HEAD object request.
For a GET object or HEAD object request, the synchronization functions return a replication status header in addition to the standard response metadata. This information is useful before deletion from a source bucket to verify synchronization.
When an object is created, HCP for cloud scale evaluates the sync-to rules for the bucket. If the object matches the rules, it sets the object's sync state as either PENDING
, COMPLETED
, or FAILED
. If the object does not match any of the rules, nothing changes.
Most of the time, this sync state is accurate. However, it is never definitive because users might change the sync-to rules for the bucket before the policy engine starts processing the object, which happens asynchronously. The policy engine evaluates the sync-to rules again when processing an object to act according to the latest sync rules.
For example:
- An object was ingested that matches the sync-to rules, so its sync state is set as
PENDING
. Then, a user changes the sync-to rules. The object does not match the rules anymore so the object is actually not synced and that sync state is removed. - An object was ingested that does not match the sync-to rules, so its sync state is not set. Then, a user changes the sync rules. The object now matches the rules so the object is actually synced and the sync state is set to
COMPLETED
.
Response header |
Description |
x-amz-replication-status |
Status of synchronization:
|
(Header not in response) | The object did not match any rules. |
Delete bucket synchronization rules (DELETE bucket replication)
You can delete S3 synchronization settings for buckets. This function is the same as in AWS S3.
aws --endpoint -url https://host_ip s3api delete-bucket-replication --bucket "bucket"
None.
None.
Request example:
aws --endpoint-url https://company.com s3api delete-bucket-replication --bucket "hcpcs_bucket"