Managing bucket synchronization
Hitachi Content Platform for cloud scale (HCP for cloud scale) provides functions to let you configure and manage bucket synchronization.
About bucket synchronization
HCP for cloud scale can synchronize the following kinds of data in buckets:
- Object data
- All user metadata (that is, anything that can be returned in the header
x-amz-meta-*
) - Tags
Content-Type
system metadata- Objects that the owner of the source bucket doesn't have permission to read
Objects that existed before the functions are configured are not synchronized.
HCP for cloud scale checks the rules that are valid at the time an object is synchronized, not at the time the object is ingested.
Objects that are marked as deleted are not synchronized.
Most system metadata is not synchronized, specifically:
- Owner ID and Name
- Timestamps (when last modified)
- Metadata returned in
x-amz-grant-*
- Metadata returned in
x-amz-acl
- Metadata returned in
x-amz-grant-*
- Metadata returned in
x-amz-acl
- Metadata returned in
x-amz-storage-class
- Metadata returned in
x-amz-replication-status
- Metadata returned in
x-amz-server-side-encryption-*
- Metadata returned in
x-amz-restore-*
- Metadata returned in
x-amz-version-id-*
- Metadata returned in
x-amz-website-redirect-location
- Metadata returned in
x-amz-object-lock-*
Unlike AWS replication, HCP for cloud scale can synchronize with buckets on storage systems outside of AWS.
AWS determines the destination bucket using rules, but only applies one rule to each new object. In contrast, HCP for cloud scale can apply multiple rules to each new object so long as the destination buckets are different. This is how one-to-many synchronization is implemented.
AWS does not replicate, but HCP for cloud scale synchronizes, objects that the owner of the source bucket doesn't have permission to read.
In contrast with AWS replication, HCP for cloud scale does not synchronize the following:
- Access control lists (ACLs)
- Lock retention information
- Objects that are encrypted using Amazon S3 managed keys (SSE-S3) and AWS KMS managed keys (SSE-KMS)
If an object being synchronized has the same name as an object in the target bucket, the result depends on whether or not the target bucket uses versioning:
- If versioning is used, the old object is kept as an old version.
- If versioning is not used, the old object is replaced by the new object.
HCP for cloud scale buckets always use versioning. The best practice is to use versioning in all target buckets.
HCP for cloud scale guarantees that operations are applied in the order of their arrival (strong consistency). However, synchronizing multiple operations applied in a short period of time to the same object presents the following difficulties:
- In a distributed system, especially when many systems are involved, synchronizing all operations in correct order is complex.
- Even if HCP for cloud scale synchronizes all operations in correct order to an external storage component, that component might not guarantee that the operations are applied with strong consistency. In particular, AWS guarantees only "eventual consistency."
- For bucket sync-from, the external queue service might not guarantee that messages are provided in correct order. In particular, AWS Simple Queue Service (SQS) does not support first-in, first-out (FIFO) queues for S3 notifications.
Therefore, HCP for cloud scale makes its best effort to synchronize only the latest state of an object, not each version or operation for the object. For example:
- Assume that a client sends three operations to an object, and that they are all committed: (1) PUT, (2) PUT, (3) DEL. The latest state of the object is (3) DEL. HCP for cloud scale only synchronizes DEL.
- Assume that a client sends three operations to an object, and that they are all committed: (1) PUT, (2) DEL, (3) PUT. The latest state of the object is (3) PUT. HCP for cloud scale only synchronizes (3) PUT.
This approach does not guarantee that the latest state of an object will be in the external storage for all situations. Partly because of the "eventual consistency" provided by AWS S3 API, corner cases still exist.
These are the high-level steps involved in setting up bucket synchronization:
- If appropriate, the HCP for cloud scale administrator assigns a sync-to or sync-from role to tenant administrators.
- The administrator creates buckets.
- The administrator configures synchronization rules.
- Users can now use synchronization when writing objects to buckets.
aws-cli
v1.16.211. Bucket synchronization configuration
Bucket synchronization is configured using S3 PUT bucket replication
API requests that define rules. Each bucket can have up to 1,000 rules, but all rules must be sync-to or sync-from rules. Each rule defines the following:
- External bucket settings
- A set of one or more prefixes; an object with one of the prefixes is mirrored
- A set of one or more tags; an object with all, or any, of the tags is mirrored
- For sync-from, external queue settings
Because you can configure multiple rules with multiple tags, you have flexibility in selecting objects to mirror. For example:
- To mirror all objects that contain Tag1 and Tag2, you can configure one rule that includes both tags.
- To mirror all objects that contain Tag1 or Tag2, you can configure two rules, one for each tag.
HCP for cloud scale can apply multiple bucket synchronization rules to each new object so long as the destination buckets are different. This is how one-to-many synchronization is implemented.
A rule collision is when two or more rules that apply to an object have the same destination (that is, the same external host, port, and bucket). HCP for cloud scale does not allow rule collisions, so PUT bucket replication
requests are rejected if they contain rule collisions. To avoid rule collisions, you can define as many tags in a rule as necessary, so that multiple rules with the same destination are not needed.
When bucket synchronization rules are created, updated, or deleted, the changes only apply to new objects or new S3 API operations. Objects that existed before the rules are configured are not synchronized. If an object exists in the state PENDING
when a rule is created, updated, or deleted, the rule might not be applied to it, because the object might be in the midst of copying.
Configure bucket synchronization (PUT bucket replication)
You can configure S3 bucket sync-to and sync-from settings.
aws --endpoint -url https://host_ip s3api put-bucket-replication --bucket "bucket" --replication-configuration '{body}'
A rule consists of up to 1000 prefixes and tag-value pairs. You can configure up to 1000 rules per bucket. Separate tag-value pairs in the rule using the keywords "And": or "Or":.
The request body is shown below:
'{ "Role": "", "Rules": [{ "ID": "string", "Filter": { "Prefix": "string", "Tag": { "Key": "string", "Value": "string" } }, "Status": "boolean", "Destination": { "Bucket": "json", "Account": "B64_key, B64_key", "StorageClass": "" } } . . . }] }'
Parameter | Required | Type | Description |
Role | Yes | N/A | Not supported; leave empty. |
ID | No | String |
Unique identifier for rule, up to 255 characters. All rules must specify the same bucket. |
Priority | Yes | Integer | Not supported; ignored. |
DeleteMarkerReplication.Status | No | String | Not supported; if provided, leave as Disabled. |
Prefix | No | String | Prefix (one per rule). Up to 1024 characters. |
Key | No | String | Tag key (up to 1000 per rule). Up to 128 characters. |
Value | No | String | Tag value. Up to 256 characters. |
Rules.Status | Yes | Boolean | Enter Enabled or Disabled. If Disabled, rule is ignored. |
Bucket | Yes | Base64-encoded JSON |
External S3 bucket access settings.
|
Account | No | Base64 encoded string | The S3 access key and secret key credentials to the external S3 bucket. |
StorageClass | No | Enum | Optional destination storage class override. If provided, leave empty. |
Bucket sync-to settings are defined by a set of parameters and passed in the value of Destination.Bucket
as a Base64-encoded string.
The syntax of a bucket sync-to setting is shown below:
arn::sync-to::version::host:port>::region::bucket_name::auth_version::path_style_always
Parameter | Required | Type | Description |
version | Yes | String | Enter 1.0. |
host | Yes | IP address | Host IP address. |
port | Yes | integer | Host port. |
region | Yes | String | The S3 region. |
bucket_name | Yes | String | The name of the bucket. Enter a name from 3 to 63 characters long containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist. |
auth_version | Yes | String | AWS Signature version: enter V2 or V4. |
path_style_always | Yes | Boolean | Path-style URLs for bucket access: enter true or false. |
Bucket sync-from settings include both a bucket address and a notification queue. The settings are defined by a set of parameters and passed in the value of Destination.Bucket
as a Base64-encoded string.
The syntax of a bucket sync-from setting is shown below:
arn::sync-to::version::host:port>::S3_region::bucket_name::auth_version::path_style_always::AWS_SQS::SQS_region::SQS_queue::SQS_access_key::SQS_secret_key
Parameter | Required | Type | Description |
version | Yes | String | Enter 1.0. |
host | Yes | IP address | Host IP address. |
port | Yes | integer | Host port. |
S3_region | Yes | String | The S3 region. |
bucket_name | Yes | String | The name of the bucket. Enter a name from 3 to 63 characters long containing only lowercase characters (a-z), numbers (0-9), periods (.), or hyphens (-). The bucket must already exist. |
auth_version | Yes | String | AWS Signature version: enter V2 or V4. |
path_style_always | Yes | Boolean | Path-style URLs for bucket access: enter true or false. |
SQS_region | Yes | String | The SQS region. |
SQS_queue | Yes | String | The name of the notification queue. |
SQS_access_key | Yes | Base64-encoded string | The access key of the S3 credentials for access to the notification queue. |
SQS_secret_key | Yes | Base64-encoded string | The secret key of the S3 credentials for access to the notification queue. |
None.
Request example:
aws --endpoint-url https://10.08.1019 s3api put-bucket-replication --bucket "hcpcs_bucket" --replication-configuration '{body}'
JSON request:
'{ "Role": "", "Rules": [{ "ID": "sync_rule1_for_images", "Filter": { "Prefix": "/images/september/", "Tag": { "Key": "target", "Value": "cloud" } }, "Status": "Enabled", "Destination": { "Bucket": "arn::sync-to::1.0::s3.amazonaws.com:443::us-east-1::redbucket::v4::true", "Account": "access_key, secret_key", "StorageClass": "STANDARD_IA" } }, { "ID": "sync_rule2_for_music", "Filter": { "Prefix": "/music/october/", "Tag": { "Key": "target", "Value": "cloud" } }, "Status": "Enabled", "Destination": { "Bucket": "arn::sync-from::1.0::s3.amazonaws.com:443::us-east-1::bluebucket::v4::true::AWS_SQS::us-east-1::blackqueue::MTIzNA==::Njc4OQ==", "Account": "access_key, secret_key", "StorageClass": "STANDARD_IA" } }] }'
Get bucket synchronization rules (GET bucket replication)
You can retrieve the synchronization rules for a bucket.
aws --endpoint -url https://host_ip s3api get-bucket-replication --bucket "bucket"
Not applicable.
The response body is shown below:
{ "ReplicationConfiguration": { "Role": "", "Rules": [ { "Filter": { "And": { "Prefix": "string", "Tags": [ { "Key": "string", "Value": "string" } . . . }, "Status": "boolean", "Destination": { "Bucket": "access_settings", }, "ID": "string", } ], } }
Parameter | Required | Type | Description |
Role | Yes | N/A | Not supported; empty. |
Prefix | No | String | Prefix. |
Key | No | String | Tag key. |
Value | No | String | Tag value. Sets of prefixes and key-value pairs. |
Status | Yes | Boolean | If false, rule is ignored. |
Bucket | Yes | Base64-encoded JSON |
Bucket access settings. S3 access and secret keys are masked. |
ID | No | String | Unique identifier for rule, up to 255 characters. |
Status code |
HTTP name |
Description |
200 | OK | The request was executed successfully. |
401 | Unauthorized | Access was denied due to invalid credentials. |
Request example:
aws --endpoint-url https://10.08.1019 s3api get-bucket-replication --bucket "hcpcs_bucket"
JSON response:
{ "ReplicationConfiguration": { "Role"": "", "Rules": [ { "Filter": { "And": { "Prefix": "SQS", "Tags": [ { "Value": "cloud", "Key": "target" } ] } }, "Status": "Enabled", "Destination": { "Bucket": "arn::sync-from::1.0::s3.amazonaws.com:443::<AWS-Region>::hcpcs_bucket::V4::true::AWS_SQS::<SQS-Region>::<SQS-QUEUE-TopicName>", }, "ID": "mirrorBack_rule_for_images" } ] } }
Get object synchronization status
The synchronization status of an object is returned in metadata as part of the response to a GET object or HEAD object request.
For a GET object or HEAD object request, the synchronization functions return a replication status header in addition to the standard response metadata. This information is useful before deletion from a source bucket to verify synchronization.
Response header |
Description |
x-amz-replication-status |
Status of synchronization:
|
(Header not in response) | The object did not match any rules. |
Delete bucket synchronization rules (DELETE bucket replication)
You can delete S3 synchronization settings for buckets. This function is the same as in AWS S3.
aws --endpoint -url https://host_ip s3api delete-bucket-replication --bucket "bucket"
None.
Request example:
aws --endpoint-url https://10.08.1019 s3api delete-bucket-replication --bucket "hcpcs_bucket"