Getting started
Hitachi Content Platform for cloud scale (HCP for cloud scale) is a software-defined object storage solution that is based on a massively parallel microservice architecture and is compatible with the Amazon Simple Storage Service (S3) application programming interface (API).
Introducing HCP for cloud scale
HCP for cloud scale is especially well suited to service applications requiring high bandwidth and compatibility with the Amazon S3 API.
HCP for cloud scale can federate S3 compatible storage from virtually any private or public source and present the combined capacity in a single, centrally managed, global name space.
You can install HCP for cloud scale on any server, in the cloud or on premise, that supports the minimum requirements.
HCP for cloud scale supports S3 event notification through a graphical user interface or through GET and PUT Bucket Notification configuration.
HCP for cloud scale lets you manage and scale storage components. You can add storage components, monitor their states, and take them online or offline for purposes of maintenance or repair. HCP for cloud scale provides functions to send notification of alerts, track and monitor throughput and performance, and trace actions through the system.
Storage components, buckets, and objects
A storage component is an Amazon S3 compatible storage provider, running independently, that HCP for cloud scale manages as a back end to store object data. To an S3 client using HCP for cloud scale, the existence, type, and state of storage components are transparent.
HCP for cloud scale supports the following storage systems:
- Amazon S3
- Hitachi Content Platform (HCP)
- HCP S Series Node
- Any Amazon S3 compatible storage service
A bucket is a logical collection of secure data objects that is created and managed by a client application. An HCP for cloud scale bucket is modeled on a storage service bucket. HCP for cloud scale uses buckets to manage storage components. You can think of an HCP for cloud scale system as a logical collection of secure buckets.
Buckets have associated metadata such as ownership and lifecycle status. HCP for cloud scale buckets are owned by an HCP for cloud scale user and access is controlled on a per-bucket basis by Amazon access control lists (ACL) supporting the S3 API. Buckets are contained in a specific region; HCP for cloud scale supports one region, us-west-2
.
- HCP for cloud scale buckets are not stored in storage components, so HCP for cloud scale clients can create buckets even before adding storage components.
- Storage component buckets are created by storage component administrators and are not visible to HCP for cloud scale clients.
- To empty and reuse a bucket, don't just delete the bucket and create a new one with the same name. After a bucket is deleted, the name becomes available for anyone to use and another account might take it first. Instead, empty and keep the bucket.
An object consists of data and associated metadata. The metadata is a set of name-value pairs that describe the object. Every object is contained in a bucket. An object is handled as a single unit by all HCP for cloud scale transactions, services, and internal processes.
For information about Amazon S3, see Introduction to Amazon S3.
HCP for cloud scale includes an S3 Console application that provides convenient functions for bucket users as an alternative to using S3 API methods:
- Obtaining S3 credentials
- Managing bucket synchronization, policies, and rules
- Creating S3 event notifications to synchronize buckets
- Managing objects in buckets, singly and in bulk
For more information, see the S3 Console Guide.
HCP for cloud scale supports strong consistency in object listing. After a write, upload, or delete operation, a list operation shows the changes immediately. Strong consistency supports big-data analytics applications and applications originally written for storage environments of smaller scale.
Data access
HCP for cloud scale supports the Amazon S3 API, which lets client applications store and retrieve unlimited amounts of data from configured storage services.
Data access control
HCP for cloud scale uses ownership and access control lists (ACLs) as data access control mechanisms for the S3 API.
Ownership is implemented as follows:
- An HCP for cloud scale bucket is owned by the user who creates the bucket and the owner cannot be changed.
- A user has full control of the buckets that user owns.
- A user has full control of the objects that user creates.
- A user can list only the buckets that user owns.
ACLs allow the assignment of privileges (read, write, or full control) for access to buckets and objects to other user accounts besides the owner's.
Data security
HCP for cloud scale supports encryption of data sent between systems (that is, data "in flight") and, as a licensed feature, data stored persistently within the system (that is, data "at rest").
HCP for cloud scale uses Secure Sockets Layer (SSL) to provide security for both incoming and outgoing communications. To enable SSL security, two types of certificates are needed:
- System certificate: the certificate that HCP for cloud scale uses for its GUIs and APIs (for incoming communications)
- Client certificates: the certificates of IdPs, storage components, and SMTP servers (for outgoing communications)
When the HCP for cloud scale system is installed, it generates and automatically installs a self-signed SSL server system certificate. This certificate is not automatically trusted by web browsers. You can choose to trust this self-signed certificate or replace it by using one of these options:
- Upload a PKCS12 certificate chain and password and apply it as the active system certificate.
- Download a certificate signing request (CSR) and then use it to obtain, upload, and apply a certificate signed by a certificate authority (CA).
- Generate a new self-signed certificate and apply it as the active system certificate.
For client certificates, upload the certificate of each client HCP for cloud scale needs to access using SSL.
You can manage certificates, and view details of installed certificates, using the System Management application.
HCP for cloud scale supports data-in-flight encryption for the HTTPS protocol for all external communications. Data-in-flight encryption is always enabled for these data paths:
- S3 API (HTTP is also enabled on a different port)
- Management API
- System Management application graphical user interface (GUI)
- Object Storage Management application GUI
- External key management system (KMS) servers
You can enable or disable data-in-flight encryption for the data paths between HCP for cloud scale and:
- An identity provider (IdP) server
- Each application using TLS or SSL
- Each managed storage component
- Each SMTP server using SSL or STARTTLS
Communication between HCP for cloud scale instances does not use data-in-flight encryption. Depending on the security needs, you might need to set up an isolated internal network for HCP for cloud scale at the site.
HCP for cloud scale supports data at rest encryption as an available licensed feature. Storage component encryption keys can be stored using either an internal service or an external key management system.
HCP for cloud scale supports data at rest encryption (DARE), enabled at the system level and controlled at the bucket level. When encryption is enabled on the system and on a bucket, every object added to the bucket is encrypted, using the S3 API, with a unique data encryption key (DEK). In turn, once an object is encrypted, the DEK is itself encrypted (or "wrapped"), using a key encryption key (KEK), and stored as part of the metadata for the encrypted object. One KEK is generated per storage component and stored separately.
An administrative account with appropriate permissions can enable encryption globally for the system. As an HCP for cloud scale administrator, before you can enable encryption, you must obtain and upload a license for the feature. Then, as part of enabling encryption, you must first decide between two mutually exclusive methods for storing KEKs:
- Internal encryption, provided by an HCP for cloud scale service.
- External encryption, provided by an external key management system (KMS).
Once encryption is enabled for the system, a bucket owner with appropriate permissions can set a policy to enable encryption at the bucket level, or later disable it. An administrator can manage the connections to KMS servers and manage KEKs by initiating rekeying as needed.
Bucket synchronization
Bucket synchronization to a bucket (bucket sync-to or mirroring) allows automatic, asynchronous copying of objects in a bucket in an HCP for cloud scale system to an external storage system. Bucket synchronization from a bucket (bucket sync-from or mirroring back) allows automatic, asynchronous copying of objects in a bucket in an external storage system to an HCP for cloud scale bucket. Objects larger than 5 GB are synchronized (both sync-to and sync-from) using multi-part uploads.
An external storage system can be another HCP for cloud scale system, AWS, or any S3 compatible system.
Bucket sync-to offers the following advantages:
- Data protection: Data is well protected against the unavailability or catastrophic failure of a system. A bucket can be synchronized to a remote system of a different type. This arrangement can provide geographically distributed data protection (called geo-protection).NoteAll rules must share a single, common destination bucket. If more than one destination appears in the collection of rules the policy is rejected.
- Data availability: AWS services can access synchronized data directly from AWS.
Bucket sync-from offers the following advantages:
- Data consolidation: Transformed data can be stored on an HCP for cloud scale system. An HCP for cloud scale system can synchronize data from multiple remote systems of different types.
- External update: Data can be updated directly in an external system and stored on an HCP for cloud scale system.
Access to bucket synchronization is controlled on a per-user basis by role-based access control (RBAC). Use the System Management application to define users, groups, and roles.
Access to an external resource might need an SSL certificate. You can upload an SSL certificate using the System Management application, the same as for uploading SSL certificates for storage components and IdPs.
For information on configuring bucket synchronization, see the S3 Console Guide.
Object locking
HCP for cloud scale supports object locking, which prevents specified objects from being deleted. A bucket owner can lock or unlock objects or lock them for a specified time period. This feature implements legal hold and retention period requirements.
Object locking is enabled at the bucket level, either when or after a bucket is created. Once enabled, object locking can't be disabled.
Object locking offers the following advantages:
- Locked objects can't be deleted. This implements write once, read many (WORM) behavior, which protects objects from accidental or malicious changes.
- A bucket owner can lock objects until a specified date and time. This implements retention periods, which complies with record retention policy. The retention period can be up to 100 years in the future.NoteOnce set, a retention period can be extended, but not shortened or turned off.
- A bucket owner can lock an object indefinitely, and then turn the lock off. This complies with legal hold requirements. If a legal hold is placed on an object it can't be modified, versioned, moved or deleted, even if it has an expired retention period (that is, a legal hold overrides a retention period). A legal hold never expires, but must instead be removed. An object can have multiple legal holds placed on it.
HCP for cloud scale implements compliance mode as described by the Amazon S3 specification. It does not support governance mode.
For information on how to lock and unlock objects, see the S3 Console Guide.
Capacity monitoring
You can monitor estimated available system-wide or per-storage component capacity.
The Storage page, in the Object Storage Management application, displays the total, used, and estimated available (free) capacity for HCP S Series Node storage components configured in the system, as well as the changes in the past week. If all the storage components are HCP S Series nodes, the page displays the total, used, and estimated available capacity of the entire system. You can set capacity threshold alarms that visually indicate, and can also send an alert message or email notification, if the capacity of an HCP S Series Node storage component, or the system as a whole, reaches a threshold. This page provides a single monitoring point for the entire HCP for cloud scale system, and the information displayed helps you with capacity planning.

The Metadata Gateway service periodically gathers storage component capacity metrics. If the used capacity for a single HCP S Series Node storage component or the entire system of HCP S Series Node storage components rises above a specified level, the system displays an alert. You can configure the alert threshold.
The calculation of used capacity includes:
- HCP S Series Node storage components configured for capacity monitoring
- Storage components set to read-only status
- Storage components that are inactive
The calculation of available system capacity does not include:
- HCP S Series Node storage components not configured for capacity monitoring
- Storage components other than HCP S Series Node storage components
- Storage components set to read-only status
- Storage components that are inactive
Using capacity information, you can be alerted and take action if a storage component is reaching capacity. You can determine if the system can support an increase in stored data (for example, as expected from a new customer). You can understand the balance of usage and capacity across storage components. You can plan for the orderly addition of additional capacity.
Chargeback reports
Chargeback reports detail how system storage capacity is used, per user or bucket.
HCP for cloud scale provides storage usage reports for objects on the system. Authorized users can generate a report for one or more of the buckets they own. Authorized administrators can generate a report for a user or a list of one or more buckets. Reports can detail hourly, daily, or monthly usage.
Chargeback reports let you create invoices or bills for bucket owners, or delegate that task to others.
Metrics for bucket size and number of objects are stored persistently. Storage usage is calculated at the granularity of byte-hours and can be reported by hour, day, or month.
For example, if a user stores 100 GB (107374182400 bytes) of standard storage data in a bucket for the first 15 days in March, and 100 TB (109951162777600 bytes) of standard storage data for the final 16 days in March, the usage is 42259901212262,400 byte-hours. The calculation is as follows:
First calculate the total byte-hour usage:
[107374182400 bytes × 15 days × (24 hours/day)] + [109951162777600 bytes × 16 days × (24 hours/day)] = 42259901212262400 byte-hours
Then convert byte-hours to GB-months:
42259901212262400 byte-hours ÷ (1073741824 bytes/GB) ÷ (24 hours/day) ÷ (31 days in March) = 52900 GB-months
Storage capacity usage is reported in either a user report or a system report.
- The user report gives storage usage for any or all buckets defined in the system that the user owns.
- The system report gives storage usage for any or all buckets defined in the system.
Within each report you can specify which fields appear.
Support for the Amazon S3 API
HCP for cloud scale is compatible with the Amazon Simple Storage Service (Amazon S3) REST API, which lets clients store objects in buckets. A bucket is a container of objects that has its own settings, such as ownership and lifecycle. Using HCP for cloud scale, users can perform common operations on objects and buckets and manage ACL settings through the client access data service.
For information about using Amazon S3, see the Amazon S3 API documentation.
For information about obtaining S3 user credentials, see the S3 Console Guide.
The following tables list the level of support for each of the HCP for cloud scale S3 API methods compared with the Amazon S3 API methods and describes any implementation differences in the HCP for cloud scale S3 APIs.
Amazon S3 API | Support level | Implementation differences |
CreateBucket | Fully supported | None |
DeleteBucket | Supported with differences |
To support legacy S3 buckets, HCP for cloud scale supports bucket names of less than three characters. When anonymous requests to create or remove a bucket use a bucket name that isn't valid, Amazon S3 verifies access first and returns 403. HCP for cloud scale returns 400 if the bucket name validation fails. |
DeleteBucketAnalyticsConfiguration | Not supported | Not supported |
DeleteBucketCors | Not supported | Not supported |
DeleteBucketEncryption | Fully supported | None |
DeleteBucketIntelligentTieringConfiguration | Not supported | Not supported |
DeleteBucketInventoryConfiguration | Not supported | Not supported |
DeleteBucketLifecycle | Supported with differences |
HCP for cloud scale supports the latest API for bucket lifecycle management. Old and deprecated V1.0 methods are not supported. HCP for cloud scale does not support Object Transition actions. Including these actions causes a Malformed XML exception. |
DeleteBucketMetricsConfiguration | Not supported | Not supported |
DeleteBucketOwnershipControls | Not supported | Not supported |
DeleteBucketPolicy | Not supported | Not supported |
DeleteBucketReplication | Fully supported | None |
DeleteBucketTagging | Not supported | Not supported |
DeleteBucketWebsite | Not supported | Not supported |
DeletePublicAccessBlock | Not supported | Not supported |
GetBucketAccelerateConfiguration | Not supported | Not supported |
GetBucketAcl | Supported with differences | In Amazon S3 each grantee is specified as a type-value pair, where the type is one of the following:
emailAddress . HCP for cloud scale fully supports id . HCP for cloud scale supports uri for the predefined groups Authenticated Users and All Users . HCP for cloud scale does not support the Amazon S3 predefined grant ("canned ACL") HCP for cloud scale supports the canned ACL HCP for cloud scale does not mirror or mirror back ACLs or policies. |
GetBucketAnalyticsConfiguration | Not supported | Not supported |
GetBucketCors | Not supported | Not supported |
GetBucketEncryption | Fully supported | None |
GetBucketIntelligentTieringConfiguration | Not supported | Not supported |
GetBucketInventoryConfiguration | Not supported | None |
GetBucketLifecycle | Supported with differences |
HCP for cloud scale supports the latest API for bucket lifecycle management. Old and deprecated V1.0 methods are not supported. HCP for cloud scale does not support Object Transition actions. Including these actions causes a Malformed XML exception. |
GetBucketLifecycleConfiguration | Not supported | Not supported |
GetBucketLocation | Supported with differences | The caller must be the bucket owner. |
GetBucketLogging | Not supported | Not supported |
GetBucketMetricsConfiguration | Not supported | Not supported |
GetBucketNotification | Fully supported | None |
GetBucketNotificationConfiguration | Fully supported | None |
GetBucketOwnershipControls | Not supported | Not supported |
GetBucketPolicy | Not supported | Not supported |
GetBucketPolicyStatus | Not supported | Not supported |
GetBucketReplication | Supported with differences | HCP for cloud scale supports only V1.0 methods. Advanced filtering is not supported. |
GetBucketRequestPayment | Not supported | Not supported |
GetBucketTagging | Not supported | Not supported |
GetBucketVersioning | Fully supported | Returns the bucket versioning configuration and status (always on ). |
GetBucketWebsite | Not supported | Not supported |
GetPublicAccessBlock | Not supported | Not supported |
HeadBucket | Supported with differences |
To support legacy S3 buckets, HCP for cloud scale supports bucket names of less than three characters. When anonymous requests to create or remove a bucket use a bucket name that isn't valid, Amazon S3 verifies access first and returns 403. HCP for cloud scale returns 400 if the bucket name validation fails. |
ListBucketAnalyticsConfigurations | Not supported | Not supported |
ListBucketIntelligentTieringConfigurations | Not supported | Not supported |
ListBucketInventoryConfigurations | Not supported | Not supported |
ListBucketMetricsConfigurations | Not supported | Not supported |
ListBuckets | Fully supported | None |
PutBucketAccelerateConfiguration | Not supported | Not supported |
PutBucketAcl | Supported with differences | In Amazon S3 each grantee is specified as a type-value pair, where the type is one of the following:
emailAddress . HCP for cloud scale fully supports id . HCP for cloud scale supports uri for the predefined groups Authenticated Users and All Users . HCP for cloud scale does not support the Amazon S3 predefined grant ("canned ACL") HCP for cloud scale supports the canned ACL HCP for cloud scale does not mirror or mirror back ACLs or policies. |
PutBucketAnalyticsConfiguration | Not supported | Not supported |
PutBucketCors | Not supported | Not supported |
PutBucketEncryption | Fully supported | None |
PutBucketIntelligentTieringConfiguration | Not supported | Not supported |
PutBucketInventoryConfiguration | Not supported | Not supported |
PutBucketLifecycle | Supported with differences |
HCP for cloud scale supports the latest API for bucket lifecycle management. Old and deprecated V1.0 methods are not supported. HCP for cloud scale does not support Object Transition actions. Including these actions causes a Malformed XML exception. |
PutBucketLifecycleConfiguration | Supported with differences | HCP for cloud scale supports only V1.0 methods. Advanced filtering is not supported. |
PutBucketLogging | Not supported | Not supported |
PutBucketMetricsConfiguration | Not supported | Not supported |
PutBucketNotification | Fully supported | None |
PutBucketNotificationConfiguration | Supported with differences | A configuration can have to up 100 rules. Amazon S3 considers that two rules overlap if both apply to the same object and share at least one event type. HCP for cloud scale supports notification from the same object to multiple targets. However, rules are blocked if they send a message for the same event to the same target. All notification message fields are returned except HCP for cloud scale does not support the |
PutBucketOwnershipControls | Not supported | Not supported |
PutBucketPolicy | Not supported | Not supported |
PutBucketReplication | Supported with differences |
HCP for cloud scale supports replication to only one destination (1:1). All rules must share a single, common destination bucket. If more than one destination appears in the collection of rules, the entire policy will be rejected with a 400 status code. Sending encrypted data to a remote bucket is not supported. |
PutBucketRequestPayment | Not supported | Not supported |
PutBucketTagging | Not supported | Not supported |
PutBucketVersioning | Not supported | With HCP for cloud scale versioning is always enabled . |
PutBucketWebsite | Not supported | Not supported |
PutPublicAccessBlock | Not supported | Not supported |
Amazon S3 API | Support level | Implementation differences |
AbortMultipartUpload | Fully supported | None |
CompleteMultipartUpload | Fully supported | None |
CopyObject | Supported with differences |
The copy object source and destination must have the same encryption state. For example, encrypted to encrypted or unencrypted to unencrypted. HCP for cloud scale supports using the |
CreateMultipartUpload | Supported with differences |
HCP for cloud scale supports using the |
DeleteObject | Supported with differences |
Bucket synchronization or removal of an object or a specific version of an object is not supported. To improve performance, if the current version of an object is a delete marker, HCP for cloud scale does not create another delete marker. |
DeleteObjects | Supported with differences |
Bucket synchronization is not supported. |
DeleteObjectTagging | Fully supported | None |
GetObject | Supported with differences | If a lifecycle policy is configured for a bucket, HCP for cloud scale displays the expiration date of an object (in the x-amz-expiration header) fetched using the subresource ?versionId .Legal hold is fully implemented. Object retention is fully implemented. Object names cannot contain NUL or backslash (\) characters. GET methods on objects so named fail with a 400 error. HCP for cloud scale supports using the The partNumber parameter is not supported. |
GetObjectAcl | Supported with differences |
In Amazon S3, each grantee is specified as a type-value pair, where the type is one of the following:
HCP for cloud scale does not support HCP for cloud scale does not support the |
GetObjectAttributes | Not supported | Not supported |
GetObjectLegalHold | Fully supported | None |
GetObjectLockConfiguration | Fully supported | None |
GetObjectRetention | Fully supported | None |
GetObjectTagging | Fully supported | None |
GetObjectTorrent | Not supported | Not supported |
HeadObject | Supported with differences |
If a lifecycle policy is configured for a bucket, HCP for cloud scale displays the expiration date of an object (in the x-amz-expiration header) fetched using the subresource HCP for cloud scale supports using the The partNumber parameter is not supported. |
ListMultipartUploads | Fully supported | None |
ListObjects | Fully supported | In Amazon S3, the NextMarker element is returned only if you have a delimiter request parameter specified. HCP for cloud scale always returns NextMarker when the response is truncated. |
ListObjectsV2 | Fully supported | None |
ListObjectVersions | Fully supported | None |
ListParts | Fully supported | None |
PutObject | Supported with differences |
HCP for cloud scale adds additional content-type validation. Bucket synchronization is supported. Legal hold is fully implemented. AWS object lock permissions are not supported; that is, a bucket owner can set a legal hold without restriction. Object retention is implemented, but not governance mode; that is, once a retain-until date is set, it can be extended but not removed. AWS object lock permissions are not supported; that is, a bucket owner can set object retention without restriction. Object locking can be applied to a bucket even after it's created. To enable object locking, in the S3 API Object names cannot contain NUL or backslash (\) characters. PUT methods on objects so named fail with a 400 error. HCP for cloud scale supports using the |
PutObjectAcl | Supported with differences |
Bucket synchronization is not supported. In Amazon S3, each grantee is specified as a type-value pair, where the type is one of the following:
HCP for cloud scale does not support HCP for cloud scale does not support the |
PutObjectLegalHold | Fully supported | None |
PutObjectLockConfiguration | Fully supported | None |
PutObjectRetention | Fully supported | None |
PutObjectTagging | Fully supported | None |
RestoreObject | Not supported | Not supported |
SelectObjectContent | Supported with differences | Scan range is supported. HCP for cloud scale supports the use of select *, first_name from s3object s where s.salary > 100000 limit 10 HCP for cloud scale supports a wider range of date-time formats than AWS. The full list is available at https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html. HCP for cloud scale supports nested aggregate functions. For example, this expression is supported: HCP for cloud scale SQL queries on columns are case sensitive, while AWS SQL queries are case insensitive. For example, given an object Only input serialization of Parquet is supported. Requests for CSV or JSON objects are not supported and return an error. Parquet compression is managed automatically, so the Only CSV output is supported. Specifying another output format returns an error. AWS calculates the size of a record returned in an S3 Select query as the total size of the record, including any delimiters. HCP for cloud scale calculates the size as the total data of each column returned. These calculations can sometimes differ slightly. |
UploadPart | Supported with differences | HCP for cloud scale supports using the x-amz-server-side-encryption header if encrypting only a single object. The header is not needed if the encryption policy is set at the bucket level. |
UploadPartCopy | Supported with differences |
HCP for cloud scale supports using the The upload part copy source and destination must have the same encryption state. For example, encrypted to encrypted or unencrypted to unencrypted. |
WriteGetObjectResponse | Not supported | Not supported |
HCP for cloud scale does not support the following HTTP request headers in APIs it otherwise supports. If supplied as part of the request, these headers will be ignored.
- x-amz-expected-bucket-owner
- x-amz-sdk-checksum-algorithm
- x-amz-request-payer
- x-amz-storage-class
- x-amz-website-redirect-location
- x-amz-bypass-governance-retention
- x-amz-request-mfa
- x-amz-security-token
HCP for cloud scale does not support the following conditional HTTP request headers or equivalent x-amz
extensions for putCopy:
- If-Match
- If-Modified-Since
- If-None-Match
- If-Unmodified-Since
- x-amz-copy-source-if-match
- x-amz-copy-source-if-none-match
- x-amz-copy-source-if-unmodified-since
- x-amz-copy-source-if-modified-since
S3 Select
HCP for cloud scale supports the S3 Select feature.
HCP for cloud scale supports the S3 Select Object Content method, which allows retrieval of a portion of a structured object by an S3 client such as Apache Spark, Apache Hive, and Presto. The portion of the object returned is selected based on a structured query language (SQL) query sent in the request. The query is performed by S3 storage components that support pushdown. Selecting only the data needed within an object can significantly improve costs, time, and performance.
A request can select serialized object data in these formats:
- Apache Parquet
A request can return data in these formats:
- Comma-separated values (CSV)
The client application must have the permission s3:GetObject. S3 Select supports reading encrypted data. The SQL expression can be up to 256 KB, and can return up to 1 MB of data.
Here is a simple example of a SQL query against a Parquet object. The query returns data for salaries greater than 100000:
select salary from s3object s where s.salary > 100000
S3 event notification
HCP for cloud scale supports the S3 PUT Bucket notification configuration
and GET Bucket notification configuration
methods.
HCP for cloud scale can send notifications of specified events in a bucket to a message server for applications to consume. This is a more efficient way to signal changes than periodically scanning objects in a bucket.
HCP for cloud scale supports event notification to signal specified events in buckets. Notifications can be sent to AWS SQS Standard services, Kafka, or RabbitMQ. A retry mechanism assures highly reliable notifications.
Notification can be configured for these events:
- s3:ObjectCreated:*
- s3:ObjectCreated:Put
- s3:ObjectCreated:Post
- s3:ObjectCreated:Copy
- s3:ObjectCreated:CompleteMultipartUpload
- s3:ObjectRemoved:*
- s3:ObjectRemoved:Delete
- s3:ObjectRemoved:DeleteMarkerCreated
Supported limits
HCP for cloud scale limits the number of instances (nodes) in a system to 160.
HCP for cloud scale does not limit the number of the following entities.
Entity | Minimum | Maximum | Notes |
Buckets | None | Unlimited | A user can own up to 1000 buckets. |
Users (external) | None | Unlimited | The local user has access to all functions including MAPI and S3 API methods. However, it's best to configure HCP for cloud scale with an identity provider (IdP) with users to enforce role-based access control. |
Groups (external) | Unlimited | ||
Roles | Unlimited | ||
Objects | None | Unlimited | The size limit for an object is 5 TB. |
Storage components | 1 | Unlimited |
High availability
HCP for cloud scale supports high availability for multi-instance sites.
High availability needs at least four service instances: three master instances, which run essential services, and at least one worker instance. The best practice is to run the three master instances on separate physical hardware (or, if running on virtual machines, on at least three separate physical hosts) and to run HCP for cloud scale services on more than one instance.
Scalability of instances, service instances, and storage components
You can increase or decrease the capacity, performance, and availability of HCP for cloud scale by adding or removing the following:
- Instances: physical computer nodes or virtual machines
- Service instances: copies of services running on additional instances
- Storage components: S3 compatible systems used to store object data
In a multi-instance site, you might add instances to improve system performance or if you are running out of storage space on one or more instances. You might remove instances if you are retiring hardware, if an instance is down and cannot be recovered, or if you decide to run fewer instances.
When you add an instance, you can also scale floating services (such as the S3 Gateway) to the new instance. When you scale a floating service, HCP for cloud scale automatically rebalances itself.
In a multi-instance site, you can manually change where a service instance runs:
- You can configure it to run on additional instances. For example, you can increase the number of S3 Gateway service instances to improve throughput of S3 API transactions.
- You can configure it run on fewer instances. For example, you can free computational resources on an instance to run other services.
- You can configure it to run on different instances. For example, you can move the service instances off a hardware instance to retire the hardware.
- For a floating service, instead of specifying a specific instance on which it runs, you can specify a pool of eligible instances, any of which can run the service.
Some services have a fixed number of instances and therefore cannot be scaled:
- Metadata Coordination
You might add storage components to a site under these circumstances:
- The existing storage components are running out of available capacity
- The existing storage components do not provide the performance you need
- The existing storage components do not provide the functionality you need
Site availability
An HCP for cloud scale site has three master instances and thus can tolerate the failure of one master instance without interruption of service.
If a site with only two healthy master instances experiences an outage of another master instance (for example, if services restart or the entire instance or operating system restarts), it goes into a degraded state until all three master instances are restored.
Service availability
HCP for cloud scale services provide high availability as follows:
- The Metadata Gateway service always has at least three service instances. When the system starts, the nodes "elect a leader" using the raft consensus algorithm. The other service instances follow the leader. The leader processes all GET and PUT requests. If the followers cannot identify the leader, they elect a new leader. The Metadata Gateway service tolerates the failure of one service instance without interruption. If more than one service instance is unavailable, some data can become unavailable until the instance recovers.
- The Metadata Coordination service always has one service instance. If that instance fails, HCP for cloud scale automatically starts another instance. Until startup is complete, the Metadata Gateway service cannot scale.
- The Metadata Cache service is deprecated but always has one service instance. If that instance fails, HCP for cloud scale automatically starts another instance.
- To protect messaging consistency, the Message Queue service always has three service instances. To prevent being split into disconnected parts, the service shuts down if half of the service instances fail. In practice, messaging stops if two of the three instances fail. Do not let the service run with only two instances, because in that scenario if one of the remaining instances fails, the service shuts down. However, when one of the failed instances restarts, messaging services recover and resume.
- To maintain access to the encryption key vault, the Key Management Server service uses an active-standby model. One service instance is the active instance and any other service instances are kept as standbys. If the active vault node becomes sealed or unavailable, one of the standbys takes over as active. You can scale up to the number of instances in the HCP for cloud scale system or your acceptable performance limits.
The rest of the HCP for cloud scale services remain available if HCP for cloud scale instances or service instances fail, as long as at least one service instance remains healthy. Even if a service that has only one service instance fails, HCP for cloud scale automatically starts a new service instance.
Metadata availability
Metadata is available as long as these services are available:
- S3 Gateway
- Metadata Gateway
Object data availability
Object data is available as long as these items are available:
- The S3 Gateway service (at least one instance)
- The storage component containing the requested object data
- At least two functioning Metadata Gateway service instances (of the required three)
For high availability of object data or data protection, you should use a storage component with high availability, such as HCP, HCP S Series Node, or AWS S3.
Network availability
You can install each HCP for cloud scale instance with both an internal and an external network interface. To avoid single points of networking failure, you can:
- Configure two external network interfaces in each HCP for cloud scale instance
- Use two switches and connect each network interface to one of them
- Bind the two network interfaces into one virtual network interface in an active-passive configuration
- Install HCP for cloud scale using the virtual network interface
Failure recovery
HCP for cloud scale actively monitors the health and performance of the system and its resources, gives real-time visual health representations, issues alert messages when needed, and automatically takes action to recover from the failure of:
- Instances (nodes)
- Product services (software processes)
- System services (software processes)
- Storage components
Instance failure recovery
If an instance (a compute node) fails, HCP for cloud scale automatically adds new service instances to other available instances (compute nodes) to maintain the minimum number of service instances. Data on the failed instance is not lost and remains consistent. However, while the instance is down, data redundancy might degrade.
HCP for cloud scale adds new service instances automatically only for floating services. Depending on the remaining number of instances and service instances running, you might need to add new service instances or deploy a new instance.
Service failure recovery
HCP for cloud scale monitors service instances and automatically restarts them if they are not healthy.
For floating services, you can configure a pool of eligible HCP for cloud scale instances and the number of service instances that should be running at any time. You can also set the minimum and maximum number of instances running each service. If a service instance failure causes the number of service instances to go below the minimum, HCP for cloud scale starts another service instance on one of the HCP for cloud scale instances in the pool that doesn't already have that service instance running.
Persistent services run on the specific instances that you specify. If a persistent service fails, HCP for cloud scale restarts the service instance in the same HCP for cloud scale instance. HCP for cloud scale does not automatically bring up a new service instance on a different HCP for cloud scale instance.
Storage component failure recovery
HCP for cloud scale performs regular, periodic health verifications to detect storage component failures.
If HCP for cloud scale detects a storage component failure, it sets the storage component state to INACCESSIBLE, so that HCP for cloud scale will not try to write new objects to the storage component, and sends an alert. While a storage component is unavailable, the data in it is not accessible.
HCP for cloud scale continues to verify a failed storage component and, when it detects that the storage component is healthy again, automatically sets its state to ACTIVE. HCP for cloud scale sends an alert when this event happens as well. After the storage component is repaired and brought back online, the data it contains is again accessible and HCP for cloud scale can write new objects to it.
HCP for cloud scale management APIs
The Hitachi Content Platform for cloud scale (HCP for cloud scale) system includes RESTful application programming interfaces (APIs) that you can use to exercise its functions and manage the system.
Anything you can do in the Object Storage Management, S3 Console, or System Management application GUIs you can also do using APIs.
The Object Storage Management application includes a RESTful API for administrative functions such as managing storage components, configuring Amazon S3 settings, and obtaining or revoking S3 user credentials. For more information on the Object Storage Management API, see the MAPI Reference.
The System Management application includes a RESTful API for system management functions such as system monitoring, service monitoring, user registration, and configuration. For more information on the System Management API, see the Swagger interface in the System Management application.
The S3 Console application uses the Amazon S3 API for object functions such as reading and writing, listing, and setting retention or encryption policies on objects. Unless otherwise noted, HCP for cloud scale is fully compatible with the Amazon S3 API. For more information on the S3 API, see the documentation provided by Amazon Web Services.
Object Storage Management API
The Object Storage Management application includes a RESTful API interface for the following functions:
- Managing storage components and Amazon S3 settings
- Managing administrative resources such as serial numbers and system events
- Managing user resources such as S3 user credentials and OAuth tokens
The Object Storage Management API is served by the MAPI Gateway service from any HCP for cloud scale node.
You can execute all functions supported in the Object Storage Management application using the API.
All URLs for the API have the following base, or root, uniform resource identifier (URI):
https://hcpcs_ip_address:9099/mapi/v1
System Management API
The System Management application provides a RESTful API for managing the following:
- Alerts
- Business objects
- Certificates
- Events
- Instances
- Jobs
- Licenses
- Notifications
- Packages
- Plugins
- Security
- Services
- Setup
- Tasks
- Updates
You can execute all functions supported in the System Management application using the API.
Security and authentication
HCP for cloud scale controls access to system functions through user accounts, roles, permissions, and, where user accounts are stored in an external identity provider, by OAuth tokens. All browser pages that make up the system are protected and cannot be reached without authentication. Users who try to reach a system page without authentication are redirected to the login page.
HCP for cloud scale controls access to data by S3 API requests through S3 credentials, ownership, and access control lists. HCP for cloud scale supports in-flight encryption (HTTPS) for all external communications.
User accounts
The initial user account, which has all permissions, is created when you install HCP for cloud scale. The initial user account can perform all HCP for cloud scale functions. After the initial user account is created, you can change its password any time, but you cannot disable the account and you cannot change its permissions.
The initial user is the only local account allowed and is intended only to let you configure an identity provider (IdP). HCP for cloud scale can communicate with IdPs using HTTP or HTTPS. HCP for cloud scale supports multiple IdPs:
- Active Directory
- OpenLDAP
- 389 Directory Server
- LDAP compatible
HCP for cloud scale supports external users defined in the IdP. External users with the appropriate permissions can perform some or all of these functions:
- Log in to the Object Storage Management application and use all functions
- Log in to the System Management application and use all functions
- Get an OAuth token to use all API calls for the Object Storage Management and System Management applications
- Log in to the S3 Console application and get S3 credentials to use the S3 API
HCP for cloud scale discovers the groups in each IdP and allows assigning roles to groups.
HCP for cloud scale uses OAuth2 as a service provider to authenticate single sign-on (SSO) access. SSO lets you use one set of login credentials for all HCP for cloud scale applications, so you can switch between applications without logging in again.
API access
Object Storage Management application API methods need a valid OAuth access token for a user account with suitable permissions, or else the requests are rejected. With one exception, System Management application API methods also require a valid OAuth access token for a user account with suitable permissions, or else the requests are rejected. (The exception is the API method to generate an OAuth token, which requires only a username and password in the body of the request.)
Before using either the Object Storage Management or System Management APIs, you need to obtain an OAuth token. You can generate an OAuth token by sending a request to the OAuth server with your account credentials. Then you can supply the OAuth token in the Authorization header in each request. OAuth tokens are valid for five hours.
S3 API requests generally require valid S3 credentials for users with the right privileges, that is, access control lists (ACLs). (Exceptions are methods configured to allow anonymous access and pre-signed requests.) HCP for cloud scale supports AWS Signature version 4 authentication to include S3 credentials in S3 requests.
Users with a valid account and suitable permissions can generate S3 credentials. You can generate an unlimited number of S3 credentials, but only the last credentials generated are valid. These credentials are associated only with your account. S3 credentials do not have an expiration date, so they are valid until revoked.
Users with a valid account and suitable permissions can revoke all S3 credentials of any user. That is, you can revoke your own S3 credentials or the S3 credentials of any other user. Revocation removes all S3 credentials associated with the account.
Network isolation and port mapping
When you install HCP for cloud scale, you can set up network isolation by configuring one external network and one internal network.
HCP for cloud scale software creates a cluster using commodity x86 servers that are networked using Ethernet. The software uses two networks on the operating system hosting the HCP for cloud scale software. These networks can also use link aggregation defined by the OS administrator.
While two networks provide optimal traffic isolation, you can deploy the software using a single network. The OS administrator must make and implement networking decisions before you install HCP for cloud scale.
HCP for cloud scale services use a range of network ports. You can configure services to use different ports instead of the default ports. Installation is the only opportunity to change the default ports used by services.
- Message Queue
- Tracing Agent
- Tracing Collector
- Tracing Query
For information about installing HCP for cloud scale, see Installing Hitachi Content Platform for Cloud Scale.
Dashboard
The Dashboard service helps you monitor system status. The service displays a set of dashboards, each providing in-depth visibility to different categories of system information.
You can drill down into each dashboard to display additional system information. For information about starting the dashboards, see Starting the dashboards. For more information about the interface, go to the Grafana website at: https://grafana.com/docs/grafana/latest/dashboards/
Dashboard 1: System Health
The System Health dashboard is the primary source of information about the overall status of the system and is designed to be the first location to view for a high-level assessment of system health. This dashboard uses colors to indicate the health of components. If the boxes on the page are mostly green, then the system is in good health. Orange or yellow boxes indicate components with warnings. Blue indicates inactive components. Viewing this color-coded page allows users to identify potential issues at a glance.
Clicking any of the items on the page provides additional information about that component. The main components are displayed on the left-hand side of the page. These include the following:
- Database Partition Health: Check details at Dashboard 6: Services Health.
- Metadata Partition Balance: A database should be distributed evenly among nodes. Any node showing greater than15% imbalance should be investigated.
- S3 I/O Balance: S3 I/O should be distributed evenly among nodes. Any value greater than 15% imbalance should be investigated.
- Database Partitions Per Node: Check details at Dashboard 6: Services Health.
- DLS: Delete Backend Objects policy: Checks whether DLS (Data-Lifestyle Service) is examining objects for DELETE_BACKEND_OBJECTS policy. UNHEALTHY indicates that no activity was detected in the last 24 hours.
Additional information about overall system health, such as service uptime status, metadata space used and available, and the percentage of used metadata capacity, is presented on other areas of the page.
Dashboard 2: System Overview
The System Overview dashboard displays information about the capacity, objects, and the amount of data that has been moved since the last startup. The following information categories are available:
- System Overview
- Object Count Information
- System Capacity Information
Dashboard 3: S3 Activity
The S3 Activity dashboard displays information about overall S3 request activity as well as information about specific S3 requests. The following information categories are available:
- System Overall S3 Activity
- System S3 Ingest Activity
- System S3 Data GET Activity
- System S3 Object Delete Activity
- S3 Failed Request Details
Dashboard 4: Buckets Stats and Activity
The Bucket Stats and Activity dashboard displays information about the buckets in the system, such as used capacity, number of objects, and average object size. The following information categories are available:
- Bucket Stats
- Bucket I/O Activity
Dashboard 5: System Capacity
The System Capacity dashboard displays capacity information for different components of the system such as the number of objects, the amount of HCP for cloud scale capacity that has been used, and the HCP S Series Node capacity of the system. The following information categories are available:
- Object Count Information
- System Capacity Information
- Metadata Capacity
- S-node Detailed Capacity Reports
Dashboard 6: Services Health
NoteThe Services Health dashboard is designed to be used by Support personnel.Dashboard 7: Encryption Status
The Encryption Status dashboard provides information about encryption activity taking place on the system, such as the percentage of objects in the system that are encrypted, the number of client objects on which encryption has been started, completed, or have conflicts, as well as the data encryption key (DEK) encryption count and rate. The following information categories are available:
- System Overall S3 Activity
- Lifecycle Rekey Activity
- DEK Re-Encryption Activity
- S3 System Activity
Logging in
HCP for cloud scale provides one locally defined administrative user account. Any other user accounts reside in a realm provided by external identity providers (IdPs). To log in you need this information:
- The cluster hostname, instance, or IP address of the HCP for cloud scale system that you're using
- Your user name as assigned by your system administrator
- Your password as assigned by your system administrator
- The realm where your user account is defined
Procedure
Open a web browser and go to https://system_address:8000
system_address is the address of the HCP for cloud scale system that you're usingType your username and password.
In the Security Realm field, select the location where your user account is defined.
To log in using the local administrator account, without using an external IdP, select Local. If no IdP is configured yet, Local is the only available option.Click LOGIN.
Results
security/clearCache
to allow immediate login.HCP for cloud scale applications
After you log in, the HCP for cloud scale Applications page shows you the applications you are authorized to use, such as:
- Object Storage Management: Manage and monitor storage components, data objects, alerts, and regions
- S3 Console: Generate S3 access and secret keys; conveniently create and manage buckets, bucket synchronization, and bucket policies; manage S3 event notification; and browse objects in buckets
- System Management (sometimes referred to in the application as the Admin App): Manage and monitor cluster instances, software services, system security, user accounts, and other cluster configuration parameters
From the Applications page, or from within each application, you can switch back and forth between applications as needed.
Switching between applications
HCP for cloud scale uses OAuth2 as a service provider to authenticate single sign-on (SSO) access. You only need one set of login credentials for all HCP for cloud scale applications, so you can switch between applications without logging in again.
Depending on the permissions assigned to your account role, you can have access to one or more HCP for cloud scale application. To switch between applications:
Procedure
Depending on the application you are currently using:
- In the Object Storage Management application, click the app switcher menu (
) and select another application.
- In the System Management application, click the Open menu (
), in the right corner of the top navigation bar, and select another application.
NoteThe System Management application is also identified in the user interface as Admin App.
- In the Object Storage Management application, click the app switcher menu (
Select the application you want to use.
The application opens.
Serial number
You can use the Object Storage Management application or an API method to enter, view, or edit your HCP for cloud scale serial number.
A serial number is required to activate the HCP for cloud scale software. You must enter the serial number before you can use the system or its features.
Entering your serial number
The Object Storage Management application displays the product serial number. An administrative account with appropriate permissions can enter or edit this number.
Object Storage Management application instructions
Procedure
From the Object Storage Management application, select Settings > Serial number.
The SERIAL NUMBER page opens.Enter your serial number into the Serial number field.
Click Save.
Related REST API methods
POST /serial_number/set
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
Viewing your serial number
You can use the Object Storage Management application or an API method to view or return the product serial number.
Object Storage Management application instructions
The product serial number appears in the Object Storage Management application on the SERIAL NUMBER page.
From the Object Storage Management application, select Settings > Serial number.
The SERIAL NUMBER page opens. The serial number appears in the Serial number field.
Related REST API methods
POST /serial_number/get
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
Editing your serial number
The Object Storage Management application manages the product serial number. An administrative account with appropriate permissions can enter or edit this number.
Object Storage Management application instructions
Procedure
From the Object Storage Management application, select Settings > Serial number.
The SERIAL NUMBER page opens, displaying the current serial number in the Serial number field.Update your serial number in the Serial number field.
Click Save.
Results
Related REST API methods
POST /serial_number/set
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
License
You can use the Object Storage Management application or an API method to enter, validate, and display an HCP for cloud scale license.
A license is required before you can activate certain HCP for cloud scale features. You must enter your serial number before you can upload a license.
Uploading a license
The Object Storage Management application can display and upload product licenses. An administrative account with appropriate permissions can upload a license file.
Object Storage Management application instructions
Procedure
From the Object Storage Management application, select Settings > Licensing.
The Licensing page opens.Click Upload license.
The UPLOAD LICENSE page opens, displaying the Select file area.Do one of the following:
- Drag and drop a license file into the Select file area.
- Click Select file, select a license file, and then click Open.
Results

Related REST API methods
POST /license/add
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
Defining subdomain for S3 Console application
The S3 Console application uses a subdomain of the HCP for cloud scale system.
Procedure
On a system that calls the S3 Console application, open the hosts file in an editor.
On a Windows system, the hosts file is normally located at C:\Windows\System32\drivers\etc\hosts. On a Linux system, the hosts file is normally located at /etc/hosts.Associate the IP address of the HCP for cloud scale system with the S3 subdomain.
10.24.19.54 s3.hcpcs.Company.com
Save the file.
Repeat Steps 1-3 for every system used to call the S3 Console application.
About page
The Object Storage Management application About page displays the product version number and a link to the software license terms.
The About page is available from the user profile icon.