Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Getting started

Hitachi Content Platform for cloud scale (HCP for cloud scale) is a software-defined object storage solution that is based on a massively parallel microservice architecture and is compatible with the Amazon Simple Storage Service (S3) application programming interface (API).

Introducing HCP for cloud scale

HCP for cloud scale is especially well suited to service applications requiring high bandwidth and compatibility with the Amazon S3 API.

HCP for cloud scale can federate S3 compatible storage from virtually any private or public source and present the combined capacity in a single, centrally managed, global name space.

You can install HCP for cloud scale on any server, in the cloud or on premise, that supports the minimum requirements.

HCP for cloud scale supports S3 event notification through a graphical user interface or through GET and PUT Bucket Notification configuration.

HCP for cloud scale lets you manage and scale storage components. You can add storage components, monitor their states, and take them online or offline for purposes of maintenance or repair. HCP for cloud scale provides functions to send notification of alerts, track and monitor throughput and performance, and trace actions through the system.

Storage components, buckets, and objects

Storage components

A storage component is an Amazon S3 compatible storage provider, running independently, that HCP for cloud scale manages as a back end to store object data. To an S3 client using HCP for cloud scale, the existence, type, and state of storage components are transparent.

HCP for cloud scale supports the following storage systems:

  • Amazon S3
  • Hitachi Content Platform (HCP)
  • HCP S Series Node
  • Any Amazon S3 compatible storage service
Buckets

A bucket is a logical collection of secure data objects that is created and managed by a client application. An HCP for cloud scale bucket is modeled on a storage service bucket. HCP for cloud scale uses buckets to manage storage components. You can think of an HCP for cloud scale system as a logical collection of secure buckets.

Buckets have associated metadata such as ownership and lifecycle status. HCP for cloud scale buckets are owned by an HCP for cloud scale user and access is controlled on a per-bucket basis by Amazon access control lists (ACL) supporting the S3 API. Buckets are contained in a specific region; HCP for cloud scale supports one region.

Note
  1. HCP for cloud scale buckets are not stored in storage components, so HCP for cloud scale clients can create buckets even before adding storage components.
  2. Storage component buckets are created by storage component administrators and are not visible to HCP for cloud scale clients.
  3. To empty and reuse a bucket, don't just delete the bucket and create a new one with the same name. After a bucket is deleted, the name becomes available for anyone to use and another account might take it first. Instead, empty and keep the bucket.
Objects

An object consists of data and associated metadata. The metadata is a set of name-value pairs that describe the object. Every object is contained in a bucket. An object is handled as a single unit by all HCP for cloud scale transactions, services, and internal processes.

For information about Amazon S3, see Introduction to Amazon S3.

S3 Console application

HCP for cloud scale includes an S3 Console application that provides convenient functions for bucket users as an alternative to using S3 API methods:

  • Obtaining S3 credentials
  • Managing bucket synchronization, policies, and rules
  • Creating S3 event notifications to synchronize buckets
  • Managing objects in buckets, singly and in bulk

For more information, see the S3 Console Guide.

Strongly consistent object listing

HCP for cloud scale supports strong consistency in object listing. After a write, upload, or delete operation, a list operation shows the changes immediately. Strong consistency supports big-data analytics applications and applications originally written for storage environments of smaller scale.

Data access

HCP for cloud scale supports the Amazon S3 API, which lets client applications store and retrieve unlimited amounts of data from configured storage services.

Data access control

HCP for cloud scale uses ownership and access control lists (ACLs) as data access control mechanisms for the S3 API.

Ownership is implemented as follows:

  • An HCP for cloud scale bucket is owned by the user who creates the bucket and the owner cannot be changed.
  • A user has full control of the buckets that user owns.
  • A user has full control of the objects that user creates.
  • A user can list only the buckets that user owns.

ACLs allow the assignment of privileges (read, write, or full control) for access to buckets and objects to other user accounts besides the owner's.

Data security

HCP for cloud scale supports encryption of data sent between systems (that is, data "in flight") and, as a licensed feature, data stored persistently within the system (that is, data "at rest").

Certificate management

HCP for cloud scale uses Secure Sockets Layer (SSL) to provide security for both incoming and outgoing communications. To enable SSL security, two types of certificates are needed:

  • System certificate: the certificate that HCP for cloud scale uses for its GUIs and APIs (for incoming communications)
  • Client certificates: the certificates of IdPs, storage components, and SMTP servers (for outgoing communications)

When the HCP for cloud scale system is installed, it generates and automatically installs a self-signed SSL server system certificate. This certificate is not automatically trusted by web browsers. You can choose to trust this self-signed certificate or replace it by using one of these options:

  1. Upload a PKCS12 certificate chain and password and apply it as the active system certificate.
  2. Download a certificate signing request (CSR) and then use it to obtain, upload, and apply a certificate signed by a certificate authority (CA).
  3. Generate a new self-signed certificate and apply it as the active system certificate.

For client certificates, upload the certificate of each client HCP for cloud scale needs to access using SSL.

You can manage certificates, and view details of installed certificates, using the System Management application.

Data-in-flight encryption

HCP for cloud scale supports data-in-flight encryption for the HTTPS protocol for all external communications. Data-in-flight encryption is always enabled for these data paths:

  • S3 API (HTTP is also enabled on a different port)
  • Management API
  • System Management application graphical user interface (GUI)
  • Object Storage Management application GUI

You can enable or disable data-in-flight encryption for the data paths between HCP for cloud scale and:

  • An identity provider (IdP) server
  • Each application using TLS or SSL
  • Each managed storage component
  • Each SMTP server using SSL or STARTTLS

Communication between HCP for cloud scale instances does not use data-in-flight encryption. Depending on the security needs, you might need to set up an isolated internal network for HCP for cloud scale at the site.

Data-at-rest encryption

HCP for cloud scale stores these kinds of data persistently:

  • HCP for cloud scale services data
  • HCP for cloud scale metadata and user-defined metadata
  • User data (object data)

The first two kinds of data are handled by the hardware on which HCP for cloud scale instances are installed. If needed, you can install HCP for cloud scale on servers with encrypted disks.

Object data is handled by storage components. HCP for cloud scale supports system-wide encryption, using AWS SDK client-side encryption and strong encryption ciphers, as a licensed feature. Encryption and decryption are transparent to users. Each storage component has a separate master key. Storage components that use hardware acceleration for encryption and decryption are supported.

To manage encryption master keys, HCP for cloud scale supports the HashiCorp Vault key management system (KMS) through a KMS client that is automatically deployed as a service when encryption is enabled. After you set up a Vault server, you can enable encryption support on HCP for cloud scale as a global setting and then manage the encryption client service as needed. For information about Vault and how to set up a server, see https://www.hashicorp.com/products/vault.

CautionOnce enabled, encryption can't be disabled.

As an alternative, you can use individual storage components that support data-at-rest encryption. Storage components can self-manage their keys, or HCP for cloud scale can facilitate keys you supply following the Amazon S3 API specification.

Bucket synchronization

Bucket synchronization to a bucket (bucket sync-to or mirroring) allows automatic, asynchronous copying of objects in a bucket in an HCP for cloud scale system to an external storage system. Bucket synchronization from a bucket (bucket sync-from or mirroring back) allows automatic, asynchronous copying of objects in a bucket in an external storage system to an HCP for cloud scale bucket. Objects larger than 5 GB are synchronized (both sync-to and sync-from) using multi-part uploads.

An external storage system can be another HCP for cloud scale system, AWS, or any S3 compatible system.

Bucket sync-to offers the following advantages:

  • Data protection: Data is well protected against the unavailability or catastrophic failure of a system. A bucket can be synchronized to a remote system of a different type. This arrangement can provide geographically distributed data protection (called geo-protection).
    NoteAll rules must share a single, common destination bucket. If more than one destination appears in the collection of rules the policy is rejected.
  • Data availability: AWS services can access synchronized data directly from AWS.

Bucket sync-from offers the following advantages:

  • Data consolidation: Transformed data can be stored on an HCP for cloud scale system. An HCP for cloud scale system can synchronize data from multiple remote systems of different types.
  • External update: Data can be updated directly in an external system and stored on an HCP for cloud scale system.

Access to bucket synchronization is controlled on a per-user basis by role-based access control (RBAC). Use the System Management application to define users, groups, and roles.

Access to an external resource might need an SSL certificate. You can upload an SSL certificate using the System Management application, the same as for uploading SSL certificates for storage components and IdPs.

For information on configuring bucket synchronizations, see the S3 Console Guide.

Object locking

HCP for cloud scale supports object locking, which prevents specified objects from being deleted. A bucket owner can lock or unlock objects or lock them for a specified time period. This feature implements legal hold and retention period requirements.

Object locking is enabled at the bucket level, either when or after a bucket is created. Once enabled, object locking can't be disabled.

Object locking offers the following advantages:

  • Locked objects can't be deleted. This implements write once, read many (WORM) behavior, which protects objects from accidental or malicious changes.
  • A bucket owner can lock objects until a specified date and time. This implements retention periods, which complies with record retention policy. The retention period can be up to 100 years in the future.
    NoteOnce set, a retention period can be extended, but not shortened or turned off.
  • A bucket owner can lock an object indefinitely, and then turn the lock off. This complies with legal hold requirements. If a legal hold is placed on an object it can't be modified, versioned, moved or deleted, even if it has an expired retention period (that is, a legal hold overrides a retention period). A legal hold never expires, but must instead be removed. An object can have multiple legal holds placed on it.

HCP for cloud scale implements compliance mode as described by the Amazon S3 specification. It does not support governance mode.

NoteUsing S3 PUT Object Lock methods in HCP for cloud scale v1.4 and earlier is not supported. Using the methods might return an HTTP status code of 200 but will not produce the expected behavior. Only use S3 object lock methods after updating to v1.5 or later.

For information on how to lock and unlock objects, see the S3 Console Guide.

Capacity monitoring

You can monitor estimated available system-wide or per-storage component capacity.

The Storage page, in the Object Storage Management application, displays the total, used, and estimated available (free) capacity for HCP S Series Node storage components configured in the system, as well as the changes in the past week. If all the storage components are HCP S Series nodes, the page displays the total, used, and estimated available capacity of the entire system. You can set capacity threshold alarms that visually indicate, and can also send an alert message or email notification, if the capacity of an HCP S Series Node storage component, or the system as a whole, reaches a threshold. This page provides a single monitoring point for the entire HCP for cloud scale system, and the information displayed helps you with capacity planning.

Screenshot of capacity monitoring information displayed for storage components: cards for system total capacity, used capacity, and free capacity; a bar showing used versus free capacity for each HCP S Series Node storage component, with the bar over the warning threshold and displayed in red; and a graph of the number of active objects during thepast two hours.

The Metadata Gateway service periodically gathers storage component capacity metrics. If the used capacity for a single HCP S Series Node storage component or the entire system of HCP S Series Node storage components rises above a specified level, the system displays an alert. You can configure the alert threshold.

The calculation of used capacity includes:

  • HCP S Series Node storage components configured for capacity monitoring
  • Storage components set to read-only status
  • Storage components that are inactive

The calculation of available system capacity does not include:

  • HCP S Series Node storage components not configured for capacity monitoring
  • Storage components other than HCP S Series Node storage components
  • Storage components set to read-only status
  • Storage components that are inactive
NoteMetrics for capacity usage are for Metadata Gateway instances only, so adding used capacity to estimated available capacity will not equal the total capacity on the system. Also, multiple services are running on a system instance, and all sharing the disk capacity. Therefore, the available capacity for the Metadata Gateway service on one node can be consumed by a different service running on the same node.

Using capacity information, you can be alerted and take action if a storage component is reaching capacity. You can determine if the system can support an increase in stored data (for example, as expected from a new customer). You can understand the balance of usage and capacity across storage components. You can plan for the orderly addition of additional capacity.

Chargeback reports

Chargeback reports detail how system storage capacity is used, per user or bucket.

HCP for cloud scale provides storage usage reports for objects on the system. Authorized users can generate a report for one or more of the buckets they own. Authorized administrators can generate a report for a user or a list of one or more buckets. Reports can detail hourly, daily, or monthly usage.

Chargeback reports let you create invoices or bills for bucket owners, or delegate that task to others.

How usage is calculated

Metrics for bucket size and number of objects are stored persistently. Storage usage is calculated at the granularity of byte-hours and can be reported by hour, day, or month.

For example, if a user stores 100 GB (107374182400 bytes) of standard storage data in a bucket for the first 15 days in March, and 100 TB (109951162777600 bytes) of standard storage data for the final 16 days in March, the usage is 42259901212262,400 byte-hours. The calculation is as follows:

First calculate the total byte-hour usage:

[107374182400 bytes × 15 days × (24 hours/day)] + 
 [109951162777600 bytes × 16 days × (24 hours/day)] = 
42259901212262400 byte-hours

Then convert byte-hours to GB-months:

42259901212262400 byte-hours ÷ 
          (1073741824 bytes/GB) ÷ 
                  (24 hours/day) ÷ 
                  (31 days in March) =
                52900 GB-months
Usage reports

Storage capacity usage is reported in either a user report or a system report.

  • The user report gives storage usage for any or all buckets defined in the system that the user owns.
  • The system report gives storage usage for any or all buckets defined in the system.

Within each report you can specify which fields appear.

S3 Select

HCP for cloud scale supports the S3 Select feature.

HCP for cloud scale supports the S3 Select Object Content method, which allows retrieval of a portion of a structured object by an S3 client such as Apache Spark, Apache Hive, and Presto. The portion of the object returned is selected based on a structured query language (SQL) query sent in the request. The query is performed by S3 storage components that support pushdown. Selecting only the data needed within an object can significantly improve costs, time, and performance.

A request can select serialized object data in these formats:

  • Apache Parquet

A request can return data in these formats:

  • Comma-separated values (CSV)

The client application must have the permission s3:GetObject. S3 Select supports reading encrypted data. The SQL expression can be up to 256 KB, and can return up to 1 MB of data.

Here is a simple example of a SQL query against a Parquet object. The query returns data for salaries greater than 100000:

select salary from s3object s where s.salary > 100000

S3 event notification

HCP for cloud scale supports the S3 PUT Bucket notification configuration and GET Bucket notification configuration methods.

HCP for cloud scale can send notifications of specified events in a bucket to a message server for applications to consume. This is a more efficient way to signal changes than periodically scanning objects in a bucket.

HCP for cloud scale supports event notification to signal specified events in buckets. Notifications can be sent to AWS SQS Standard services, Kafka, or RabbitMQ. A retry mechanism assures highly reliable notifications.

Supported limits

HCP for cloud scale limits the number of instances (nodes) in a system to 160.

HCP for cloud scale does not limit the number of the following entities.

EntityMinimumMaximumNotes
BucketsNoneUnlimitedA user can own up to 1000 buckets.
Users (external)NoneUnlimitedThe local user has access to all functions including MAPI and S3 API methods. However, it's best to configure HCP for cloud scale with an identity provider (IdP) with users to enforce role-based access control.
Groups (external)Unlimited
RolesUnlimited
ObjectsNoneUnlimitedThe size limit for an object is 5 TB.
Storage components1Unlimited

High availability

HCP for cloud scale supports high availability for multi-instance sites.

High availability needs at least four service instances: three master instances, which run essential services, and at least one worker instance. The best practice is to run the three master instances on separate physical hardware (or, if running on virtual machines, on at least three separate physical hosts) and to run HCP for cloud scale services on more than one instance.

Scalability of instances, service instances, and storage components

You can increase or decrease the capacity, performance, and availability of HCP for cloud scale by adding or removing the following:

  • Instances: physical computer nodes or virtual machines
  • Service instances: copies of services running on additional instances
  • Storage components: S3 compatible systems used to store object data

In a multi-instance site, you might add instances to improve system performance or if you are running out of storage space on one or more instances. You might remove instances if you are retiring hardware, if an instance is down and cannot be recovered, or if you decide to run fewer instances.

When you add an instance, you can also scale floating services (such as the S3 Gateway) to the new instance. When you scale a floating service, HCP for cloud scale automatically rebalances itself.

In a multi-instance site, you can manually change where a service instance runs:

  • You can configure it to run on additional instances. For example, you can increase the number of S3 Gateway service instances to improve throughput of S3 API transactions.
  • You can configure it run on fewer instances. For example, you can free computational resources on an instance to run other services.
  • You can configure it to run on different instances. For example, you can move the service instances off a hardware instance to retire the hardware.
  • For a floating service, instead of specifying a specific instance on which it runs, you can specify a pool of eligible instances, any of which can run the service.

Some services have a fixed number of instances and therefore cannot be scaled:

  • Metadata Coordination

You might add storage components to a site under these circumstances:

  • The existing storage components are running out of available capacity
  • The existing storage components do not provide the performance you need
  • The existing storage components do not provide the functionality you need

Site availability

An HCP for cloud scale site has three master instances and thus can tolerate the failure of one master instance without interruption of service.

If a site with only two healthy master instances experiences an outage of another master instance (for example, if services restart or the entire instance or operating system restarts), it goes into a degraded state until all three master instances are restored.

Service availability

HCP for cloud scale services provide high availability as follows:

  • The Metadata Gateway service always has at least three service instances. When the system starts, the nodes "elect a leader" using the raft consensus algorithm. The other service instances follow the leader. The leader processes all GET and PUT requests. If the followers cannot identify the leader, they elect a new leader. The Metadata Gateway service tolerates the failure of one service instance without interruption. If more than one service instance is unavailable, some data can become unavailable until the instance recovers.
  • The Metadata Coordination service always has one service instance. If that instance fails, HCP for cloud scale automatically starts another instance. Until startup is complete, the Metadata Gateway service cannot scale.
  • The Metadata Cache service always has one service instance. If that instance fails, HCP for cloud scale automatically starts another instance. Until startup is complete, overall performance decreases.
  • To protect messaging consistency, the Message Queue service always has three service instances. To prevent being split into disconnected parts, the service shuts down if half of the service instances fail. In practice, messaging stops if two of the three instances fail. Do not let the service run with only two instances, because in that scenario if one of the remaining instances fails, the service shuts down. However, when one of the failed instances restarts, messaging services recover and resume.
  • To maintain access to the encryption key vault, the Key Management Server service uses an active-standby model. One service instance is the active instance and any other service instances are kept as standbys. If the active vault node becomes sealed or unavailable, one of the standbys takes over as active. You can scale up to the number of instances in the HCP for cloud scale system or your acceptable performance limits.

The rest of the HCP for cloud scale services remain available if HCP for cloud scale instances or service instances fail, as long as at least one service instance remains healthy. Even if a service that has only one service instance fails, HCP for cloud scale automatically starts a new service instance.

Metadata availability

Metadata is available as long as these services are available:

  • S3 Gateway
  • Metadata Gateway

Object data availability

Object data is available as long as these items are available:

  • The S3 Gateway service (at least one instance)
  • The storage component containing the requested object data
  • At least two functioning Metadata Gateway service instances (of the required three)

For high availability of object data or data protection, you should use a storage component with high availability, such as HCP, HCP S Series Node, or AWS S3.

Network availability

You can install each HCP for cloud scale instance with both an internal and an external network interface. To avoid single points of networking failure, you can:

  • Configure two external network interfaces in each HCP for cloud scale instance
  • Use two switches and connect each network interface to one of them
  • Bind the two network interfaces into one virtual network interface in an active-passive configuration
  • Install HCP for cloud scale using the virtual network interface

Failure recovery

HCP for cloud scale actively monitors the health and performance of the system and its resources, gives real-time visual health representations, issues alert messages when needed, and automatically takes action to recover from the failure of:

  • Instances (nodes)
  • Product services (software processes)
  • System services (software processes)
  • Storage components

Instance failure recovery

If an instance (a compute node) fails, HCP for cloud scale automatically adds new service instances to other available instances (compute nodes) to maintain the minimum number of service instances. Data on the failed instance is not lost and remains consistent. However, while the instance is down, data redundancy might degrade.

HCP for cloud scale adds new service instances automatically only for floating services. Depending on the remaining number of instances and service instances running, you might need to add new service instances or deploy a new instance.

Service failure recovery

HCP for cloud scale monitors service instances and automatically restarts them if they are not healthy.

For floating services, you can configure a pool of eligible HCP for cloud scale instances and the number of service instances that should be running at any time. You can also set the minimum and maximum number of instances running each service. If a service instance failure causes the number of service instances to go below the minimum, HCP for cloud scale starts another service instance on one of the HCP for cloud scale instances in the pool that doesn't already have that service instance running.

Persistent services run on the specific instances that you specify. If a persistent service fails, HCP for cloud scale restarts the service instance in the same HCP for cloud scale instance. HCP for cloud scale does not automatically bring up a new service instance on a different HCP for cloud scale instance.

Storage component failure recovery

HCP for cloud scale performs regular, periodic health verifications to detect storage component failures.

If HCP for cloud scale detects a storage component failure, it sets the storage component state to INACCESSIBLE, so that HCP for cloud scale will not try to write new objects to the storage component, and sends an alert. While a storage component is unavailable, the data in it is not accessible.

HCP for cloud scale continues to verify a failed storage component and, when it detects that the storage component is healthy again, automatically sets its state to ACTIVE. HCP for cloud scale sends an alert when this event happens as well. After the storage component is repaired and brought back online, the data it contains is again accessible and HCP for cloud scale can write new objects to it.

Support for the Amazon S3 API

HCP for cloud scale is compatible with the Amazon Simple Storage Service (Amazon S3) REST API, which lets clients store objects in buckets. A bucket is a container of objects that has its own settings, such as ownership and lifecycle. Using HCP for cloud scale, users can perform common reads and writes on objects and buckets and manage ACL settings through the client access data service.

For information about using Amazon S3, see the Amazon S3 API documentation.

For information about obtaining S3 user credentials, see the S3 Console Guide.

The following tables list the supported Amazon S3 API features and describe any implementation differences between the Amazon and HCP for cloud scale S3 APIs.

Authentication and addressing
FeatureImplementation differences
Authentication with AWS Signature Version 4 Fully implemented.
Addressing virtual host style (such as http://bucket.server/object) Fully implemented.
Addressing path style (such as http://server/bucket/object ) Fully implemented.
Signed/unsigned payloadFully implemented.
Chunked requestFully implemented.
Presigned URLFully implemented.
Service
FeatureImplementation differences
LIST buckets (GET Service) Fully implemented.
Buckets
FeatureImplementation differences
GET Bucket (list objects) V1 Fully implemented.
GET Bucket (list objects) V2 Fully implemented.
PUT Bucket

To support legacy S3 buckets, HCP for cloud scale supports bucket names of less than three characters.

When anonymous requests to create or remove a bucket use a bucket name that isn't valid, Amazon S3 verifies access first and returns 403. HCP for cloud scale returns 400 if the bucket name validation fails.

DELETE Bucket
HEAD Bucket
PUT Bucket ACLIn Amazon S3 each grantee is specified as a type-value pair, where the type is one of the following:
  • emailAddress if the value specified is the email address of an AWS account
  • id if the value specified is the canonical user ID of an AWS account
  • uri if granting permission to a predefined group
HCP for cloud scale does not support emailAddress. HCP for cloud scale fully supports id. HCP for cloud scale supports uri for the predefined groups Authenticated Users and All Users.

HCP for cloud scale does not support the Amazon S3 predefined grant ("canned ACL") aws-exec-read.

HCP for cloud scale supports the canned ACL authenticated-read-write.

HCP for cloud scale does not mirror or mirror back ACLs or policies.

GET Bucket ACL
List Multipart Uploads Fully implemented.
GET Bucket Lifecycle (except transition action)

HCP for cloud scale supports the latest API for bucket lifecycle management. Old and deprecated V1.0 methods are not supported.

HCP for cloud scale does not support Object Transition actions. Including these actions causes a Malformed XML exception.

PUT Bucket Lifecycle (except transition action)
DELETE Bucket Lifecycle (except transition action)
PUT Bucket Notification ConfigurationA configuration can have to up 100 rules.

Amazon S3 considers that two rules overlap if both apply to the same object and share at least one event type. HCP for cloud scale supports notification from the same object to multiple targets. However, rules are blocked if they send a message for the same event to the same target.

All notification message fields are returned except Region and Glacier Storage. The field awsRegion is returned but left empty.

PUT Bucket ReplicationHCP for cloud scale supports one-to many mirroring and many-to-one mirroring back. The bucket Amazon Resource Name (ARN) is replaced by configuration settings.

For mirroring back, HCP for cloud scale supports one queue server, AMAZON_SQS.

Sending encrypted data to a remote bucket is not supported.

GET Bucket VersioningReturns the bucket versioning configuration and status (always on).
GET Bucket Object VersionsVersion listing requests do not strictly comply with documented behavior for NextKeyMarker/NextVersionIdMarker. Amazon S3 documentation currently states that these values specify "the first key not returned that satisfies the search criteria." However, HCP for cloud scale specifies the last key returned in the current response. S3 V1 object listings do not call out as specific a requirement and V2 object listings use a continuation token that is opaque to the caller. Internally, HCP for cloud scale shares the same listing logic across all listing types.
GET Bucket LocationYou must be the bucket owner.
GET Bucket Notification ConfigurationFully implemented.
Object
NoteThe characters null (NUL) and backslash (\) are not supported in object keys for S3 operations. If you include those characters in an operation it fails with the error 400 BadRequest. Don't use null or backslash characters in object keys.
FeatureImplementation differences
GET ObjectIf a lifecycle policy is configured for a bucket, HCP for cloud scale displays the expiration date of an object (in the x-amz-expiration header) fetched using the subresource ?versionId.

Legal hold is fully implemented.

Object retention is fully implemented.

Object names cannot contain NUL or backslash (\) characters. GET methods on objects so named fail with a 400 error.

HEAD ObjectIf a lifecycle policy is configured for a bucket, HCP for cloud scale displays the expiration date of an object (in the x-amz-expiration header) fetched using the subresource ?versionId.
PUT Object

In HCP for cloud scale, the maximum file size for a single PUT object call is configurable through the management API. The maximum and default is 5 GB.

Amazon S3 is liberal in what is accepted for the Content-Type of an object. HCP for cloud scale adds additional content-type validation.

Bucket synchronization is supported.

Legal hold is fully implemented. AWS object lock permissions are not supported; that is, a bucket owner can set a legal hold without restriction.

Object retention is implemented, but not governance mode; that is, once a retain-until date is set, it can be extended but not removed. AWS object lock permissions are not supported; that is, a bucket owner can set object retention without restriction.

Object locking can be applied to a bucket even after it's created. To enable object locking, in the S3 API PUT Bucket ObjectLockConfiguration, include the URI request parameter x-amz-bucket-object-lock-token (with any string).

Object names cannot contain NUL or backslash (\) characters. PUT methods on objects so named fail with a 400 error.

PUT Object (Copy)Conditional headers are not supported. Server-side encryption is not supported.
PUT Object (Part Copy)Conditional headers are not supported. Server-side encryption is not supported.
Object and version encoding Amazon S3 object and version listing documentation mentions the ability to pass an encoding parameter so the object name in the response XML to the client can be escaped to avoid names containing XML characters that are not valid. Encoding is documented only as it applies to object names, not Owner/DisplayNames. Also, escaping for Bucket Listing requests is not mentioned.

The Owner/DisplayName is a concern because user display names might contain characters that can cause XML parsing issues. Amazon S3 does not currently return a display name for all regions. HCP for cloud scale uses IdPs and thus does not control restriction.

Bucket name restrictions should prevent problematic bucket names from being created. For security, HCP for cloud scale passes the user display name through a URI encoder before returning the name in an XML response.

Object tagging Amazon S3 wraps eTags in double quotes. For XML listings (v1 object, v2 object, version) double quotes are escaped. For example:

<ETag>&quot;32c81604d07395b1aa39a7e206c3af06$quot;</ETag>

HCP for cloud scale does not do this because only attributes, not double quotes, need to be escaped within content.

Expiration date URL encoding (x-amz-expiration header)

HCP for cloud scale URL-encodes the RuleID portion of the header x-amz-expiration using the same encoding strategy that Amazon suggests for V4 authentication. This strategy can result in encoded strings that do not exactly match how Amazon encodes RuleIDs. However, decoding them should always return the original strings.

HCP for cloud scale synchronizes (mirrors and mirrors back) existing object tags. Subsequent PUT Object tagging requests are not synchronized, so tags added to, removed from, or updated on a synchronized object are not synchronized. To ensure that tags are processed as expected, it's best to set them when creating objects rather than creating objects and then setting their tags.
GET Object ACL

Bucket synchronization is not supported.

In Amazon S3, each grantee is specified as a type-value pair, where the type is one of the following:

  • emailAddress if the value specified is the email address of an AWS account
  • id if the value specified is the canonical user ID of an AWS account
  • uri if granting permission to a predefined group

HCP for cloud scale does not support emailAddress. HCP for cloud scale fully supports id. HCP for cloud scale supports uri for the predefined groups Authenticated Users and All Users.

HCP for cloud scale does not support the aws-exec-read canned ACL.

PUT Object ACL
DELETE Object

Bucket synchronization or removal of an object or a specific version of an object is not supported.

To improve performance, if the current version of an object is a delete marker, HCP for cloud scale does not create another delete marker.

DELETE Multiple Objects

Fully implemented.

Bucket synchronization is not supported.

POST Object

Fully implemented.

In HCP for cloud scale, the maximum file size for a single POST object call is configurable through the management API. The maximum and default is 5 GB.

Bucket synchronization is supported.

POST Select Object Content

Scan range is supported.

HCP for cloud scale supports the use of * by itself with no alias reference. For example, this SQL query is supported:

select *, first_name from s3object s where s.salary > 100000 limit 10

HCP for cloud scale supports a wider range of date-time formats than AWS. The full list is available at https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

HCP for cloud scale supports nested aggregate functions. For example, this expression is supported: count(sum(s.salary))

HCP for cloud scale SQL queries on columns are case sensitive, while AWS SQL queries are case insensitive. For example, given an object s with the columns ID, iD, and id, an SQL query to select s.id will return column id in HCP for cloud scale but column ID in AWS.

Only input serialization of Parquet is supported. Requests for CSV or JSON objects are not supported and return an error.

Parquet compression is managed automatically, so the CompressionType argument is not needed, and if specified returns an error.

Only CSV output is supported. Specifying another output format returns an error.

AWS calculates the size of a record returned in an S3 Select query as the total size of the record, including any delimiters. HCP for cloud scale calculates the size as the total data of each column returned. These calculations can sometimes differ slightly.

Initiate/Complete/Abort Multipart Upload

Fully implemented.

Upload Part Fully implemented.
List Multipart Uploads Fully implemented.
Unsupported S3 API methods

HCP for cloud scale does not support the following Amazon S3 API methods:

Authentication

  • Authentication v2 (deprecated by AWS)

Bucket

  • GET/PUT/DELETE Bucket website
  • GET/PUT/DELETE Bucket policy
  • GET/PUT/DELETE Bucket tagging
  • GET/PUT/DELETE Bucket CORS (cross-origin resource sharing)
  • PUT Bucket versioning (with HCP for cloud scale versioning is always on)
  • GET/PUT Bucket logging
  • GET Bucket notification
  • GET/PUT Bucket requestPayment
  • GET/PUT/DELETE Bucket Inventory
  • List Bucket Inventory Configurations
  • GET/DELETE Bucket metrics
  • List Bucket Metrics Configurations
  • GET/PUT/DELETE Bucket analytics
  • List Bucket Analytics Configurations
  • PUT/GET Bucket accelerate
  • Server-side encryption with customer-provided encryption keys (SSE-C)
  • Server-side encryption with storage-managed encryption keys (SSE-S3)

Object

  • Options object
  • GET/POST Object torrent

HCP for cloud scale APIs

The Hitachi Content Platform for cloud scale (HCP for cloud scale) system includes RESTful application programming interfaces (APIs) that you can use for writing applications that exercise its functions and manage the system.

Anything you can do in the Object Storage Management, S3 Console, or System Management application GUIs you can also do using APIs.

Object Storage Management API

The Object Storage Management application includes a RESTful API to administrative functions such as managing storage components, configuring Amazon S3 settings, and obtaining or revoking S3 user credentials. For more information on the Object Storage Management API, see the MAPI Reference.

System Management API

The System Management application includes a RESTful API to system management functions such as system monitoring, service monitoring, user registration, and configuration. For more information on the System Management API, see the Swagger interface in the System Management application.

Amazon S3 API

Unless otherwise noted, HCP for cloud scale is fully compatible with the Amazon S3 API.

Object Storage Management API

The Object Storage Management application includes a RESTful API interface for the following functions:

  • Managing storage components and Amazon S3 settings
  • Managing administrative resources such as serial numbers and system events
  • Managing user resources such as S3 user credentials and OAuth tokens

The Object Storage Management API is served by the MAPI Gateway service from any HCP for cloud scale node.

You can execute all functions supported in the Object Storage Management application using the API.

NoteThe system configuration, management, and monitoring functions included in the System Management application can be performed using the System Management API.

All URLs for the API have the following base, or root, uniform resource identifier (URI):

https://hcpcs_ip_address:9099/mapi/v1

System Management API

The System Management application provides a RESTful API for managing the following:

  • Alerts
  • Business objects
  • Certificates
  • Events
  • Instances
  • Jobs
  • Licenses
  • Notifications
  • Packages
  • Plugins
  • Security
  • Services
  • Setup
  • Tasks
  • Updates

You can execute all functions supported in the System Management application using the API.

Security and authentication

HCP for cloud scale controls access to system functions through user accounts, roles, permissions, and, where user accounts are stored in an external identity provider, by OAuth tokens. All browser pages that make up the system are protected and cannot be reached without authentication. Users who try to reach a system page without authentication are redirected to the login page.

HCP for cloud scale controls access to data by S3 API requests through S3 credentials, ownership, and access control lists. HCP for cloud scale supports in-flight encryption (HTTPS) for all external communications.

User accounts

The initial user account, which has all permissions, is created when you install HCP for cloud scale. The initial user account can perform all HCP for cloud scale functions. After the initial user account is created, you can change its password any time, but you cannot disable the account and you cannot change its permissions.

The initial user is the only local account allowed and is intended only to let you configure an identity provider (IdP). HCP for cloud scale can communicate with IdPs using HTTP or HTTPS. HCP for cloud scale supports multiple IdPs:

  • Active Directory
  • OpenLDAP
  • 389 Directory Server
  • LDAP compatible

HCP for cloud scale supports external users defined in the IdP. External users with the appropriate permissions can perform some or all of these functions:

  • Log in to the Object Storage Management application and use all functions
  • Log in to the System Management application and use all functions
  • Get an OAuth token to use all API calls for the Object Storage Management and System Management applications
  • Log in to the S3 Console application and get S3 credentials to use the S3 API

HCP for cloud scale discovers the groups in each IdP and allows assigning roles to groups.

HCP for cloud scale uses OAuth2 as a service provider to authenticate single sign-on (SSO) access. SSO lets you use one set of login credentials for all HCP for cloud scale applications, so you can switch between applications without logging in again.

API access

Object Storage Management application API methods need a valid OAuth access token for a user account with suitable permissions, or else the requests are rejected. With one exception, System Management application API methods also require a valid OAuth access token for a user account with suitable permissions, or else the requests are rejected. (The exception is the API method to generate an OAuth token, which requires only a username and password in the body of the request.)

Before using either the Object Storage Management or System Management APIs, you need to obtain an OAuth token. You can generate an OAuth token by sending a request to the OAuth server with your account credentials. Then you can supply the OAuth token in the Authorization header in each request. OAuth tokens are valid for five hours.

NoteAn administrator can revoke all OAuth tokens for any other HCP for cloud scale user. You would do this, for example, if an employee leaves the company, you delete the user account, and you do not want to wait for the account tokens to expire.

S3 API requests generally require valid S3 credentials for users with the right privileges, that is, access control lists (ACLs). (Exceptions are methods configured to allow anonymous access and pre-signed requests.) HCP for cloud scale supports AWS Signature version 4 authentication to include S3 credentials in S3 requests.

Users with a valid account and suitable permissions can generate S3 credentials. You can generate an unlimited number of S3 credentials, but only the last credentials generated are valid. These credentials are associated only with your account. S3 credentials do not have an expiration date, so they are valid until revoked.

Users with a valid account and suitable permissions can revoke all S3 credentials of any user. That is, you can revoke your own S3 credentials or the S3 credentials of any other user. Revocation removes all S3 credentials associated with the account.

NoteDeleting a user account from the IdP does not revoke S3 credentials, and if a user's S3 credentials are revoked the user can still generate new credentials. The best practice is to delete the user account from the IdP and then revoke the S3 credentials.

Network isolation and port mapping

When you install HCP for cloud scale, you can set up network isolation by configuring one external network and one internal network.

HCP for cloud scale software creates a cluster using commodity x86 servers that are networked using Ethernet. The software uses two networks on the operating system hosting the HCP for cloud scale software. These networks can also use link aggregation defined by the OS administrator.

While two networks provide optimal traffic isolation, you can deploy the software using a single network. The OS administrator must make and implement networking decisions before you install HCP for cloud scale.

HCP for cloud scale services use a range of network ports. You can configure services to use different ports instead of the default ports. Installation is the only opportunity to change the default ports used by services.

NoteThe following services must be deployed with their default port values:
  • Message Queue
  • Tracing Agent
  • Tracing Collector
  • Tracing Query

For information about installing HCP for cloud scale, see Installing Hitachi Content Platform for Cloud Scale.

Logging in

HCP for cloud scale provides one locally defined administrative user account. Any other user accounts reside in a realm provided by external identity providers (IdPs). To log in you need this information:

  • The cluster hostname, instance, or IP address of the HCP for cloud scale system that you're using
  • Your user name as assigned by your system administrator
  • Your password as assigned by your system administrator
  • The realm where your user account is defined

Procedure

  1. Open a web browser and go to https://system_address:8000

    system_address is the address of the HCP for cloud scale system that you're using
  2. Type your username and password.

  3. In the Security Realm field, select the location where your user account is defined.

    To log in using the local administrator account, without using an external IdP, select Local. If no IdP is configured yet, Local is the only available option.
  4. Click LOGIN.

Results

The Applications page opens.
NoteWhen a new user is created and added to a group, that user might not have immediate access to HCP for cloud scale. Instead, login fails with the message "Not authorized. Please contact your system administrator." Verify the credentials. If the condition persists, the system administrator can use the API method security/clearCache to allow immediate login.

HCP for cloud scale applications

After you log in, the HCP for cloud scale Applications page shows you the applications you are authorized to use, such as:

  • Object Storage Management: Manage and monitor storage components, data objects, alerts, and regions
  • S3 Console: Generate S3 access and secret keys; conveniently create and manage buckets, bucket synchronization, and bucket policies; manage S3 event notification; and browse objects in buckets
  • System Management (sometimes referred to in the application as the Admin App): Manage and monitor cluster instances, software services, system security, user accounts, and other cluster configuration parameters

Applications page, showing links to the applications you can choose from: Object Storage Management, S3 Console, and System Management

From the Applications page, or from within each application, you can switch back and forth between applications as needed.

Switching between applications

HCP for cloud scale uses OAuth2 as a service provider to authenticate single sign-on (SSO) access. You only need one set of login credentials for all HCP for cloud scale applications, so you can switch between applications without logging in again.

Depending on the permissions assigned to your account role, you can have access to one or more HCP for cloud scale application. To switch between applications:

Procedure

  1. Depending on the application you are currently using:

    • In the Object Storage Management application, click the app switcher menu (App Switcher menu (a nine-dot square) lets you select another application) and select another application.
    • In the System Management application, click the Open menu (Open menu icon (three horizontal lines)), in the right corner of the top navigation bar, and select another application.
      NoteThe System Management application is also identified in the user interface as Admin App.
  2. Select the application you want to use.

    The application opens.

Providing unseal keys to KMS service

When encryption is enabled for a HCP for cloud scale system, the Key Management System service provides encryption keys for storage components. If the service restarts, the key repository vault closes and data objects can't be decrypted. If a vault instance becomes sealed, you must provide a quorum of unseal keys (three of the five provided when encryption was first enabled) to reopen the vault and resume encryption and decryption.

CautionDon't try to initialize the vault manually outside of HCP for cloud scale. Doing so results in data loss.

Procedure

  1. From the Object Storage Management application, select Settings > Encryption.

    The ENCRYPTION page opens.
  2. In the UNSEAL VAULT INSTANCES section, enter the first unseal key into the Unseal key 1 field.

    The key is validated. You can't leave the field blank.
  3. Enter a second unseal key into the Unseal key 2 field.

    The key is validated. You can't leave the field blank. Each key must be different.
  4. Enter a third unseal key into the Unseal key 3 field.

    The key is validated. You can't leave the field blank. Each key must be different.
  5. Click Unseal vault.

Results

The vault is unsealed.

Serial number

You can use the Object Storage Management application or an API method to enter and display your HCP for cloud scale serial number.

A serial number is required to activate the HCP for cloud scale software. You must enter the serial number before you can use the system or its licensed features.

Entering your serial number

The Object Storage Management application displays the product serial number. An administrative account with appropriate permissions can enter or edit this number.

Object Storage Management application instructions

Procedure

  1. From the Object Storage Management application, select Settings > Serial number.

    The SERIAL NUMBER page opens.
  2. Enter your serial number into the Serial number field.

  3. Click Save.

Related REST API methods

POST /serial_number/set

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Displaying your serial number

You can use the Object Storage Management application or an API method to displays the product serial number.

Object Storage Management application instructions

The product serial number is displayed in the Object Storage Management application on the SERIAL NUMBER page.

Related REST API methods

POST /serial_number/get

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

License

You can use the Object Storage Management application or an API method to enter and validate your HCP for cloud scale license.

A license is required before you can activate certain HCP for cloud scale features. You must enter your serial number before you can upload your license.

Uploading your license

The Object Storage Management application displays your product licenses. An administrative account with appropriate permissions can upload a license file.

Object Storage Management application instructions

Procedure

  1. From the Object Storage Management application, select Settings > Licensing.

    The Licensing page opens.
  2. Click Upload license.

    The UPLOAD LICENSE page opens, displaying the Select file area
  3. Do one of the following:

    • Drag and drop a license file into the Select file area.
    • Click Select file, select a license file, and then click Open.
    The license file is decrypted and validated and appears on the Licensing page.

Related REST API methods

POST /license/list

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Enabling encryption

Encryption is an available licensed feature. You must obtain and upload a license to enable encryption.

An administrative account with appropriate permissions can enable encryption. All objects added to the system after encryption is enabled are encrypted on all storage components.

NoteEncryption is a global setting. Once enabled, you can't turn off encryption or decrypt either storage components or the objects stored on them.

Before you enable encryption, set up a Vault server. Enabling encryption generates encryption keys, an initial root token, and a set of five unseal keys. When establishing a connection to the Vault server, HCP for Cloud Scale provides the initial root token for authentication and root access.

Vault doesn't store the generated master key. Instead, each time the Vault server starts it uses the unseal keys to regenerate the master key, which is then used to return storage component encryption keys. If the Vault server goes down, it seals the vault, and to regenerate the master key you must provide a quorum of at least three valid unseal keys.

CautionIf you don't provide a quorum of unseal keys to reconstruct the master key, Vault remains sealed, so the master key is unavailable and encrypted storage components can't be decrypted. To ensure encryption security, the best practice is to encrypt and store unseal keys separately.

You can enable encryption using the Object Storage Management application or a management API method.

CautionIf two accounts try to set the encryption flag simultaneously, either using the Object Storage Management application or the management API method /s3_encryption/set, existing storage components can become inaccessible.
  • If you intend to use encryption, set it before defining storage components.
  • If you have already defined storage components and intend to use encryption, do not try to set encryption from multiple accounts, or by multiple calls to the API method /s3_encryption/set, simultaneously.

After enabling encryption, restart (repair) the S3 Gateway and Policy Engine services.

Object Storage Management application instructions

Procedure

  1. From the Object Storage Management application, select Settings > Encryption.

    The ENCRYPTION page opens. The page displays information about the key management server connection.
  2. In the ENCRYPTION section, click Enable.

    You are prompted that turning on encryption is permanent.
  3. Click Enable to confirm.

    NoteYou receive an error message if the KMS service has stopped or is unable to complete the request. You might also receive an error message if the key management server is not yet available. In this case, try again when the server is available.
    The Vault unsealing window opens, displaying your initial root token and five unseal keys.
  4. Click Copy for the initial root token and save it elsewhere.

  5. Click Copy for each unseal key and save the keys elsewhere.

    ImportantThis window is the only time that all of this data is ever known by Vault and also the only time that the unseal keys should ever appear together. To minimize the possibility of multiple keys becoming unavailable, the best practice is securely distribute, encrypt, and store the unseal keys in as many different locations.
  6. Click Close.

    You are warned that you won't have another opportunity to record the unseal keys and the initial root token.
  7. Click Continue.

    The Key Management Server service is initialized, a connection to the KMS server is established, the storage component encryption keys are generated and applied, and encyption is enabled.

Next steps

After enabling encryption, restart (repair) the S3 Gateway and Policy Engine services.

Related REST API methods

POST /s3_encryption/set

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Defining subdomain for S3 Console application

The S3 Console application uses a subdomain of the HCP for cloud scale system.

The S3 Console application uses a subdomain within the HCP for cloud scale system, such as s3.hcpcs.Company.com. For user convenience, you can modify the hosts file on systems used to call the S3 Console application.

Procedure

  1. On a system that calls the S3 Console application, open the hosts file in an editor.

    On a Windows system, the hosts file is normally located at C:\Windows\System32\drivers\etc\hosts. On a Linux system, the hosts file is normally located at /etc/hosts.
  2. Associate the IP address of the HCP for cloud scale system with the S3 subdomain.

    10.24.19.54 s3.hcpcs.Company.com
  3. Save the file.

  4. Repeat Steps 1-3 for every system used to call the S3 Console application.

About page

The Object Storage Management application About page displays the product version number and a link to the software license terms.

The About page is available from the user profile icon.

NoteVersion information is also displayed on the main login page.