Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Getting started

Hitachi Content Platform for cloud scale (HCP for cloud scale) is a software-defined object storage solution that is based on a massively parallel microservice architecture and is compatible with the Amazon Simple Storage Service (S3) application programming interface (API).

Introducing HCP for cloud scale

HCP for cloud scale is especially well suited to service applications requiring high bandwidth and compatibility with the Amazon S3 API.

HCP for cloud scale can federate S3 compatible storage from virtually any private or public source and present the combined capacity in a single, centrally managed, global name space.

You can install HCP for cloud scale on any server, in the cloud or on premise, that supports the minimum requirements.

HCP for cloud scale supports S3 event notification through a graphical user interface or through GET and PUT Bucket Notification configuration.

HCP for cloud scale lets you manage and scale storage components. You can add storage components, monitor their states, and take them online or offline for purposes of maintenance or repair. HCP for cloud scale provides functions to send notification of alerts, track and monitor throughput and performance, and trace actions through the system.

Storage components, buckets, and objects

Storage components

A storage component is an Amazon S3 compatible storage system, running independently, that HCP for cloud scale manages as a back end to store object data. To an S3 client using HCP for cloud scale, the existence, type, and state of storage components are transparent.

HCP for cloud scale supports the following storage systems:

  • Amazon S3
  • Hitachi Content Platform (HCP)
  • HCP S Series Node
  • Any Amazon S3 compatible storage service
Buckets

A bucket is a logical collection of secure data objects that is created and managed by a client application. An HCP for cloud scale bucket is modeled on a storage service bucket. HCP for cloud scale uses buckets to manage storage components. You can think of an HCP for cloud scale system as a logical collection of secure buckets.

Buckets have associated metadata such as ownership and lifecycle status. HCP for cloud scale buckets are owned by an HCP for cloud scale user and access is controlled on a per-bucket basis by Amazon access control lists (ACL) supporting the S3 API. Buckets are contained in a specific region; HCP for cloud scale supports one region.

Note
  1. HCP for cloud scale buckets are not stored in storage components, so HCP for cloud scale clients can create buckets even before adding storage components.
  2. Storage component buckets are created by storage component administrators and are not visible to HCP for cloud scale clients.
  3. To empty and reuse a bucket, don't just delete the bucket and create a new one with the same name. After a bucket is deleted, the name becomes available for anyone to use and another account might take it first. Instead, empty and keep the bucket.
Objects

An object consists of data and associated metadata. The metadata is a set of name-value pairs that describe the object. Every object is contained in a bucket. An object is handled as a single unit by all HCP for cloud scale transactions, services, and internal processes.

For information about Amazon S3, see Introduction to Amazon S3.

S3 Console application

HCP for cloud scale includes an S3 Console application that provides convenient functions for bucket users as an alternative to using S3 API methods:

  • Obtaining S3 credentials
  • Managing bucket synchronization, policies, and rules
  • Creating S3 event notifications to synchronize buckets
  • Managing objects in buckets

For more information, see the S3 Console Guide.

Data access

HCP for cloud scale supports the Amazon S3 API, which lets client applications store and retrieve unlimited amounts of data from configured storage services.

Data access control

HCP for cloud scale uses ownership and access control lists (ACLs) as data access control mechanisms for the S3 API.

Ownership is implemented as follows:

  • An HCP for cloud scale bucket is owned by the user who creates the bucket and the owner cannot be changed.
  • A user has full control of the buckets that user owns.
  • A user has full control of the objects that user creates.
  • A user can list only the buckets that user owns.

ACLs allow the assignment of privileges (read, write, or full control) for access to buckets and objects to other user accounts besides the owner's.

Data security

HCP for cloud scale supports encryption of data sent between systems (that is, data "in flight") and, as a licensed feature, data stored persistently within the system (that is, data "at rest").

Certificate management

HCP for cloud scale uses Secure Sockets Layer (SSL) to provide security for both incoming and outgoing communications. To enable SSL security, two types of certificates are needed:

  • System certificate: the certificate that HCP for cloud scale uses for its GUIs and APIs (for incoming communications)
  • Client certificates: the certificates of IdPs, storage components, and SMTP servers (for outgoing communications)

When the HCP for cloud scale system is installed, it generates and automatically installs a self-signed SSL server system certificate. This certificate is not automatically trusted by web browsers. You can choose to trust this self-signed certificate or replace it by using one of these options:

  1. Upload a PKCS12 certificate chain and password and apply it as the active system certificate.
  2. Download a certificate signing request (CSR) and then use it to obtain, upload, and apply a certificate signed by a certificate authority (CA).
  3. Generate a new self-signed certificate and apply it as the active system certificate.

For client certificates, you need to upload the certificate of each client HCP for cloud scale needs to access using SSL.

You can manage certificates, and view details of installed certificates, using the System Management application.

Data-in-flight encryption

HCP for cloud scale supports data-in-flight encryption (HTTPS) for all external communications. Data-in-flight encryption is always enabled for these data paths:

  • S3 API (HTTP is also enabled on a different port)
  • Management API
  • System Management application graphical user interface (GUI)
  • Object Storage Management application GUI

You can enable or disable data-in-flight encryption for the data paths between HCP for cloud scale and:

  • An identity provider (IdP) server
  • Each application using TLS or SSL
  • Each managed storage component
  • Each SMTP server using SSL or STARTTLS

Communication between HCP for cloud scale instances does not use data-in-flight encryption. Depending on your security needs, you might need to set up an isolated internal network for HCP for cloud scale at your site.

Data-at-rest encryption

HCP for cloud scale stores these kinds of data persistently:

  • HCP for cloud scale services data
  • HCP for cloud scale metadata and user-defined metadata
  • User data (object data)

The first two kinds of data are handled by the hardware on which HCP for cloud scale instances are installed. If needed, you can install HCP for cloud scale on servers with encrypted disks.

Object data is handled by storage components. HCP for cloud scale supports system-wide encryption, using AWS SDK client-side encryption and strong encryption ciphers, as a licensed feature. Encryption and decryption are transparent to users. Each storage component has a separate master key. Storage components that use hardware acceleration for encryption and decryption are supported.

To manage encryption master keys, HCP for cloud scale supports the HashiCorp Vault key management system (KMS) through a KMS client that is automatically deployed as a service when encryption is enabled. After you set up a Vault server, you can enable encryption support on HCP for cloud scale as a global setting and then manage the encryption client service as needed. For information about Vault and how to set up a server, see https://www.hashicorp.com/products/vault.

NoteOnce enabled, encryption can't be disabled.

As an alternative, you can use individual storage components that support data-at-rest encryption. Storage components can self-manage their keys, or HCP for cloud scale can facilitate keys you supply following the Amazon S3 API specification.

Bucket synchronization

Bucket synchronization to a bucket (bucket sync-to) allows automatic, asynchronous copying of objects in buckets in an HCP for cloud scale system to external storage systems. Bucket synchronization from a bucket (bucket sync-from) allows automatic, asynchronous copying of objects in buckets in external storage systems to an HCP for cloud scale bucket.

An external storage system can be another HCP for cloud scale system, AWS, or any S3 compatible system.

Bucket sync-to offers the following advantages:

  • Data protection: Data is well protected against the unavailability or catastrophic failure of a system. Buckets can be synchronized to multiple remote systems of different types. This arrangement can provide geographically distributed data protection (called geo-protection).
  • Data availability: AWS services can access synchronized data directly from AWS.

Bucket sync-from offers the following advantages:

  • Data consolidation: Transformed data can be stored on an HCP for cloud scale system. An HCP for cloud scale system can synchronize data from multiple remote systems of different types.
  • External update: Data can be updated directly in an external system and stored on an HCP for cloud scale system.

Access to bucket synchronization is controlled on a per-user basis by role-based access control (RBAC). Use the System Management application to define users, groups, and roles.

Access to an external resource might need an SSL certificate. You can upload an SSL certificate using the System Management application, the same as for uploading SSL certificates for storage components and IdPs.

For information on bucket synchronizations, see the S3 Console Guide.

Object locking

HCP for cloud scale supports object locking, which prevents specified objects from being deleted. A bucket owner can lock or unlock objects or lock them for a specified time period. This feature implements legal hold and retention period requirements.

Object locking is enabled at the bucket level, either when or after a bucket is created. Once enabled, object locking can't be disabled.

Object locking offers the following advantages:

  • Locked objects can't be deleted. This implements write once, read many (WORM) behavior, which protects objects from accidental or malicious changes.
  • A bucket owner can lock objects until a specified date and time. This implements retention periods, which complies with record retention policy. The retention period can be up to 100 years in the future.
    NoteOnce set, a retention period can be extended, but not shortened or turned off.
  • A bucket owner can lock an object indefinitely, and then turn the lock off. This complies with legal hold requirements. If a legal hold is placed on an object it can't be modified, versioned, moved or deleted, even if it has an expired retention period (that is, a legal hold overrides a retention period). A legal hold never expires, but must instead be removed. An object can have multiple legal holds placed on it.

HCP for cloud scale implements compliance mode as described by the Amazon S3 specification. It does not support governance mode.

NoteUsing S3 PUT Object Lock methods in HCP for cloud scale v1.4 and earlier is not supported. Using the methods might return an HTTP status code of 200 but will not produce the expected behavior. Only use S3 object lock methods after updating to v1.5 or later.

For information on how to lock and unlock objects, see the S3 Console Guide.

S3 Select

HCP for cloud scale supports the S3 Select feature.

HCP for cloud scale supports the S3 Select Object Content method, which allows retrieval of a portion of a structured object by an S3 client such as Apache Spark, Apache Hive, and Presto. The portion of the object returned is selected based on a structured query language (SQL) query sent in the request. The query is performed by S3 storage components that support pushdown. Selecting only the data needed within an object can significantly improve costs, time, and performance.

A request can select serialized object data in these formats:

  • Apache Parquet

A request can return data in these formats:

  • Comma-separated values (CSV)

The client application must have s3:GetObject permission. S3 Select supports reading of encrypted data. The SQL expression can be up to 256 KB. Returned data can be up to 1 MB.

Here is a simple example of a SQL query against a Parquet object. The query returns data for salaries greater than 100,000:

select salary from s3object s where s.salary > 100000

S3 event notification

HCP for cloud scale supports the S3 PUT Bucket notification configuration and GET Bucket notification configuration methods.

HCP for cloud scale can send notifications of specified events in a bucket to a message server for applications to consume. This is a more efficient way to signal changes than periodically scanning objects in a bucket.

HCP for cloud scale supports event notification to signal specified events in buckets. Notifications can be sent to SQS Standard services. A retry mechanism assures highly reliable notifications.

Supported limits

HCP for cloud scale limits the number of instances (nodes) in a system to 160.

HCP for cloud scale does not limit the number of the following entities.

EntityMinimumMaximumNotes
BucketsNoneUnlimited
Users (external)NoneUnlimitedThe local user has access to all functions including MAPI calls and S3 API calls. However, it is best to configure HCP for cloud scale with an identity provider (IdP) with users to enforce role-based access control.
Groups (external)Unlimited
RolesUnlimited
ObjectsNoneUnlimitedThe default size limit for a single PUT or POST object call is 5 GB.
Storage components1Unlimited

High availability

HCP for cloud scale supports high availability for multi-instance sites.

High availability needs at least four service instances: three master instances, which run essential services, and at least one worker instance. The best practice is to run the three master instances on separate physical hardware (or, if running on virtual machines, on at least three separate physical hosts) and to run HCP for cloud scale services on more than one instance.

Scalability of instances, service instances, and storage components

You can increase or decrease the capacity, performance, and availability of HCP for cloud scale by adding or removing the following:

  • Instances: physical computer nodes or virtual machines
  • Service instances: copies of services running on additional instances
  • Storage components: S3 compatible systems used to store object data

In a multi-instance site, you might add instances to improve system performance or if you are running out of storage space on one or more instances. You might remove instances if you are retiring hardware, if an instance is down and cannot be recovered, or if you decide to run fewer instances.

When you add an instance, you can also scale floating services (such as the Metadata Gateway) to the new instance. When you scale a floating service, HCP for cloud scale automatically rebalances itself.

In a multi-instance site, you can manually change where a service instance runs:

  • You can configure it to run on additional instances. For example, you can increase the number of S3 Gateway service instances to improve throughput of S3 API transactions.
  • You can configure it run on fewer instances. For example, you can free computational resources on an instance to run other services.
  • You can configure it to run on different instances. For example, you can move the service instances off a hardware instance to retire the hardware.
  • For a floating service, instead of specifying a specific instance on which it runs, you can specify a pool of eligible instances, any of which can run the service.

Some services have a fixed number of instances and therefore cannot be scaled:

  • Metadata Coordination

You might add storage components to a site under these circumstances:

  • The existing storage components are running out of available capacity
  • The existing storage components do not provide the performance you need
  • The existing storage components do not provide the functionality you need

Site availability

An HCP for cloud scale site has three master instances and thus can tolerate the failure of one master instance without interruption of service. HCP for cloud scale services can continue to function if two or even all three master instances fail. However, you cannot move or scale service instances until master instances are restored.

Service availability

HCP for cloud scale services provide high availability as follows:

  • The Metadata Gateway service always has at least three service instances. When the system starts, the nodes "elect a leader" using the raft consensus algorithm. The other service instances follow the leader. The leader processes all GET and PUT requests. If the followers cannot identify the leader, they elect a new leader. The Metadata Gateway service tolerates service instance failure, and functions without loss of data, as long as at least two service instances are healthy.
  • The Metadata Coordination service always has one service instance. If that instance fails, HCP for cloud scale automatically starts another instance. Until startup is complete, the Metadata Gateway service cannot scale.
  • The Metadata Cache service always has one service instance. If that instance fails, HCP for cloud scale automatically starts another instance. Until startup is complete, overall performance decreases.
  • To protect messaging consistency, the Message Queue service always has three service instances. To prevent being split into disconnected parts, the service shuts down if half of the service instances fail. In practice, messaging stops if two of the three instances fail. Do not let the service run with only two instances, because in that scenario if one of the remaining instances fails, the service shuts down. However, when one of the failed instances restarts, messaging services recover and resume.
  • To maintain access to the encryption key vault, the Key Management Server service uses an active-standby model. One service instance is the active instance and any other service instances are kept as standbys. If the active vault node becomes sealed or unavailable, one of the standbys takes over as active. You can scale up to the number of instances in the HCP for cloud scale system or your acceptable performance limits.

The rest of the HCP for cloud scale services remain available if HCP for cloud scale instances or service instances fail, as long as at least one service instance remains healthy. Even if a service that has only one service instance fails, HCP for cloud scale automatically starts a new service instance.

Metadata availability

Metadata is available as long as these services are available:

  • S3 Gateway
  • Metadata Gateway

Object data availability

Object data is available as long as these items are available:

  • The S3 Gateway service (at least one instance)
  • The storage component containing the requested object data
  • At least two functioning Metadata Gateway service instances (of the required three)

For high availability of object data or data protection, you should use a storage component with high availability, such as HCP, HCP S Series Node, or AWS S3.

Network availability

You can install each HCP for cloud scale instance with both an internal and an external network interface. To avoid single points of networking failure, you can:

  • Configure two external network interfaces in each HCP for cloud scale instance
  • Use two switches and connect each network interface to one of them
  • Bind the two network interfaces into one virtual network interface in an active-passive configuration
  • Install HCP for cloud scale using the virtual network interface

Failure recovery

HCP for cloud scale actively monitors the health and performance of the system and its resources, gives real-time visual health representations, issues alert messages when needed, and automatically takes action to recover from the failure of:

  • Instances (nodes)
  • Product services (software processes)
  • System services (software processes)
  • Storage components

Instance failure recovery

If an instance (a compute node) fails, HCP for cloud scale automatically adds new service instances to other available instances (compute nodes) to maintain the minimum number of service instances. Data on the failed instance is not lost and remains consistent. However, while the instance is down, data redundancy might degrade.

HCP for cloud scale adds new service instances automatically only for floating services. Depending on the remaining number of instances and service instances running, you might need to add new service instances or deploy a new instance.

Service failure recovery

HCP for cloud scale monitors service instances and automatically restarts them if they are not healthy.

For floating services, you can configure a pool of eligible HCP for cloud scale instances and the number of service instances that should be running at any time. You can also set the minimum and maximum number of instances running each service. If a service instance failure causes the number of service instances to go below the minimum, HCP for cloud scale starts another service instance on one of the HCP for cloud scale instances in the pool that doesn't already have that service instance running.

Persistent services run on the specific instances that you specify. If a persistent service fails, HCP for cloud scale restarts the service instance in the same HCP for cloud scale instance. HCP for cloud scale does not automatically bring up a new service instance on a different HCP for cloud scale instance.

Storage component failure recovery

HCP for cloud scale performs regular health checks to detect storage component failures.

If HCP for cloud scale detects a storage component failure, it sets the storage component state to INACCESSIBLE, so that HCP for cloud scale will not try to write new objects to the storage component, and sends an alert. While a storage component is unavailable, the data in it is not accessible.

HCP for cloud scale continues to check a failed storage component and, when it detects that the storage component is healthy again, automatically sets its state to ACTIVE. HCP for cloud scale sends an alert when this event happens as well. Once the storage component is repaired and brought back online, the data it contains is again accessible and HCP for cloud scale can write new objects to it.

Support for the Amazon S3 API

HCP for cloud scale is compatible with the Amazon Simple Storage Service (Amazon S3) REST API, which allows clients to store objects in containers called buckets. A bucket is a collection of objects that has its own settings, such as ownership and lifecycle. Using HCP for cloud scale, users can perform common reads and writes on objects and buckets and manage ACL settings through the client access data service.

For information about using Amazon S3, see the Amazon S3 API documentation.

For information about obtaining S3 user credentials, see the S3 Console Guide.

The following tables list the supported Amazon S3 API features and describe any implementation differences between the Amazon and HCP for cloud scale S3 APIs.

Authentication and addressing
FeatureImplementation differences
Authentication with AWS Signature Version 4 Fully implemented.
Addressing virtual host (such as http://bucket.server/object) Fully implemented.
Addressing Path style (such as http://server/bucket/object ) Fully implemented.
Signed/unsigned payloadFully implemented.
Chunked requestFully implemented.
Presigned URLFully implemented.
Service
FeatureImplementation differences
LIST buckets (GET Service) Fully implemented.
Buckets
FeatureImplementation differences
GET Bucket (list objects) V1 Fully implemented.
GET Bucket (list objects) V2 Fully implemented.
PUT Bucket

To support legacy S3 buckets, HCP for cloud scale supports bucket names of less than three characters.

When anonymous requests to create or delete a bucket use a bucket name that isn't valid, Amazon S3 verifies access first and returns 403. HCP for cloud scale returns 400 if the bucket name validation fails.

DELETE Bucket
HEAD Bucket
PUT Bucket ACLIn Amazon S3 each grantee is specified as a type-value pair, where the type is one of the following:
  • emailAddress if the value specified is the email address of an AWS account
  • id if the value specified is the canonical user ID of an AWS account
  • uri if granting permission to a predefined group
HCP for cloud scale does not support emailAddress. HCP for cloud scale fully supports id. HCP for cloud scale supports uri for the predefined groups Authenticated Users and All Users.

HCP for cloud scale does not support the aws-exec-read canned ACL.

HCP for cloud scale does not mirror or mirror back ACLs or policies.

GET Bucket ACL
List Multipart Uploads Fully implemented.
GET Bucket Lifecycle (except transition action)

HCP for cloud scale supports the latest API for bucket lifecycle management. Old and deprecated V1.0 methods are not supported.

HCP for cloud scale does not support Object Transition actions. Including these actions causes a Malformed XML exception.

PUT Bucket Lifecycle (except transition action)
DELETE Bucket Lifecycle (except transition action)
PUT Bucket Notification ConfigurationA configuration can have to up 100 rules.

Amazon S3 considers that two rules overlap if both apply to the same object and share at least one event type. HCP for cloud scale supports notification from the same object to multiple targets. However, rules are blocked if they send a message for the same event to the same target.

All notification message fields are returned except Region and Glacier Storage. The field awsRegion is returned but left empty.

PUT Bucket ReplicationAmazon S3 allows only one-to-one replication. HCP for cloud scale supports one-to many mirroring and many-to-one mirroring back. The bucket Amazon Resource Name (ARN) is replaced by configuration settings.

For mirroring back, HCP for cloud scale supports one queue server, AMAZON_SQS.

Sending encrypted data to a remote bucket is not supported.

GET Bucket VersioningVersion listing requests do not strictly comply with documented behavior for NextKeyMarker/NextVersionIdMarker. Amazon S3 documentation currently states that these values specify "the first key not returned that satisfies the search criteria." However, HCP for cloud scale specifies the last key returned in the current response. S3 V1 object listings do not call out as specific a requirement and V2 object listings use a continuation token that is opaque to the caller. Internally, HCP for cloud scale shares the same listing logic across all listing types.
GET Bucket Object VersionsFully implemented.
GET Bucket LocationYou must be the bucket owner. The only supported location is us-west-2.
GET Bucket Notification ConfigurationFully implemented.
Object
FeatureImplementation differences
GET ObjectIf a lifecycle policy is configured for a bucket, HCP for cloud scale displays the expiration date of an object (in the x-amz-expiration header) fetched using the subresource ?versionId. Amazon displays this only when performing unversioned GET requests.

Legal hold is fully implemented.

Object retention is fully implemented.

Object names can't contain NUL or backslash (\) characters. GET operations on objects so named fail with a 400 error.

HEAD ObjectIf a lifecycle policy is configured for a bucket, HCP for cloud scale displays the expiration date of an object (in the x-amz-expiration header) fetched using the subresource ?versionId. Amazon only displays this when performing unversioned HEAD requests.
PUT Object

Amazon S3 limits the maximum file size for a single PUT or POST object call to 5 GB. In HCP for cloud scale, this value is configurable and the default is 5 GB.

Amazon S3 is extremely liberal in what is accepted for the Content-Type of an object. HCP for cloud scale adds additional content-type validation.

Bucket synchronization is supported.

Legal hold is fully implemented. The bucket owner can set a legal hold without restriction; that is, AWS object lock permissions are not supported.

Object retention is implemented, but not governance mode; that is, a retain-until date can be set but not changed. The bucket owner can set object retention without restriction; that is, AWS object lock permissions are not supported.

Object locking can be applied to a bucket even after it's created. To enable object locking, in the S3 API PUT Bucket ObjectLockConfiguration, include the URI request parameter x-amz-bucket-object-lock-token (with any string).

Object names can't contain NUL or backslash (\) characters. PUT operations on objects so named fail with a 400 error.

PUT Object (Copy)Conditional headers are not supported. Server-side encryption is not supported. Multiple AWS regions are not supported; as a result, cross-region limitations are not supported.
PUT Object (Part Copy)Conditional headers are not supported. Server-side encryption is not supported.
Object and version encoding Amazon S3 object and version listing documentation mentions the ability to pass an encoding parameter so the object name in the response XML to the client can be escaped to avoid names containing XML characters that aren't valid. Encoding is documented only as it applies to object names, not Owner/DisplayNames. Also, escaping for Bucket Listing requests isn't mentioned.

The Owner/DisplayName is a concern because user display names might contain characters that can cause XML parsing issues. Amazon might be able to restrict display names, though it does not currently return a display name for all regions. HCP for cloud scale uses IdPs and thus doesn't control restriction.

Bucket name restrictions should prevent problematic bucket names from being created. For security, HCP for cloud scale passes the user display name through a URI encoder before returning the name in an XML response.

Object tagging Amazon S3 wraps eTags in double quotes. For XML listings (v1 object, v2 object, version) double quotes are escaped. For example:

<ETag>&quot;32c81604d07395b1aa39a7e206c3af06$quot;</ETag>

HCP for cloud scale doesn't need to do this because only attributes, not double quotes, need to be escaped within content.

Expiration date URL encoding (x-amz-expiration header)

HCP for cloud scale URL-encodes the RuleID portion of the x-amz-expiration header using the same encoding strategy that Amazon suggests for V4 authentication. This strategy can result in encoded strings that do not exactly match how Amazon encodes RuleIDs. However, decoding them should always return the original strings.

HCP for cloud scale mirrors and mirrors back object tagging and tag updates.
GET Object ACL

Bucket synchronization is not supported.

In Amazon S3, each grantee is specified as a type-value pair, where the type is one of the following:

  • emailAddress if the value specified is the email address of an AWS account
  • id if the value specified is the canonical user ID of an AWS account
  • uri if granting permission to a predefined group

HCP for cloud scale does not support emailAddress. HCP for cloud scale fully supports id. HCP for cloud scale supports uri for the predefined groups Authenticated Users and All Users.

HCP for cloud scale does not support the aws-exec-read canned ACL.

PUT Object ACL
DELETE Object

Bucket synchronization of deletion of an object or a specific version of an object is not supported.

To improve performance, if the current version of an object is a delete marker, HCP for cloud scale doesn't create another delete marker.

DELETE Multiple Objects

Fully implemented.

Bucket synchronization is not supported.

POST Object

Fully implemented.

Amazon S3 limits the maximum file size for a single PUT or POST object call to 5 GB. In HCP for cloud scale, this value is configurable and the default is 5 GB.

Bucket synchronization is supported.

POST Select Object Content

Scan range is supported.

HCP for cloud scale supports the use of * by itself with no alias reference. For example, this SQL query is supported:

select *, first_name from s3object s where s.salary > 100000 limit 10

HCP for cloud scale supports a wider range of date-time formats than AWS. The full list is available at https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

HCP for cloud scale supports nested aggregate functions. For example, this expression is supported: count(sum(s.salary))

HCP for cloud scale SQL queries on columns are case sensitive, while AWS SQL queries are case insensitive. For example, given an object s with the columns ID, iD, and id, an SQL query to select s.id will return column id in HCP for cloud scale but column ID in AWS.

Only input serialization of Parquet is supported. Requests for CSV or JSON objects are not supported and return an error.

Parquet compression is managed automatically, so the CompressionType argument is not needed, and if specified returns an error.

Only CSV output is supported. Specifying another output format returns an error.

Initiate/Complete/Abort Multipart Upload

Fully implemented.

Bucket synchronization is supported.

Upload Part Fully implemented.
List Multipart Uploads Fully implemented.
Unsupported S3 API methods

HCP for cloud scale does not support the following Amazon S3 API methods:

Authentication

  • Authentication v2 (deprecated by AWS)

Bucket

  • GET/PUT/DELETE Bucket website
  • GET/PUT/DELETE Bucket policy
  • GET/PUT/DELETE Bucket tagging
  • GET/PUT/DELETE Bucket CORS (cross-origin resource sharing)
  • PUT Bucket versioning (with HCP for cloud scale versioning is always on)
  • GET/PUT Bucket logging
  • GET Bucket notification
  • GET/PUT Bucket requestPayment
  • GET/PUT/DELETE Bucket Inventory
  • List Bucket Inventory Configurations
  • GET/DELETE Bucket metrics
  • List Bucket Metrics Configurations
  • GET/PUT/DELETE Bucket analytics
  • List Bucket Analytics Configurations
  • PUT/GET Bucket accelerate
  • Server-side encryption with customer-provided encryption keys (SSE-C)
  • Server-side encryption with storage-managed encryption keys (SSE-S3)

Object

  • Options object
  • GET/POST Object torrent

HCP for cloud scale APIs

The Hitachi Content Platform for cloud scale (HCP for cloud scale) system includes RESTful application programming interfaces (APIs) that you can use for writing applications that exercise its functions and manage the system.

Anything you can do in the Object Storage Management, S3 User Credentials, or System Management application GUIs you can also do using APIs.

Object Storage Management API

The Object Storage Management application includes a RESTful API to administrative functions such as managing storage components, configuring Amazon S3 settings, and obtaining or revoking S3 user credentials. For more information on the Object Storage Management API, see the MAPI Reference.

System Management API

The System Management application includes a RESTful API to system management functions such as system monitoring, service monitoring, user registration, and configuration. For more information on the System Management API, see the online help in the System Management application.

Amazon S3 API

Unless otherwise noted, HCP for cloud scale is fully compatible with the Amazon S3 API.

Object Storage Management API

The Object Storage Management application includes a RESTful API interface for the following functions:

  • Managing storage components and Amazon S3 settings
  • Managing administrative resources such as serial numbers and system events
  • Managing user resources such as S3 user credentials and OAuth tokens

The Object Storage Management API is served by the MAPI Gateway service from any HCP for cloud scale node.

You can execute all functions supported in the Object Storage Management application using the API.

NoteThe system configuration, management, and monitoring functions included in the System Management application can be performed using the System Management API.

All URLs for the API have the following base, or root, uniform resource identifier (URI):

https://hcpcs_ip_address:9099/mapi/v1

System Management API

The System Management application provides a RESTful API for managing the following:

  • Alerts
  • Business objects
  • Certificates
  • Events
  • Instances
  • Jobs
  • Licenses
  • Notifications
  • Packages
  • Plugins
  • Security
  • Services
  • Setup
  • Tasks
  • Updates

You can execute all functions supported in the System Management application using the API.

Security and authentication

HCP for cloud scale controls access to system functions through user accounts, roles, permissions, and OAuth tokens, where user accounts are stored in an external identity provider. HCP for cloud scale controls access to data by S3 API requests through S3 credentials, ownership, and access control lists. HCP for cloud scale supports in-flight encryption (HTTPS) for all external communications.

User accounts

The initial user account, which has all permissions, is created when you install HCP for cloud scale. The initial user account can perform all HCP for cloud scale functions. After the initial user account is created, you can change its password any time, but you cannot disable the account and you cannot change its permissions.

The initial user is the only local account allowed and is intended only to let you configure an identity provider (IdP). HCP for cloud scale can communicate with IdPs using HTTP or HTTPS. HCP for cloud scale supports multiple IdPs:

  • Active Directory
  • OpenLDAP
  • 389 Directory Server
  • LDAP compatible

HCP for cloud scale supports external users defined in the IdP. External users with the appropriate permissions can perform some or all of these functions:

  • Log in to the Object Storage Management application and use all functions
  • Log in to the System Management application and use all functions
  • Get an OAuth token to use all API calls for the Object Storage Management and System Management applications
  • Log in to the S3 User Credentials application and get S3 credentials to use the S3 API

HCP for cloud scale discovers the groups in each IdP and allows assigning roles to groups.

HCP for cloud scale uses OAuth2 as a service provider to authenticate single sign-on (SSO) access. SSO lets you use one set of login credentials for all HCP for cloud scale applications, so you can switch between applications without logging in again.

API access

Object Storage Management application API methods need a valid OAuth access token for a user account with suitable permissions, or else the requests are rejected. With one exception, System Management application API methods also require a valid OAuth access token for a user account with suitable permissions, or else the requests are rejected. (The exception is the API method to generate an OAuth token, which requires only a username and password in the body of the request.)

Before using either the Object Storage Management or System Management APIs, you need to obtain an OAuth token. You can generate an OAuth token by sending a request to the OAuth server with your account credentials. Then you can supply the OAuth token in the Authorization header in each request. OAuth tokens are valid for five hours.

NoteAn administrator can revoke all OAuth tokens for any other HCP for cloud scale user. You would do this, for example, if an employee leaves the company, you delete the user account, and you do not want to wait for the account tokens to expire.

S3 API requests generally require valid S3 credentials for users with the right privileges, that is, access control lists (ACLs). (Exceptions are methods configured to allow anonymous access and pre-signed requests.) HCP for cloud scale supports AWS Signature version 4 authentication to include S3 credentials in S3 requests.

Users with a valid account and suitable permissions can generate S3 credentials. You can generate an unlimited number of S3 credentials, but only the last credentials generated are valid. These credentials are associated only with your account. S3 credentials do not have an expiration date, so they are valid until revoked.

Users with a valid account and suitable permissions can revoke all S3 credentials of any user. That is, you can revoke your own S3 credentials or the S3 credentials of any other user. Revocation removes all S3 credentials associated with the account.

NoteDeleting a user account from the IdP does not revoke S3 credentials, and if a user's S3 credentials are revoked the user can still generate new credentials. The best practice is to delete the user account from the IdP and then revoke the S3 credentials.

Network isolation and port mapping

When you install HCP for cloud scale, you can set up network isolation by configuring one external network and one internal network.

HCP for cloud scale software creates a cluster using commodity x86 servers that are networked using Ethernet. The software uses two networks on the operating system hosting the HCP for cloud scale software. These networks can also use link aggregation defined by the OS administrator.

While two networks provide optimal traffic isolation, you can deploy the software using a single network. The OS administrator must make and implement networking decisions before you install HCP for cloud scale.

HCP for cloud scale services use a range of network ports. You can configure services to use different ports instead of the default ports. Installation is the only opportunity to change the default ports used by services.

NoteThe following services must be deployed with their default port values:
  • Message Queue
  • Tracing Agent
  • Tracing Collector
  • Tracing Query

For information about installing HCP for cloud scale, see Installing Hitachi Content Platform for Cloud Scale.

Logging in

User accounts reside in an external identity provider (IdP). To log in you need this information:

  • The IP address of the HCP for cloud scale instance that you're using
  • Your user name as assigned by your system administrator
  • Your password as assigned by your system administrator
  • The security realm where your user account is defined

Procedure

  1. Open a web browser and go to https://instance_ip_address:8000

    instance_ip_address is the IP address of the HCP for cloud scale instance you're using
  2. Type your username and password.

  3. In the Security Realm field, select the location where your user account is defined.

    To log in using the local administrator account, without using an external IdP, select Local. If no IdP is configured yet, Local is the only available option.
  4. Click LOGIN.

Results

The Applications page opens.
NoteWhen a new user is created and added to a group, that user might not have immediate access to HCP for cloud scale. Instead, login fails with the message "Not authorized. Please contact your system administrator." Verify the credentials. If the condition persists, the system administrator can use the API method security/clearCache to allow immediate login.

HCP for cloud scale applications

After you log in, the HCP for cloud scale Applications page shows you the applications you are authorized to use, such as:

  • Object Storage Management: Manage and monitor storage components, data objects, alerts, and regions
  • S3 Console: Generate S3 access and secret keys; conveniently create and manage buckets, bucket synchronization, and bucket policies; manage S3 event notification; and browse objects in buckets
  • System Management (sometimes referred to in the application as the Admin App): Manage and monitor cluster instances, software services, system security, user accounts, and other cluster configuration parameters

Applications page, showing links to the applications you can choose from: Object Storage Management, System Management, and S3 Console

From the Applications page, or from within each application, you can switch back and forth between applications as needed.

Switching between applications

HCP for cloud scale uses OAuth2 as a service provider to authenticate single sign-on (SSO) access. You only need one set of login credentials for all HCP for cloud scale applications, so you can switch between applications without logging in again.

Depending on the permissions assigned to your account role, you can have access to one or more HCP for cloud scale application. To switch between applications:

Procedure

  1. Click the Open menu (Open menu icon (three horizontal lines)), in the right corner of the top navigation bar, and select the application you want to use.

    NoteThe System Management application is also identified in the user interface as Admin-App.
    The application opens.

Providing unseal keys to KMS service

When encryption is enabled for a HCP for cloud scale system, the Key Management System service provides encryption keys for storage components. If the service restarts, the key repository vault closes and data objects can't be decrypted. If a vault instance becomes sealed, you must provide a quorum of unseal keys (three of the five provided when encryption was first enabled) to reopen the vault and resume encryption and decryption.

CautionDon't try to initialize the vault manually outside of HCP for cloud scale. Doing so results in data loss.

To unseal the vault:

Procedure

  1. Select Global Settings.

    The Global Settings page opens.
  2. Click Unseal Vault.

    The Unseal Vault Instances window opens.
  3. Enter the first unseal key (a master key portion) by pasting or typing it into the field Unseal Key 1.

    The key is validated. You can't leave the field blank.
  4. Enter a second unseal key into the field Unseal Key 2.

    The key is validated. You can't leave the field blank. Each key must be different.
  5. Enter a third unseal key into the field Unseal Key 3.

    The key is validated. You can't leave the field blank. Each key must be different.
  6. Click Unseal.

Results

The vault is unsealed.

Serial number

You can use the Object Storage Management application or API to enter and display your HCP for cloud scale serial number.

A serial number is required to activate the HCP for cloud scale software. You must enter the serial number before you can use the system or its licensed features.

Entering your serial number

The Object Storage Management application displays the product serial number. An administrative account with appropriate permissions can enter or edit this number.

Object Storage Management application instructions

To enter your product serial number:

Procedure

  1. Select Global Settings and then click the Edit icon next to the Serial Number field.

    The Add Serial Number window opens.
  2. Type your serial number and then click Add.

Related REST API methods

POST /serial_number/set

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Displaying your serial number

You can use the Object Storage Management application or API to displays the product serial number.

Object Storage Management application instructions

The product serial number is displayed in the upper right corner of the Dashboard page.

Related REST API methods

POST /serial_number/get

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

License

You can use the Object Storage Management application or an API method to enter and validate your HCP for cloud scale license.

A license is required before you can activate certain HCP for cloud scale features. You must enter your serial number before you can upload your license.

Uploading your license

The Object Storage Management application displays your product licenses. An administrative account with appropriate permissions can upload a license file.

Object Storage Management application instructions

To upload a product license file:

Procedure

  1. Select Global Settings and then click Upload in the License field.

    The Upload License area opens, displaying the status of your licenses.
  2. Do one of the following:

    • Drag a license file into the Upload License area.
    • Click Choose File, select a license file, and then click Open.
  3. After you've selected the license file click Submit.

Results

The license file is decrypted and validated. If the license is valid you see the message, "License is valid." If the license is invalid or expired you see an error message.

Related REST API methods

POST /license/list

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Enabling encryption

Encryption is an available licensed feature. You must obtain and upload a license to enable encryption.

An administrative account with appropriate permissions can enable encryption. All objects added to the system after encryption is enabled are encrypted on all storage components.

NoteEncryption is a global setting. Once enabled, you can't turn off encryption or unencrypt either storage components or the objects stored on them.

Before you enable encryption, set up a Vault server. Enabling encryption generates encryption keys, an initial root token, and a set of five unseal keys. When establishing a connection to the Vault server, HCP for Cloud Scale provides the initial root token for authentication and root access.

Vault doesn't store the generated master key. Instead, each time the Vault server starts it uses the unseal keys to regenerate the master key, which is then used to return storage component encryption keys. If the Vault server goes down, it seals the vault, and to regenerate the master key you must provide a quorum of at least three valid unseal keys.

CautionIf you don't provide a quorum of unseal keys to reconstruct the master key, Vault remains sealed, so the master key is unavailable and encrypted storage components can't be decrypted. To ensure encryption security, the best practice is to encrypt and store unseal keys separately.

You can enable encryption using the Object Storage Management application or a management API method.

Object Storage Management application instructions

To enable encryption:

Procedure

  1. Select Global Settings.

    The Global Settings page opens. The page displays information about the key management server connection.
  2. In the Encryption section of the page, select Encryption.

    You are prompted that turning on encryption is permanent and can't be disabled.
  3. Click Proceed.

    If the key management server is not yet available you receive an error message. Try again when the server is available.The Vault Unsealing Information window opens, displaying five unseal keys and your initial root token.
  4. Click Copy for each unseal key and save the keys elsewhere.

    This window is the only time that all of this data is ever known by Vault and also the only time that the unseal keys should ever appear together. To minimize the possibility of multiple keys becoming unavailable, the best practice is securely distribute, encrypt, and store the unseal keys in as many different locations.
  5. Click Copy for the initial root token and save it elsewhere.

  6. Click Close.

    You are warned that you won't have another opportunity to record the unseal keys and the initial root token.
  7. Click Okay.

    The Key Management Server service is initialized, a connection to the KMS server is established, and storage component encryption keys are generated and applied.

Results

You have enabled encryption.

Related REST API methods

POST /s3_encryption/set

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Defining subdomain for S3 Console application

The S3 Console application uses a subdomain of the HCP for cloud scale system.

The S3 Console application uses a subdomain within the HCP for cloud scale system, such as s3.hcpcs.Company.com. For user convenience, you can modify the hosts file on systems used to call the S3 Console application.

Procedure

  1. On a system that calls the S3 Console application, open the hosts file in an editor.

    On a Windows system, the hosts file is normally located at C:\Windows\System32\Drivers\etc\hosts. On a Linux system, the hosts file is normally located at /etc/hosts.
  2. Associate the IP address of the HCP for cloud scale system with the S3 subdomain.

    10.24.19.54 s3.hcpcs.Company.com
  3. Save the file.

  4. Repeat Steps 1-3 for every system used to call the S3 Console application.