Services
Services perform functions essential to the health and function of the Hitachi Content Platform for cloud scale (HCP for cloud scale) system. The System Management application enables management of services.
For example, the S3 Gateway service serves S3 API methods and communicates with storage components, while the Watchdog service ensures that other services remain running.
Services provide cluster management and coordination, metadata coordination and caching, and external gateways.
Internally, services run in Docker containers on the instances of the system. The container orchestration framework supports cloud or on-premise deployment.
HCP for cloud scale is designed around an adaptive service deployment model that changes based on workload.
The starting point for storage component management is the Dashboard page of the System Management application. The procedures in this module begin at this page.
Service categories
Services are grouped into categories depending on what actions they perform.
Services are grouped into these categories:
- Product services enable HCP for cloud scale functions. For example, the S3 Gateway service serves S3 API methods and communicates with storage components. You can scale, move, and reconfigure product services.
- System services maintain the health and availability of the HCP for cloud scale system. For example, the Watchdog service ensures that other services remain running. You cannot scale, move, or reconfigure system services.
HCP for cloud scale services
The following table describes the services that HCP for cloud scale runs. Each service runs within its own Docker container. For each service, the table lists:
- Configuration settings: The settings you can configure for the service.
- RAM needed per instance: The amount of RAM that, by default, the service needs on each instance on which it's deployed. For all services except for System services, this value is also the default Docker value of Container Memory for the service.
- Number of instances: Shows both:
- The minimum number of instances on which a service must run to function properly.
- The best number of instances on which a service shouldD run. If the system includes more than the minimum number of instances, you should take advantage of the instances by running services on them.
- Service unit cost: For HCP for cloud scale, you can safely ignore these values.
- Whether the service is stateful (that is, it saves data permanently to disk) or stateless (that is, it does not save data to disk).
- Whether the service is persistent (that is, it must run on a specific instance) or supports floating (that is, it can run on any instance).
- Whether the service is scalable or not.
Service name and description | Configuration settings (changes cause the service to redeploy) | Properties |
Product services: These services perform HCP for cloud scale functions. You can move and reconfigure these services. | ||
Cassandra
Decentralized database, used to stores some configuration and system update packages |
Container Options: Default
Service Options
Advanced Options Compaction Frequency: How often the database is compacted. The options are Weekly (default) and Daily. Caution: Changing this setting can negatively affect the service. Use with caution. |
RAM needed per instance: 2.4 GB Number of instances: minimum 3, best All Service unit cost: 10 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Chronos
Job scheduler |
Container Options: Default
Service Options
|
RAM needed per instance: 712 MB Number of instances: minimum 1, best 1 Service unit cost: 1 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? Yes Single or multiple types? Single Scalable? Yes |
Data Lifecycle
Processes lifecycle policies |
Container Options: Default
Service Options
|
RAM needed per instance: 4 GB Number of instances: minimum 1; best depends on system load, number of active client objects, and daily rate of object deletion Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? Yes Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
Elasticsearch
Indexes metrics and event logs |
Container Options: Default
Service Options
|
RAM needed per instance: 10 GB Number of instances: minimum 3, best All Service unit cost: 25 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? Yes |
Grafana
Collects data and displays dashboard metrics |
Container Options: Default
Service Options
|
RAM needed per instance: 768 MB Number of instances: minimum 1, best 1 Service unit cost: 10 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Kafka
Handles metrics and event logs |
Container Options: Default
Service Options
|
RAM needed per instance: 2 GB Number of instances: minimum 3, best All Service unit cost: 5 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? Yes |
Key Management Server
Manages storage component encryption keys |
Container Options: Default
Service Options None. |
RAM needed per instance: 2 GB Number of instances: minimum 1, best 2 or more Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? Yes Single or multiple types? Single Scalable? Yes |
Logstash
Handles metrics and event logs |
Container Options: Default
Service Options
|
RAM needed per instance: 700 MB Number of instances: minimum 1, best 1 Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
MAPI Gateway
Serves MAPI endpoints |
Container Options: Default
Service Options
|
RAM needed per instance: 2 GB Number of instances: minimum 1, max 1 Service unit cost: 5 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Message Queue
Coordinates and distributes messages to other services |
Container Options: Default
Service Options
|
RAM needed per instance: 8 GB Number of instances: minimum 3, best 3 Service unit cost: 10 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
Metadata Cache
Cache for HCP for cloud scale metadata Note: This service is deprecated but cannot be removed. |
Container Options: Default
Service Options
|
RAM needed per instance: 1024 MB Number of instances: minimum 1, best 1 Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Persistent Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Metadata Coordination
Coordinates Metadata Gateway service instances and coordinates scaling and balancing of metadata partitions |
Container Options: Default
Service Options
|
RAM needed per instance: 4 GB Number of instances: minimum 1, best 1 Service unit cost: 5 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Metadata Gateway
Stores and protects metadata and serves it to other services |
Container Options: Default
Service Options
|
RAM needed per instance: 64 GB Number of instances: minimum 3, best All Service unit cost: 50 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
Metrics
Gathers metrics from all services and instances and supplies them to GUI and API |
Container Options: Default
Service Options
|
RAM needed per instance: 6 GB Number of instances: minimum 1, best 1 Service unit cost: 10 Stateful or stateless? Stateful Persistent or floating? Persistent Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Mirror In
Executes synch-from policies |
Container Options: Default
Service Options
|
RAM needed per instance: 4 GB Number of instances: minimum 1, best All Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
Mirror Out
Executes system synch-to policies |
Container Options: Default
Service Options
|
RAM needed per instance: 4 GB Number of instances: minimum 1, best All Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
Policy Engine
Executes system policies |
Container Options: Default
Service Options
|
RAM needed per instance: 4 GB Number of instances: minimum 3, best 3 Service unit cost: 25 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
S3 Gateway
Serves S3 API methods and communicates with storage components |
Container Options: Default
Service Options
HTTP Options
HTTPS Options
|
RAM needed per instance: 16 GB Number of instances: minimum 1, best All Service unit cost: 25 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
S3 Notifications
Executes S3 notifications |
Container Options: Default
Service Options
|
RAM needed per instance: 4 GB Number of instances: minimum 1, best All Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes (but not recommended on master instances) |
Tracing Agent
Listens for incoming tracing of S3 API and MAPI calls, batches them, and sends them to Tracing Collector service |
Container Options: Default
Service Options
|
RAM needed per instance: 2 GB Number of instances: minimum 1, best 1 Service unit cost: 1 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Tracing Collector
Collects traces from Tracing Agent service instances and stores them in tracing database |
Container Options: Default
Service Options
|
RAM needed per instance: 8 GB Number of instances: minimum 1, best 1 Service unit cost: 10 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
Tracing Query
UI and API endpoint access for distributed tracing for S3 API and MAPI calls |
Container Options: Default
Service Options
|
RAM needed per instance: 768 MB Number of instances: minimum 1, best 1 Service unit cost: 5 Stateful or stateless? Stateless Persistent or floating? Floating Supports volume configuration? No Single or multiple types? Single Scalable? Yes |
System services: These services manage system resources and ensure that the HCP for cloud scale system remains available and accessible. These services are persistent and cannot be moved, scaled, or reconfigured. | ||
Admin App
The System Management application |
Service Options
|
RAM needed per instance: N/A Number of instances: N/A Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Cluster Coordination
Manages hardware resource allocation |
None. |
RAM needed per instance: N/A Number of instances: N/A Persistent or floating? Persistent Supports volume configuration? No Single or multiple types? Single Scalable? No |
Cluster Worker
Agent for Cluster Coordination on each instance; reports on resource utilization and availability, deploys services |
None. |
RAM needed per instance: N/A Number of instances: N/A Service unit cost: 5 Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Network Proxy
Network request load balancer |
Security Protocol: Select which Transport Layer Security (TLS) versions to use:
SSL Ciphers: To use another cipher suite, type it here. Custom Global Configuration: Select Enable Advanced Global Configuration to enable adding custom parameters to the HAProxy "global" section. Custom Defaults Configuration: Select Enable Defaults Configuration to enable adding custom parameters to the HAProxy "global" section. |
RAM needed per instance: N/A Number of instances: N/A Service unit cost: 1 Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Sentinel
Runs internal system processes and monitors the health of other services |
Service Options
|
RAM needed per instance: N/A Number of instances: N/A Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Service Deployment
Handles deployment of high-level services (that is, the services that you can configure) |
None. |
RAM needed per instance: N/A Number of instances: N/A Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Synchronization
Coordinates service configuration settings and other information across service instances |
Service Options
|
RAM needed per instance: N/A Number of instances: N/A Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Watchdog
Responsible for initial system startup; monitors other System services and restarts them if necessary |
Service Options
|
RAM needed per instance: N/A Number of instances: N/A Service unit cost: 5 Persistent or floating? Persistent Supports volume configuration? Yes Single or multiple types? Single Scalable? No |
Viewing services
You can use the Admin App, CLI commands, or REST API methods to view the status of all services for the system.
Viewing all services
Procedure
-
To view the status of all services, in the Admin App, click Services.
For each service, the page shows:
- The service name
- The service state:
- Healthy: The service is running normally.
- Unconfigured: The service has yet to be configured and deployed.
- Deploying: The system is currently starting or restarting the service. This can happen when:
- You move the service to run on a completely different set of instances.
- You repair a service.
- Balancing: The service is running normally, but performing background maintenance.
- Under-protected: In a multi-instance system, one or more of the instances on which a service is configured to run are offline.
- Failed: The service is not running or the system cannot communicate with the service.
- CPU Usage: The current percentage CPU usage for the service across all instances on which it's running.
- Memory: The current RAM usage for the service across all instances on which it's running.
- Disk Used: The current total amount of disk space that the service is using across all instances on which it's running.
Viewing individual service status
Procedure
-
To view the detailed status for an individual service, select the service in the Services window.
In addition to status information, the window shows:
- Instances: A list of all instances on which the service is running.
- Volumes: To view a list of volumes used by the service, click the row for an instance in the Instances section.
- Network: [Internal|External]: Which network type this service uses to receive communications.
This section also displays a list of the ports that the service uses:
- Configuration settings: The settings you can configure for the service.
- Service Units: The total number of service units currently being spent to run this service. This value is equal to the service's service unit cost times the number of instances on which the service is running.
- Service unit cost: The number of service units required to run the service on one instance.
- Service Instance Types: For services that have multiple types, the types that are currently running.
- Instance Pool: For floating services, the instances that this service is eligible to run on.
- Events: A list of all system events for the service.
Related CLI commands
getService
listServices
Related REST API methods
POST /services/query
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Listing service ports
You can list service port information for ports available for customer use.
POST /public/discovery/get_service_port
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
Managing services
This section describes how you can reconfigure, restart, and otherwise manage the services running on your system.
Moving and scaling services
You can change a service to run on more instances, fewer instances, or different instances.
You can change a service to run on:
- Additional instances (for example, to improve service performance and availability)
- Fewer instances (for example, to free up resources on an instance for running other services)
- A different set of instances (for example, to retire the piece of hardware on which an instance is installed)
For floating services, instead of specifying the specific instances on which the service runs, you can specify a pool of eligible instances, any of which can run the service.
When moving or scaling a service that has multiple types, you can simultaneously configure separate rebalancing for each type.
- Monitor resource usage (CPU, RAM, disk) and services such as Prometheus regularly and adjust the scaling of services across instances as needed.
- Ensure that there are enough instances that the cluster can still manage the volume and growth of objects if one or even two instances fail.
- Distributing instances of product services onto master instances is discouraged. For more information refer to Best practices for system sizing and scaling.
- You cannot remove a service from an instance if doing so would cause or risk causing data loss.
- Service relocations can take a long time to complete and can impact system performance while they are running.
- Instance needs vary from service to service. Each service defines the minimum and maximum number of instances on which it can run.
Relocating services
Procedure
-
Select Services.
The Services page opens, displaying the services and system services. -
Select the service that you want to scale or move.
Configuration information for the service is displayed. -
Click Scale, and if the service has more than one type, select the instance type that you want to scale.
- The next step depends on whether the service is floating or persistent (non-floating).
-
If the service is a floating service, you are presented with options for configuring an instance pool. For example:
-
In the box Service Instances, specify the number of instances on which the service should be running at any time.
-
Configure the instance pool:
- For the service to run on any instance in the system, select All Available Instances.
With this option, the service can be restarted on any instance in the instance pool, including instances that were added to the system after the service was configured.
- For the service to run on a specific set of instances, clear All Available Instances. Then:
- To remove an instance from the pool, select it from the list Instance Pool, on the left, and then click Remove Instances.
- To add an instance to the pool, select it from the list Available Instances, on the right, and then click Add Instances.
- For the service to run on any instance in the system, select All Available Instances.
-
-
If the service is a persistent (non-floating) service, you are presented with options for selecting the specific instances that the service should run on. Do one or both of these, then click Next:
- To remove the service from the instances it's currently on, select one or more instances from the list Selected Instances, on the left, and then click Remove Instances.
- To add the service to other instances, select one or more instances from the list Available Instances, on the right, and then click Add Instances.
-
Click Update.
The Processes page opens, and the Service Operations tab displays the progress of the service update as "Running." When the update finishes, the service shows "Complete."
Next steps
Related CLI commands
updateServiceConfig
Related REST API methods
POST /services/configure
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Scaling Metadata Gateway instances
The HCP for cloud scale software lets you deploy an instance of the Metadata Gateway service on every node in your system. You can scale the number of instances up or down as needed.
The Metadata Coordination service manages Metadata Gateway scaling. The service does the following:
- Constantly monitors the Metadata Gateway service and balances data among Metadata Gateway instances as needed
- Moves data into new Metadata Gateway instances
- Moves data out of a Metadata Gateway instance set for removal
Use the System Management application to add new Metadata Gateway instances. You can add more than one instance at a time.
Use the System Management application to remove a Metadata Gateway instance. Before you scale down Metadata Gateway instances, consider the following:
- You can only remove a Metadata Gateway instance from the system when there is one or zero Metadata Gateway instances down.
NoteIf more than one instance is down, call Support to remove a Metadata Gateway instance.
- You cannot remove a Metadata Gateway instance when there are only three instances. You first need to add a new Metadata Gateway instance.
- You can only remove one Metadata Gateway instance at a time.
If a Metadata Gateway instance is down, the data in this instance becomes underprotected. To resolve this situation, remove the Metadata Gateway instance that is down so that the Metadata Gateway service can recover the data protection. You should first add a new Metadata Gateway instance before removing the instance that is down. This ensures that the system keeps the same performance and capacity usage and also that there is a suitable target instance to recover the data protection. When removing the Metadata Gateway instance, the considerations on scaling down services apply.
A snapshot shows the current state of the state machine from a leader node (service instance) to any follower service instance that is out of synch. If a leader node runs out of space to store snapshots and can't send out its latest snapshot, the follower node cannot resynchronize. if this happens, bring down the leader service instance, increase its storage space, and restart the service.
Configuring service settings
You can configure settings for some of the services that the system runs.
Configuring service settings
-
Select Dashboard > Services.
-
Select the service you want to configure.
-
On the Configuration tab, configure the service.
-
Click Update.
Related CLI commands
updateServiceConfig
Related REST API methods
POST /services/configure
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Repairing services
If a service becomes slow, unresponsive, or shows a status of Failed, you can repair it. If you change the configuration of a service you use the same process to restart it.
Repairing a service stops and restarts the service on each instance on which it's running.
If you change the cluster name (cluster hostname), you must repair the S3 Gateway services for the change to take effect.
If you regenerate or upload an SSL certificate, you must repair the S3 Gateway and MAPI services for the change to take effect.
If you upload an SSL certificate for access to a remote system for bucket synchronization, you must repair the Policy Engine and MAPI services for the change to take effect.
Procedure
-
Select Dashboard > Services.
-
Select the service you want to repair.
-
Click Repair.
The Processes window opens, displaying a progress bar for the repair process.
Configuring TLS cipher suite
Procedure
-
Select Dashboard > Services.
-
Select the service S3-Gateway.
-
On the Configuration tab, in the HTTPS Options section, enter the new cipher suite in the field SSL Ciphers.
-
Click Update.
The service redeploys.
Avoiding Message Queue shutdown
If two of the three Message Queue service instances fail, the service shuts down. To avoid the possible loss of queued messages, resolve any situation in which only two service instances are running.
To protect messaging consistency, the Message Queue service always has three service instances. To prevent being split into disconnected parts, the service shuts down if half of the service instances fail. In practice, messaging stops if two of the three instances fail.
Do not let the service run with only two instances, because in that scenario if one of the remaining instances fails, the service shuts down. However, when one of the failed instances restarts, messaging services recover and resume.
To protect the Message Queue service, immediately address a node failure where an instance cannot be restarted, because if two service instances are lost and cannot be recovered, the service cannot recover its previous state. You can still add new instances to form a new cluster, but messages that were queued are lost.
In the case of such a multi-node failure, after the Message Queue service cluster is re-formed, the best practice is to restart the Policy Engine service instances, and, if used, the Mirror In, Mirror Out, and S3 Notifications microservice instances, one at a time. This forces the service instances to recover configurations that might have been missed while the Message Queue service was down. Additionally, after the Message Queue service cluster is re-formed, bucket sync-to events that were in the messaging queues are lost, so you might need to regenerate bucket sync-to events for such objects.
The cluster forms based on instance names, including the IP address of the node on which an instance runs. Therefore, changing node configurations such as IP addresses can cause nodes to be permanently removed from the cluster, possibly triggering a shutdown. If this happens, first add instances to the messaging service. Ensure the instances synchronize with the cluster before taking nodes offline or changing node configurations such as IP addresses. This way, the cluster can always keep over half of its instances running.
Avoiding service underscaling
If services are underscaled for the usage of the product, responsiveness or performance can suffer. Scaling up services or nodes can alleviate problems.
A service is underscaled when it has fewer than the required or sufficient number of service instances running.
The main symptom of underscaling is that performance degrades. You can identify underscaling by monitoring certain metrics.
The Data Lifecycle service manages, among other things, backend garbage collection. An object is deleted in a multistep process: the object metadata is replaced by a tombstone marker, the object itself is deleted from backend storage, and eventually, as per your retention policy, the tombstone is deleted as well. For a system used for rapid creation and deletion of objects, sometimes called a ring-buffer use case, underscaling might manifest itself as storage components using more capacity than expected because the cleanup policy is falling behind. You can monitor Data Lifecycle performance using the metric lifecycle_policy_concurrency or its rate per minute (rate(lifecycle_policy_concurrency[1m])
) to show how many objects are being concurrently processed per lifecycle type. This should be zero if the policies are not running. If the metric continues to increase over time, the service might be underscaled.
The S3 Gateway service is designed to handle a set number of concurrent requests. If your workflow exceeds its capacity S3 requests can be delayed. You can monitor S3 Gateway performance using the metric http_s3_servlet_operations_total or its rate per minute (rate(http_s3_servlet_operations_total[1m])
) to show how many operations are being completed.
The Metadata Gateway service manages metadata partitions. If partition counts become very high, as can happen on systems storing large numbers of objects or objects deleted after long retention periods, performance can degrade. You can monitor partitions using the metric mcs_partitions_per_instance.
Solutions to an underscaled service include:
- Distributing the service onto more nodes (physical instances)
- Installing additional nodes and distributing service instances onto them
For example, scaling up to two S3 Gateway service instances doubles the capacity for S3 request processing. Scaling up the Metadata Gateway service lets the system load-balance partitions from heavily burdened nodes to unburdened nodes and smooths performance. Scaling up the Data Lifecycle service provides additional processing capacity for object lifecycle management and speeds the cleanup of deleted objects.
Service restart after internal network interruption
If the internal network is disconnected in one HCP for cloud scale node and then restored, the services Cassandra, Elasticsearch, and Kafka might not come back online automatically on the node, leaving the system underprotected.
If this happens, restart the hcpcs service on the node.