Monitoring
Your system gives a number of mechanisms that allow you to monitor the health and performance of the system and all of its instances and services.
Monitoring instances
The Instances page lets you monitor instances (nodes) in the system. You can use the Admin App, CLI commands, or REST API methods to view a list of all instances in the system.
Viewing all instances
To view all instances, in the Admin App, click Dashboard > Instances.
The page shows all instances in the system. Each instance is identified by its IP address.
This table describes the information shown for each instance.
Property | Description |
State |
|
Services | The number of services running on the instance. |
Service Units |
The total number of service units for all services and job types running on the instance, out of the best-practice service unit limit for the instance. An instance with a higher number of service units is likely to be more heavily used by the system than an instance with a lower number of service units. The Instances page displays a blue bar for instances running less than the best-practice service unit limit. The Instances page displays a red bar for instances running more than the best-practice service unit limit. |
Load Average | The load averages for the instance for the past one, five, and ten minutes. |
CPU | The sum of the percentage utilization for each CPU core in the instance. |
Memory Allocated |
This section shows both:
|
Memory Total | The total amount of RAM for the instance. |
Disk Used | The current amount of disk space that your system is using in the partition on which it is installed. |
Disk Free | The amount of free disk space in the partition in which your system is installed. |
Viewing the services running on an instance
To view the services running on an individual instance, in the Admin App:
Procedure
Click Dashboard > Instances.
Select the instance you want.
The page lists all services running on the instance.
For each service, the page shows:
- The service name
- The service state:
- Healthy: The service is running normally.
- Unconfigured: The service has yet to be configured and deployed.
- Deploying: The system is currently starting or restarting the service. This can happen when:
- You move the service to run on a completely different set of instances.
- You repair a service.
- Balancing: The service is running normally, but performing background maintenance.
- Under-protected: In a multi-instance system, one or more of the instances on which a service is configured to run are offline.
- Failed: The service is not running or the system cannot communicate with the service.
- CPU Usage: The current percentage CPU usage for the service across all instances on which it's running.
- Memory: The current RAM usage for the service across all instances on which it's running.
- Disk Used: The current total amount of disk space that the service is using across all instances on which it's running.
Related CLI commands
getInstance
listInstances
Related REST API methods
GET /instances
GET /instances/{uuid}
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Monitoring services
The Services page lets you view information about service instances. You can use the Admin App, CLI commands, or REST API methods to view the status of all services for the system.
Viewing all services
To view the status of all services, in the Admin App, click Services.
For each service, the page shows:
- The service name
- The service state:
- Healthy: The service is running normally.
- Unconfigured: The service has yet to be configured and deployed.
- Deploying: The system is currently starting or restarting the service. This can happen when:
- You move the service to run on a completely different set of instances.
- You repair a service.
- Balancing: The service is running normally, but performing some background maintenance operations.
- Under-protected: In a multi-instance system, one or more of the instances on which a service is configured to run are offline.
- Failed: The service is not running or the system cannot communicate with the service.
- CPU Usage: The current percentage CPU usage for the service across all instances on which it's running.
- Memory: The current RAM usage for the service across all instances on which it's running.
- Disk Used: The current total amount of disk space that the service is using across all instances on which it's running.
Viewing individual service status
To view the detailed status for an individual service, select the service on the Services page.
In addition to the status information, the page shows:
- Instances: A list of all instances on which the service is running.
- Volumes: To view a list of volumes used by the service, select the row for an instance in the Instances section.
- Network: [Internal|External]: Which network type this service uses to receive communications.
This section also displays a list of the ports that the service uses.
- Configuration settings: The settings you can configure for the service.
- Service Units: The total number of service units currently being spent to run this service. This value is equal to the service's service unit cost times the number of instances on which the service is running.
- Service unit cost: The number of service units required to run the service on one instance.
- Service Instance Types: For services that have multiple types, the types that are currently running.
- Instance Pool: For floating services, the instances that this service is eligible to run on.
- Events: A list of all system events for the service.
Related CLI commands
getService
listServices
Related REST API methods
POST /services/query
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Monitoring processes
The Processes page lets you view information about what the system is doing. This includes any service operations you started and any internal maintenance processes the system needs to run.
Monitoring service operations
You can use the Admin App, CLI commands, or REST API methods to monitor all service operations. These include:
- The initial deployments of services when the system was installed.
- Service relocations that you begin.
For each one, the system shows:
- The name of the service involved
- The status of the operation
- The number of steps completed out of the total number of steps
Procedure
Select Dashboard > Processes.
Results
Related CLI commands
listSystemTasks
getSystemTask
Related REST API methods
GET /tasks/system
GET /tasks/system/{uuid}
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Monitoring system processes
You can use the Admin App, CLI commands, or REST API methods to view the progress of internal system processes. These processes include package installation tasks and regularly scheduled system maintenance activities such as log rotation.
For each process, your system shows:
- The process name
- The process state
- The times at which each step in the process run occurred
Procedure
In the Admin App, select Processes.
To view the currently running processes, select the System tab.
To view the scheduled processes, select the Scheduled tab.
Related CLI commands
listSystemTasks
getSystemTask
Related REST API methods
GET /tasks/system
GET /tasks/system/{uuid}
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Monitoring objects
You can use the REST API to configure and generate chargeback reports for objects on the system. Users can generate a report for one or more of the buckets they own. An administrator can generate a report for a user or a list of one or more buckets.
Generating a system chargeback report
You can use a REST API method to generate a system chargeback report. You can display a report for a specific user or a list of one or more buckets.
Related REST API methods
POST /chargeback/system/get_report
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
Generating a user chargeback report
You can use a REST API method to generate a chargeback report for a user. Users can display a report for a specific bucket, a list of buckets, or all buckets that they own.
Related REST API methods
POST /chargeback/user/get_report
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
System events
Your system maintains a log of system events that you can view using the Admin App, CLI commands, or REST API methods.
Procedure
To view all system events, in the Admin App, click Events.
Related CLI commands
queryEvents
To view events through the CLI, your requests need to specify which events you want to retrieve.
For example, this JSON request body searches the event log for all events that have a
severity level of warning
:
{ "severities": [ "warning" ] }
Related REST API methods
POST /events
To view events through the REST API, your requests need to specify which events you want to retrieve.
For example, this JSON request body searches the event log for all events that have a
severity level of warning
:
{ "severities": [ "warning" ] }
You can get help on specific REST API methods for the Admin App at REST API - Admin.
HCP for cloud scale events
Most events are generated by and reported through the Object Storage Management application.
Events are written to syslog. Additionally, alerts corresponding to some events are displayed in the HCP for cloud scale applications.
The following table lists HCP for cloud scale events.
ID | Severity | Message | Description |
1109 | WARNING | Installation of package package failed: reason | The installation of the specified package failed for the specified reason. |
2004 | SEVERE | instance instance with IP ip_address is error. | |
2005 | WARNING | Instance with IP ip_address value is at usage. | |
2006 | SEVERE | Instance with IP ip_address value is at usage. | |
3002 | WARNING | Low-level service_name service on instance instance exited abnormally. Restarting. | The specified service exited abnormally and is restarting. |
5213 | WARNING | A certificate in the SSL server certificate chain for this system expires soon. If the certificate chain expires, users won't be able to access the system. | This event applies only to system certificates, not client (storage component) certificates. |
5214 | WARNING | The SSL server certificate chain for this system contains an expired certificate. Users cannot access the system until the certificate chain is replaced. | This event applies only to system certificates, not client (storage component) certificates. |
6001 | WARNING | Service service is balancing. | |
6002 | WARNING | Service service is under-protected. | The number of service instances has fallen below the required minimum. |
6003 | SEVERE | Service service has failed. | |
6006 | INFO | Service Information: Default Retention configuration policy_name bucket 'bucket_name' | The default retention policy policy_name for the specified bucket has been updated. |
6006 | INFO | Service Information: Failed to Retrieve Storage Capacity Information | The system could not retrieve capacity information from storage component id. Verify the storage component configuration. |
6006 | INFO | Service Information: Lifecycle policy {CREATE | UPDATE | DELETE} bucket 'bucket_name' | The lifecycle policy for the specified S3 bucket has been either created, updated, or removed. |
6066 | INFO | Service Information: Lifecycle policy deleted for bucket 'bucket_name' | The lifecycle policy for the specified S3 bucket has been removed. |
6066 | INFO | Service Information: Notifications configuration notification_rule bucket 'bucket_name' | Bucket notification has been updated. |
6066 | INFO | Service Information: Replication policy policy_name 'bucket_name' | The replication policy policy_name has been updated for the specified bucket. |
6066 | INFO | Service Information: Replication policy deleted for bucket 'bucket_name' | Bucket replication has been stopped. |
6006 | INFO | Service Information: S3 Encryption setting updated to value | The S3 encryption setting has been updated to the specified value. |
6006 | INFO | Service Information: Serial number updated to value | The HCP for cloud scale serial number has been changed to the specified value. |
6006 | INFO | Service Information: setting_name was set to value | The specified S3 setting has been changed to the specified value. If this was intended no action is needed. |
6006 | INFO | Service Information: Single Storage Component Available Capacity Low | The available capacity for object data of storage component id is now below the specified value. You might need additional capacity. |
6006 | INFO | Service Information: Storage component 'id' created | The storage component id has been created. |
6006 | INFO | Service Information: Storage component 'id' is now state | The specified storage component is in one of the following states:
|
6006 | INFO | Service Information: Storage component 'id' updated: configuration | The specified storage component has been updated. configurationlists the changes. |
6006 | INFO | Service Information: System Available Capacity Low | The available capacity for object data of the system is now below the specified value. You might need to plan for additional capacity. |
6007 | WARNING | Certificate for SubjectDN dn will expire in n days | The SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to synchronize to or from the target. You might need to obtain a new client certificate. |
6007 | WARNING | Service Warning: Certificate for Storage component 'id' is about to expire in 'n' days | The SSL certificate for the specified storage component is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to read from or write to the storage component. You might need to obtain a new certificate. |
6007 | WARNING | Service Warning: Metadata-Coordination cannot communicate with Sentinel service to get state information | The Metadata Coordination service can't communicate with the Sentinel service. |
6007 | WARNING | Service Warning: Storage component 'id' is now INACCESSIBLE | The specified storage component is inaccessible.HCP for cloud scale cannot read from or write to the storage component. |
6008 | SEVERE | Certificate for SubjectDN dn expired on dd-mmm-yyyy | The SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) had expired. HCP for cloud scale cannot synchronize to or from the target. You must obtain a new client certificate. |
6008 | SEVERE | Service Error: Storage Component Certificate has expired. | The SSL certificate for a storage component has expired. HCP for cloud scale cannot read from or write to the storage component. You must obtain a new certificate. |
6008 | SEVERE | Service Error: There is a critical issue with the Metadata Gateway database. Shutting down the Metadata Gateway Service. | |
6008 | SEVERE | Service Error: The vault service cannot be reached. | No connection to the active vault node can be established. |
6008 | SEVERE | Service Error: The vault service has a node that we can't connect to. Node IP: ip_address | One of the vault nodes can't be reached. If other active nodes are available service continues, but attend to this issue immediately. |
6008 | SEVERE | Service Error: The vault service has a sealed node. Please unseal it using the unseal keys you obtained when you turned on encryption. Node IP: ip_address | One of the vault nodes is sealed. If other active nodes are available service continues, but attend to this issue immediately. Unseal it using the unseal keys you obtained when you turned on encryption. |
6008 | SEVERE | Service Error: Vault Service Completely Sealed. Please unseal it using the unseal keys you obtained when you turned on encryption | All nodes of the vault service (Key Management Server service) are sealed. Unseal using the unseal keys you obtained when you turned on encryption. |
8001 | WARNING | Starting update from version to version. | |
8002 | WARNING | Update in progress from version to version. | |
8003 | SEVERE | Update from version to version prechecks failed. | Update failed because a pre-update verification failed. |
8004 | SEVERE | Update from version to version failed. | The update failed. |
8007 | WARNING | Completed update from version to version. | The update succeeded. |
9001 | WARNING | Signal Source source failed. Reason. | |
9002 | WARNING | Workflow workflow is not running and will be restarted. | The Monitor-App workflow is not running. It will be restarted. |
9003 | WARNING | The Monitor-App is not processing data fast enough. Dashboard data is more than n minutes behind the latest data from the source. While in this state, monitors might not be triggered or might be triggered unexpectedly. If this alert persists, the system might be undersized. Consider adding more instances. | The Monitor-App is starting to fall behind. |
9004 | SEVERE | The Monitor-App is not processing data fast enough. Dashboard data is more than n minutes behind the latest data from the source. While in this state, monitors might not be triggered or might be triggered unexpectedly. If this alert persists, the system might be undersized. Consider adding more instances. | The Monitor-App has fallen behind. |
Alerts
Alert messages notify you of situations that need attention. Alerts can have a severity of Info, Warning, Severe, or Critical. You can view system alerts through the Admin App, CLI, or REST API, and storage component alerts through the Object Storage Management app.
Each alert corresponds to a system event.
Severity | Alert Description | Action |
Severe | Instance ip-address disk usage severe threshold |
The specified instance has less than 10% free disk space. Add additional storage to the instance. Important: If an instance runs out of disk space, the system can become unresponsive. |
Severe | Master Instance ip-address is down |
Do one of these:
|
Severe | Service is down |
Verify the health of the instances. If one is down, do one of these:
Otherwise, if the instances are healthy and the problem persists, contact Support. |
Severe | Worker Instance ip-address is down |
Do one of these:
|
Warning | Instance ip-address disk usage warning threshold |
The specified instance has less than 25% free disk space. Add additional storage to the instance. Important: If an instance runs out of disk space, the system can become unresponsive. |
Warning | Package installation failed |
Your system failed to install a package that you uploaded. |
Warning | Service below recommendation |
The service is currently running on fewer than the minimum number of instances. Configure this service to run on additional instances. |
Warning | Service under-protected |
A service has lost redundancy; that is, one or more instances on which that service is running are unresponsive. Verify the health of the instances. If one is down, do one of these:
Otherwise, if the instances are healthy and the problem persists, contact Support. |
Warning | SSL server certificate chain expires soon |
A certificate in the SSL server certificate chain for this system expires soon. If the certificate chain expires, users can't access the system. |
Warning | SSL server certificate chain expired |
The SSL server certificate chain for this system contains an expired certificate. Users cannot access the system until the certificate chain is replaced. |
Info | Package installation in progress |
Your system is currently installing a package that you uploaded. Depending on the contents of the package, this might take a while. |
Warning | The certificate for the storage component (storage-id) is about to expire in n days | Renew the storage component certificate. |
Info | The storage component (storage-id) is unavailable | Verify that the storage component ID is correct and valid and that the storage component is active. |
Info | Update migration in progress. Current state: state | Update migration from a version before v2.3 is in progress. The state of migration is either OLD , MIGRATING , or CLEANUP . |
Severity | Message | Description |
Warning | Available capacity is below n {% | bytes} in the system for object data | The free capacity on the HCP for cloud scale system has fallen below the specified threshold (either a percentage of the total or a byte value). |
Warning | Certificate for Storage component id is about to expire in n days | The SSL certificate for the storage component id is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to read from or write to the storage component. |
Warning | Storage component id is now inaccessible | The storage component id is in the state INACCESIBLE. HCP for cloud scale cannot read from or write to the storage component. |
Severe | Certificate for Storage component id expired | The SSL certificate for the storage component id has expired. HCP for cloud scale cannot read from or write to the storage component. Install a new certificate. |
Severe | Error communicating with a vault node. Node IP: ip_address | One of the vault nodes can't be reached. If other active nodes are available service continues, but attend to this issue immediately. Examine the vault instance logs to determine the cause of this issue. |
Severe | Failed to connect to KMS server | One of the vault nodes can't be reached. If other active nodes are available service continues, but attend to this issue immediately. If ingest is halted, then investigate why the KMS service is failing to run on all nodes. If ingest is still working, the original active node has failed over. Examine the vault instance logs to determine the cause of the failure. |
Severe | Failed to connect to KMS server as it is completely sealed | The vault service (Key Management Server service) is completely sealed. Unseal it using the unseal keys you obtained when you turned on encryption. |
Severe | Service error: There is a critical issue with the Metadata Gateway database. Shutting down the Metadata Gateway Service. |
A Metadata Gateway instance has encountered an issue and shut down. Use the System Management Services function Repair to restart it. If restarting the service doesn't resolve the issue, contact Support. |
Severe | Vault node is sealed. Node IP: ip_address | One of the vault nodes is sealed. If other active nodes are available service continues, but attend to this issue immediately. Unseal it using the unseal keys you obtained when you turned on encryption. |
Critical | Available capacity is below n {% | bytes} in Storage component id | The free capacity on the named HCP S Series Node storage component has fallen below the specified threshold (either a percentage of the total or a byte value). |
Critical | Failed to connect to KMS server | The Key Management System service is not available. Until the service is available, data on encrypted storage components can't be read or written. When KMS service restarts, if there is only one active instance log in to HCP for cloud scale on port 8200 and provide unseal keys to reopen the vault. |
Critical | Failed to retrieve capacity usage from Storage component id | System can't retrieve metrics from an HCP S Series Node storage component. Possible reasons are:
|
Critical | Failed verification for retrieved encryption key for StorageComponent_ID{uuid=uuid} | The encryption key returned from the Key Management System server doesn't match the key for the storage component uuid. Verify that the KMS service is available. If the service is available, verify that you have provided the service with a quorum of unseal keys. If objects on the storage component still can't be read, contact Support. |
Critical | Metadata-Coordination cannot communicate with Sentinel service to get state information | The Sentinel service is not responding to requests for state information. Using the System Management application, immediately review the health of the Metadata-Coordination and Sentinel services and ensure that the Sentinel container has adequate heap size for the configuration of the cluster. |
Severity | Message | Description |
Warning | Certificate for SubjectDN dn will expire in n days | The SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to synchronize to or from the target. You might need to obtain a new client certificate. |
Severe | Certificate for SubjectDN dn expired on dd-mmm-yyyy | The SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) had expired. HCP for cloud scale cannot synchronize to or from the target. You must obtain a new client certificate. |
Viewing alerts
Procedure
To view alerts, click the user icon (
) in the top right corner of each Admin App page and then click Notifications.
Object Storage Management application instructions
The Object Storage Management application displays alerts about storage components. If an alert is raised the alert icon displays a badge with the number of active alerts. For example:
Click the icon to display a window listing alert text.
Related CLI commands
listAlerts
Related REST API methods
GET /alerts
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Related REST API methods
POST /alert/list
For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.
Email notification rules
For the system to send email notifications, you need to create a rule that specifies who to email, what email server to use, what events to send emails about, and what information to include in email messages.
- Enable: Turns on email notifications.
- Host: The hostname or IP address of the email server.
- Port: The port on which the email server listens for email messages.
- Security: The security protocol used by the email server (SSL or STARTTLS) or None if the email server doesn’t use a security protocol.
- Authenticated: Enable this if the email server needs authentication, then specify:
- In the Username field, the username for an email account that’s authorized to establish the connection between the system and the email server.
- In the Password field, the password for the email account.
You use the email notification message settings to configure a template for formatting all email notifications sent by the system.
- From: The email address from which you want email notifications to be sent.
- Subject: The email subject.
- Body: The email message body.
This table lists the variables you can use to make the email notification template. When the system sends an email notification, it replaces the variables in the notification with event-specific information.
Variable | Description |
$severity | Event severity: INFO, WARNING, or SEVERITY. |
$subject | A short description of the event. |
$message | Event message text. |
$userName | Name of the user responsible for the event. |
$objectId | Unique identifier for component affected by the event. |
$subsystem | Category for the component affected by the event. |
$objectSourceId | Unique identifier of the internal system component or process that was the source of the event. Value is [unknown] for most events. |
- Email addresses: A comma-separated list of email addresses to send notification emails to.
- Severity Filter: The event severities about which to send email notifications. Can be one or more of these: INFO, WARNING, SEVERITY.
Creating email notification rules
Procedure
Select Dashboard > Configuration.
Click Notifications.
Click Create.
In the Type field, select Email.
Type a name for the notification rule.
Under SMTP settings, click Enable to enable the rule.
Configure the SMTP and message settings for the notification rule.
Specify a comma-separated list of emails to send notifications to.
Click Create.
Related CLI commands
createNotificationRule
Related REST API methods
POST /notifications
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Creating syslog notification rules
When you create a syslog notification rule, the system sends log messages to your syslog server for each applicable system event.
- Enable: Turns on syslog notifications
- Host: The hostname or IP address of the syslog server
- Port: The port on which the syslog server listens for log messages
- Facility: Category for the messages sent by this notification rule
You use the syslog notification message settings to configure a template for formatting all syslog notifications sent by this notification rule.
- Message: The message to send. You can use these variables as part of the message:
Variable Description $severity Event severity: INFO, WARNING, or SEVERITY $subject A short description of the event $message Event message text $time Time at which the event occurred $userName Name of the user responsible for the event $subsystem Category for the component affected by the event $objectId Unique identifier for component affected by the event $objectType The type of the component affected by the event. $objectSourceId Unique identifier of the internal system component or process that was the source of the event. Value is [unknown] for most events. $objectSourceType Type of the internal system component or process that was the source of the event. Value is [unknown] for most events. - Sender Identity: Identity of the sender for the event. Sent with every syslog message.
The event severities about which to send email notifications. Can be one or more of these: INFO, WARNING, or SEVERITY.
Creating syslog notification rules
Procedure
Select Dashboard > Configuration.
Click Notifications.
Click Create.
In the Type field, select Syslog.
Type a name for the notification rule.
Under Syslog settings, click Enable to enable the rule.
Configure the settings for the notification rule.
Specify a severity filter for the notification rule.
Click Create.
Related CLI commands
createNotificationRule
Related REST API methods
POST /notifications
You can get help on specific REST API methods for the Admin App at REST API - Admin.
Logs and diagnostic information
Each service maintains its own set of logs. By default, log files are maintained in the folder install_path/hcpcs/log
on each instance in the system. During installation, you can configure each service to store its logs in a different (that is, non-default) location.
Log levels
The following table lists the available log levels.
Level | Levels included |
ALL | FATAL, ERROR, WARN, INFO, DEBUG, TRACE |
TRACE | FATAL, ERROR, WARN, INFO, DEBUG, TRACE |
DEBUG | FATAL, ERROR, WARN, INFO, DEBUG |
INFO | FATAL, ERROR, WARN, INFO |
WARN | FATAL, ERROR, WARN (default) |
ERROR | FATAL, ERROR |
FATAL | FATAL |
OFF | None |
Log management
You can manage any of the log files yourself. That is, you can delete or archive them as necessary.
System logs are managed automatically in these ways:
- Retirement: All log files are periodically added to a compressed file and moved to install_path/hcpcs/retired/. This occurs at least once a day, but can also occur:
- Whenever you run the
log_download
script. - Hourly, if the system instance's disk space is more than 60% full.
- At the optimum time for a specific service.
- Whenever you run the
- Rotation: When a log file grows larger than 10MB in size, the system stops writing to that file, renames it, and begins writing to a new file. For example, if the file
exampleService.log.0
grows to 10 MB, it is renamed toexampleService.log.1
and the system creates a new file namedexampleService.log.0
to write to. - Removal: When a log file becomes older than 90 days, it is removed. If the system instance's disk space is more than 70% full, log files are deleted when they become older than one day.
- When an optimum number of log files for a specific service is reached, the system can overwrite the oldest file. For example, if a service is limited to 20 log files, when the file
exampleService.log.19
is filled, the system overwrites the file namedexampleService.log.0
.
Retrieving logs and diagnostic information
The tool log_download
lets you easily retrieve logs and diagnostic information from all instances in the system. This tool is located at this path on each instance:
install_path/hcpcs/bin/log_download
For information about running the tool, use this command:
install_path/hcpcs/bin/log_download -h
- When using the tool
log_download
, if you specify the option--output
, do not specify an output path that contains colons, spaces, or symbolic links. If you omit the option--output
, you cannot run the script from within a folder path that contains colons, spaces, or symbolic links. - When you run the script
log_download
, all log files are automatically compressed and moved to the folder install_path/hcpcs/retired/. - If an instance is down, you need to specify the option
--offline
to collect the logs from that instance. If your whole system is down, you need to run the script log_download with the option--offline
on each instance.
Default log locations
By default, each service stores its logs on each instance on which the service instance runs, in its own folder at this path:
install_path/hcpcs/log
This table shows the default log folder names for each service. Depending on how your system was configured when first deployed, your system's logs might not be stored in these folders.
Service | Default log folder name | Contains information about |
Admin-App | com.hds.ensemble.plugins.service.adminApp | The System Management application. |
Database | com.hds.ensemble.plugins.service.cassandra |
|
Scheduling | com.hds.ensemble.plugins.service.chronos | Workflow task scheduling. |
N/A | com.hds.ensemble.plugins.service.containerAction | Created by custom actions run by service plugins. |
Metrics | com.hds.ensemble.plugins.service.elasticsearch | The storage and indexing of:
|
Network-Proxy | com.hds.ensemble.plugins.service.haproxy | Network requests between instances. |
Message Queue | com.hds.ensemble.plugins.service.kafka | The transmission of data between instances. |
Logging | com.hds.ensemble.plugins.service.logstash | The transport of system events and workflow task metrics to the Metrics service. |
Service-Deployment | com.hds.ensemble.plugins.service.marathon | The deployment of high-level services across system instances. High-level services are the ones that you can move and configure, not the services grouped under System Services. |
Cluster-Worker | com.hds.ensemble.plugins.service.mesosAgent | The work ordered by the Cluster-Coordination service. |
Cluster-Coordination | com.hds.ensemble.plugins.service.mesosMaster | Hardware resource allocation. |
Watchdog | com.hds.ensemble.plugins.service.remoteAction | Internal system processes. |
Sentinel | com.hds.ensemble.plugins.service.sentinel | The internal system processes. |
Watchdog | com.hds.ensemble.plugins.service.watchdog | General diagnostic information. |
Synchronization | com.hds.ensemble.plugins.service.zookeeper | The coordination of actions and database activities across instances. |
S3-Gateway | com.hitachi.aspen.foundry.service.clientaccess.data | The client access data service. |
Data-Lifecycle | com.hitachi.aspen.foundry.service.data-lifecycle.service | The data lifecycle service. |
Tracing-Agent | com.hitachi.aspen.foundry.service.jaeger.agent | The tracing agent service. |
Tracing-Collector | com.hitachi.aspen.foundry.service.jaeger.collector | The tracing collector service. |
Tracing-Query | com.hitachi.aspen.foundry.service.jaeger.query | The tracing query service. |
MAPI-Gateway | com.hitachi.aspen.foundry.service.mapi.gateway | The management API gateway. |
Policy-Engine | com.hitachi.aspen.foundry.service.metadata.async.policy.engine | The metadata asynchronous policy engine. |
Metadata-Cache | com.hitachi.aspen.foundry.service.metadata.cache | The metadata cache. |
Metadata-Coordination | com.hitachi.aspen.foundry.service.metadata.coordination | Metadata coordination. |
Metadata-Gateway | com.hitachi.aspen.foundry.service.metadata.gateway | The metadata gateway. |
Telemetry-Service | com.hitachi.aspen.foundry.service.metrics.prometheus | Telemetry. |
Message-Queue | com.hitachi.aspen.foundry.service.rabbitmq.server | The message broker. |
Key-Management-Server | com.hitachi.aspen.foundry.service.vault.vault | The key management server. |