Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Monitoring

Your system gives a number of mechanisms that allow you to monitor the health and performance of the system and all of its instances and services.

Monitoring instances

The Instances page lets you monitor instances (nodes) in the system. You can use the Admin App, CLI commands, or REST API methods to view a list of all instances in the system.

Viewing all instances

To view all instances, in the Admin App, click Dashboard > Instances.

The page shows all instances in the system. Each instance is identified by its IP address.

GUID-F6C9E700-DA8E-4C87-9084-8BD9DA87D8B1-low.png

This table describes the information shown for each instance.

PropertyDescription
State
  • Up: The instance is reachable by other instances in the system.
  • Down: The instance cannot be reached by other instances in the system.
ServicesThe number of services running on the instance.
Service Units

The total number of service units for all services and job types running on the instance, out of the best-practice service unit limit for the instance.

An instance with a higher number of service units is likely to be more heavily used by the system than an instance with a lower number of service units.

The Instances page displays a blue bar for instances running less than the best-practice service unit limit.

The Instances page displays a red bar for instances running more than the best-practice service unit limit.

GUID-701CFEF4-B49C-4DCD-8B83-2FA6EB8A8D03-low.png

Load AverageThe load averages for the instance for the past one, five, and ten minutes.
CPUThe sum of the percentage utilization for each CPU core in the instance.
Memory Allocated

This section shows both:

  • The amount of RAM on the instance that's allocated to all services running on that instance.
  • The percentage of this allocated RAM to the total RAM for the instance.
Memory TotalThe total amount of RAM for the instance.
Disk UsedThe current amount of disk space that your system is using in the partition on which it is installed.
Disk FreeThe amount of free disk space in the partition in which your system is installed.

Viewing the services running on an instance

To view the services running on an individual instance, in the Admin App:

Procedure

  1. Click Dashboard > Instances.

  2. Select the instance you want.

    The page lists all services running on the instance.

    For each service, the page shows:

    • The service name
    • The service state:
      • Healthy: The service is running normally.
      • Unconfigured: The service has yet to be configured and deployed.
      • Deploying: The system is currently starting or restarting the service. This can happen when:
        • You move the service to run on a completely different set of instances.
        • You repair a service.
      • Balancing: The service is running normally, but performing background maintenance.
      • Under-protected: In a multi-instance system, one or more of the instances on which a service is configured to run are offline.
      • Failed: The service is not running or the system cannot communicate with the service.
    • CPU Usage: The current percentage CPU usage for the service across all instances on which it's running.
    • Memory: The current RAM usage for the service across all instances on which it's running.
    • Disk Used: The current total amount of disk space that the service is using across all instances on which it's running.

Related CLI commands

getInstance

listInstances

Related REST API methods

GET /instances

GET /instances/{uuid}

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Monitoring services

The Services page lets you view information about service instances. You can use the Admin App, CLI commands, or REST API methods to view the status of all services for the system.

Viewing all services

To view the status of all services, in the Admin App, click Services.

For each service, the page shows:

  • The service name
  • The service state:
    • Healthy: The service is running normally.
    • Unconfigured: The service has yet to be configured and deployed.
    • Deploying: The system is currently starting or restarting the service. This can happen when:
      • You move the service to run on a completely different set of instances.
      • You repair a service.
    • Balancing: The service is running normally, but performing some background maintenance operations.
    • Under-protected: In a multi-instance system, one or more of the instances on which a service is configured to run are offline.
    • Failed: The service is not running or the system cannot communicate with the service.
  • CPU Usage: The current percentage CPU usage for the service across all instances on which it's running.
  • Memory: The current RAM usage for the service across all instances on which it's running.
  • Disk Used: The current total amount of disk space that the service is using across all instances on which it's running.

Viewing individual service status

To view the detailed status for an individual service, select the service on the Services page.

In addition to the status information, the page shows:

  • Instances: A list of all instances on which the service is running.
  • Volumes: To view a list of volumes used by the service, select the row for an instance in the Instances section.
  • Network: [Internal|External]: Which network type this service uses to receive communications.

    This section also displays a list of the ports that the service uses.

  • Configuration settings: The settings you can configure for the service.
  • Service Units: The total number of service units currently being spent to run this service. This value is equal to the service's service unit cost times the number of instances on which the service is running.
  • Service unit cost: The number of service units required to run the service on one instance.
  • Service Instance Types: For services that have multiple types, the types that are currently running.
  • Instance Pool: For floating services, the instances that this service is eligible to run on.
  • Events: A list of all system events for the service.

Related CLI commands

getService

listServices

Related REST API methods

POST /services/query

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Monitoring processes

The Processes page lets you view information about what the system is doing. This includes any service operations you started and any internal maintenance processes the system needs to run.

Monitoring service operations

You can use the Admin App, CLI commands, or REST API methods to monitor all service operations. These include:

  • The initial deployments of services when the system was installed.
  • Service relocations that you begin.

For each one, the system shows:

  • The name of the service involved
  • The status of the operation
  • The number of steps completed out of the total number of steps
Admin App instructions

Procedure

  1. Select Dashboard > Processes.

Results

The Service Operations tab shows information about in-progress and completed service operations.

Related CLI commands

listSystemTasks

getSystemTask

Related REST API methods

GET /tasks/system

GET /tasks/system/{uuid}

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Monitoring system processes

You can use the Admin App, CLI commands, or REST API methods to view the progress of internal system processes. These processes include package installation tasks and regularly scheduled system maintenance activities such as log rotation.

For each process, your system shows:

  • The process name
  • The process state
  • The times at which each step in the process run occurred
NoteSystem processes have a type of SCHEDULED or ONE-TIME.
Admin App instructions

Procedure

  1. In the Admin App, select Processes.

  2. To view the currently running processes, select the System tab.

  3. To view the scheduled processes, select the Scheduled tab.

Related CLI commands

listSystemTasks

getSystemTask

Related REST API methods

GET /tasks/system

GET /tasks/system/{uuid}

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Monitoring objects

You can use the REST API to configure and generate chargeback reports for objects on the system. Users can generate a report for one or more of the buckets they own. An administrator can generate a report for a user or a list of one or more buckets.

Generating a system chargeback report

You can use a REST API method to generate a system chargeback report. You can display a report for a specific user or a list of one or more buckets.

Related REST API methods

POST /chargeback/system/get_report

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Generating a user chargeback report

You can use a REST API method to generate a chargeback report for a user. Users can display a report for a specific bucket, a list of buckets, or all buckets that they own.

Related REST API methods

POST /chargeback/user/get_report

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

System events

Your system maintains a log of system events that you can view using the Admin App, CLI commands, or REST API methods.

Admin App instructions

Procedure

  1. To view all system events, in the Admin App, click Events.

Related CLI commands

queryEvents

To view events through the CLI, your requests need to specify which events you want to retrieve.

For example, this JSON request body searches the event log for all events that have a severity level of warning:

{
"severities": [
"warning"
  ]
}

Related REST API methods

POST /events

To view events through the REST API, your requests need to specify which events you want to retrieve.

For example, this JSON request body searches the event log for all events that have a severity level of warning:

{
"severities": [
"warning"
  ]
}

You can get help on specific REST API methods for the Admin App at REST API - Admin.

HCP for cloud scale events

Most events are generated by and reported through the Object Storage Management application.

Events are written to syslog. Additionally, alerts corresponding to some events are displayed in the HCP for cloud scale applications.

NoteThe System Management application reports service-related events using the IDs 6006 (service information), 6007 (service warning), and 6008 (service error).

The following table lists HCP for cloud scale events.

IDSeverityMessageDescription
1109WARNINGInstallation of package package failed: reasonThe installation of the specified package failed for the specified reason.
2004SEVEREinstance instance with IP ip_address is error.
2005WARNINGInstance with IP ip_address value is at usage.
2006SEVEREInstance with IP ip_address value is at usage.
3002WARNINGLow-level service_name service on instance instance exited abnormally. Restarting.The specified service exited abnormally and is restarting.
5213WARNINGA certificate in the SSL server certificate chain for this system expires soon. If the certificate chain expires, users won't be able to access the system.This event applies only to system certificates, not client (storage component) certificates.
5214WARNINGThe SSL server certificate chain for this system contains an expired certificate. Users cannot access the system until the certificate chain is replaced.This event applies only to system certificates, not client (storage component) certificates.
6001WARNINGService service is balancing.
6002WARNINGService service is under-protected.The number of service instances has fallen below the required minimum.
6003SEVEREService service has failed.
6006INFOService Information: Default Retention configuration policy_name bucket 'bucket_name' The default retention policy policy_name for the specified bucket has been updated.
6006INFOService Information: Failed to Retrieve Storage Capacity InformationThe system could not retrieve capacity information from storage component id. Verify the storage component configuration.
6006INFOService Information: Lifecycle policy {CREATE | UPDATE | DELETE} bucket 'bucket_name'The lifecycle policy for the specified S3 bucket has been either created, updated, or removed.
6066INFOService Information: Lifecycle policy deleted for bucket 'bucket_name'The lifecycle policy for the specified S3 bucket has been removed.
6066INFOService Information: Notifications configuration notification_rule bucket 'bucket_name'Bucket notification has been updated.
6066INFOService Information: Replication policy policy_name 'bucket_name'The replication policy policy_name has been updated for the specified bucket.
6066INFOService Information: Replication policy deleted for bucket 'bucket_name'Bucket replication has been stopped.
6006INFOService Information: S3 Encryption setting updated to valueThe S3 encryption setting has been updated to the specified value.
6006INFOService Information: Serial number updated to valueThe HCP for cloud scale serial number has been changed to the specified value.
6006INFOService Information: setting_name was set to valueThe specified S3 setting has been changed to the specified value. If this was intended no action is needed.
6006INFOService Information: Single Storage Component Available Capacity Low The available capacity for object data of storage component id is now below the specified value. You might need additional capacity.
6006INFOService Information: Storage component 'id' createdThe storage component id has been created.
6006INFOService Information: Storage component 'id' is now state The specified storage component is in one of the following states:
  • ACTIVE
  • INACTIVE
  • UNVERIFIED
6006INFOService Information: Storage component 'id' updated: configurationThe specified storage component has been updated. configurationlists the changes.
6006INFOService Information: System Available Capacity LowThe available capacity for object data of the system is now below the specified value. You might need to plan for additional capacity.
6007WARNINGCertificate for SubjectDN dn will expire in n daysThe SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to synchronize to or from the target. You might need to obtain a new client certificate.
6007WARNINGService Warning: Certificate for Storage component 'id' is about to expire in 'n' days The SSL certificate for the specified storage component is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to read from or write to the storage component. You might need to obtain a new certificate.
6007WARNINGService Warning: Metadata-Coordination cannot communicate with Sentinel service to get state informationThe Metadata Coordination service can't communicate with the Sentinel service.
6007WARNINGService Warning: Storage component 'id' is now INACCESSIBLEThe specified storage component is inaccessible.HCP for cloud scale cannot read from or write to the storage component.
6008SEVERECertificate for SubjectDN dn expired on dd-mmm-yyyyThe SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) had expired. HCP for cloud scale cannot synchronize to or from the target. You must obtain a new client certificate.
6008SEVEREService Error: Storage Component Certificate has expired.The SSL certificate for a storage component has expired. HCP for cloud scale cannot read from or write to the storage component. You must obtain a new certificate.
6008SEVEREService Error: There is a critical issue with the Metadata Gateway database. Shutting down the Metadata Gateway Service.
6008SEVEREService Error: The vault service cannot be reached.No connection to the active vault node can be established.
6008SEVEREService Error: The vault service has a node that we can't connect to. Node IP: ip_addressOne of the vault nodes can't be reached. If other active nodes are available service continues, but attend to this issue immediately.
6008SEVEREService Error: The vault service has a sealed node. Please unseal it using the unseal keys you obtained when you turned on encryption. Node IP: ip_addressOne of the vault nodes is sealed. If other active nodes are available service continues, but attend to this issue immediately. Unseal it using the unseal keys you obtained when you turned on encryption.
6008SEVEREService Error: Vault Service Completely Sealed. Please unseal it using the unseal keys you obtained when you turned on encryptionAll nodes of the vault service (Key Management Server service) are sealed. Unseal using the unseal keys you obtained when you turned on encryption.
8001WARNINGStarting update from version to version.
8002WARNINGUpdate in progress from version to version.
8003SEVEREUpdate from version to version prechecks failed.Update failed because a pre-update verification failed.
8004SEVEREUpdate from version to version failed.The update failed.
8007WARNINGCompleted update from version to version.The update succeeded.
9001WARNINGSignal Source source failed. Reason.
9002WARNINGWorkflow workflow is not running and will be restarted.The Monitor-App workflow is not running. It will be restarted.
9003WARNINGThe Monitor-App is not processing data fast enough. Dashboard data is more than n minutes behind the latest data from the source. While in this state, monitors might not be triggered or might be triggered unexpectedly. If this alert persists, the system might be undersized. Consider adding more instances.The Monitor-App is starting to fall behind.
9004SEVEREThe Monitor-App is not processing data fast enough. Dashboard data is more than n minutes behind the latest data from the source. While in this state, monitors might not be triggered or might be triggered unexpectedly. If this alert persists, the system might be undersized. Consider adding more instances.The Monitor-App has fallen behind.

Alerts

Alert messages notify you of situations that need attention. Alerts can have a severity of Info, Warning, Severe, or Critical. You can view system alerts through the Admin App, CLI, or REST API, and storage component alerts through the Object Storage Management app.

Each alert corresponds to a system event.

TipBefore you deal with an alert, refresh the page. The underlying condition might have changed since the alert was raised.
System alerts
SeverityAlert DescriptionAction
SevereInstance ip-address disk usage severe threshold

The specified instance has less than 10% free disk space. Add additional storage to the instance.

Important: If an instance runs out of disk space, the system can become unresponsive.

SevereMaster Instance ip-address is down

Do one of these:

  • Restart the instance hardware or virtual machine.
  • Restart the script run on the instance. This script is located in the folder bin in the installation folder.
SevereService is down

Verify the health of the instances. If one is down, do one of these:

  • Restart the instance hardware or virtual machine.
  • Restart the script run on the instance. This script is located in the folder bin in the installation folder.

Otherwise, if the instances are healthy and the problem persists, contact Support.

SevereWorker Instance ip-address is down

Do one of these:

  • Restart the instance hardware or virtual machine.
  • Restart the script run on the instance. This script is located in the folder bin in the installation folder.
WarningInstance ip-address disk usage warning threshold

The specified instance has less than 25% free disk space. Add additional storage to the instance.

Important: If an instance runs out of disk space, the system can become unresponsive.

WarningPackage installation failed

Your system failed to install a package that you uploaded.

WarningService below recommendation

The service is currently running on fewer than the minimum number of instances. Configure this service to run on additional instances.

WarningService under-protected

A service has lost redundancy; that is, one or more instances on which that service is running are unresponsive.

Verify the health of the instances. If one is down, do one of these:

  • Restart the instance hardware or virtual machine.
  • Restart the script run on the instance. This script is located in the folder bin in the installation folder.

Otherwise, if the instances are healthy and the problem persists, contact Support.

WarningSSL server certificate chain expires soon

A certificate in the SSL server certificate chain for this system expires soon. If the certificate chain expires, users can't access the system.

WarningSSL server certificate chain expired

The SSL server certificate chain for this system contains an expired certificate. Users cannot access the system until the certificate chain is replaced.

InfoPackage installation in progress

Your system is currently installing a package that you uploaded. Depending on the contents of the package, this might take a while.

WarningThe certificate for the storage component (storage-id) is about to expire in n daysRenew the storage component certificate.
InfoThe storage component (storage-id) is unavailableVerify that the storage component ID is correct and valid and that the storage component is active.
InfoUpdate migration in progress. Current state: stateUpdate migration from a version before v2.3 is in progress. The state of migration is either OLD, MIGRATING, or CLEANUP.
Storage component alerts
SeverityMessageDescription
WarningAvailable capacity is below n {% | bytes} in the system for object dataThe free capacity on the HCP for cloud scale system has fallen below the specified threshold (either a percentage of the total or a byte value).
WarningCertificate for Storage component id is about to expire in n daysThe SSL certificate for the storage component id is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to read from or write to the storage component.
WarningStorage component id is now inaccessibleThe storage component id is in the state INACCESIBLE. HCP for cloud scale cannot read from or write to the storage component.
SevereCertificate for Storage component id expiredThe SSL certificate for the storage component id has expired. HCP for cloud scale cannot read from or write to the storage component. Install a new certificate.
SevereError communicating with a vault node. Node IP: ip_addressOne of the vault nodes can't be reached. If other active nodes are available service continues, but attend to this issue immediately.

Examine the vault instance logs to determine the cause of this issue.

SevereFailed to connect to KMS serverOne of the vault nodes can't be reached. If other active nodes are available service continues, but attend to this issue immediately.

If ingest is halted, then investigate why the KMS service is failing to run on all nodes. If ingest is still working, the original active node has failed over. Examine the vault instance logs to determine the cause of the failure.

SevereFailed to connect to KMS server as it is completely sealedThe vault service (Key Management Server service) is completely sealed.

Unseal it using the unseal keys you obtained when you turned on encryption.

SevereService error: There is a critical issue with the Metadata Gateway database. Shutting down the Metadata Gateway Service.

A Metadata Gateway instance has encountered an issue and shut down. Use the System Management Services function Repair to restart it.

If restarting the service doesn't resolve the issue, contact Support.

SevereVault node is sealed. Node IP: ip_addressOne of the vault nodes is sealed. If other active nodes are available service continues, but attend to this issue immediately.

Unseal it using the unseal keys you obtained when you turned on encryption.

CriticalAvailable capacity is below n {% | bytes} in Storage component idThe free capacity on the named HCP S Series Node storage component has fallen below the specified threshold (either a percentage of the total or a byte value).
CriticalFailed to connect to KMS serverThe Key Management System service is not available. Until the service is available, data on encrypted storage components can't be read or written.

When KMS service restarts, if there is only one active instance log in to HCP for cloud scale on port 8200 and provide unseal keys to reopen the vault.

CriticalFailed to retrieve capacity usage from Storage component idSystem can't retrieve metrics from an HCP S Series Node storage component. Possible reasons are:
  • The storage component is not reachable
  • The system was upgraded from before v2.1
  • The management username or password is not valid
  • HCP S Series Node version is not supported
CriticalFailed verification for retrieved encryption key for StorageComponent_ID{uuid=uuid}The encryption key returned from the Key Management System server doesn't match the key for the storage component uuid.

Verify that the KMS service is available. If the service is available, verify that you have provided the service with a quorum of unseal keys. If objects on the storage component still can't be read, contact Support.

CriticalMetadata-Coordination cannot communicate with Sentinel service to get state informationThe Sentinel service is not responding to requests for state information. Using the System Management application, immediately review the health of the Metadata-Coordination and Sentinel services and ensure that the Sentinel container has adequate heap size for the configuration of the cluster.
Client certificate alerts
SeverityMessageDescription
WarningCertificate for SubjectDN dn will expire in n daysThe SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) is set to expire in n days. If the certificate expires, HCP for cloud scale will not be able to synchronize to or from the target. You might need to obtain a new client certificate.
SevereCertificate for SubjectDN dn expired on dd-mmm-yyyyThe SSL certificate for the specified client sync-to or sync-from target (specified by its Distinguished Name) had expired. HCP for cloud scale cannot synchronize to or from the target. You must obtain a new client certificate.

Viewing alerts

Procedure

  1. To view alerts, click the user icon (The user icon is the silhouette of a head) in the top right corner of each Admin App page and then click Notifications.

Object Storage Management application instructions

The Object Storage Management application displays alerts about storage components. If an alert is raised the alert icon displays a badge with the number of active alerts. For example:

Alert badge with a circle with "10" inside, showing that ten alerts are active

Click the icon to display a window listing alert text.

Related CLI commands

listAlerts

Related REST API methods

GET /alerts

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Related REST API methods

POST /alert/list

For information about specific API methods, see the MAPI Reference or, in the Object Storage Management application, click the profile icon and select REST API.

Email notification rules

For the system to send email notifications, you need to create a rule that specifies who to email, what email server to use, what events to send emails about, and what information to include in email messages.

SMTP settings
  • Enable: Turns on email notifications.
  • Host: The hostname or IP address of the email server.
  • Port: The port on which the email server listens for email messages.
  • Security: The security protocol used by the email server (SSL or STARTTLS) or None if the email server doesn’t use a security protocol.
  • Authenticated: Enable this if the email server needs authentication, then specify:
    • In the Username field, the username for an email account that’s authorized to establish the connection between the system and the email server.
    • In the Password field, the password for the email account.
Message settings

You use the email notification message settings to configure a template for formatting all email notifications sent by the system.

  • From: The email address from which you want email notifications to be sent.
  • Subject: The email subject.
  • Body: The email message body.
Message variables

This table lists the variables you can use to make the email notification template. When the system sends an email notification, it replaces the variables in the notification with event-specific information.

VariableDescription
$severityEvent severity: INFO, WARNING, or SEVERITY.
$subjectA short description of the event.
$messageEvent message text.
$userNameName of the user responsible for the event.
$objectIdUnique identifier for component affected by the event.
$subsystemCategory for the component affected by the event.
$objectSourceIdUnique identifier of the internal system component or process that was the source of the event. Value is [unknown] for most events.
Recipient settings
  • Email addresses: A comma-separated list of email addresses to send notification emails to.
  • Severity Filter: The event severities about which to send email notifications. Can be one or more of these: INFO, WARNING, SEVERITY.

Creating email notification rules

Admin App instructions

Procedure

  1. Select Dashboard > Configuration.

  2. Click Notifications.

  3. Click Create.

  4. In the Type field, select Email.

  5. Type a name for the notification rule.

  6. Under SMTP settings, click Enable to enable the rule.

  7. Configure the SMTP and message settings for the notification rule.

  8. Specify a comma-separated list of emails to send notifications to.

  9. Click Create.

Related CLI commands

createNotificationRule

Related REST API methods

POST /notifications

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Creating syslog notification rules

When you create a syslog notification rule, the system sends log messages to your syslog server for each applicable system event.

Syslog settings
  • Enable: Turns on syslog notifications
  • Host: The hostname or IP address of the syslog server
  • Port: The port on which the syslog server listens for log messages
  • Facility: Category for the messages sent by this notification rule
Message settings

You use the syslog notification message settings to configure a template for formatting all syslog notifications sent by this notification rule.

  • Message: The message to send. You can use these variables as part of the message:
    VariableDescription
    $severityEvent severity: INFO, WARNING, or SEVERITY
    $subjectA short description of the event
    $messageEvent message text
    $timeTime at which the event occurred
    $userNameName of the user responsible for the event
    $subsystemCategory for the component affected by the event
    $objectIdUnique identifier for component affected by the event
    $objectTypeThe type of the component affected by the event.
    $objectSourceIdUnique identifier of the internal system component or process that was the source of the event. Value is [unknown] for most events.
    $objectSourceTypeType of the internal system component or process that was the source of the event. Value is [unknown] for most events.
  • Sender Identity: Identity of the sender for the event. Sent with every syslog message.
Severity filter

The event severities about which to send email notifications. Can be one or more of these: INFO, WARNING, or SEVERITY.

Creating syslog notification rules

Admin App instructions

Procedure

  1. Select Dashboard > Configuration.

  2. Click Notifications.

  3. Click Create.

  4. In the Type field, select Syslog.

  5. Type a name for the notification rule.

  6. Under Syslog settings, click Enable to enable the rule.

  7. Configure the settings for the notification rule.

  8. Specify a severity filter for the notification rule.

  9. Click Create.

Related CLI commands

createNotificationRule

Related REST API methods

POST /notifications

You can get help on specific REST API methods for the Admin App at REST API - Admin.

Logs and diagnostic information

Each service maintains its own set of logs. By default, log files are maintained in the folder install_path/hcpcs/log on each instance in the system. During installation, you can configure each service to store its logs in a different (that is, non-default) location.

Log levels

The following table lists the available log levels.

NoteRaising the log level (for example, from WARN to INFO) results in writing more data to the log file, but the file size increases more rapidly. Lowering the log level (for example, from WARN to ERROR) results in the file size increasing more slowly, but results in writing less data to the log file.
LevelLevels included
ALLFATAL, ERROR, WARN, INFO, DEBUG, TRACE
TRACEFATAL, ERROR, WARN, INFO, DEBUG, TRACE
DEBUGFATAL, ERROR, WARN, INFO, DEBUG
INFOFATAL, ERROR, WARN, INFO
WARNFATAL, ERROR, WARN (default)
ERRORFATAL, ERROR
FATALFATAL
OFFNone

Log management

You can manage any of the log files yourself. That is, you can delete or archive them as necessary.

CautionDeleting log files can make it more difficult for support personnel to resolve issues you might encounter.

System logs are managed automatically in these ways:

  • Retirement: All log files are periodically added to a compressed file and moved to install_path/hcpcs/retired/. This occurs at least once a day, but can also occur:
    • Whenever you run the log_download script.
    • Hourly, if the system instance's disk space is more than 60% full.
    • At the optimum time for a specific service.
  • Rotation: When a log file grows larger than 10MB in size, the system stops writing to that file, renames it, and begins writing to a new file. For example, if the file exampleService.log.0 grows to 10 MB, it is renamed to exampleService.log.1 and the system creates a new file named exampleService.log.0 to write to.
  • Removal: When a log file becomes older than 90 days, it is removed. If the system instance's disk space is more than 70% full, log files are deleted when they become older than one day.
  • When an optimum number of log files for a specific service is reached, the system can overwrite the oldest file. For example, if a service is limited to 20 log files, when the file exampleService.log.19 is filled, the system overwrites the file named exampleService.log.0.

Retrieving logs and diagnostic information

The tool log_download lets you easily retrieve logs and diagnostic information from all instances in the system. This tool is located at this path on each instance:

install_path/hcpcs/bin/log_download

For information about running the tool, use this command:

install_path/hcpcs/bin/log_download -h

Note
  • When using the tool log_download, if you specify the option --output, do not specify an output path that contains colons, spaces, or symbolic links. If you omit the option --output, you cannot run the script from within a folder path that contains colons, spaces, or symbolic links.
  • When you run the script log_download, all log files are automatically compressed and moved to the folder install_path/hcpcs/retired/.
  • If an instance is down, you need to specify the option --offline to collect the logs from that instance. If your whole system is down, you need to run the script log_download with the option --offline on each instance.

Default log locations

Default log locations

By default, each service stores its logs on each instance on which the service instance runs, in its own folder at this path:

install_path/hcpcs/log

This table shows the default log folder names for each service. Depending on how your system was configured when first deployed, your system's logs might not be stored in these folders.

ServiceDefault log folder nameContains information about
Admin-Appcom.hds.ensemble.plugins.service.​adminAppThe System Management application.
Databasecom.hds.ensemble.plugins.service.​cassandra
  • System configuration data.
  • Document fields and values.
Schedulingcom.hds.ensemble.plugins.service.​chronosWorkflow task scheduling.
N/Acom.hds.ensemble.plugins.service.​containerActionCreated by custom actions run by service plugins.
Metricscom.hds.ensemble.plugins.service.​​elasticsearchThe storage and indexing of:
  • System events
  • Performance and failure metrics for workflow tasks
Network-Proxycom.hds.ensemble.plugins.service.​haproxyNetwork requests between instances.
Message Queuecom.hds.ensemble.plugins.service.​kafkaThe transmission of data between instances.
Loggingcom.hds.ensemble.plugins.service.​logstashThe transport of system events and workflow task metrics to the Metrics service.
Service-Deploymentcom.hds.ensemble.plugins.service.marathonThe deployment of high-level services across system instances. High-level services are the ones that you can move and configure, not the services grouped under System Services.
Cluster-Workercom.hds.ensemble.plugins.service.​​mesosAgentThe work ordered by the Cluster-Coordination service.
Cluster-Coordinationcom.hds.ensemble.plugins.service.​mesosMasterHardware resource allocation.
Watchdogcom.hds.ensemble.plugins.service.remoteActionInternal system processes.
Sentinelcom.hds.ensemble.plugins.service.​sentinelThe internal system processes.
Watchdogcom.hds.ensemble.plugins.service.​watchdogGeneral diagnostic information.
Synchronizationcom.hds.ensemble.plugins.service.​zookeeperThe coordination of actions and database activities across instances.
S3-Gatewaycom.hitachi.aspen.foundry.service.​clientaccess.​dataThe client access data service.
Data-Lifecyclecom.hitachi.aspen.foundry.service.​data-lifecycle.serviceThe data lifecycle service.
Tracing-Agentcom.hitachi.aspen.foundry.service.​jaeger.​​agentThe tracing agent service.
Tracing-Collectorcom.hitachi.aspen.foundry.service.jaeger.​collectorThe tracing collector service.
Tracing-Querycom.hitachi.aspen.foundry.service.​jaeger.​​queryThe tracing query service.
MAPI-Gatewaycom.hitachi.aspen.foundry.service.​mapi.​gatewayThe management API gateway.
Policy-Enginecom.hitachi.aspen.foundry.service.metadata.async.​policy.​engineThe metadata asynchronous policy engine.
Metadata-Cachecom.hitachi.aspen.foundry.service.metadata.cacheThe metadata cache.
Metadata-Coordinationcom.hitachi.aspen.foundry.service.metadata.coordinationMetadata coordination.
Metadata-Gatewaycom.hitachi.aspen.foundry.service.metadata.gatewayThe metadata gateway.
Telemetry-Servicecom.hitachi.aspen.foundry.service.metrics.​prometheusTelemetry.
Message-Queuecom.hitachi.aspen.foundry.service.rabbitmq.serverThe message broker.
Key-Management-Servercom.hitachi.aspen.foundry.service.vault.vaultThe key management server.

 

  • Was this article helpful?