Alerts
The alerts that can be received in this version of the Content Software for File system are described.
Overview
Alerts indicate problematic ongoing states that the cluster is suffering from. To dismiss an alert, you need to resolve the root cause of the alert.
For each alert, the system provides the alert name, its description, and the corrective action.
Usually, an alert is introduced alongside an equivalent event. This can help in identifying the point in time that the problematic state occurred and its root cause.
Manage alerts using the GUI
How to manage alerts using the GUI.
Viewing alerts using the GUI
The bell icon on the top bar indicates the number of the existing active alerts in the system. The alerts pane in the system dashboard also provides the name of the alerts.
If there are no alerts (active or muted), the alerts pane is empty, and the bell does not specify any number.
Procedure
To display the alert details, select the bell icon or select any alert.
Muting alerts
Before you begin
If for any reason, it is not possible to resolve the root cause of an alert in a reasonable time and you want to hide it temporarily, you can mute the alert for a specified period. Then later, you can unmute the alert and resolve it.
The system automatically unmutes the muted alerts after the expiry period.
Procedure
On the Active Alerts page, select the bell next to the alert.
Set the mute duration (number and units) and select Mute.
The muted alert is moved to the Muted Alerts area. The total number of active alerts is deducted by the number of muted alerts.
Unmute alerts
Muted alerts appear under the Muted Alerts area. You can unmute an alert manually before the expiry duration.
Procedure
Under the Muted Alerts area, select the bell of the alert you want to unmute.
List of alerts
Name | Description | Actions |
AdminDefault Password | The admin password is still set to the factory default. | Change the admin user password to ensure only authorized users can access the cluster. |
AgentNotRunning | The Content Software for File local control agent is not running on a host. | Restart the agent with service Content Software for File-agent start. |
ApproachingClientsUnavailability | Approaching the maximum amount of clients that can connect with the current cluster resources. | Make sure all backhand servers are up or expand the cluster with more backend servers. |
AutoRemoveTimeoutTooLow | Stateless Client auto-remove timeout too low. | Remount the host with a higher auto-remove timeout value. |
BackendNumaBalancingEnabled | A host has automatic NUMA balancing enabled which can negatively impact performance. | To disable, run echo 0 > /proc/sys/kernel/numa_balancing on the backend host. |
BackendVersionsMismatch | There are mismatching versions of backend servers in the cluster. | Upgrade all the backend servers to match the cluster's version. |
BondInterfaceCompromised | The host is configured to work with a highly available network, but has lost the connectivity redundancy. A single network failure can disconnect the host from the cluster, which will result in the unavailability of data to the host (in case of a client host) or data protection reduced redundancy (in case of a backend host). | Check the network configuration, cables, NICs to resolve the issue. |
BucketHasNoQuorum | Too many compute nodes are down, causing the bucket compute resource to be unavailable. | Check that the compute nodes and their hosts are up and running and fully connected; Contact customer support if the issue is not resolved. |
BucketUnresponsive | A compute resource has failed, causing system unavailability. | Check that the compute nodes and their hosts are up and running and fully connected; Contact customer support if the issue is not resolved. |
ChokingDetected | High congestion level detected in the cluster. | For more information, refer to System Congestion. |
ClientNumaBalancingEnabled | A host has automatic NUMA balancing enabled which can negatively impact performance. | To disable, run echo 0 > /proc/sys/kernel/numa_balancing on the client host. |
ClientVersionsMismatch | There are clients with a version that does not match the cluster version. Some features may not be available until all the clients are upgraded. | Upgrade clients to be in the same version as the cluster by locally running weka local upgrade . |
ClockSkew | The clock of a host is skewed in relation to the cluster leader, with a time difference more than the permitted maximum of 30 seconds. | Make sure NTP is configured correctly on the hosts and that their dates are synchronized. |
CloudHealth | A host cannot upload events to the Content Software for File cloud. | Check the host has Internet connectivity and is connected. For details, contact your Hitachi representative. |
CloudStatsError | Statistics upload to Content Software for File cloud failed. | Check the host has Internet connectivity and is connected to the Content Software for File cloud as explained in the |
ClusterInitializationError | The cluster has encountered an error while initializing. | Fix the underlying problem causing the error to successfully start IO operations. |
ClusterIsUpgrading | Cluster is upgrading. | If the upgrade doesn't finish normally, contact customer support for assistance. |
CPUFrequentStarvation | CPU frequent starvation detected in the last minute. | Check the relevant hosts logs for potential hardware problems or core allocation issues. |
CPUStarvation | Content Software for File processes are experiencing long CPU stalls. | Check the relevant hosts logs for potential hardware problems. |
DataIntegrity | Data integrity issue found. | Contact customer support. |
DataProtection | Some of the system's data is not fully redundant. | Check which node/host/drive is down and act accordingly. |
DedicatedWatchdog | A dedicated Content Software for File host requires the installation of a watchdog driver. Make sure a watchdog is available at /dev/watchdog. | For more information, contact customer support. |
DriveDown | A drive is not responding. | Contact customer support to check if the drive should be replaced. |
DriveEndurancePercentageUsed | Drive exceeding its life expectancy. | It is recommended to replace the drive before it fails. |
DriveEnduranceSparesRemaining | Drive internal spares running too low. | It is recommended to replace the drive before it fails. |
DriveNeedsPhaseout | A drive has too many errors. | Phase-out the drive and probably replace it. |
FilesystemHasToo ManyFiles | The filesystem storage configuration for the size of file and directory entries is exceeding (or about to exceed). | Increase the max-files for the filesystem. |
FilesystemSquashPending | A filesystem squash task is pending. | The filesystem is pending squash. The squash background task begins automatically. No corrective action is required. |
FilesystemsThinProvisioningLowSpace | There are thinly provisioned filesystems that running on low free capacity. | Consider adding more SSD capacity to the organization containing these filesystems'. |
FilesystemsThinProvisioningReserveReached | The request reserved capacity (for filesystem creation/expansion) is available. | The reserved capacity can now be used for filesystems creation/expansion. |
HangingCacheSync | Cache sync is stopped | A stopped cache sync can prevent other clients from accessing some files. To resolve this issue, reboot the host or remove it from the cluster.Data that is not synced with the cluster may be lost. |
HangingIOs | Some IOs are hanging on the node acting as a driver/NFS/backend. | Check that the compute nodes and their hosts are up and running, and fully connected. Also check that if a backend object store is configured, it is connected and responsive. Contact customer support if the issue is not resolved. |
HighDrivesCapacity | The average capacity of the SSDs is too high. | Free-up space on the SSDs or add more SSDs to the cluster. |
HighLevelOfUnreclaimedCapacityInObjectStore | High level of unreclaimed space in object store. | Check object store connectivity and deletion operations' progress. Validate authorization of deletion operations on the object store. Run weka fs tier capacity for details. |
JumboConnectivity | A host cannot send jumbo frames to any of its cluster peers. | Check the host network settings and the switch to which it is connected, even if Content Software for File seems to be functional since this will improve performance. |
KmsError | KMS Error. | Review the KMS credentials, permissions, and configuration, as suggested in KMS management. |
LicenseError | A license conflict exists. | Make sure the cluster is using a correct license, the license has not expired, and the cluster allocated space does not exceed the license. |
LowDiskSpace | The host has low disk space (for /opt/weka directory) which can affect some Content Software for File reporting services. | Free up space on the host, or contact customer support |
ManualOverridesActive | Manual overrides are active. | Please contact customer support |
MismatchedDriveFailureDomain | The drive failure domain does not match the failure domain of its attached host. | Either connect the mismatched drive to a host with a matching failure domain, or re-provision the drive to erase its failure domain. |
NegativeUnprovisionedCapacity | Content Software for File capacity usage changes detected due to cluster upgrade. | One or more of the filesystems need to be resized in order to reclaim capacity. Contact customer support. |
NetworkInterfaceLinkDown |
A Network interface has a link down status. | Check the connectivity to the interface and see if there is a blocking it. |
NoClusterLicense | No license is assigned to the cluster. | Obtain and install a license from customer support. |
NodeBlacklisted | There is a blacklisted node in the cluster. | Use Content Software for File debug blacklist disable to whitelist nodes so they can rejoin the cluster. |
NodeDisconnected | A node is disconnected from the cluster. | Check network connectivity to make sure the node can communicate with the cluster. |
NodeNetworkUnstable | A node seems to have an unstable network. As a consequence, it has been fenced by the system and does not contribute resources to the Content Software for File cluster. | Make sure there is no network connectivity issue in the cluster. Contact customer support if the issue is not resolved. |
NodeRDMANotActive | RDMA is supported on the host but it is inactive. | Make sure Mellanox OFED version 4.6 or higher is properly installed on the host. |
NodeTieringConnectivity | A node cannot connect to an object-store. | Check connectivity with the object store and make sure the node can communicate with it. |
NotEnoughActiveDrives | There are not enough active failure domains. | Check connectivity, host status, and/or replace problematic drives. |
OFEDVersions | A host Mellanox OFED version ID does not match the one used by the Content Software for File container. | Install a supported OFED. If the current version needs to be retained or the alert continues after a supported version is installed, contact customer support. |
PartialConnectivityTrackingDisabled | The cluster's partial connectivity tracking mechanism is disabled, affecting the cluster's self-healing capabilities. | Contact customer support. |
PartiallyConnectedNode | A node seems to be only partially connected. | Make sure there is no network connectivity issue. Contact customer support if the issue is not resolved. |
PassedClientsAvailabilityThreshold | Reached Clients Limit | Add more backend servers to the cluster, check whether backends are down, or disconnect some clients. |
PerformanceDegradedLowRAM | The host is running low on RAM. Additional Metadata entries are swapped to the SSD. This might impact performance. | Make sure all the compute hosts and processes are up, add more hosts to the Content Software for File cluster, or the configured RAM of the cluster backend hosts. |
QuotasHardLimitReached | There are directory quotas that have reached their hard limit. | Run weka fs quota list to see which directory quotas have reached their hard limit. |
QuotasSoftLimitReached | There are directory quotas that have reached their soft limit. | Run weka fs quota list to see which directory quotas have reached their soft limit. |
ResourcesNotApplied | There are changes to host resources that are not applied in the Content Software for File cluster. | To apply changes run Content Software for File cluster host apply <host_id> |
SSDCapacityDiscrepancy | Used SSD capacity mismatches the expected range | Monitor COMPUTE processes' stability, contact customer support. |
SystemDefinedTLS | The Content Software for File cluster uses an auto-generated self-signed certificate. | Run weka security tls set to replace the auto-generated certificate with your own certificate for cluster TLS use. |
TLSCertificateExpired | TLS Certificate has expired. | Replace the current certificate using Content Software for File security server-tls set . |
TLSCertificateExpiresSoon | TLS Certificate is about to expire. | Replace the current certificate using Content Software for File security server-tls set . |
TieredFilesystemOverfillingSSD | Tiered filesystems' SSD Capacity overfilling. | Resolve tiering connectivity issues or increase the upload bandwidth. |
TraceDumperDown | Trace dumper is down | Contact customer supportto restart the trace dumper. |
TracesDisabled | Traces are disabled. | To turn them back on contact customer support. |
TracesFreezePeriodActive | A trace freeze period is active. | Some traces can be protected from rotating for a period of time to debug the system. This is done by the customer support when needed. If the issue persists after the case has been resolved please contact customer support. |
UdpModePerformanceWarning | The backend host is configured in UDP mode. | If this is a misconfiguration use Content Software for File cluster host net add to add network devices to this host. |