Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Alerts

The alerts that can be received in this version of the Content Software for File system are described.

Overview

Alerts indicate problematic ongoing states that the cluster is suffering from. To dismiss an alert, you need to resolve the root cause of the alert.

For each alert, the system provides the alert name, its description, and the corrective action.

Usually, an alert is introduced alongside an equivalent event. This can help in identifying the point in time that the problematic state occurred and its root cause.

GUID-E165F2CF-4026-4371-8285-1CAB56A59057-low.png

Manage alerts using the GUI

How to manage alerts using the GUI.

Viewing alerts using the GUI

The bell icon on the top bar indicates the number of the existing active alerts in the system. The alerts pane in the system dashboard also provides the name of the alerts.

If there are no alerts (active or muted), the alerts pane is empty, and the bell does not specify any number.

GUID-C7D10F01-F17C-401F-813D-32287AA638D4-low.png

Procedure

  1. To display the alert details, select the bell icon or select any alert.

    GUID-CF685547-84A8-414F-B645-5D98AD05FD3E-low.png

Muting alerts

Before you begin

If for any reason, it is not possible to resolve the root cause of an alert in a reasonable time and you want to hide it temporarily, you can mute the alert for a specified period. Then later, you can unmute the alert and resolve it.

The system automatically unmutes the muted alerts after the expiry period.

Procedure

  1. On the Active Alerts page, select the bell next to the alert.

  2. Set the mute duration (number and units) and select Mute.

    The muted alert is moved to the Muted Alerts area. The total number of active alerts is deducted by the number of muted alerts.

    GUID-92F901AC-638D-4181-A88D-273AF78CCC6B-low.png

Unmute alerts

Muted alerts appear under the Muted Alerts area. You can unmute an alert manually before the expiry duration.

Procedure

  1. Under the Muted Alerts area, select the bell of the alert you want to unmute.

    GUID-F67B9122-2F4A-4398-A6CD-9B0CFD49D771-low.png

List of alerts

NameDescription Actions
AdminDefault PasswordThe admin password is still set to the factory default.Change the admin user password to ensure only authorized users can access the cluster.
AgentNotRunningThe Content Software for File local control agent is not running on a host.Restart the agent with service Content Software for File-agent start.
ApproachingClientsUnavailabilityApproaching the maximum amount of clients that can connect with the current cluster resources.Make sure all backhand servers are up or expand the cluster with more backend servers.
AutoRemoveTimeoutTooLowStateless Client auto-remove timeout too low.Remount the host with a higher auto-remove timeout value.
BackendNumaBalancingEnabledA host has automatic NUMA balancing enabled which can negatively impact performance.To disable, run echo 0 > /proc/sys/kernel/numa_balancing on the backend host.
BackendVersionsMismatchThere are mismatching versions of backend servers in the cluster.Upgrade all the backend servers to match the cluster's version.
BondInterfaceCompromisedThe host is configured to work with a highly available network, but has lost the connectivity redundancy. A single network failure can disconnect the host from the cluster, which will result in the unavailability of data to the host (in case of a client host) or data protection reduced redundancy (in case of a backend host).Check the network configuration, cables, NICs to resolve the issue.
BucketHasNoQuorumToo many compute nodes are down, causing the bucket compute resource to be unavailable.Check that the compute nodes and their hosts are up and running and fully connected; Contact customer support if the issue is not resolved.
BucketUnresponsiveA compute resource has failed, causing system unavailability.Check that the compute nodes and their hosts are up and running and fully connected; Contact customer support if the issue is not resolved.
ChokingDetectedHigh congestion level detected in the cluster.For more information, refer to System Congestion.
ClientNumaBalancingEnabledA host has automatic NUMA balancing enabled which can negatively impact performance.To disable, run echo 0 > /proc/sys/kernel/numa_balancing on the client host.
ClientVersionsMismatchThere are clients with a version that does not match the cluster version. Some features may not be available until all the clients are upgraded.Upgrade clients to be in the same version as the cluster by locally running weka local upgrade.
ClockSkewThe clock of a host is skewed in relation to the cluster leader, with a time difference more than the permitted maximum of 30 seconds.Make sure NTP is configured correctly on the hosts and that their dates are synchronized.
CloudHealthA host cannot upload events to the Content Software for File cloud.Check the host has Internet connectivity and is connected. For details, contact your Hitachi representative.
CloudStatsErrorStatistics upload to Content Software for File cloud failed.Check the host has Internet connectivity and is connected to the Content Software for File cloud as explained in the
ClusterInitializationErrorThe cluster has encountered an error while initializing.Fix the underlying problem causing the error to successfully start IO operations.
ClusterIsUpgradingCluster is upgrading.If the upgrade doesn't finish normally, contact customer support for assistance.
CPUFrequentStarvationCPU frequent starvation detected in the last minute.Check the relevant hosts logs for potential hardware problems or core allocation issues.
CPUStarvationContent Software for File processes are experiencing long CPU stalls.Check the relevant hosts logs for potential hardware problems.
DataIntegrityData integrity issue found.Contact customer support.
DataProtectionSome of the system's data is not fully redundant.Check which node/host/drive is down and act accordingly.
DedicatedWatchdogA dedicated Content Software for File host requires the installation of a watchdog driver. Make sure a watchdog is available at /dev/watchdog.For more information, contact customer support.
DriveDownA drive is not responding.Contact customer support to check if the drive should be replaced.
DriveEndurancePercentageUsedDrive exceeding its life expectancy.It is recommended to replace the drive before it fails.
DriveEnduranceSparesRemainingDrive internal spares running too low.It is recommended to replace the drive before it fails.
DriveNeedsPhaseoutA drive has too many errors.Phase-out the drive and probably replace it.
FilesystemHasToo ManyFilesThe filesystem storage configuration for the size of file and directory entries is exceeding (or about to exceed).Increase the max-files for the filesystem.
FilesystemSquashPendingA filesystem squash task is pending.The filesystem is pending squash. The squash background task begins automatically. No corrective action is required.
FilesystemsThinProvisioningLowSpaceThere are thinly provisioned filesystems that running on low free capacity.Consider adding more SSD capacity to the organization containing these filesystems'.
FilesystemsThinProvisioningReserveReachedThe request reserved capacity (for filesystem creation/expansion) is available.The reserved capacity can now be used for filesystems creation/expansion.
HangingCacheSyncCache sync is stoppedA stopped cache sync can prevent other clients from accessing some files. To resolve this issue, reboot the host or remove it from the cluster.Data that is not synced with the cluster may be lost.
HangingIOsSome IOs are hanging on the node acting as a driver/NFS/backend.Check that the compute nodes and their hosts are up and running, and fully connected. Also check that if a backend object store is configured, it is connected and responsive. Contact customer support if the issue is not resolved.
HighDrivesCapacityThe average capacity of the SSDs is too high.Free-up space on the SSDs or add more SSDs to the cluster.
HighLevelOfUnreclaimedCapacityInObjectStoreHigh level of unreclaimed space in object store.Check object store connectivity and deletion operations' progress. Validate authorization of deletion operations on the object store. Run weka fs tier capacity for details.
JumboConnectivityA host cannot send jumbo frames to any of its cluster peers.Check the host network settings and the switch to which it is connected, even if Content Software for File seems to be functional since this will improve performance.
KmsErrorKMS Error.Review the KMS credentials, permissions, and configuration, as suggested in KMS management.
LicenseErrorA license conflict exists.Make sure the cluster is using a correct license, the license has not expired, and the cluster allocated space does not exceed the license.
LowDiskSpaceThe host has low disk space (for /opt/weka directory) which can affect some Content Software for File reporting services.Free up space on the host, or contact customer support
ManualOverridesActiveManual overrides are active.Please contact customer support
MismatchedDriveFailureDomainThe drive failure domain does not match the failure domain of its attached host.Either connect the mismatched drive to a host with a matching failure domain, or re-provision the drive to erase its failure domain.
NegativeUnprovisionedCapacityContent Software for File capacity usage changes detected due to cluster upgrade.One or more of the filesystems need to be resized in order to reclaim capacity. Contact customer support.
NetworkInterfaceLinkDown

A Network interface has a link down status.

Check the connectivity to the interface and see if there is a blocking it.
NoClusterLicenseNo license is assigned to the cluster.Obtain and install a license from customer support.
NodeBlacklistedThere is a blacklisted node in the cluster.Use Content Software for File debug blacklist disable to whitelist nodes so they can rejoin the cluster.
NodeDisconnectedA node is disconnected from the cluster.Check network connectivity to make sure the node can communicate with the cluster.
NodeNetworkUnstableA node seems to have an unstable network. As a consequence, it has been fenced by the system and does not contribute resources to the Content Software for File cluster.Make sure there is no network connectivity issue in the cluster. Contact customer support if the issue is not resolved.
NodeRDMANotActiveRDMA is supported on the host but it is inactive.Make sure Mellanox OFED version 4.6 or higher is properly installed on the host.
NodeTieringConnectivityA node cannot connect to an object-store.Check connectivity with the object store and make sure the node can communicate with it.
NotEnoughActiveDrivesThere are not enough active failure domains.Check connectivity, host status, and/or replace problematic drives.
OFEDVersionsA host Mellanox OFED version ID does not match the one used by the Content Software for File container.Install a supported OFED. If the current version needs to be retained or the alert continues after a supported version is installed, contact customer support.
PartialConnectivityTrackingDisabledThe cluster's partial connectivity tracking mechanism is disabled, affecting the cluster's self-healing capabilities.Contact customer support.
PartiallyConnectedNodeA node seems to be only partially connected.Make sure there is no network connectivity issue. Contact customer support if the issue is not resolved.
PassedClientsAvailabilityThresholdReached Clients LimitAdd more backend servers to the cluster, check whether backends are down, or disconnect some clients.
PerformanceDegradedLowRAMThe host is running low on RAM. Additional Metadata entries are swapped to the SSD. This might impact performance.Make sure all the compute hosts and processes are up, add more hosts to the Content Software for File cluster, or the configured RAM of the cluster backend hosts.
QuotasHardLimitReachedThere are directory quotas that have reached their hard limit.Run weka fs quota list to see which directory quotas have reached their hard limit.
QuotasSoftLimitReachedThere are directory quotas that have reached their soft limit.Run weka fs quota list to see which directory quotas have reached their soft limit.
ResourcesNotAppliedThere are changes to host resources that are not applied in the Content Software for File cluster.To apply changes run Content Software for File cluster host apply <host_id>
SSDCapacityDiscrepancyUsed SSD capacity mismatches the expected rangeMonitor COMPUTE processes' stability, contact customer support.
SystemDefinedTLSThe Content Software for File cluster uses an auto-generated self-signed certificate.Run weka security tls set to replace the auto-generated certificate with your own certificate for cluster TLS use.
TLSCertificateExpired TLS Certificate has expired.Replace the current certificate using Content Software for File security server-tls set.
TLSCertificateExpiresSoonTLS Certificate is about to expire.Replace the current certificate using Content Software for File security server-tls set.
TieredFilesystemOverfillingSSDTiered filesystems' SSD Capacity overfilling.Resolve tiering connectivity issues or increase the upload bandwidth.
TraceDumperDownTrace dumper is downContact customer supportto restart the trace dumper.
TracesDisabledTraces are disabled.To turn them back on contact customer support.
TracesFreezePeriodActiveA trace freeze period is active.Some traces can be protected from rotating for a period of time to debug the system. This is done by the customer support when needed. If the issue persists after the case has been resolved please contact customer support.
UdpModePerformanceWarningThe backend host is configured in UDP mode.If this is a misconfiguration use Content Software for File cluster host net add to add network devices to this host.

 

  • Was this article helpful?