Troubleshooting
When troubleshooting, you must understand what information to collect when an error occurs, the cases where an error occurs, and what action to take in each case.
Collecting information for troubleshooting
If a failure occurs in Storage Plug-in for Containers, collect the following information. Provide the collected information to customer support when you make an inquiry.
Information needed when contacting support
Information | Procedure |
Cluster information | Run the following command:
# kubectl cluster-info dump -A > dump.txt |
Operator logs | See Collecting logs for Storage Plug-in for Containers. |
CSI controller logs | See Collecting logs for Storage Plug-in for Containers. |
CSI node logs | See Collecting logs for Storage Plug-in for Containers. |
PVC-related manifests | Get the YAML files for StorageClass, Secret, and PersistentVolumeClaim. |
Snapshot-related manifests | Get the YAML files for VolumeSnapshotClass, Secret, and VolumeSnapshot. |
Snapshot-related logs | Collect the snapshot controller logs that you installed in the Volume snapshot chapter. |
Application manifests | Get the YAML files for applications that uses Storage Plug-in for Containers PVCs. |
Storage logs | See Collecting storage system information for VSP family or Collecting storage system information for VSSB. |
Collecting logs for Storage Plug-in for Containers
You can retrieve logs for your running containers using the kubectl logs
command. To collect Storage Plug-in for Containers logs, you need to collect logs from the Operator, CSI controller, and CSI node.
https://kubernetes.io/docs/concepts/cluster-administration/logging/
To retrieve logs from the operator, run the following command:
# kubectl logs -n <namespace> hspc-operator-controller-manager-<id>
To retrieve logs from the CSI controller, run the following command:
# kubectl logs -n <namespace> hspc-csi-controller-<id> -c <container name>
- Container
hspc-csi-driver
is the primary process. - There are several sidecars for this Pod: csi-provisioner, external-attacher, csi-resizer, csi-snapshotter, and liveness-probe.
To retrieve logs from the CSI node, run the following command:
# kubectl logs -n <namespace> hspc-csi-node-<id> -c <container name>
- Container
hspc-csi-driver
is the primary process. - There are several sidecars for this Pod: driver-registrar, liveness-probe
- You will see multiple CSI node Pods since this is deployed as a DaemonSet. Collect logs from all these Pods.
Collecting storage system information for VSP family
If you are using an SVP, collect the regular dump files.
If you are not using an SVP, collect system dumps using the maintenance utility. For details about how to collect the dump files of storage systems, see the System Administrator Guide.
Collecting storage system information for VSSB
Collect the dump files. For the procedure on collecting dump files, contact customer support.
Viewing the volume properties of PersistentVolume
spec.csi.volumeAttributes
of the PersistentVolume. You can view these properties using the command kubectl get pv <PV name> -o yaml
.These properties are mainly used for internal purposes. The following tables describe the properties that can be helpful when troubleshooting.
Volume properties for VSP family.
Property | Description |
ldevIDDec | LDEV ID. |
size |
Capacity of the volume. NoteCapacity shown here is the original
capacity used when creating the volume.
|
Volume properties for VSSB.
Property | Description |
volumeID | ID of the volume created in VSSB. |
size |
Capacity of the volume. NoteCapacity shown here is the original
capacity used when creating the volume.
|
Node failures
When a node failure occurs and the nodes leave a cluster, reboot the operating system of the nodes to clear unnecessary files, such as unpointed device files, before the nodes rejoin the cluster.
Initial setup for Fibre Channel environment
If you encounter an error code 0x0000c008
during the Pod creation phase, and if it is the first time after the initial setup of the Fibre Channel environment, consider deleting the Pod and rebooting the host to solve the issue.
Creating and deleting PersistentVolumeClaim simultaneously
0x0000100b
, 0x0000100f
, 0x0000101a
, or 0x0000f007
. This problem can be reduced by specifying the --worker-threads
argument to the csi-provisioner
container. This argument limits the number of simultaneously running create and delete operations. The default value is 100.The following example shows how to reduce the number of --worker-threads
to 10. For the YAML configuration, refer to Configuration of Storage Plug-in for Containers instance.
apiVersion: csi.hitachi.com/v1 kind: HSPC metadata: name: hspc namespace: <SPC_NAMESPACE> spec: imagePullSecrets: - regcred-redhat-com - regcred-redhat-io controller: containers: - name: csi-provisioner args: - --csi-address=/csi/csi-controller.sock - --timeout=300s - --v=5 - --worker-threads=10
If the problem persists, contact technical support.
Host group settings
If you encounter error 0x00001023
, you must modify the host group in the storage. Storage Plug-in for Containers searches the host group named "spc-<wwn1>-<wwn2>-<wwn3>", based on the naming rules (see Host group and iSCSI target naming rules). The error was likely generated because the host group's name may not follow the "spc-<wwn1>-<wwn2>-<wwn3>" naming format. To resolve the issue, delete the host group shown in the error
message and rename the host group that has host WWNs.
- Stop Storage Plug-in for Containers.
- Delete the host group that is specified in the error message.
- Search host groups that have WWNs for each host, and delete them or rename them to "spc-<wwn1>-<wwn2>-<wwn3>".
- Start Storage Plug-in for Containers.