NB: If the problem is that the HNAS appears to be "hung" rather than just performing slowly then please refer instead to: The main knowledgebase article for HNAS performance problems which refers to this one is: The following questions are intended to allow us to get a good understanding of the performance problem that you are seeing and to gather as much potentially relevant information as possible. Gathering all this information up-front means we don't have to wait for questions to go back-and-forth and should assist in speeding up the resolution of your problem. Questions - Problem description:
- Please provide a short description of what the problem is perceived to be?
- Symptoms:
- What symptoms are seen on the clients? Are there are any error messages on the clients? If so what exactly are those?
- What symptoms are seen on the HNAS? Are there any error messages in the HNAS eventlog or debug log that appear to correlate with when the problem is occurring? If there are any error messages what exactly are those?
NB: For text error messages, please provide those as a "cut and paste" or saved to a text file rather than providing screenshot bitmaps. - Extent of impact:
- Which share(s)/export(s) on the HNAS are impacted?
- Which EVS(es) on the HNAS are impacted?
- What are the names of the HNAS nodes on which those EVS(es) are currently hosted?
- What are the names of the file systems on the HNAS are impacted?
- Do all clients seem to be impacted or just certain subsets of clients? If certain subsets of clients what differences are there between the working clients and the non-working clients? (O/S version, where on network, NFS vs SMB etc.)
- Timing of problem:
- When did the problem seem to first occur?
- How many times has the problem occurred and when did each occurrence start and finish?
- Are there any known periodic/scheduled "jobs" in the environment which correlate with the timings of when the problem is seen?
- When the problem was occurring were there any tasks running on the storage, such as formats or reconstructions? On which storage array(s)? When did these tasks start and finish?
- What has been done thus far to try and troubleshoot or narrow down the cause of this issue?
- Have you ruled out the common causes of HNAS performance problems listed in the knowledgebase article What Are Common Causes of Performance Problems in HNAS Systems?
- Have there been any changes in the environment around the time the problem was first noticed that may be related?
- Change in load?
- Hardware changes?
- Software changes?
- Network changes?
- Is Anti-Virus configured on the HNAS for the impacted shares?
- If Anti-Virus is suspected, please send Anti-Virus logs.
- What protocols are being used (NFS, CIFS, iSCSI, FTP):
- If NFS, what mount options are being used?
- If iSCSI which initiator is being used?
- Which OS is the client(s) running? [e.g. SW version and patch level ]
- If Redhat or Centos, please provide the version you're running.
- What is the network topology between the affected client and HNAS server?
- Please provide port configuration showing flow-control and port counters for the ingress and egress ports involved for each switch / router involved. [Specific commands depend on the switch make / model]
- Please confirm that flow control is enabled on client and server end switches. (preferred setting)
- Is the problem reproducible? If so what are the steps to reproduce?
- Is the problem periodic? If so, what things seem to correlate with when the problem occurs? e.g. expected peak load, everybody starting in the morning and logging in, ...
- Is there any additional information that you think might be relevant that we should be aware of?
Environment - Hitachi Network Attached Storage (HNAS)
- 3100/3200
- 3080/3090
- 4000 series
Answer To be provided by customer. Additional Notes Performance Data Collection - If the HNAS Is Not Attached to Hitachi Storage
- Start the PIR on the HNAS while the problem is happening
- When the PIR is finished, collect Diagnostics From HNAS
- If the Problem Involves HNAS Connected to a Midrange Array (AMS2000 or HUS100 Series)
- Validate Network Time Protocol is Setup for Midrange Array, Collection Server, and HNAS
- If NTP is not possible on all items, note the time differential between HNAS clock, Array clock, Collection Server time, and local time
- Start the Performance Collection from the Midrange Array
- While the Performance Collection is running on the Midrange Array, start the PIR on the HNAS. They must overlap.
- When the Performance Collection is complete for the Midrange Array, collect a Simple Trace
- When the Performance Collection is complete for the HNAS, collect Diagnostics
- If the Problem Involves HNAS Connected to an Enterprise Array With a Midrange Array Being Virtualized
- Validate Network Time Protocol is Setup for Midrange Array, Collection Server, HNAS, and Enterprise Array
- If NTP is not possible on all items, note the time differential between HNAS clock, Array clock, Collection Server time, and local time
- Start the Performance Collection from the Midrange Array
- While the Performance Collection is running on the Midrange Array, start the PIR on the HNAS. They must overlap.
- When the Performance Collection is Complete for the Midrange Array and HNAS, collect a Performance Data from the Enterprise array.
- When the Performance Data is Complete for the Enterprise Array, collect the following Hardware logs:
- Detailed DUMP from Enterprise Array:
- Simple Trace From Midrange Array: DF Hardware Data Collection
- Diagnostics From HNAS: Hitachi NAS (HNAS) Platform Data Collection
- If the Problem Involves HNAS Connected to an Enterprise Array Without a Midrange Array Being Virtualized
- Validate NTP on HNAS and if possible on Enterprise Array
- If NTP is not possible on all items, note the time differential between HNAS clock, Array clock, Collection Server time, and local time
- Start the PIR on the HNAS while the problem is happening
- When the PIR is finished, collect Diagnostics From HNAS
- When the PIR is finished, collect Detailed DUMP from Enterprise Array:
- Collect Performance Data from the Enterprise array within 4 hours of the PIR
|