Summary
Objective
Obtain the Performance Information Report (PIR) which contains the Hitachi Network Attached Storage (HNAS) internal logs by using the System Management Unit (SMU) when requested by Hitachi Vantara Global Support Center. The diagnostic information obtained by using this article can be uploaded to the Technical Upload Facility (TUF), using the support case number.
Environment
- Hitachi Network Attached Storage (HNAS) 15.3 and earlier.
-
VSP Gx00/Fx00/N Series with NAS modules
Procedure
NOTE: A PIR operates in three main phases:
Values of performance counters etc. are captured and then reset,
Various performance statistics are gathered regularly and saved for later analysis. (This period is the "duration" of the PIR.)
The performance counters etc., are captured again, and various software and VLSI profiling are run.
The last case takes around 3 minutes, so a default 10-minute PIR typically takes about 13 minutes to collect.
CAUTION: Important Notes:
After collecting an HNAS Performance Information Report, please be sure to gather Diagnostics.
It is essential to collect back-end storage performance information along with the HNAS performance logs.
To collect PIR data:
-
- Open an SSH session to the System Management Unit (SMU)
- Determine the Cluster Node IP address of the server with the performance issue.
- From the SMU bash shell, open an SSH session to the server with the performance issue.
- Using an email address, the syntax is:
$ pir -f <file_system_name> duration (in minutes) -r <your_email@domain.com> -s subject
For example:
$ pir -f fs1 -r <your_email@domain.com> -s PIRdata
This command gathers performance statistics from the server over a specified period and then sends the results out as a compressed folder attachment via email. The above example will run for the default duration (~10 minutes), and it will focus on file system fs1, and the output is sent to your_email@domain.com with the subject header PIR data.
Note: If e-mail is unavailable, the PIR can be saved to the SMU using the --to-sscoption in the PIR command.
- Using the
--to-ssccommand, the syntax is:
$ pir -f file_system_name duration (in minutes) --to-ssc -s subject
For example:
$ pir -f fs1 --to-ssc -s PIRdata
After the PIR collection is complete, the PIR will be saved on the SMU, which you may retrieve by using SFTP to the SMU IP address. See the man pages for pir for more details. The --to-ssc option puts it in the /home/manager directory on the SMU (use WinSCP) to download it.
- Options
- Filesystem
If unable to determine which filesystem is causing the performance issue, the pir may run using the --no-file-system flag, but this is NOT recommended. Instead, gather a PIR on what appears to be the busiest file system on the impacted node. Please see this article on how to determine the most active file systems: How To: Determine Busy File Systems (HNAS)
If a particular storage pool appears to be impacted, try and run the PIR on the busiest file system in the storage pool. We don't recommend using --no-file-system because it collects much less data than a PIR focussed on a file system and gives information aggregated for all the file systems on the node, which usually doesn't help narrow down where the problem is occurring.
In general, It is NOT recommended to use anything other than the default duration of 10 minutes. The reason is that only the last 10 minutes of the PIR contain high-resolution data. There's no harm for a short problem from capturing more data, and there's little point in capturing more than the default collection period for a long-running problem.
- Duration in Minutes of PIR
If you look at the man page for the performance-info-report command, which can be viewed on the HNAS CLI using:
man performance-info-report | less
You will see in the synopsis that it has an option:
[<duration-in-minutes>]
that allows the operator to specify a duration that the PIR runs for. The system can only record 10 minutes of high-resolution statistics, so if specifying a period of longer than 10 minutes, only the last 10 minutes of the PIR will have high-resolution statistics recorded. As a result, this option is only useful to specify a PIR that runs for less than 10 minutes.
Another consideration for PIR duration is that performance problems usually occur during periods of peak behavior, not average behavior. If you run a PIR for a long duration, the performance counters are averaged over that period, and therefore they tend to reflect the average behavior, not the peak behavior. As a result, any peak behavior is much less distinct if not absorbed into the average and thus not apparent.
If you have an intermittent performance issue, then it is more beneficial to either schedule repeat PIRs to capture the problem or use the continuous PIR feature, as described in: What if my performance problem is intermittent?
- Using the
--all-pnodesoption
If the performance problem cannot be traced to a single cluster node, use the --all-pnodes option, which requires a GSC-provided developer password.
PIR Status
You can check the status of the PIR data collection using the pir-status command. It will show you the time remaining and allow you to calculate the elapsed time.
As usual with performance measurement, there is a cost associated. There is a chance that the data collection will worsen the problem to the point the customer may wish to halt the PIR collection process. In that case, please run pir-cancel then pir-status to confirm the PIR collection has ended. Of course, if no performance data is gathered, it is unlikely that the support engineers can help you.
Diagnosis
To start diagnosing a problem, we need a single capture of data while the problem occurs. If you have collected multiple PIRs, please send us the one which covered the period when the problem appeared to be worst. PIR data collected when the problem is not occurring will indicate what might be considered acceptable performance. In some cases, such a baseline is helpful to compare against a PIR from when the problem occurs, but if we think that will be helpful, we will ask for that explicitly. (Comparing a "known good" against a "known bad" can sometimes indicate what the differences are and thus what might be relevant to the problem if that is not obvious. Usually, it is obvious what the problem is, though, which is why we don't ask for a "baseline PIR" as a matter of course.)
If the problem is intermittent, then the article: What Information Do I Need to Gather to Allow GSC to Diagnose an HNAS Performance Problem
Describes how to go about capturing a PIR for that. It also contains details of all the other things that need to be looked at to progress a "performance problem."
