How to Collect and Download an HNAS Performance Information Report
Content

Objective

This article describes how to collect and download an HNAS performance information report (PIR).

Environment

  • Hitachi NAS Gateway Platform
    • Hitachi NAS Platform 5300 (HNAS 5300)
    • Hitachi NAS Platform 5200 (HNAS 5200)
    • Hitachi NAS Platform 4100 (HNAS 4100)
    • Hitachi NAS Platform 4080 (HNAS 4080)
    • Hitachi NAS Platform 4060 (HNAS 4060)
    • Hitachi NAS Platform 4040 (HNAS 4040)
  • Hitachi VSP-F/G/Nx00 Series NAS
    • VSP-F800 NAS
    • VSP-F600 NAS
    • VSP-F400 NAS
    • VSP-G800 NAS
    • VSP-G600 NAS
    • VSP-G400 NAS
    • VSP-N800 NAS
    • VSP-N600 NAS
    • VSP-N400 NAS

Procedure

Important Notes

  1. After collecting an HNAS Performance Information Report, please be sure to gather Diagnostics.
  2. It is essential to collect back-end storage performance information along with the HNAS performance logs.

To collect PIR data:

  1. Open an SSH session to the SMU
  2. Determine the Cluster_Node IP address of the server with the performance issue.
  3. From the SMU bash shell, open an SSH session to the server with the performance issue.
Using an email address, the syntax is: 

$ pir -f <file_system_name> duration (in minutes) -r <your_email@domain.com> -s subject 

For example: 

$ pir -f fs1 -r <your_email@domain.com> -s PIRdata

This command gathers performance statistics from the server over a specified period and then sends the results out as a compressed folder attachment via email.  The above example will run for the default duration (~10 minutes), and it will focus on file system fs1, and the output is sent to your_email@domain.com with the subject header PIRdata.

Note:  If e-mail is unavailable, the PIR can be saved to the SMU using the "--to-ssc" option in the PIR command.

 

Using the --to-ssc command, the syntax is:

$ pir -f file_system_name duration (in minutes) --to-ssc -s subject 

For example:

$ pir -f fs1 --to-ssc -s PIRdata

After the PIR collection is complete, the PIR will be saved on the SMU, which you may retrieve by using SFTP to the SMU IP address.  See the man pages for "pir" for more details. The --to-ssc option puts it in the /home/manager directory on the SMU (use WinSCP) to download it.

Options

Filesystem

If unable to determine which filesystem is causing the performance issue, the pir may run using the --no-file-system flag, but this is NOT recommended.  Instead, gather a PIR on what appears to be the busiest file system on the impacted node.  Please see this article on how to determine the most active file systems:

How To: Determine Busy File Systems (HNAS)

If a particular storage pool appears to be impacted, try and run the PIR on the busiest file system in the storage pool.  We don't recommend using --no-file-system because it collects much less data than a PIR "focussed" on a file system and gives information aggregated for all the file systems on the node, which usually doesn't help narrow down where the problem is occurring.

In general, It is NOT recommended to use anything other than the default duration of 10 minutes. The reason is that only the last 10 minutes of the PIR contain high-resolution data.  There's no harm for a short problem from capturing more data, and there's little point in capturing more than the default collection period for a long-running problem.

Duration in Minutes of PIR

If you look at the man page for the performance-info-report command, which can be viewed on the HNAS CLI using:

man performance-info-report | less 

You will see in the synopsis that it has an option:

[<duration-in-minutes>]

that allows the operator to specify a duration that the PIR "runs for."  The system can only record 10 minutes of high-resolution statistics, so if specifying a period of longer than 10 minutes, only the last 10 minutes of the PIR will have high-resolution statistics recorded.  As a result, this option is only useful to specify a PIR that runs for less than 10 minutes.

Another consideration for PIR duration is that performance problems usually occur during periods of peak behavior, not average behavior.  If you run a PIR for a long duration, the performance counters are averaged over that period, and therefore they tend to reflect the average behavior, not the peak behavior.  As a result, any peak behavior is much less distinct if not absorbed into the average and thus not apparent.

If you have an intermittent performance issue, then it is more beneficial to either schedule repeat PIRs to capture the problem or use the continuous PIR feature, as described in:

Using the --all-pnodes option

If the performance problem cannot be traced to a single cluster node, use the --all-pnodes option, which requires a GSC-provided developer password.

PIR Status

You can check the status of the PIR data collection using the pir-status command. It will show you the time remaining and allow you to calculate the elapsed time.

As usual with performance measurement, there is a cost associated.  There is a chance that the data collection will worsen the problem to the point the customer may wish to halt the PIR collection process. In that case, please run pir-cancel then pir-status to confirm the PIR collection has ended.  Of course, if no performance data is gathered, it is unlikely that the support engineers can help you.

Diagnosis

To start diagnosing a problem, we need a single capture of data while the problem occurs.  If you have collected multiple PIRs, please send us the one which covered the period when the problem appeared to be worst. PIR data collected when the problem is not occurring will indicate what might be considered "acceptable performance."  In some cases, such a baseline is helpful to compare against a PIR from when the problem occurs, but if we think that will be helpful, we will ask for that explicitly.  (Comparing a "known good" against a "known bad" can sometimes indicate what the differences are and thus what might be relevant to the problem if that is not obvious.  Usually, it is obvious what the problem is, though, which is why we don't ask for a "baseline PIR" as a matter of course.)

If the problem is intermittent, then the article:

What Information Do I Need to Gather to Allow GSC to Diagnose an HNAS Performance Problem

Describes how to go about capturing a PIR for that.  It also contains details of all the other things that need to be looked at to progress a "performance problem."

A PIR operates in three main phases:

  • Values of performance counters etc. are captured and then reset,
  • Various performance statistics are gathered regularly and saved for later analysis.  (This period is the "duration" of the PIR.)
  • The performance counters etc., are captured again, and various software and VLSI profiling are run.

The last case takes ~3 minutes, so a default 10 minute PIR typically takes about 13 minutes to collect.

Storage Considerations

  • If the backend storage is the AMS2000 series array or HUS100 series array, we will need to capture one hours' worth of midrange performance data while the problem is happening, followed by a simple trace. This capture must cover the 13-15 minute span of time during which the PIR is taken.

 

Attachments
CXone Metadata

Tags: PIR,hnas,TUF,Data Collection

PageID: 22646