What Information do I need to gather to allow GSC to diagnose an HNAS Performance Problem?
A standard set of HNAS diagnostics taken after the event is usually insufficient to diagnose the cause of a HNAS performance problem. If a HNAS system is having a performance problem the high-level process is the following:
NOTE: It is important that the performance data such as the PIR and packet capture gathered below are collected while the performance problem is happening.
In reality, there is no such thing as a "performance problem," there are only "lack of capacity problems." As such there are three possible approaches that can be taken to address these:
A performance tuning exercise is something that can be led by Hitachi GSC. There are three things to bear in mind before going down this path:
The way this process proceeds is as follows:
The changes recommended may be:
However, is not recommended to propose too many changes at once for the following reasons:
As a result, a performance tuning exercise is usually iterative in nature:
The performance tuning loop finishes when:
In the event of 2 & 3 this may mean that a system sizing exercise or product enhancement request is then required.
System sizing would be carried out by your Hitachi account team and/or Hitachi GSS and there are two aspects to this:
Once this has been carried out and the necessary additional capacity provisioned then you can plan and implement a migration from the old capacity to the new additional capacity. This approach is particularly suitable for customers who want "one set of changes that is going to resolve my problem."
Under certain circumstances it may be possible to "tune" the hardware/software of a product so that it can accommodate a greater load without requiring any additional hardware resources. The process for asking whether any such "tuning" may be possible is to raise a Product Enhancement Request (PER.)
An example of a product enhancement might be changing the system so that it can handle additional load before it exhausts the available CPU capacity. In Hitachi product enhancements are considered as sales rather than a support function and should be requested through your account team.
Suggested product enhancements are reviewed by product management and if deemed reasonable are added to the engineering backlog for possible scheduling and implementation. In general, the lead time for product enhancement requests would be quite long and they may be rejected if they are not considered suitable.
HNAS Performance Data Collection
While the above storage data collection is happening, kick off a 10-minute Performance Information Report (PIR) on the HNAS cluster specifying the file system that is currently not performing as expected:
Whilst the PIR above is being collected please gather a short (~30 second) packet capture on an impacted client as per the guidance in How to Collect Packet Captures for Troubleshooting HNAS Problems.
Please also provide:
After the PFM (storage) and PIR (HNAS) data collections have completed, gather a simple trace (Midrange) or dump (Enterprise) from the array and HNAS diagnostics and upload everything collected to TUF:
Once the performance data collection is underway, please look at and provide answers for the HNAS Performance Issues Questionnaire:
If the file system to focus on is not obvious from the context of the problem, try and "focus" the PIR (-f switch) on the busiest file system on the impacted EVS or storage pool (span). You can determine the busy file systems using the process described in:
How To: Determine Busy File Systems (HNAS)
Please refer to the knowledgebase article:
If the performance problem is intermittent then we recommend using the HNAS crontab CLI command to start a PIR on the impacted file system at every 00, 15, 30 and 45 minutes past the hour. You can then collect the PIRs and when the problem reoccurs send the PIR covering the time period in question to GSC. A default length "10 minute" PIR take approximately 13 minutes to run so starting one every 15 minutes means the previous one will have completed whilst still giving good coverage.
If you are using HNAS firmware 12.5 or later then you may also be able to use "continuous PIR" - see the performance-info-report HNAS CLI command man page for additional details.
If a particular HNAS event seems to mark the start of the performance problem, then it may be useful to trigger the start of a PIR when that event occurs. The procedure for doing that is documented in: