Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Software solution examples

The management software for the Hitachi VSP storage systems enables you to increase operational efficiency, optimize availability, and meet critical business requirements.

Delivering storage infrastructure as a service through automated workflows

Financial institutions must provide services 24/7, with almost zero tolerance for outages and inaccessibility to data and information. Storage provisioning plays an integral part in data management. Organizations need to control the complexities associated with storage management and balance operational efficiency. A positive customer experience depends on how the data center is controlled and managed and on the ability to deliver applications in a consistent and timely manner. However, to achieve this objective, customers require a solution to alleviate these pain points:

  • Manual storage provisioning processes, which can lead to human errors. Studies show that more than 40% of outages in a storage environment are caused by human error.
  • Time-consuming operational inefficiencies
  • Cost-inefficient storage provisioning, which can waste storage resources
  • A requirement to know infrastructure and environmental details, which allows for no abstraction
  • A requirement to manually analyze performance and capacity without any built-in intelligence or automation
Solution

Hitachi Automation Director (HAD) automates manual storage provisioning processes and provides application-based provisioning services that require minimal user input and that intelligently leverage infrastructure resources. HAD provides the following solutions to alleviate the pain points that customers experience in the current environment:

  • Implements intelligent automation workflows to streamline the storage provisioning process.
  • Provides a catalog of predefined service templates and plugin components that incorporate Hitachi best practices in storage provisioning and that minimize human error.
  • Provides customizable storage service templates requiring minimal input that administrative users can use to increase operational efficiency.
  • Optimizes storage configurations for common business applications such as Oracle, Microsoft Exchange, Microsoft SQL Server and hypervisors such as Microsoft Hyper-V and VMware.
  • Analyzes current storage pool capacity utilization and performance to automatically determine the optimized location for new storage capacity requests and to make storage provisioning more cost-efficient.
Management software

HAD offers a web-based portal and includes a catalog of predefined workflows that are based on best practices for various applications. These workflows take into account infrastructure requirements for specific applications, including the appropriate storage tier. Capturing the provisioning process with predefined requirements in the workflow, a storage administrator can repeatedly provision infrastructure with simple requests.

After information for provisioning is submitted, the HAD intelligent engine matches the request with the appropriate infrastructure based on performance and capacity analysis. HAD expedites the provisioning process and enables smarter data center management. It provides a REST-based API to integrate provisioning workflows into existing IT management operation applications.Card View

HAD includes a comprehensive tool, Service Builder, to create and modify existing workflows and plug-in components that automate the storage management tasks for a given operating environment. Service Builder

HAD supports all native block storage systems and third-party storage systems through virtualization technology.

Data protection for business-critical Oracle databases

Data protection and recovery operations are cited by most customers as one of their top three IT-related challenges. Meanwhile, traditional solutions cannot keep up with rampant data growth, increasing complexity, and distribution of infrastructure. Tighter data availability service-level requirements (backup window, recovery point objective, and recovery time objective) create an impossible situation for line of business owners.

The simple truth is that backup is broken in certain highly important areas, including critical 24x7 applications with large databases.

The business demands that critical data is protected with little or no data loss and with minimal or no performance or availability impact while the data protection occurs.

Solution

Hitachi Thin Image (HTI) provides fast copies of the production data and Hitachi Universal Replicator (HUR) ensures that there is an asynchronous copy of the data on another storage system in a distant location. Hitachi Data Instance Director (HDID) orchestrates the HTI and HUR data protection activities through a business-objective-driven, whiteboard-like graphical interface, and ensures application consistency for both local and remote snapshots.

The HDID policy is defined in terms of recovery point objectives (RPO) and retention so that new application-aware snapshots are taken to meet each RPO and deleted after the retention period.

Management software

Hitachi Data Instance Director (HDID) combines modern data protection with business-defined copy data management, simplifying the creation and management of complex data protection and retention workflows.

For simplified management, HDID provides a powerful, easy-to-use workflow-based policy engine, so that you can define a data protection workflow within 10 minutes:

  • Service-level agreement (SLA)-driven policy enables administrators to define the data classification (such as SQL Server or Oracle), data protection operations, and required SLAs (RPO, data retention).
  • Whiteboard-style data flow enables the administrator to define the copy destinations and assign policies to them using drag-and-drop operations. The topological view helps the administrator to visualize the data protection processes and align them with the management requirements.

HDID whiteboard-style data flow

You can use different methods to back up data across multiple sites, as described in the following table and figure.

Method Description
Identical snapshots and clones Provide identical RPO and data retention regardless of location. Keeping identical backups provides identical recovery options and procedures during a site failover, which simplifies the entire restore process.
Unique snapshots and clones Provide flexible RPO and data retention based on differing business requirements between normal operation and a site failover. Keeping independent backups enables shorter RPOs and lower retention to be set on the local site for quick recovery, while protecting data longer on the remote site.

RPO snapshots and clones

Using Infrastructure Analytics Advisor for data analysis: from deep dive to recovery planning

Hitachi Infrastructure Analytics Advisor is bundled with Hitachi Data Center Analytics in a software package called Performance Analytics.

Infrastructure Analytics Advisor provides an intuitive GUI for performance monitoring, management, and troubleshooting.

Data Center Analytics performs the data collection from monitored targets (such as storage systems, hosts, and switches) using software probes that support each device or environment. Data Center Analytics also provides historical trend analysis and extensive report generation capabilities.

The Performance Analytics solution provides end-to-end monitoring and troubleshooting capabilities for your infrastructure resources, from host to storage system. The basic workflow for Performance Analytics troubleshooting is called the MAPE loop:

  • Monitor
  • Analyze
  • Plan
  • Execute

When reviewing and evaluating reports and event information on the Infrastructure Analytics Advisor Dashboard, you can also perform a deep dive analysis by invoking the Data Center Analytics UI. The deep dive is part of the Analyze segment of the MAPE loop workflow.

The following workflow is an example of how to use this troubleshooting methodology as an infrastructure administrator, managing user resources (such as consumers, VMs, and volumes) and system resources (such as cache, ports, CPUs, and disks).

Viewing the dashboard

As an infrastructure administrator, you set up dynamic thresholds on the user resources you are monitoring. After seeing nine critical alerts on VM/Host resource gauge, you become interested in troubleshooting a threshold violation.

GUID-C607AB89-3680-4D57-9EFB-4DB80008683E-low.png

You browse the resources with critical alerts and select the target VM to analyze in the E2E View.

GUID-9E33149A-1073-4ECC-8141-3F7651D34A88-low.png
Using E2E or Sparkline views

The E2E view represents the topology of infrastructure resources: from host, to fabric switch, to storage system. The infrastructure administrator sets the base point of analysis on the target resource for analysis. This view allows you to see the relationship between resources.

To move deeper into the underlying resources, you can invoke the Sparkline view, which presents multiple charts that track performance by component. Use this view to correlate performance trends between user and system resources.

GUID-A6FE4961-0E4E-4AAD-A9D9-9C9A796768F7-low.png
Using additional troubleshooting tools

Infrastructure Analytics Advisor offers multiple troubleshooting tools for isolating a bottleneck candidate and identifying its root cause. You can launch any of the following tools for further analysis:

  • Verify Bottleneck: Use at the initial stage of analysis to compare performance charts of the base point of analysis with the bottlenecked candidate.
  • Identify Affected Resources: Use to display the user resources that rely on the bottlenecked resource.
  • Analyze Shared Resources: Use if you suspect that the root cause of the problem is resource contention, a noisy neighbor that disrupts the balance of resource usage. You compare performance charts of the bottleneck candidate to the resources using the bottleneck. After comparing performance across a number of resources with Analyze Shared Resources, you isolate the actual bottleneck.
  • Analyze Related Changes: Use if Analyze Shared Resources does not reveal the actual bottleneck (noisy neighbor), or if you suspect that the root cause of the problem is a recent configuration change. In this view, you compare performance charts with configuration events. The bar graph portion of the chart represents the configuration changes made at a particular time. You can click on a bar to list those changes.
Performing a deep dive analysis

Regardless of which tool you use, once you have isolated the bottleneck candidate and validated its root cause, you can collect more information to understand its origin. For example, you have identified a storage system as the bottleneck. Subsequently, you want to understand how the problem affects other resources or vice versa. This phase of the troubleshooting analysis is called the deep dive. In a deep dive analysis, you can compare the data of various components from the resource tree, which displays all the resources and their components in your infrastructure, and run a customized report against that data.

To proceed with the deep dive for information, launch the Data Center Analytics UI, which provides detailed reports at the component level. You can launch this component-level view from the following windows in the Infrastructure Analytics Advisor UI during analysis:

  • E2E view
  • Sparkline view
  • Performance tab of the Show detail window for a resource
  • Analyze Shared Resources
  • Analyze Related Changes

When analyzing system resources in Data Center Analytics, you can view performance charts based on various metrics to correlate components with resource performance. For example, you have validated the root cause of the storage system bottleneck, but you want to perform further analysis in Data Center Analytics.

The following figure examines the performance of the volume from the VM side. This report, LDEV IOPS versus Response Time, displays spikes at specific times, which you can then use as reference points for when the I/O activity was particularly intensive during otherwise typical workloads.

GUID-FB8EBBDD-E1F4-4D66-92D1-A3BD1706A746-low.png

Digging deeper, you discover the storage systems and volumes associated with a particular VM. You cross-reference the resources in the VM performance chart and determine that the component with the performance that correlates to the VM is the cache on the storage side, or (specifically, CLPR, or Cache Write Pending Rate). This workload is typically intensive, but you realize that the times when the resource reached 100% correlate with the spikes in the LDEV IOPS versus Response Time report. GUID-DC2DB9C0-EC86-4C22-B6E7-15AC7BDDA132-low.png

Often, the performance problem is a recurring trend; for example, when monitoring certain infrastructure resources, you notice spikes in I/O activity every weekday at 3 PM. When you create a customized report, you discover this trend has persisted for six months. (In theory, you can review performance from months to years.) This capability to review past performance adds a historical element to deep dive analysis.

Initiate recovery plan to solve the performance problem

After establishing the correlation between the two charts, you return to the Infrastructure Analytics Advisor UI to initiate a recovery plan. You can enter the key metric, date, and time of the problem occurrence, and the target value for the metric. In this case, the problem component is the CLPR; the key metric is IOPS. You can specify conditions, then review the recovery plan generated by Infrastructure Analytics Advisor before executing it.

GUID-43AC88B3-4878-4932-B063-28326FC8C379-low.png

After you have successfully executed the recovery plan, you can adjust your thresholds with new metric settings to monitor the user resources (in this case, the VM and the affected volume). At this stage, you have completed the MAPE loop.

 

  • Was this article helpful?