Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Typical Content Software for File configuration

Product configuration is described for:

Backend hosts

In a typical Content Software for File system configuration, the backend hosts access the network function in two different methods:

  • Standard TCP/UDP network for management and control operations.
  • High-performance network for data-path traffic.
NoteTo run both functions on the same physical interface, contact your customer support representative.

The high-performance network used to connect all the backend hosts must be DPDK-based. This internal Weka network also requires a separate IP address space (see Network Planning and Network Configuration). For this, the Weka system maintains a separate ARP database for its IP addresses and virtual functions and does not use the kernel or operating system ARP services.

Backend hosts with DPDK-supporting Mellanox and Intel E810 NICs

For backend hosts equipped with DPDK-supporting Mellanox (CX-4 or newer) and Intel E810 NICs, the following conditions must be met:

  • Mellanox OFED must be installed and loaded.
  • There is no need to use SR-IOV, so the number of IPs allocated to the backend hosts on the internal network should be the total number of backend hosts, i.e., 8 IPs for 8 backend hosts (using the example above).
NoteSR-IOV enablement in the hardware is optional. If enabled, DPDK generates its own MAC addresses for the VFs (Virtual Functions) of the NIC and the same NIC can support multiple MAC addresses, some handled by the operating system and others by the Weka system.

Backend hosts with DPDK-supporting the other NICs

For backend hosts equipped with DPDK-supporting the other NICs, the following conditions must be met:

  • A driver with DPDK support must be installed and loaded.
  • SR-IOV must be enabled in the hardware (BIOS + NIC).
  • The number of IPs allocated to the backend hosts on the internal network should be the total number of Weka software processes plus the total number of backend hosts. For example, a cluster consisting of 8 machines running 10 Weka processes each requires 88 (80 + 8) IPs on the internal network. The IP requirements for the Weka clients are outlined below in the Client Hosts section.

Client hosts

Unlike Content Software for File backend nodes that must be DPDK/SR-IOV based, the Content Software for File client hosts (application servers) can use either DPDK-based or UDP modes. The DPDK mode is the natural choice for the newer, high-performing platforms that support it.

Client hosts with DPDK-supporting Mellanox and Intel E810 NICs

For client hosts equipped with DPDK-supporting Mellanox (CX-4 or newer) and Intel E810 NICs, the following conditions must be met:

  • Mellanox OFED must be installed and loaded.
  • There is no need to use SR-IOV, so the number of IPs allocated to the client hosts on the internal network should be the total number of client hosts, i.e., 10 IPs for 10 client hosts (using the example above).

Client hosts with DPDK-supporting the other NICs

For client hosts equipped with DPDK-supporting the other NICs, the following conditions must be met to use the DPDK mode:

  • A driver with DPDK support must be installed and loaded.
  • SR-IOV must be enabled in the hardware (BIOS + NIC).
  • The number of IPs allocated to the Intel client hosts on the internal network should be the total number of Weka system FrontEnd (FE) processes (typically no more than 2 per host) plus the total number of client hosts. For example, 10 client hosts with 1 FE process per client require 20 IPs (10 FE IPs + 10 IPs). ‌

Client hosts in UDP mode

The UDP mode is available for legacy clients lacking SR-IOV or DPDK support, or where there is no requirement for low latency, high throughput IO.

For client hosts in the UDP mode, the following conditions must be met:

  • The native driver must be installed and loaded.
  • The number of IPs allocated to the client hosts on the internal network should be equal to the total number of client hosts. For example, 10 client hosts in the UDP mode require 10 IPs on the internal network.

High availability (HA)

For HA support, the Content Software for File system must be configured with no single component representing a single point of failure. Multiple switches are required, and hosts must have one leg on each switch.

HA for hosts is achieved either through the implementation of two network interfaces on the same host or via LACP (ethernet only, modes 1 and 4). Using a non-LACP approach sets a redundancy that enables the Weka software to utilize two interfaces for HA and bandwidth, respectively.

HA performs failover and failback for reliability and load balancing on both interfaces and is operational for both Ethernet and InfiniBand. If not using LACP, it requires doubling the number of IPs on both the host and the IO nodes.

When working with HA networking, it is useful to hint the system (using the label parameter in weka cluster host net add command to identify the switch a network port is connected to) to send data between hosts through the same switch rather than using the ISL or other paths in the fabric. This can reduce the overall traffic in the network.

NoteLACP is currently supported between ports on a single Mellanox NIC, and is not supported when using VFs.

RDMA and GPUDirect storage

GPUDirect Storage enables a direct data path between storage and GPU memory. GPUDirect Storage avoids extra copies through a bounce buffer in the CPU’s memory. It allows a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory without burdening the CPU or GPU.

When enabled, the Content Software for File system automatically utilizes the RDMA data path and GPUDirect Storage in supported environments. When the system identifies it can use RDMA, both in UDP and DPDK modes, it utilizes the use for workload it can benefit from RDMA (with regards to IO size: 32K+ for reads and 256K+ for writes).

Using RDMA/GPUDirect Storage, it is thus possible to get a performance gain. You can get much higher performance from a UDP client (which does not require dedicating a core to the Content Software for File system), get an extra boost for a DPDK client, or assign fewer cores for the Content Software for File system in the DPDK mode to get the same performance.

Limitations

For the RDMA/GPUDirect Storage technology to take into effect, the following requirements must be met:

  • All the cluster hosts must support RDMA networking
  • For a client host:
    • GPUDirect Storage - the IB interfaces added to the Nvidia GPUDirect configuration should support RDMA
    • RDMA - all the NICs used by Weka must support RDMA networking
  • Encrypted filesystems: The framework will not be utilized for encrypted filesystems and will fall back to work without RDMA/GPUDirect for IOs to encrypted filesystems
  • A NIC is considered to support RDMA Networking if the following requirements are met:
    • For GPUDirect Storage only: InfiniBand network
    • Mellanox ConnectX5 or ConnectX6
    • OFED 4.6-1.0.1.1 or higher
      • For GPUDirect Storage: install with --upstream-libs and --dpdk
NoteGPUDirect Storage completely bypasses the kernel and does not utilize the page cache. Standard RDMA clients still utilize the page cache.
NoteRDMA/GPUDirect Storage technology is not supported when working with a cluster with mixed IB and Ethernet networking.

Running weka cluster nodes will indicate if the RDMA is utilized, for example:

# weka cluster nodes
NODE ID      HOST  ID     ROLES        NETWORK
NodeId: 0    HostId: 0    MANAGEMENT   UDP 
NodeId: 1    HostId: 0    FRONTEND     DPDK / RDMA 
NodeId: 2    HostId: 0    COMPUTE      DPDK / RDMA 
NodeId: 3    HostId: 0    COMPUTE      DPDK / RDMA 
NodeId: 4    HostId: 0    COMPUTE      DPDK / RDMA 
NodeId: 5    HostId: 0    COMPUTE      DPDK / RDMA
NodeId: 6    HostId: 0    DRIVES       DPDK / RDMA 
NodeId: 7    HostId: 0    DRIVES       DPDK / RDMA
NoteGPUDirect Storage is auto-enabled and detected by the system. To enable/disable RDMA networking altogether on the cluster or a specific client, contact the customer support team.