System requirements and sizing
The hardware, networking, and operating system requirements for running an HCI system with one or more instances.
Sizing guidance for Hitachi Content Search
Simple sizing
This table shows the minimum and recommended hardware requirements for each instance in an HCI running Hitachi Content Search.
Resource |
Minimum |
Recommended |
RAM |
16 GB |
32 GB |
CPU |
4-core |
8-core |
Available disk space |
50 GB |
500 GB |
- A large number of factors determine how many documents your system can index and how fast it can process them, including: the number of documents to be indexed; the contents of those documents; what search features (such as sorting) the index supports; the number of fields in the index; the number of users querying the system; and so on.
Depending on how you use your system, you might require additional hardware resources to index all the documents you want and at the rate you require.
- Each instance uses all available RAM and CPU resources on the server or virtual machine on which it's installed.
Detailed sizing
To determine the system size that you need:
Procedure
Determine how many documents you need to index.
Based on the number of documents you want to index, use the following tables to determine:
- How many instances you need
- How much RAM each instance needs
- The Index service configuration needed to support indexing the number of documents you want
Total documents to be indexed System configuration 15 million 25 million 50 milliona Total instances required: 1b
Instances running the Index service: 1
Index service configuration required:
- Shards per index: 1
- Index Protection Level per index: 1
- Container memory: 200MB greater than Heap settings
- Heap settings: Depends on instance RAM.
Instance RAM Heap setting 16 GB 1800m 32 GB 9800m 64 GB 25800m
16 GB 32 GB 64 GB Instance RAM needed (for each instance running the Index service) a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.
b Single-instance systems are suitable for testing and development, but not for production use.
Total documents to be indexed System configuration 45 million 75 million 150 milliona Total instances required: 4
Instances running the Index service: 3
Index service configuration required:
- Shards per index: 3
- Index Protection Level per index: 1
- Container memory: 200MB greater than Heap settings
- Heap settings: Depends on instance RAM.
Instance RAM Heap setting 16 GB 1800m 32 GB 9800m 64 GB 25800m
16 GB 32 GB 64 GB Instance RAM needed (for each instance running the Index service) a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.
Total documents to be indexed System configuration 75 million 125 million 250 milliona Total instances required: 8
Instances running the Index service: 5
Index service configuration required:
- Shards per index: 5
- Index Protection Level per index: 1
- Container memory: 200MB greater than Heap settings
- Heapb settings: Depends on instance RAM.
Instance RAM Heap setting 16 GB 7800m 32 GB 15800m 64 GB 31000m
16 GB 32 GB 64 GB Instance RAM needed (for each instance running the Index service) a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.
b With an 8-instance system, the Index service should be the only service running on each of its 5 instances. With the Index service isolated this way, you can allocate more heap space to the service than you can on a single or 4-instance system.
Total documents to be indexed System configuration 195 million 325 million 650 milliona Total instances required: 16
Instances running the Index service: 13
Index service configuration required:
- Shards per index: 13
- Index Protection Level per index: 1
- Container memory: 200MB greater than Heap settings
- Heapb settings: Depends on instance RAM.
Instance RAM Heap setting 16 GB 7800m 32 GB 15800m 64 GB 31000m
16 GB 32 GB 64 GB Instance RAM needed (for each instance running the Index service) a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.
b With a 16-instance system, the Index service should be the only service running on each of its 13 instances. With the Index service isolated this way, you can allocate more heap space to the service than you can on a single or 4-instance system.
For example, if you need to index up to 150 million documents, you need at minimum a 4-instance system with 64 GB RAM per instance.
Determine how fast you need to index documents, in documents per second.
For example:
- To index 100 million documents in 2 days, you need an indexing rate of 578 documents per second.
- To continuously index 1 million documents every day, you need an indexing rate of 12 documents per second.
Determine the base indexing rate for your particular dataset and processing pipelines:
- Install a single-instance HCI system with that has the minimum required hardware resources.
- Run a workflow with the pipelines you want and on a representative subset of your data.
- Use the workflow task details to determine the rate of documents processed per second.
To determine the number of cores you need per instance, replace Base rate in this table with the rate you determined in step 4.
Number of instances you need Cores per instance 4 (minimum required) 8 (recommended) 1 Base rate 70% Base rate 4 300% Base rate 500% Base rate 8 600% Base rate 900% Base rate More than 8 Contact Hitachi Vantara for guidance For example, if you had previously determined that:
- You need a 4-instance system.
- You need to process 500 documents per second.
- The base processing rate for your data and pipelines is 100 documents per second.
You need 8 cores per instance.
Multiply the number of instances you need times the number of cores per instances to determine the total number of cores that you need for your system.
After your system is installed, configure it with the index settings you determined in step 2.
For information on index shards, Index Protection Level, and moving the Index service, see the Administrator Help, which is available from the Admin App.
Sizing guidance for HCM
Minimum hardware requirements
If you are installing HCI to run HCM, each instance in the system must meet these minimum hardware requirements:
Documents per second | Cores | RAM (GB) | Disk (GB) |
Up to 1200 | 8 | 28 | 600 |
1200-1600 | 12 | 32 | 800 |
1600-2000 | 16 | 40 | 1000 |
2000-2400 | 18 | 48 | 1400 |
2400-2800 | 20 | 56 | 1700 |
2800-3200 | 24 | 64 | 2000 |
Determining number of instances
- Whether you need the system to remain highly available.
- The number of documents being produced by the HCP system you want to monitor. In this case, each document represents a single piece of data about the HCP system. A more active HCP system will produce more documents than a less active one.
- The total number of documents you want HCM to store.
Number of instances: simple procedure
If you're monitoring a typically-active HCP system (roughly 75 operations per second per node), you can use this table to determine the number of HCM instances you need. This table lists the number of HCM instances you need based on the number of nodes in your HCP system and the number of days that you want your HCM system to retain the data it receives from HCP.
If your system is more active, see Number of instances: detailed procedure.
HCP nodes | Data retention time on HCM | Instances needed |
Up to 8 | Up to 30 days |
1* |
Up to 8 | Up to 60 days |
3* |
Up to 16 | Up to 30 days | 4 |
Up to 24 | Up to 60 days | 8 |
*An HCM system must have a minimum of 4 instances to maintain high system availability. |
Number of instances: detailed procedure
Determine whether you need your HCM system to maintain high availability. If so, you need a minimum of 4 instances. For more information, see Single-instance systems versus multi-instance systems.
Determine the number of documents per second being produced by the HCP system you want to monitor. You can easily do this if you already have an HCM system up and running:
Go to the Monitor App:
https://system-hostname:6162
Add the HCP system as a source. For information, see the help that's available from the Monitor App.
Go to the HCI Admin App:
https://system-hostname:8000
Go to Workflows > Monitor App Workflow > Task > Metrics.
View the value for the Average DPS field.
TipLet the workflow run for a while to get a more accurate measure for the Average DPS field.
Select a time period.
Download the HCP Internal Logs for this time period. For more information, see the help that's accessible from the HCP System Management Console.
In the downloaded logs for each node, count the number lines logged during the selected time period.
Add the line value for each node and then divide the sum by the number of seconds in the time window you selected.
Use this table to determine the number of instances needed based on the number of documents per second produced by your HCP system.
Documents per second Instances needed ≤ 3,200
1
3,201 to 7,200
3
7,201-10,500*
4
*This is the maximum documents per second that HCM currently supports. Based on your data availability requirements, determine the number of instances you need.
Data availability requirement Index replicas needed Instances needed Impact on total documents stored No failure tolerance
1
1 None Survive 2 failed replicas
3
3 3x
Survive 3 failed replicas
4 4 4x
An index with multiple copies remains available in the event of an instance outage. For example, if an index has two copies stored on two instances and one of those instances fails, one copy of the index remains available for servicing requests.
Use this formula to determine the total number of documents your HCM system must be able to store:
documents per second from step 2.
x 3600 seconds in an hour
x 24 hours in a day
x number of days you want to store data (default is 30)
x Impact from the data availability table in step 4
.= Total document count
For example, if your HCP system produces 1500 documents per second, you want to store data for 30 days, and you want to maintain two copies of each index containing the stored data, your system must have enough instances to be able to store roughly 8 billion documents:
1500
x 3600
x 24
x 30
x 2
= 7,776,000,000
Use this table to determine the number of instances needed based on the total number of documents your HCM must store.
Total document count Instances needed 2 billion or less
1
6 billion or less
3
8 billion or less
4
Take the highest number of instances from steps 2, 3, and 6. That's the number instances you need.
Operating system and Docker requirements
To be an HCI instance, each server or virtual machine you provide:
- Must run a 64-bit Linux distribution
- Must have Docker version 1.13.1 or later installed
- Must be configured with IP and DNS addresses
Additionally, you should install all relevant patches on the operating system and perform appropriate security hardening tasks.
Suggested Docker version
This table shows the operating systems, as well as the Docker and SELinux configurations, on which this HCI system has been qualified for:
Operating system | Docker version | Docker storage configuration | SELinux setting |
CentOS 7.6 | Docker 18.03.1-ce | device-mapper | Enforcing |
CentOS 8.1.1911 | Docker 19.03.9-ce | overlay2 | Enforcing and Disabled |
Red Hat Enterprise Linux 8.1 | Docker 19.03.11-ce | overlay2 | Enforcing |
Ubuntu 18.04.4 LTS | Docker 18.03.1-ce | overlay2 | Enforcing |
Docker considerations
The Docker installation folder on each instance must have at least 20 GB available for storing the Docker images.
Make sure that the Docker storage driver is configured correctly on each instance before installing the product. After you install the product, to change the Docker storage driver you must reinstall the product. To view the current Docker storage driver on an instance, run:
docker info
Core dumps can fill a host's file system, which can result in host or container instability. Also, if your system uses the data at rest encryption (DARE) feature, encryption keys are written to the dump file. It's best to disable core dumps.
To enable SELinux on the system instances, you need to use a Docker storage driver that SELinux supports. The storage drivers that SELinux supports differ depending on the Linux distribution you're using. For more information, see the Docker documentation.
If you are using the Docker devicemapper
storage driver:
- Make sure that there's at least 40 GB of Docker metadata storage space available on each instance. The product needs 20 GB to install successfully and an additional 20 GB to successfully update to a later version.
To view Docker metadata storage usage on an instance, run:
docker info
- On a production system, do not run
devicemapper
inloop-lvm
mode. This can cause slow performance or, on certain Linux distributions, the product might not have enough space to run.
SELinux considerations
- You should decide whether you want to run SELinux on system instances and enable or disable it before installing additional software on the instance.
Enabling or disabling SELinux on an instance needs a restart of the instance.
To view whether SELinux is enabled on an instance, run:
sestatus
- To enable SELinux on the system instances, you need to use a Docker storage driver that SELinux supports.
The storage drivers that SELinux supports differ depending on the Linux distribution you're using. For more information, see the Docker documentation.
Networking
This topic describes the network usage and requirements for both system instances and services.
You can configure the network settings for each service when you install the system. You cannot change these settings after the system is up and running. If your networking environment changes such that the system can no longer function with its current networking configuration, you need to reinstall the system. See Handling network changes.
The HCI product uses both internal and external ports to operate its services and the system-internal ports do not have authentication or Transport Layer Security (TLS). At a minimum, use your firewall to make these ports accesible only to other instances in the system. If any users have root access to your system, your network and its systems are vulnerable to unauthorized use.
To secure your data and HCI system, you need to manually use iptables or firewalld to restrict ports to only local communications that the HCI installer otherwise leaves open. See System-internal ports and Example HCI firewall setup.
Additionally, you can use Internet Protocol Security (IPSec) or an equivalent to secure internode communications. Consult with your system administrator to configure your network with this added security.
Instance IP address requirements
All instance IP addresses must be static. This includes both internal and external network IP addresses, if applicable to your system.
Network types
Each of the HCI services can bind to one type of network, either internal or external, for receiving incoming traffic. If your network infrastructure supports having two networks, you might want to isolate the traffic for most system services to a secured internal network that has limited access to avoid critical security risks to your data and system. You can then leave only the Search-App and Admin-App services on your external network for user access.
You can use either a single network type for all services or a mix of both types. To use both types, every instance in your system must be addressable by two IP addresses: one on your internal network and one on your external network. If you use only one network type, each instance needs only one IP address.
Allowing access to external resources
Regardless of whether you're using a single network type or a mix of types, you need to configure your network environment to ensure that all instances have outgoing access to the external resources you want to use.
This includes:
- The data sources where your data is stored.
- Identity providers for user authentication.
- Email servers that you want to use for sending email notifications.
- Any external search indexes (for example, HDDS indexes) that you want to make accessible through HCI.
Ports
Each service binds to a number of ports for receiving incoming traffic. Before installing HCI, you can configure the services to use different ports, or use the default values shown in the following tables.
Port values can be reconfigured during system installation, so your system might not use the default values. You cannot change service port values when the system is up and running.
To view the ports that your system is using, view the Network tab for each service your system runs (Services > service-name
> Network).
The HCI product uses both internal and external ports to operate its services and the system-internal ports do not have authentication or Transport Layer Security (TLS). At a minimum, use your firewall to make these ports accesible only to other instances in the system. If any users have root access to your system, your network and its systems are vulnerable to unauthorized use.
To secure your data and HCI system, you need to manually use iptables or firewalld to restrict ports to only local commnuications that the HCI installer otherwise leaves open. See System-internal ports and Example HCI firewall setup.
Additionally, you can use Internet Protocol Security (IPSec) or an equivalent to secure internode communications. Consult with your system administrator to configure your network with this added security.
System-external ports
The following table contains information about the service ports that are used to interact with the system.
On every instance in the system, each of these ports:
- Must be accessible from any network that needs administrative or search access to the system.
- Must be accessible from every other instance in the system.
/<installation-directory>/config/cluster.config
Default Port Value | Service | Purpose |
6162 | Monitor-App |
Access to the HCM application, which is used to monitor the health of HCP systems. WARNINGThe Monitor-App service will not function properly if it is assigned a port value lower than 1024.
|
8000 |
Admin-App |
Access to administrative interfaces:
|
8888 | Search-App |
Access to search interfaces:
|
System-internal ports
This table lists the ports used for intra-system communication by the services. On every instance in the system, each of these ports:
- Must be accessible from every other instance in the system.
- Should not be accessible from outside the system.
You can find more information on how these ports are used in the documentation for the third-party software underlying each service.
Default Port Value | Used By | Purpose |
2181 |
Synchronization service |
Synchronization service client port. |
2888 |
Synchronization service |
Synchronization service internal communication. |
3888 |
Synchronization service |
Synchronization service leader election. |
4040 |
Workflow jobs |
Spark UI port. |
5001 |
Admin-App service | Debug port for Admin-App service. |
5005 |
Workflow jobs |
The port to use for debugging the job driver. |
5008 |
Workflow jobs |
The port to use for debugging the job executor. |
5002 |
Search-App service | Debug port used by the Search-App service. |
5003 |
Index service | Debug port used by the Index service. |
5050 |
Cluster-Coordination service | Primary port for communicating with Cluster-Coordination. |
5051 |
Cluster-Worker service | Primary port for communicating with Cluster-Worker. |
5123 |
Monitor-App service |
The debug port used by the Monitor App. |
5555 |
Watchdog service | Port for JMX connections to Watchdog service. |
5601 |
Dashboard service |
Primary port for communicating with the Dashboard service. |
6175 |
Monitor-App service |
The port used by the Monitor App for graceful shutdowns. |
7000 |
Database service |
TCP port for commands and data. |
7199 |
Database service |
Port for JMX connections to Database service. |
7203 |
Message Queue service |
Port for JMX connections to Message Queue service. |
8005 |
Admin-App service |
Port used by Admin-App for graceful shutdowns. |
8006 |
Search App service | Port used by the Search App service for graceful shutdowns. |
8080 |
Service-Deployment service | Primary port for communicating with Service-Deployment. |
8081 |
Scheduling service | Primary port for communicating with the Scheduling service. WARNINGIf you change the port number for the Scheduling service, in order for the changes to take effect, you will need to restart HCI.service on all system nodes. |
5007 |
Sentinel service |
Debug port used by Sentinel service. |
8007 |
Sentinel service |
Port used by the Sentinel service for graceful shutdowns. |
8889 |
Sentinel service |
Primary port for communicating with Sentinel. |
8893 | Monitor-App service | Port used for the Monitor App Analytics functionality. |
8983 |
Index service |
Primary port used to communicate with the Index service. WARNINGThe port assigned to the Index service should not be below 1024.
|
9042 |
Database service |
Primary port for communicating with the Database service. |
9091 |
Network-Proxy service | Primary port for communicating with Network-Proxy. |
9092 |
Message Queue service |
Primary port for communicating with Message Queue service. |
9200 |
Metrics service |
Port used to communicate with the Metrics service cluster. |
9201 |
Metrics service |
Port used to communicate with an individual Metrics service node. |
9301 |
Metrics service |
Port that nodes in the Metrics service cluster should use when communicating with each other. |
9600 |
Logging service |
Primary port for communicating with Logging service. |
9601 |
Logging service |
The port used to receive syslog messages. |
10000 |
Index service |
Port used by the Index service for graceful shutdowns. |
15050 |
Cluster-Coordination service |
Cluster-Coordination internal communication |
18000 |
Admin-App service |
Admin-App internal communication. |
18080 |
Service-Deployment service |
Service-Deployment internal communication |
18889 |
Sentinel service |
Sentinel service internal communication. |
31000-34000 |
Cluster-Coordination and Cluster-Worker services | High ports used by both Mesos and Docker. |
System ports for Monitor-App
Monitor-App signal | Port Type | Port Number |
Node Status |
TCP | 443 (or 80 if not using SSL) inbound to HCP |
MAPI | TCP | 9090 inbound to HCP |
SNMP | TCP/UDP | 161 inbound to HCP |
Syslog | UDP | 9601 (the default listener port of Monitor-App) inbound to the HCM node |
Time source
If you are installing a multi-instance system, each instance should run NTP (network time protocol) and use the same external time source. For information, see support.ntp.org.
Supported browsers
The HCI web applications support these web browsers:
- The latest version of Google Chrome
- The latest version of Mozilla Firefox
- The latest version of Microsoft Edge
File ownership considerations
Within some of the Docker containers on each system instance, file ownership is assigned to this user and group:
- User: hci, UID: 10001
- Group: hci, GID: 10001
When you view such files in the instance operating system
(for example, by running ls -l
), the files appear to be owned by an
unknown or undefined user and group. Typically, this causes no issues.
However, if you run applications on the system instances that change file ownership (for example, security hardening scripts), changing the ownership of files owned by the hci user and group can cause the system to become unresponsive.
To avoid these issues:
- Create the expected user and group on each instance:
sudo groupadd hci -g 10001
sudo useradd hci -u 10001 -g 10001
- Configure your applications to not change the ownership of files owned by the hci user and group.