This chapter introduces Hitachi Content Intelligence (HCI) and its main use cases: Hitachi Content Search and Hitachi Content Monitor (HCM).
A single HCI system can be installed for only one of these use cases.
About Hitachi Content Search
Hitachi Content Intelligence (HCI) powers Hitachi Content Search, a full-fledged search and data processing solution. It handles all steps in making your data searchable, regardless of where that data lives or what formats it's in. HCI also gives users tools for examining, understanding, normalizing, migrating, and editing their data.
You manage how the system scales by adding or removing instances to the system and also by specifying which services run on those instances.
An instance is a server or virtual machine on which the software is running. A system can have either a single instance or multiple instances. Multi-instance systems have a minimum of four instances.
A system with multiple instances maintains higher availability in the event of instance failures. Additionally, a system with more instances can run tasks concurrently and can typically process tasks faster than a system with fewer or only one instance.
A multi-instance system has two types of instances: master instances, which run an essential set of services, and non-master instances, which are called workers.
Each instance runs a configurable set of services, each of which performs a specific function. For example, the Metadata Gateway service stores metadata persistently.
In a single-instance system, that instance runs all services. In a multi-instance system, services can be distributed across all instances.
Single-instance systems vs. multi-instance systems
A system can have a single instance or can have multiple instances (four or more).
- Every instance must meet the minimum RAM, CPU, and disk space requirements.
- Three instances are sufficient to perform leader election for distributing work. However, a multi-instance system needs a minimum of four instances because, with the minimum hardware requirements, three instances are not sufficient for running all HCI services at their recommended distributions.
- Hitachi Vantara has qualified HCI systems with up to 16 instances.
A single-instance system is useful for testing and demonstration purposes. It needs only a single server or virtual machine and can perform all product functionality.
However, a single-instance system has these drawbacks:
Only a single point of failure. If the instance hardware fails, you lose access to the system.
With no additional instances, you cannot choose where to run services. All services run on the single instance.
A multi-instance system is suitable for use in a production environment because it offers these advantages over a single-instance system:
- You can control how services are distributed across the multiple instances, providing improved service redundancy, scale out, and availability.
- A multi-instance system can survive instance outages. For example, with a four-instance
system running the default distribution of services, the system can lose one instance and
still remain available.Note For a search index to survive an instance outage:
- The system must have at least two instances running the Index service.
- The Index Protection Level for the index must be at least 2.
For more information, see the HCI Administrator Help, which is available in the Admin App.
Performance is improved as work can be performed in parallel across instances.
You can add additional instances to the system at any time.
By adding additional instances to a single-instance system, your system still has only one master instance, meaning there is still a single point of failure for the essential services that only a master instance can run.
For information about adding instances to an existing HCI system, see the Content Intelligence Administrator Help, which is available from the Admin App.
Two-instance systems are a viable option for the HCM use case, but not recommended for Hitachi Content Search.
Three-instance systems should have only a single master instance. If you deploy a three-instance system where all three instances are masters, the system might not have enough resources to do much beyond running the master services.
About master and worker instances
Master instances are special instances that run an essential set of services, including:
- Admin-App service
- Cluster-Coordination service
- Synchronization service
- Service-Deployment service
Non-master instances are called workers. Workers can run any services except for those listed previously.
Single-instance systems have one master instance while multi-instance systems have either one or three master instances.
Services perform functions essential to the health or functionality of the system. For example, the Metrics service stores and manages system events, while the Watchdog service ensures that other services remain running. Internally, services run in Docker containers on the instances in the system.
Services are grouped into these categories depending on what actions they perform:
- Services: Enable product functionality. For example, the Index service performs functions that allow the system to be used to search for data. You can scale, move, and reconfigure these services.
- System services: Maintain the health and availability of the system. You cannot scale, move, or reconfigure these services.
Some System services run only on master instances.
Some services are classified as applications. These are the services with which users interact. Services that are not applications typically interact only with other services.
Services run on instances in the system. Most services can run simultaneously on multiple instances. That is, you can have multiple instances of a service running on multiple instances in the system. Some services run on only one instance.
Each service has a best and required number of instances on which it should run.
You can configure where Hitachi Content Intelligence services run, but not system services.
Some services can have multiple service instance types. That is, a service can run on two system instances, but those two service instances can perform different functions from one another.
If a service supports floating, you have flexibility in configuring where new instances of that service are started when service instances fail.
Non-floating (or persistent) services run on the specific instances that you specify. If one of those service instances fails, the system does not automatically bring up a new instance of that service on another system instance.
With a service that supports floating, you specify a pool of eligible system instances and the number of service instances that should be running at any time. If a service instance fails, the system brings up another one on one of the system instances in the pool that doesn't already have an instance of that service running.
For services with multiple types, the ability to float can be supported on a per-type basis.
Each service binds to a number of ports and to one type of network, either internal or external. Networking for each service is configured during system installation and cannot be changed after a system is running.
Services can use volumes for storing data.
Jobs are operations that services run to typically perform transient work. Like services, jobs are run in Docker containers on system instances. However when the job completes its work, its container exits.
Jobs are run by services; you cannot start or stop them yourself on demand, but you can schedule the times when they are allowed to run and specify which instances in the system that they are allowed to run on.
Beginning with release 1.3, each HCI workflow is associated with a job. Running the workflow causes its job to run and process documents.
Jobs are grouped into job types. All jobs in a type share the same default configuration settings. New jobs inherit their settings from their job type. However, each job in a type can be configured with settings different from the job type default settings.
HCI has a single type of job, the Workflow-Agent job type. Jobs of this type are run to perform:
- A single workflow task.
- A pipeline test.
- A workflow test.
- Tasks to restart workflow failures.
You can configure storage usage for jobs by associating volumes with job types.
-aoption, as it may cause unintended performance and functionality issues.
Volumes are properties of services that specify where and how a service stores its data.
You can use volumes to configure services to store their data in external storage systems, outside of the system instances. This allows data to be more easily backed up or migrated.
Volumes can also allow services to store different types of data in different locations. For example, a service might use two separate volumes, one for storing its logs and the other for storing all other data.
In this example, service A runs on instance 101. The service's Log volume stores data in a folder on the system instance and the service's Data volume stores data in an NFS mount.
Depending on how they are created and managed, volumes are separated into these groups:
- System-managed volumes are created and managed by the system. When you deploy the system, you can specify the volume driver and options that the system should use when creating these volumes.
After the system is deployed, you cannot change the configuration settings for these volumes.
- User-managed volumes can be added to services and job types after the system has been deployed. These are volumes that you manage; you need to create them on your system instances before you can configure a service or job to use them.NoteAs of release 1.3.0, none of the built-in services support adding user-managed volumes.
When configuring a volume, you specify the volume driver that it should use. The volume driver determines how and where data is stored.
Because services run in Docker containers on instances in the system, volume drivers are provided by Docker and other third-party developers, not by the system itself. For information about volume drivers you can use, see the applicable Docker or third-party developer's documentation.
By default, all services do not use volume drivers but instead use the bind-mount setting. With this setting, data for each service is stored within the system installation folder on each instance where the service runs.
For more information on volume drivers, see the Docker documentation.
- local: The default Docker volume driver
- local-persist: A Docker volume driver plugin available from https://github.com/CWSpear/local-persist
You can update system software by installing an update package through the System Management application. For more information, see the System Management Help, which is accessible from the System Management application.
Update consists of multiple steps and might take several hours to complete. During this time:
- Multiple varities of Loading and Reconnecting messages will appear.
- The window or its progress might appear stalled or stuck.
- Severe and Warning events might occur.
This is typical update and deployment behavior. You will be notified when the process has officially completed.
- Hitachi Vantara does not provide updates or security fixes for the host operating systems running on HCI instances.
- During Update and Deployment, if you're installing the Monitor-App service, each specific signal needs a different set of ports and protocol. For reference on which ports to use, see System ports for Monitor-App.