Skip to main content
Hitachi Vantara Knowledge

Data connection types and settings

By default, your system includes data connections for accessing data sources.

You can write your own data connection plugins to allow the system to connect to different types of data sources. When you do this, you define both the required and optional configuration settings for that data connection type.

NoteData connections pointing to index collections (Query or JDBC connectors) will not scan for updated documents if used as workflow inputs.

Box data connection (Preview Mode)

ImportantThis connector is being released in preview mode. We do not recommend using it in a production environment, but welcome feedback on functionality and any issues you encounter.

This data connection allows the system to access files available to all Box Enterprise users to provide HCI Search App capabilities on the crawled and indexed documents. This connector crawls files and folders and only the latest version of a file considered.

The connector presents the Box file system with the following hierarchy:

  • "All Files (Enterprise)"
  • All Enterprise Box user folders
  • Individual user files and folders
NoteCurrently, web links are not supported by the crawler.

Authentication

To set up your Box connector, you will need the OAuth 2.0 with JSON Web Token (JWT) from the Box Developer Console. For more information, see Setup with JWT.

List-based data connection

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
NameRequiredThe name for your Box data source.
DescriptionOptionalA description of the data source.
Authentication KeyRequiredThe OAuth 2.0 with JSON Web Token (JWT) provided to you by Box.

For more information, see Setup with JWT.

Use Proxy ServerOptional

Whether to use a proxy server to connect to the data source.

When enabled, you need to specify:

  • Name of the proxy server: The hostname for the proxy server. For example, proxy.example.com
  • Port number of the proxy server: The port number on which to connect to the proxy server.
  • Username for the proxy server: Username for the proxy server.
  • Password for the proxy server: Password for the proxy server.
Filter typeRequired

The filter to use when crawling Enterprise. Choose from the following filter types:

  • None: Crawl everything.
  • Whitelist: Crawl only Enterprise users in your whitelist. When selected, you must specify a list of the users.
  • Blacklist: Crawl everything except for Enterprise users in your blacklist. When selected, you must specify a list of the users.
Max Visited File SizeRequiredSets a maximum crawlable file size. Files larger than this size will be skipped.

The default value is 100 GB.

Supported actions

This data connection does not support any actions. It can only read and process documents from Box.

Document fields

The following document fields are populated with metadata for your Box connector:

NameData Type
HCI_createdDateMillisLong
HCI_createdDateStringString
HCI_displayNameString
HCI_doc_versionString
HCI_filenameString
HCI_idString
HCI_modifiedDateMillisLong
HCI_modifiedDateStringString
HCI_relativePathString
HCI_ownerString
HCI_sizeLong
HCI_URIString

Microsoft Sharepoint data connection (Preview Mode)

ImportantThis connector is being released in preview mode. We do not recommend using it in a production environment, but welcome feedback on functionality and any issues you encounter.

This data connection allows the system access to enterprise Microsoft Sharepoint files to provide HCI Search App capabilities on the crawled and indexed documents.

This connector only crawls files and folders. Some file types, such as OneNote notebooks, cannot be crawled and are therefore not processed.

The connector presents the Sharepoint file system with the following hierarchy:

  • "All Sites (Enterprise)"
  • All Sharepoint sites
  • All drives for the selected Sharepoint site
  • All user files and folders

Authentication

To utilize this connector, you must have an application ID, directory ID, and a secret key through your Microsoft admin account. For more information, see the Microsoft Azure Portal.

The application must also enable the following permissions:

  • Files.Read.All
  • Sites.Read.All
  • Directory.Read.All

List-based data connection

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
NameRequiredThe name for your Sharepoint data source.
DescriptionOptionalA description of the data source.
Application (client) IDRequiredUnique values generated and provided to you through your Microsoft Azure AD account.

For more information, see the Microsoft Azure Portal.

Client SecretRequired
Directory (tenant) IDRequired
Authority URLRequired

The URL that should be used to authenticate the application.

The default value is: https://login.microsoftonline.com
Use Proxy ServerOptional

Whether to use a proxy server to connect to the data source.

When enabled, you need to specify:

  • Name of the proxy server: The hostname for the proxy server. For example, proxy.example.com
  • Port number of the proxy server: The port number on which to connect to the proxy server.
Max Visited File Size (bytes)RequiredSets a maximum crawlable file size. Files larger than this size will be skipped.

The default value is 100 GB.

Supported actions

This data connection does not support any actions. It can only read and process documents from Microsoft Sharepoint.

Document fields

The following document fields are populated with metadata for your Microsoft Sharepoint connector:

NameData Type
driveItemIdString
driveIdString
ownerIdString
HCI_createdDateMillisLong
HCI_createdDateStringString
HCI_displayNameString
HCI_doc_versionString
HCI_filenameString
HCI_idString
HCI_modifiedDateMillisLong
HCI_modifiedDateStringString
HCI_relativePathString
HCI_siteNameString
HCI_sizeLong
HCI_URIString

Microsoft OneDrive for Business data connection

This data connection allows the system to access enterprise Microsoft files through OneDrive for Business to provide HCI Search App capabilities on the crawled and indexed documents.

This connector only crawls files and folders. Some file types, such as OneNote notebooks, cannot be crawled and are therefore not processed.

The connector presents the OneDrive file system with the following hierarchy:

  • All Files (Enterprise)
  • All OneDrive folders
  • All user files and folders

Authentication

To utilize this connector, you must have an application ID, directory ID, and a secret key through your Microsoft admin account. For more information, see the Microsoft Azure Portal.

The application must also enable the following permissions:

  • Files.Read.All
  • User.Read
  • User.Read.All
  • Directory.Read.All
  • People.Read.All

List-based data connection

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
NameRequiredThe name for your OneDrive data source.
DescriptionOptionalA description of the data source.
Application (client) IDRequiredUnique values generated and provided to you through your Microsoft Azure AD account.

For more information, see the Microsoft Azure Portal.

Client SecretRequired
Directory (tenant) IDRequired
Authority URLRequired

The URL that should be used to authenticate the application.

The default value is: https://login.microsoftonline.com
Use Proxy ServerOptional

Whether to use a proxy server to connect to the data source.

When enabled, you need to specify:

  • Name of the proxy server: The hostname for the proxy server. For example, proxy.example.com
  • Port number of the proxy server: The port number on which to connect to the proxy server.
Max Visited File Size (bytes)RequiredSets a maximum crawlable file size. Files larger than this size will be skipped.

The default value is 100 GB.

Filter typeRequired

The filter to use when crawling the file system. Choose from the following filter types:

  • None: Crawl everything.
  • Whitelist: Crawl only drives of users in your whitelist. When selected, you must specify a list of users.
  • Blacklist: Crawl everything except for the users in your blacklist. When selected, you must specify a list of users.

Supported actions

This data connection does not support any actions. It can only read and process documents from Microsoft OneDrive.

Document fields

The following document fields are populated with metadata for your OneDrive for Business connector:

NameData Type
driveItemIdString
driveIdString
ownerIdString
HCI_createdDateMillisLong
HCI_createdDateStringString
HCI_displayNameString
HCI_doc_versionString
HCI_filenameString
HCI_idString
HCI_modifiedDateMillisLong
HCI_modifiedDateStringString
HCI_ownerString
HCI_relativePathString
HCI_sizeLong
HCI_URIString

HCP (Hitachi Content Platform) data connection

This data connection allows access to files on a Hitachi Content Platform (HCP) system.

For information on how this data connection compares to other data connections that can access HCP, see Best practices for connecting to HCP.

HCP system requirements

In order for your system to connect to an HCP namespace, either the HTTP or HTTPS protocol must be enabled for the namespace.

NoteFiles in HCP namespaces can have multiple versions. With this data connection, a workflow task reads, processes, and indexes only the latest versions

Configuration settings

SettingRequired/OptionalDescription

HCP System Name

HCP Tenant Name

HCP Namespace Name

Required

Information about the HCP namespace to connect to.

You can find this information in the URL for an HCP namespace:

https://<hcp-namespace-name>.<hcp-tenant-name>.<hcp-system-name>
HCP Root DirectoryRequiredThe directory path to read. Use / (forward slash) to read all files in the namespace.
Use SSLRequired

Whether to use SSL to connect to the data source.

When this option is enabled, click Test at the bottom of the Add Data connection page to connect to the data source and retrieve its SSL certificate.

Use Proxy ServerOptional

Whether to use a proxy server to connect to the data source.

When enabled, you need to specify:

  • Name of the proxy server: The hostname for the proxy server. For example, proxy.example.com
  • Port number of the proxy server: The port number on which to connect to the proxy server.
HCP Authentication TypeRequiredThe type of authentication which should be used when connecting to an HCP system. Users can select either their local credentials or Active Directory credentials. The default value is Local.
User NameRequired

Username for an HCP tenant-level user account.

Tip: To access HCP anonymously, specify all_users as the user name and leave the password field blank.
PasswordRequiredPassword for the user account.

Supported actions

Action nameDescriptionConfiguration settingsHCP Permissions Required
Copy File

This action issues an HCP REST Put-Copy API request through to HCP, allowing users to copy objects between HCP namespaces.

This action can be used as a workflow output or in a pipeline Execute Action stage.

Note: Copy is possible only from within the same HCP system.
  • Source object: Specifies the object to be copied.

    The default is HCI_URI.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Base path: The subpath that will be prepended to the original document path.

    The default is none.

  • Copy Metadata: Choose whether customized metadata should be copied with documents.

    The default is NO.

read

write

Delete

For each document, the action deletes the corresponding object from HCP.

If versioning is enabled for the HCP namespace, this action removes only the current version of the object.

This operation does not affect objects under retention.

This operation does not delete folders from HCP.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

delete
Hold

For each document, the action applies an HCP retention hold value to the corresponding HCP object.

Hold values can be either true or false. When this value is true for an object, the object is on hold; it cannot be deleted, even by a privileged operation. Also, new versions of the object cannot be created.

  • Hold field name: The document field that contains the hold setting value that you want to apply to the corresponding HCP object. Valid values for this field are true or false.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

Tip: You can use a Tagging stage to add a retention hold field/value pair to your documents.

write

privileged

Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write Fileaction. The Delete action is executed only when both of these conditions are true:

The outputFile action is included as a workflow output, not as a pipeline Execute Action stage.

A document has an HCI_operation field with a value of DELETED.

This indicates that the corresponding HCP object was deleted from the namespace. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Apply Hold: When enabled, the action uses the value for the document's HCI_retentionHold field to determine what retention hold setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the hold setting for the resulting object.

  • Apply Shred: When enabled, the action uses the value for the document's HCI_shred field to determine what shred setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the shred setting for the resulting object.

  • Apply Retention: When enabled, the action uses the value for the document's HCI_retention field to determine what retention setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the retention setting for the resulting object.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Write Annotations: Enable this option to write already-existing custom metadata annotations along with the document being written by this action.

    When this option is enabled, correctly-named document streams are written to HCP objects as custom metadata annotations.

For a stream to be written as a custom metadata annotation, the stream name must start with HCP_customMetadata_ and the stream must have a metadata field named HCP_customMetadatawith the value of the annotation name. For example:

streams {
HCP_customMetadata_exampleAnnotation: 
HCP_customMetadata=exampleAnnotation
};

delete

write

privileged (for putting objects on hold)

Privileged DeleteSame as the regular Delete action, except that this action can delete objects under retention.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Reason for deletion field name: The document field that contains the reason why you are purging the object from HCP.

Tip: You can use a Tagging stage to add a reasonForDeletion field to your documents.

delete

privileged

Privileged Purge

For each document, the action deletes the corresponding HCP object and all of its versions.

This is the same as the regular Purge action except that this action can be performed on objects that are under retention.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Reason for deletion field name: The document field that contains the reason why you are purging the object from HCP.

    The value for this field must be from one through 1,024 characters long and can contain any valid UTF-8 characters, including white space.

Tip: You can use a Tagging stage to add a reasonForDeletion field to your documents.

delete

purge

privileged

Purge

For each document, the action deletes the corresponding HCP object and all of its versions.

This action does not affect objects under retention. To purge those objects, use the privileged purge action.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

delete

purge

Retention

For each document, this action applies an HCP retention setting to the corresponding object in HCP. An HCP object's retention setting determines whether the object is eligible for deletion.

When you edit the retention setting for an existing object, HCP allows you only to make the setting longer or more restrictive, not less.

For more information on HCP retention settings, see the HCP document Using a Namespace.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Retention field name: The name of a document field that contains an HCP retention value. Valid retention values are:
    • 0 or Deletion Allowed: The object can be deleted at any time.
    • -1 or Deletion Prohibited: The object cannot be deleted, except with a privileged operation.
    • -2 or Initial Unspecified: Specifies that the object doesn't yet have a retention setting.
    • The name of an HCP retention class, in this format:
      C+retention_class_name
    • A datetime value in this format:
      yyyy-MM-ddThh:mm:ssZ
      For example:
      2015-11-16T14:27:20-0500
Tip: Use the Tagging stage to add retention field/value pairs to documents.
write
Write Annotation

This action takes document streams and writes them as custom metadata annotations to existing HCP objects.

This action does not create new objects in HCP. That is, to write annotations to an HCP object, the object must already exist in HCP.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Write All Annotations (enabled): Enable this option to write already-existing custom metadata annotations along with the document being written by this action.

    When this option is enabled, correctly-named document streams are written to HCP objects as custom metadata annotations.

For a stream to be written as a custom metadata annotation, the stream name must start with HCP_customMetadata_ and the stream must have a metadata field named HCP_customMetadatawith the value of the annotation name. For example:

streams {
HCP_customMetadata_exampleAnnotation: 
HCP_customMetadata=exampleAnnotation
};
Tip: Disable this option if the stream you want to write did not originally exist in the document (that is, the stream was created by a stage in your pipeline).
  • Write All Annotations (disabled): The action writes only a single document stream as a custom metadata annotation.

    No naming or content requirements exist when writing a single custom metadata annotation.

    You can configure these settings for the single stream:

    • Annotation Stream: The document stream to write.
    • Write Single Annotation: The name for the custom metadata annotation in HCP. If this annotation does not exist for an HCP object, your system creates it.
write
Write File

For each document, the action writes the specified stream to an HCP object.

If the object exists and versioning is enabled for the HCP namespace, the system writes a new version of the object.

  • Apply Hold: When enabled, the action uses the value for the document's HCI_retentionHoldfield to determine what retention hold setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the hold setting for the resulting object.

  • Apply Shred: When enabled, the action uses the value for the document's HCI_shred field to determine what shred setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the shred setting for the resulting object.

  • Apply Retention: When enabled, the action uses the value for the document's HCI_retention field to determine what retention setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the retention setting for the resulting object.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Write Annotations: Enable this option to write already-existing custom metadata annotations along with the document being written by this action.

    When this option is enabled, correctly-named document streams are written to HCP objects as custom metadata annotations.

For a stream to be written as a custom metadata annotation, the stream name must start with HCP_customMetadata_ and the stream must have a metadata field named HCP_customMetadatawith the value of the annotation name. For example:

streams {
HCP_customMetadata_exampleAnnotation: 
HCP_customMetadata=exampleAnnotation
};

write

privileged (for putting objects on hold)

Authentication and action permissions

To configure this data connection, you need the username and password for a tenant-level user account on the HCP system. At a minimum, this user account must have read permission for the namespace you want your system to access.

To perform an action, the user account needs the correct permissions in HCP to perform that action. For example, the user account needs the delete permission to delete objects.

HCP object versioning

This data connection reads only the latest version of each HCP object.

Checking for updates with an HCP connector

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

For information on the Check for Updates setting, see Task settings.

How this data connection determines which file to perform an action on

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>//
<Base Path-from-action-config-(if-specified)>/
<Path field-from-action-config>/<Filename field-from-action-config>
Write File action example

This table shows an example of using the Write File action to copy an object named /sourceDir/file.txt from one HCP namespace to another.

Source data connection configurationDocument valuesDestination data connection configurationAction stage configurationFile written to
Name

sourceDataConnection

Type

HCP

HCP System Name

sourceHcp.example.com

HCP Tenant Name

sourceTenant

HCP Namespace Name

sourceNamespace

HCP Root Directory

/sourceDir

HCI_filename

file.txt

HCI_relativePath

/

Name

destinationDataConnection

Type

HCP

HCP System Name

destinationHcp.example.com

HCP Tenant Name

destinationTenant

HCP Namespace Name

destinationNamespace

HCP Root Directory

/destinationDir

Action Name

Write File

Data connection

destinationDataConnection

Stream

HCI_content

Filename field

HCI_filename

Path field

HCI_relativePath

Base Path

/writtenByHCI

HCP System Name

destinationHcp.example.com

HCP Tenant Name

destinationTenant

HCP Namespace Name

destinationNamespace

Filename and path

/destinationDir/writtenByHCI/file.txt

How this data connection populates HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

If this data connection is configured to read objects from the root folder (/) of a data source, the value for the HCI_relativePath field is relative to the root folder.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePathfield value for the document is logs/.

If you change the data connection to read from a specific folder (for example, /logs), the HCI_relativePath field value is relative to that folder. For example, in this case, the HCI_relativePath value for /logs/March.log will be /.

Setting up the HCP connector to perform a Copy File action across HCP tenants

The Copy File action across HCP tenants is only possible when using the HCP connector. To use it to complete this action, HCI users will need to create a new HCP data connection with the all_users user name and a blank password field.

ImportantWhile the HCP connector works across tenants, the HCP MQE connector only works in the same tenant, as the credentials are already provided. It can not be used to perform this action.

To set up an HCP connector to perform the Copy File action across HCP tenants:

Procedure

  1. Click Data Connections.

    The Add Data Connection button appears.
  2. In the Type dropdown, select HCP.

  3. Enter the HCP System Name.

  4. Enter the HCP Tenant Name.

  5. Enter the HCP Namespace Name.

    NoteHCP System Name, HCP Tenant Name, and HCP Namespace Name can all be found in the URL for the HCP namespace:
    https://<hcp-namespace-name>.<hcp-tenant-name>.<hcp-system-name>
  6. Enter the HCP Root Directory path.

  7. To use SSL to communicate with the HCP system, enable Use SSL.

  8. To use a proxy server to connect to the HCP system, enable Use Proxy Server.

  9. For User Name, enter all_users.

  10. For Password, leave the field blank.

    NoteThe combination of the all_users User Name and blank Password field allow for anonymous access to an HCP system. To perform actions, the all_users account must already have the applicable HCP data access permissions assigned to it.
  11. When you are finished, click Create.

HCP MQE (Hitachi Content Platform Metadata Query Engine) data connection

This connector allows your system to access objects in a Hitachi Content Platform (HCP) system. With this connector, the system uses the HCP metadata query engine (MQE) to submit operation-based requests for discovering new and changed files.

You can configure an HCP MQE data connection to access:

  • All objects in a single namespace folder.
  • All objects in a single namespace.
  • All objects in a tenant.
  • All objects in an HCP system.
NoteFor an HCP MQE data connection to be used as either a workflow output or in an Execute Action stage, the data connection must be configured to connect to a specific HCP namespace.

HCP system requirements

In order for your system to use this connector to access an HCP namespace:

  • The HTTP or HTTPS protocol must be enabled for the namespace.
  • The HCP system must be at version 5.0 or later.
  • The namespace must be enabled for search in HCP.

HCP MQE object versioning

When creating or editing an HCP MQE connector, the Track HCP Versions setting can be enabled to identify documents by their HCP versions. The version is then appended to the HCI_ID and HCI_URI fields on the document, helping to identify all versions of that document contained within HCP.

TipThe default value for this setting is No.

When this setting is set to True:

  • A new Boolean field (HCI_trackHcpVersion) is added to the document with a value of True. HCP delete operations are not processed but instead stored in the index just like create operations.
  • When the HCI_trackHcpVersion is set to True, HCP delete records are not processed but are instead stored in the index, similar to how create records are handled.
  • The HCI_deleted field for each document is set to False for all HCP create records and True for all other records.
  • If the HCI_URI document field contains version information, the HCP and HCP MQE connector's plugin APIs return data or metadata corresponding to that specific HCP version.

Additionally, the MQE connector will always add a new document field HCI_hcpDocVersion with Long data type and HCP Version as the field value.

NoteWith delete operations being stored instead of processed, indexes might incur a build up of versions that need manual cleanup.

Authentication and action permissions

To configure this data connection, you need the username and password for a tenant-level user account on the HCP system. At a minimum, this user account must have read permission for the namespace you want your system to access.

To perform an action, the user account needs to have the correct permissions in HCP to perform that action. For example, the user account requires the delete permission to delete objects.

Checking for updates with an MQE connector

This is a change-based connector, which means that a workflow task can submit requests directly to the data source to learn what files have changed.

With list-based connectors, such as the regular HCP data connection, the data connection needs to check a list of files that it has already read to determine whether a file in the data source has been read since it was last updated.

TipUse this data connection instead of the regular HCP data connection to connect to HCP namespaces that are continually being edited.

Configuration settings

SettingRequired/OptionalDescription
HCP System NameRequiredThe HCP system to connect to.
HCP Tenant NameOptional

The name of an HCP tenant.

If you omit both this and the HCP Namespace Name, the system reads all files in the HCP system.

Note: To read all files in the HCP system, the Use SSL setting must be enabled.

HCP Namespace NameOptional

The name of an HCP namespace.

If you omit this:

  • And the HCP Tenant Name field is empty, the system reads all objects in the HCP system.
  • And you've specified a tenant, the system reads objects from all namespaces in that tenant.
  • The data connection cannot be used as a workflow output or in an Execute Action stage.
Directories FilterOptional

A comma-separated list of directories from which the system should read files.

If you omit this, the system reads all files in the namespace.

Note: The directories you specify here are not used to determine the value for the HCI_relativePath field for each document.
Use SSLOptional

Whether to use SSL to connect to the data source.

When this option is enabled, click Test at the bottom of the Add Data connection page to connect to the data source and retrieve its SSL certificate.

Note: If you've configured this data connection to read all files in the HCP system, this setting must be enabled.
HCP Authentication TypeRequiredThe type of authentication which should be used when connecting to an HCP system. Users can select either their local credentials or Active Directory credentials. The default value is Local.
User NameRequiredUsername for an HCP tenant-level or system-level user account.
PasswordRequiredPassword for the user account.
Batch SizeOptionalThe number of documents to return per MQE request. The default is 500.
Customize the Query RangeOptionalWhen enabled, allows you to edit the time period for MQE requests to cover. For example, you can use this setting to process and index only the files that were changed between March 1, 2015 and April 1, 2015.
Custom query start time (millisec)OptionalNumber of milliseconds since January 1, 1970, 00:00:00 UTC. The systemreads only the files that were added or changed at or after this date and time.
Custom query end time (millisec)OptionalNumber of milliseconds since January 1, 1970, 00:00:00 UTC. The systemreads only the files that were added or changed at or before this date and time.
Include delete operationsOptional

When enabled, a workflow task processes files that were deleted from the HCP namespace. While this option is enabled, and while the workflow task is configured to scan for updates, this field value pair is added to deleted documents:

HCI_deleted:true

The deleted file is not removed from the index.

This setting is disabled by default.

Supported actions

Action nameDescriptionConfiguration settingsHCP Permissions Required
Copy File

This action issues an HCP REST Put-Copy API request through to HCP, allowing users to copy objects between HCP namespaces.

This action can be used as a workflow output or in a pipeline Execute Action stage. Additionally, the HCP MQE connector is only available if the connector is configured with a specific namespace.

Note: Copy is possible only from within the same HCP system.
  • Source object: Specifies the object to be copied.

    The default is HCI_URI.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Base path: The subpath that will be prepended to the original document path.

    The default is none.

  • Copy Metadata: Choose whether customized metadata should be copied with documents.

    The default is NO.

read

write

Delete

For each document, the action deletes the corresponding object from HCP.

If versioning is enabled for the HCP namespace, this action removes only the current version of the object.

This operation does not affect objects under retention.

This operation does not delete folders from HCP.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

delete
Hold

For each document, the action applies an HCP retention hold value to the corresponding HCP object.

Hold values can be either true or false. When this value is true for an object, the object is on hold; it cannot be deleted, even by a privileged operation. Also, new versions of the object cannot be created.

  • Hold field name: The document field that contains the hold setting value that you want to apply to the corresponding HCP object. Valid values for this field are true or false.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

Tip: You can use a Tagging stage to add a retention hold field/value pair to your documents.

write

privileged

Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write File action. The Delete action is executed only when both of these conditions are true:

The outputFile action is included as a workflow output, not as a pipeline Execute Action stage.

A document has an HCI_operation field with a value of DELETED.

This indicates that the corresponding HCP object was deleted from the namespace. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Apply Hold: When enabled, the action uses the value for the document's HCI_retentionHold field to determine what retention hold setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the hold setting for the resulting object.

  • Apply Shred: When enabled, the action uses the value for the document's HCI_shred field to determine what shred setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the shred setting for the resulting object.

  • Apply Retention: When enabled, the action uses the value for the document's HCI_retention field to determine what retention setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the retention setting for the resulting object.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Write Annotations: Enable this option to write already-existing custom metadata annotations along with the document being written by this action.

    When this option is enabled, correctly-named document streams are written to HCP objects as custom metadata annotations.

For a stream to be written as a custom metadata annotation, the stream name must start with HCP_customMetadata_ and the stream must have a metadata field named HCP_customMetadatawith the value of the annotation name. For example:

streams {
HCP_customMetadata_exampleAnnotation: 
HCP_customMetadata=exampleAnnotation
};

delete

write

privileged (for putting objects on hold)

Privileged DeleteSame as the regular Delete action, except that this action can delete objects under retention.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Reason for deletion field name: The document field that contains the reason why you are purging the object from HCP.
Tip: You can use a Tagging stage to add a reasonForDeletion field to your documents.

delete

privileged

Privileged Purge

For each document, the action deletes the corresponding HCP object and all of its versions.

This is the same as the regular Purge action except that this action can be performed on objects that are under retention.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Reason for deletion field name: The document field that contains the reason why you are purging the object from HCP.

    The value for this field must be from one through 1,024 characters long and can contain any valid UTF-8 characters, including white space.

Tip: You can use a Tagging stage to add a reasonForDeletion field to your documents.

delete

purge

privileged

Purge

For each document, the action deletes the corresponding HCP object and all of its versions.

This action does not affect objects under retention. To purge those objects, use the privileged purge action.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

delete

purge

Retention

For each document, this action applies an HCP retention setting to the corresponding object in HCP. An HCP object's retention setting determines whether the object is eligible for deletion.

When you edit the retention setting for an existing object, HCP allows you only to make the setting longer or more restrictive, not less.

For more information on HCP retention settings, see the HCP document Using a Namespace.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Retention field name: The name of a document field that contains an HCP retention value.

    Valid retention values are:

    • 0 or Deletion Allowed: The object can be deleted at any time.
    • -1 or Deletion Prohibited: The object cannot be deleted, except with a privileged operation.
    • -2 or Initial Unspecified: Specifies that the object doesn't yet have a retention setting.
    • The name of an HCP retention class, in this format:
      C+retention_class_name
    • A datetime value in this format:
      yyyy-MM-ddThh:mm:ssZ
      For example:
      2015-11-16T14:27:20-0500
Note: Use the Tagging stage to add retention field/value pairs to documents.
write
Write Annotation

This action takes document streams and writes them as custom metadata annotations to existing HCP objects.

This action does not create new objects in HCP. That is, to write annotations to an HCP object, the object must already exist in HCP.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Write All Annotations (enabled): Enable this option to write already-existing custom metadata annotations along with the document being written by this action.

    When this option is enabled, correctly-named document streams are written to HCP objects as custom metadata annotations.

For a stream to be written as a custom metadata annotation, the stream name must start with HCP_customMetadata_ and the stream must have a metadata field named HCP_customMetadatawith the value of the annotation name.

For example:

streams {

HCP_customMetadata_exampleAnnotation: HCP_customMetadata=exampleAnnotation

};

Note: Disable this option if the stream you want to write did not originally exist in the document (that is, the stream was created by a stage in your pipeline).

  • Write All Annotations (disabled): The action writes only a single document stream as a custom metadata annotation.

    No naming or content requirements exist for writing a single custom metadata annotation.

    You can configure these settings for the single stream:

    • Annotation Stream: The document stream to write.
    • Write Single Annotation: The name for the custom metadata annotation in HCP. If this annotation does not exist for an HCP object, your system creates it.
write
Write File

For each document, the action writes the specified stream to an HCP object.

If the object exists and versioning is enabled for the HCP namespace, the system writes a new version of the object.

  • Apply Hold: When enabled, the action uses the value for the document's HCI_retentionHold field to determine what retention hold setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the hold setting for the resulting object.

  • Apply Shred: When enabled, the action uses the value for the document's HCI_shred field to determine what shred setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the shred setting for the resulting object.

  • Apply Retention: When enabled, the action uses the value for the document's HCI_retention field to determine what retention setting to set on the resulting HCP object.

    When this option is disabled, the HCP namespace determines the retention setting for the resulting object.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Write Annotations: Enable this option to write already-existing custom metadata annotations along with the document being written by this action.

    When this option is enabled, correctly-named document streams are written to HCP objects as custom metadata annotations.

For a stream to be written as a custom metadata annotation, the stream name must start with HCP_customMetadata_ and the stream must have a metadata field named HCP_customMetadatawith the value of the annotation name. For example:

streams {
HCP_customMetadata_exampleAnnotation: 
HCP_customMetadata=exampleAnnotation};

write

privileged (for putting objects on hold)

How the data connection determines which file to perform an action on

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>//<Base Path-from-action-config-
(if-specified)>/<Path field-from-action-config>/<Filename field-from-action-config>
NoteThe HCP MQE data connection does not take into account the value for the Directories setting when configuring this location. All actions performed by the HCP MQE data connection are relative to the root directory (/).
Write File action example

This table shows an example of using the Write File action to copy an object named /sourceDir/file.txt from one HCP namespace to another.

Source data connection configurationDocument valuesDestination data connection configurationAction stage configurationFile written to
Name

sourceDataConnection

Type

HCP MQE

HCP System Name

sourceHcp.example.com

HCP Tenant Name

sourceTenant

HCP Namespace Name

sourceNamespace

Directories

/sourceDir

HCI_filename

file.txt

HCI_relativePath

/sourceDir

Name

destinationDataConnection

Type

HCP MQE

HCP System Name

destinationHcp.example.com

HCP Tenant Name

destinationTenant

HCP Namespace Name

destinationNamespace

Directories

/destinationDir

Action Name

Write File

Data connection

destinationDataConnection

Stream

HCI_content

Filename field

HCI_filename

Path field

HCI_relativePath

Base Path

/writtenByHCI

HCP System Name

destinationHcp.example.com

HCP Tenant Name

destinationTenant

HCP Namespace Name

destinationNamespace

Filename and path

/writtenByHCI/sourceDir/file.txt

How this data connection populates the HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

For this data connection, the value for the HCI_relativePath field is always relative to the root directory, regardless of the Directories Filter setting.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

HCP Anywhere (Single user) data connection

HCP Anywhere is the file synchronization and sharing system from Hitachi Vantara. This data connection allows your system to read files from a single user's HCP Anywhere folder.

To access all files on an entire HCP Anywhere system, use the HCP Anywhere (System-wide) data connection .

HCP Anywhere system requirements

For your system to be able to read data from HCP Anywhere:

  • The HCP Anywhere system must be at version 2.1.1 or later.
  • The data connection must use an HCP Anywhere user account that has access to the HCP Anywhere File Sync and Share API.

Checking for updates with an HCP Anywhere single user connector

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
HCP Anywhere System NameRequiredHostname for the HCP Anywhere system.
HCP Anywhere Root DirectoryRequiredThe folder path to read. Use / (forward slash) to read all files in user's HCP Anywhere folder.
Use Proxy ServerOptional

Whether to use a proxy server to connect to the data source.

When enabled, you need to specify:

  • Name of the proxy server: The hostname for the proxy server. For example, proxy.example.com.
  • Port number of the proxy server: The port number on which to connect to the proxy server.
User NameRequired

Username for the HCP Anywhere user whose folder you want Hitachi Content Intelligence to read.

Note: This user must have permission to use the HCP Anywhere File Sync and Share API.

PasswordRequiredPassword for the HCP Anywhere user.

Supported actions

The HCP Anywhere (Single user) data connection does not support any actions. The system can use it only to read files.

How an HCP Anywhere data connection populates HCI_relativePath

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

For this data connection, the value for the HCI_relativePath field is always relative to the root folder.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

HCP Anywhere (System-wide) data connection

HCP Anywhere is the file synchronization and sharing system from Hitachi Vantara. This data connection allows your system to read files from an entire HCP Anywhere system.

To access only a single user's files, see HCP Anywhere (Single user) data connection.

HCP Anywhere system and user requirements

For your system to be able to read data from HCP Anywhere, the HCP Anywhere system must be at version 4.0.0 or later and have the Device Management API enabled.

You also need the username and password for an HCP Anywhere user account that has the Administrator and Audit roles and has access to the HCP Anywhere Device Management API.

Checking for updates with an HCP Anywhere system connector

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
NameRequiredA name for the data connection.
DescriptionOptionalA description for the data connection.
HCP Anywhere System NameRequiredHostname or IP for the HCP Anywhere system.
Use Proxy ServerOptional

Whether to use a proxy server to connect to the data source.

When enabled, you need to specify:

  • Name of the proxy server: The hostname for the proxy server. For example, proxy.example.com.
  • Port number of the proxy server: The port number on which to connect to the proxy server.

The default is false.

User NameRequired

Username for an HCP Anywhere user. This user must have access to the Device Management API.

The user specified in this field performs all of the supported actions and owns all files in this data connection.

PasswordRequiredPassword for the HCP Anywhere admin user.
Filter typeRequired

The filter to use when crawling the HCP Anywhere server. Choose from the following filter types:

  • None: Crawl the entire HCP Anywhere system.
  • Whitelist: Crawl only file systems of users in your whitelist. When selected, you must specify a comma-separated list of users.
  • Blacklist: Crawl everything except for file systems of users in your blacklist. When selected, you must specify a comma-separated list of users.
Batch SizeRequired

The maximum number of files to retrieve from the server.

The default is 1000.

Max visited file size (bytes)Required

HCI will create documents only for files smaller than this limit.

The default is 107374182400 (100 GB).

Supported actions

Action nameDescriptionConfiguration settings
Delete

For each document, the system deletes the file from a user's file system.

Unshared folders take on a different folder structure when they become shared and do not get crawled.

This action is available only when the data connection is used by an Execute Action stage, not when it is included as a workflow output.

  • User: Either the name of the user who owns the file system, or the field in the document that contains the name of the user.
  • Filename field: The document field that contains the filename for the corresponding file in the data source.
  • The default is HCI_filename.
  • Relative path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Delete Empty Parent Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file. This is enabled by default.
Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write File action. The Deleteaction is executed only when both of these conditions are true:

The outputFile action is included as a workflow output, not as a pipeline Execute Action stage.

A document has an HCI_operation field with a value of DELETED.

This indicates that the corresponding file was deleted from the file system. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • User: Either the name of the user who owns the file system, or the field in the document that contains the name of the user.
  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Delete Empty Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file. This is enabled by default.
Write File

For each document, the action creates a file in an Anywhere file system.

Shared folders lose their top-level folder after it's written to a destination and their paths differ.

  • User: Either the name of the user who owns the file system, or the field in the document that contains the name of the user.
  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the AW data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.

How this data connection determines which file to perform an action on

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>/
<user-filesystem>/<Base Path-from-action-config-(if-specified)>/
<Relative path field-from-action-config>/<Filename field-from-action-config>
Write File action example

This table shows an example of using the Write File action to copy an object named /sourceDir/file.txt from one HCP Anywhere file system to another.

Source data connection configurationDocument valuesDestination data connection configurationAction stage configurationFile written to
Name

sourceDataConnection

Type

HCP Anywhere System-Wide

AW System Name

sourceAW.example.com

Base Directory

/sourceDir

HCI_filename

file.txt

HCI_relativePath

/sourceDir

Name

destinationDataConnection

Type

HCP Anywhere System-Wide

AW System Name

destinationAW.example.com

Base Directory

/destinationDir

Action Name

Write File

Data connection

destinationDataConnection

User

destinationUser

Stream

HCI_content

Filename field

HCI_filename

Path field

HCI_relativePath

Base Path

/writtenByHCI

AW System Name

destinationAW.example.

com

Filename and path

/destinationDir/wr

HCP for Cloud Scale Monitoring data connection (Preview Mode)

ImportantThis connector is being released in preview mode. We do not recommend using this in a production environment, but welcome feedback on functionality and issues.

HCP for Cloud Scale is a software-defined, massively scalable, object storage system from Hitachi Vantara. This data connection gathers monitoring details from a Cloud Scale system through the metrics of the Prometheus service for use in HCM.

It is created automatically by HCM once a Cloud Scale source is added.

Configuration settings

SettingRequired/OptionalDescription
NameRequiredThe name for your data source.
DescriptionOptionalA description for your data source.
HCP for Cloud Scale System to MonitorRequiredThe domain name of the system you intend on monitoring.
Monitor system via PrometheusOptionalCollect metrics using the Prometheus API on your system. The Prometheus service must be running on the system in order for this setting to function correctly.
HCP for Cloud Scale Prometheus PortRequired (if Monitor system via Prometheus is enabeled)The port on which the HCP for Cloud Scale system has configured the Prometheus service to run.
Time between checksRequired (if Monitor system via Prometheus is enabeled)The minimum interval (in seconds) between metrics collection attempts.

Amazon® S3 data connection

This data connection allows the system to access the Amazon Simple Storage Service (S3) on Amazon Web Services (AWS).

For information on using this data connection to migrate data from Amazon S3 to HCP, see Copying data from Amazon S3 to HCP.

Authentication

This data connection needs an access key ID and secret access key for an Amazon AWS account.

Checking for updates with an Amazon S3 connector

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
Amazon regionRequired

A list of Amazon S3 regions. Select the one that you want to connect to.

Note: If the region you want is not listed, use the S3 Compatible data connection instead of the Amazon S3 data connection. When configuring the S3 Compatible data connection, specify the applicable endpoint and authentication type for the region you want to connect to.
BucketRequiredThe name of the S3 bucket to connect to.
PrefixOptionalWhen specified, this data connection retrieves only the files whose names begin with this prefix.
Prefix delimiterRequired

The character or characters that the data source uses to separate a file prefixes into segments.

The default is / (forward slash).

Include user metadataRequiredWhen enabled, the systemretrieves any user-defined metadata for a file, in addition to the file's contents.
Use SSLOptionalWhether to use SSL to connect to the data source.
Use Proxy ServerOptional

Whether the system should use a proxy server to connect to the data source.

When enabled, you also need to specify the:

  • Name of the proxy server.
  • Port number of the proxy server.
  • Username for the proxy server.
  • Password for the proxy server.
Use STS AuthenticationOptional

Whether to use Amazon Web Services Security Token Service for authentication. When enabled, the system retrieves and uses temporary tokens to authenticate with the data source.

For more information, see the Amazon Web Services documentation.

STS session timeoutRequired if STS Authentication is enabledTime in seconds before the STS session expires. Valid values range from 900 (15 minutes) to 129600 (36 hours). The default is 900 seconds.
Access key IDRequired

One half of the access key that this data connection uses to authenticate with the data source.

For information on finding your Amazon AWS account access key ID, see the Amazon Web Services documentation.

Secret access keyRequired

One half of the access key that this data connection uses to authenticate with the data source.

For information on finding your Amazon AWS account secret access key, see the Amazon Web Services documentation.

Supported actions

Action nameDescriptionConfiguration settings
Delete

For each document, the system deletes the corresponding file from the data source.

This operation does not delete folders.

  • Filename field: The document field that contains the filename for the corresponding file in the S3 data source.

    The default is HCI_filename.

  • Relative Prefix Field: The document field that contains the prefix for the corresponding file in the S3 data source.

    The default is HCI_relativePath.

Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write File action. The Delete action is executed only when both of these conditions are true:

The Output File action is included as a workflow output, not as a pipeline Execute Action stage.

A document has an HCI_operationfield with a value of DELETED.

This indicates that the corresponding file was deleted from the data source. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Base Prefix: A string to prepend to the original file prefix (that is, the prefix specified in the Relative Prefix Field).
  • Filename field: The document field that contains the filename for the corresponding file in the S3 data source.

    The default is HCI_filename.

  • Include metadata: When enabled, metadata from the source document is copied along with the file contents to the destination. Use the Metadata name patterns to include option to specify which fields to copy.
  • Metadata name patterns to include: If the Include Metadata option is enabled, use this field to specify which metadata fields to include. Valid values are regular expressions.

    The default expression (^S3_userMetadata_.*) matches all metadata fields that start with S3_userMetadata_.

  • Relative Prefix Field: The document field that contains the prefix for the corresponding file in the S3 data source.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Stream Size: Optionally, the document field that contains size metadata for the stream specified in the Stream field.

    The default is HCI_size.

    You should specify a value for this field if possible because S3 write operations are more efficient when the file size is known upfront. However, if your documents do not include a field with the relevant information, leave this field blank.

Write FileFor each document, the system writes a new file to the data source.
  • Base Prefix: A string to prepend to the original file prefix (that is, the prefix specified in the Relative Prefix Field).
  • Filename field: The document field that contains the filename for the corresponding file in the S3 data source.

    The default is HCI_filename.

  • Include metadata: When enabled, metadata from the source document is copied along with the file contents to the destination. Use the Metadata name patterns to include option to specify which fields to copy.
  • Metadata name patterns to include: If the Include Metadata option is enabled, use this field to specify which metadata fields to include. Valid values are regular expressions.

    The default expression (^S3_userMetadata_.*) matches all metadata fields that start with S3_userMetadata_.

  • Relative Prefix Field: The document field that contains the prefix for the corresponding file in the S3 data source.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Stream Size: Optionally, the document field that contains size metadata for the stream specified in the Stream field.

    The default is HCI_size.

    You should specify a value for this field if possible because S3 write operations are more efficient when the file size is known upfront. However, if your documents do not include a field with the relevant information, leave this field blank.

How the data connection determines which file to perform an action on

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>//
<Base Prefix-from-action-config-(if-specified)>/
<Relative Prefix Field-from-action-config>/<Filename field-from-action-config>
Write File action example

This table shows an example of using the Write File action to copy an file named /sourceDir/file.txt from one S3 data source to another.

Source data connection configurationDocument valuesDestination data connection configurationAction stage configurationFile written to
Name

sourceDataConnection

Type

Amazon S3

Amazon Region

us-east-1

Bucket

sourceBucket

Prefix

sourceDir

Prefix Delimiter

/

HCI_filename

file.txt

HCI_relativePath

/

Name

destinationDataConnection

Type

S3 Compatible

S3 Endpoint

tenant1.hcp.example.com

Bucket

namespace1

Prefix

destinationDir

Prefix Delimiter

/

Action Name

Write File

Data connection

destinationDataConnection

Stream

HCI_content

Filename field

HCI_filename

Relative prefix field

HCI_relativePath

Base Prefix

/writtenByHCI

HCP System Name

hcp.example.com

HCP Tenant Name

tenant1

HCP Namespace Name

namespace1

Filename and path

/destinationDir/writtenByHCI/file.txt

How this data connection populates the HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

If this data connection is configured to read objects from the root folder (/) of a data source, the value for the HCI_relativePath field is relative to the root folder.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

If you change the data connection to read from a specific folder (for example, /logs), the HCI_relativePath field value is relative to that folder. For example, in this case, the HCI_relativePath value for /logs/March.log can be /.

Considerations - Amazon S3 connections

When you use this data connection to write files to Amazon S3, the data connection does not create empty files for representing directories.

For example, say that you write a file whose full path and name in the input data source is:

/patients/ChrisGreen/billing/bill-02-02-2015.txt

The Amazon S3 data connection writes only this file. It does not write any of these files:

/patients/
/patients/ChrisGreen/
/patients/ChrisGreen/billing/Accessing files in Amazon S3 from the Search App

After running a search in the Search App, users can select search result links to download files from your data sources. For users to be able to access files from Amazon S3, the files must have public read permissions.

S3 Compatible data connection

This connector allows your system to access the Amazon Simple Storage Service (S3) on Amazon Web Services (AWS), or any system (such as HCP) that gives an HTTP-based API that's compatible with the API used by Amazon (S3).

For information on:

NoteWhen used to read and write HCP objects, this data connection can read and write only the custom metadata annotations called .metapairs. This data connection cannot read or write any other custom metadata annotations for HCP objects.

Authentication

To access an S3 compatible data source, this data connection needs an access key ID and secret access key for an account on that data source.

Connecting to HCP over S3

You can use this data connection to read data from an HCP namespace.

For HCP namespaces with versioning enabled, this connector reads only the latest version of each object.

Connection requirements:

  • The HCP system must be at version 6.0 or later.
  • The Hitachi API for Amazon S3 protocol must be enabled for the namespaces that you want to connect to.
  • You need an HCP user account with read permission for the namespace you want to connect to. To perform actions, the user account must also have write and delete permissions.

To generate the access key ID and secret access key for an HCP user account, you need to base64-encode the account username and md5 hash the password. For example, run this command in a terminal window:

echo `echo -n <username> | base64`:`echo -n <password> | md5sum` | awk '{print $1}'

The command outputs an access key in this format:

<access-key-id>:<secret-access-key>
NoteIf a file was read from HCP using the S3 Compatible data connection, your users cannot access that file directly from the Search App. Instead, links in the Search App for such files point to the Namespace Browser page for the HCP namespace in which the files are stored.

Checking for updates with an S3 compatible connector

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingRequired/OptionalDescription
S3 EndpointRequired

The point of entry for the data source that you want to connect to.

When connecting to HCP, this is the name of the HCP tenant and system in this format:

<tenant-name>.<hcp-system-hostname>

For example:

financeTenant.hcp.example.com
BucketRequired

The name of the S3 bucket to connect to.

When connecting to HCP, this is the name of an HCP namespace.

PrefixOptionalWhen specified, this data connection retrieves only the files whose names begin with this prefix.
Prefix delimiterRequired

The character or characters that the data source uses to separate a file prefixes into segments.

The default is / (forward slash).

Include user metadataOptional

When enabled, the system retrieves any user-defined metadata for a file, in addition to the file's contents.

If this data connection connects to an HCP namespace, the system retrieves the custom metadata annotation named .metapairs for each object, if the annotation exists.

Include S3 object tagging metadataOptionalWhether to fetch object tagging metadata for each document.
Object tagging metadata prefixRequired if Include S3 object tagging metadata is enabledIf set, prepends the passed in string to the beginning of the object tag key when converting the S3 object tagging to HCI metadata fields.
Include S3 object lock metadataOptionalWhether to fetch object lock metadata for each document.
Use SSLOptionalWhether to use SSL to connect to the data source.
Use Proxy ServerOptional

Whether the system should use a proxy server to connect to the data source.

When enabled, you also need to specify the:

  • Name of the proxy server.
  • Port number of the proxy server.
  • Username for the proxy server.
  • Password for the proxy server.
Authentication TypeRequired

The process used to sign the access key that this data connection uses to connect to the data source.

Options are:

  • AWS v2 Authentication (default)
  • AWS v4 Authentication

For information on these signing processes, see the Amazon Web Services documentation.

Use STS AuthenticationOptional

Whether to use Amazon Web Services Security Token Service for authentication. When enabled, the system retrieves and uses temporary tokens to authenticate with the data source.

For more information, see the Amazon Web Services documentation.

STS session timeoutRequired if Use STS Authentication is enabledTime in seconds before the STS session expires. Valid values range from 900 (15 minutes) to 129600 (36 hours). The default is 900 seconds.
STS EndpointOptionalThe endpoint for the AWS Security Token Service (AWS STS)..
Access key IDRequiredOne half of the access key that this data connection uses to authenticate with the data source.
Secret access keyRequiredOne half of the access key that this data connection uses to authenticate with the data source.

Supported actions

Action nameDescriptionConfiguration settingsHCP Permissions Required
Delete

For each document, the system deletes the corresponding file from the data source.

This operation does not delete folders.

  • Filename field: The document field that contains the filename for the corresponding file in the S3 data source.

    The default is HCI_filename.

  • Relative Prefix Field: The document field that contains the prefix for the corresponding file in the S3 data source.

    The default is HCI_relativePath.

delete
Delete tagsPerforms an S3 operation to delete all tags from the specified object.
  • S3 Object Key field: The field that contains the S3 object key.
  • Use version id: If disabled, the tags for the latest version of the object will be deleted. If enabled, the tags will be deleted for the version of the object as specified by the "S3 object version id" property.
Output File

Depending on the state of the incoming document, executes either the Write File or Deleteaction.

This action usually executes the Write File action. The Delete action is executed only when both of these conditions are true:

The Output File action is included as a workflow output, not as a pipeline Execute Action stage.

A document has an HCI_operation field with a value of DELETED.

This indicates that the corresponding file was deleted from the data source. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Base Prefix: A string to prepend to the original file prefix (that is, the prefix specified in the Relative Prefix Field).
  • Filename field: The document field that contains the filename for the corresponding file in the S3 data source.

    The default is HCI_filename.

  • Include metadata: When enabled, metadata from the source document is copied along with the file contents to the destination. Use the Metadata name patterns to include option to specify which fields to copy.

    When this action is used to write objects to HCP, user-specified metadata is written to the destination objects as a custom metadata annotations called .metapairs.

  • Metadata name patterns to include: If the Include Metadata option is enabled, use this field to specify which metadata fields to include. Valid values are regular expressions.

    The default expression (^S3_userMetadata_.*) matches all metadata fields that start with S3_userMetadata_.

  • Relative Prefix Field: The document field that contains the prefix for the corresponding file in the S3 data source.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Stream Size: Optionally, the document field that contains size metadata for the stream specified in the Stream field.

    The default is HCI_size.

    You should specify a value for this field if possible because S3 write operations are more efficient when the file size is known upfront. However, if your documents do not include a field with the relevant information, leave this field blank.

delete

write

Set Legal HoldPerforms an S3 operation to set legal hold on the specified object.
  • S3 Object Key field: The field that contains the S3 object key.
  • S3 object version id: The field that contains the S3 object version ID.
  • Legal Hold: Legal hold prevents an object version from being deleted regardless of its retain until date.
Set RetentionPerforms an S3 operation to set retention on the specified object.
  • S3 Object Key field: The field that contains the S3 object key.
  • S3 object version id: The field that contains the S3 object version ID.
  • Retention Mode: In GOVERNANCE mode, users can't overwrite or delete an object version or alter its lock settings unless they have special permissions. In COMPLIANCE mode, a protected object version can't be overwritten or deleted by any user, including the root user.
  • Retention Period: Retain until date that protects the object version until the retention period expires.
Set TagsPerforms an S3 operation to set the configured metadata fields as object tags on the specified object.
  • Maintain Existing Unspecified Tags: This setting will send any tags which currently exist on the corresponding S3 object that aren't represented in the "Object tagging metadata prefix" property in order to maintain their values. If disabled, any tags not represented by metadata matching this property will be deleted from the object.
  • S3 Object Key field: The field that contains the S3 object key.
  • Object tagging metadata prefix: The prefix which should be used when looking for tags to set on the S3 object.
  • Use version id: If disabled, the tags for the latest version of the object will be set. If enabled, the tags will be set for the version of the object as specified by the "S3 object version id" property.
  • S3 object version id: The field containing the version ID of the object.
Write FileFor each document, the systemwrites a new file to the data source.
  • Base Prefix: A string to prepend to the original file prefix (that is, the prefix specified in the Relative Prefix Field).
  • Filename field: The document field that contains the filename for the corresponding file in the S3 data source.

    The default is HCI_filename.

  • Include metadata: When enabled, metadata from the source document is copied along with the file contents to the destination. Use the Metadata name patterns to include option to specify which fields to copy.

    When this action is used to write objects to HCP, user-specified metadata is written to the destination objects as a custom metadata annotations called .metapairs.

  • Metadata name patterns to include: If the Include Metadata option is enabled, use this field to specify which metadata fields to include. Valid values are regular expressions.

    The default expression (^S3_userMetadata_.*) matches all metadata fields that start with S3_userMetadata_.

  • Relative Prefix Field: The document field that contains the prefix for the corresponding file in the S3 data source.

    The default is HCI_relativePath.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Stream Size: Optionally, the document field that contains size metadata for the stream specified in the Stream field.

    The default is HCI_size.

    You should specify a value for this field if possible because S3 write operations are more efficient when the file size is known upfront. However, if your documents do not include a field with the relevant information, leave this field blank.

write

How the data connection determines which file to perform an action on

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>/
/<Base Prefix-from-action-config-(if-specified)>/
<Relative Prefix Field-from-action-config>/<Filename field-from-action-config>
Write File action example

This table shows an example of using the Write File action to copy an file named /sourceDir/file.txt from one S3 data source to another.

Source data connection configurationDocument valuesDestination data connection configurationAction stage configurationFile written to
Name

sourceDataConnection

Type

Amazon S3

Amazon Region

us-east-1

Bucket

sourceBucket

Prefix

sourceDir

Prefix Delimiter

/

HCI_filename

file.txt

HCI_relativePath

/

Name

destinationDataConnection

Type

S3 Compatible

S3 Endpoint

tenant1.hcp.example.com

Bucket

namespace1

Prefix

destinationDir

Prefix Delimiter

/

Action Name

Write File

Data connection

destinationDataConnection

Stream

HCI_content

Filename field

HCI_filename

Relative prefix field

HCI_relativePath

Base Prefix

/writtenByHCI

HCP System Name

hcp.example.com

HCP Tenant Name

tenant1

HCP Namespace Name

namespace1

Filename and path

/destinationDir/writtenByHCI/file.txt

How this data connection populates the HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

If this data connection is configured to read objects from the root directory (/) of a data source, the value for the HCI_relativePath field is relative to the root directory.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

If you change the data connection to read from a specific directory (for example, /logs), the HCI_relativePath field value is relative to that directory. For example, in this case, the HCI_relativePath value for /logs/March.log would be /.

Considerations - S3 compatible connections

After running a search in the Search App, users can click search result links to download files from your data sources. For users to be able to access files from Amazon S3, the files must have public read permissions.

HCP Monitoring data connection

The HCP Monitoring data connection produces documents containing performance and storage metrics for a Hitachi Content Platform (HCP) system.

This data connection is typically used by the Monitor App, but you can also use it in your workflows to process, archive, and index HCP system information.

TipYou can use this data connection to gather metrics from your HCP system and then use an Action to write them to an HCP namespace for archiving.

How this data connection collects information

This data connection can collect information from any combination of these HCP system resources:

  • Node status API: Shows performance metrics for individual HCP nodes.
  • MAPI: Shows object storage metrics for individual HCP tenants.
  • SNMP: Shows object storage metrics for the HCP system and individual nodes.

Each resource for which you want to collect information must be enabled in HCP.

When used as part of a workflow, the data connection makes regular requests to each enabled resource. Each request produces a document containing information specific to that resource. The Politeness setting determines how often these requests are made.

Supported HCP versions

This data connection is designed to collect information from HCP systems at version 8.0 or later. It can collect information from earlier HCP system versions, but some information might be unavailable. For example, individual HCP node metrics (such as the number of active HTTP connections per node) are available only in HCP version 8.0 and later.

Fields added to all documents

This table lists the fields that appear in all documents produced by this data connection.

FieldDescription
signalTimestampThe time when metrics were collected.
signalType

The source of metrics collection. Select from:

  • NODESTATUS
  • SNMP
  • MAPI
signalSystemTypeThe type of the system from which metrics were collected. For this data connection, the value is always HCP.
signalSystemThe name of the system from which metrics were collected.
signalElementType

The HCP system scope that the metrics apply to. Select from:

  • SYSTEM: Fields in the document apply to an entire HCP system.
  • NODE: Fields in the document apply to a specific HCP node.
  • TENANT: Fields in this document apply to a specific HCP tenant.
signalElementThe name of the system element that the document applies to. This field depends on the value for the signalElementType field. For example, if the signalElementType field value is NODE, signalElement is the name of a specific HCP node.

Fields added by the Node Status API

This table lists the fields added to documents when the Monitor Nodes option is enabled for this data connection.

FieldDescription
beReadThe average number of bytes read from the node per second over the back-end network.
beWriteThe average number of bytes written to the node per second over the back-end network.
cpuThe total percentage of CPU capacity used.
cpuSystemThe percentage of CPU capacity used by the operating system kernel.
cpuUserThe percentage of CPU capacity used by HCP processes.
diskReadThe average number of blocks read from the logical volume per second.
diskUtilizationThe usage of the communication channel between the operating system and the logical volume as a percent of the channel bandwidth.
diskUtilizationAvgThe average value for the diskUtilization field.
diskUtilizationMdnThe median value for the diskUtilization field.
diskWriteThe average number of blocks written to the logical volume per second.
feReadThe average number of bytes read from the node per second over the front-end network.
feWriteThe average number of bytes written to the node per second over the front-end network.
httpConnectionsThe number of HTTP connections.
ioWaitThe percentage of CPU capacity spent waiting to access logical volumes that are in use by other processes.
swapOutThe average number of pages swapped out of memory per second.

Fields added by MAPI

This table lists the fields added to documents when the Monitor tenants via MAPI option is enabled for this data connection.

FieldDescription
ingestedVolumeThe total size of all objects in this tenant.
objectsThe total number of objects in this tenant.
objectsCMThe number of objects in the tenant that have custom metadata.
objectsCompressedThe number of compressed objects in the tenant.
storageUsedThe amount of storage space used by this tenant.

Fields added by SNMP

This table lists the fields added to documents when the Monitor tenants via SNMP option is enabled for this data connection.

FieldDescription
objectsThe number of objects stored in the system.
objectsIndexedThe number of objects indexed by the HCP system.
ingestedVolumeThe total size of all objects in the system.
storageTotal

Depends on value for the signalElementType field:

  • NODE: The total amount of storage space on all volumes managed by the node.
  • SYSTEM: The total amount of storage space on all G Series nodes in the system.
storageUsed

Depends on value for the signalElementType field:

  • NODE: The amount of storage space used on all volumes managed by the node.
  • SYSTEM: The amount of storage space used on all G Series nodes in the system.
economyStorageTotalThe total amount of storage space on all S Series nodes in the system.
economyStorageUsedThe amount of storage space used on all S Series nodes in the system.

Configuration settings

This table lists the configuration settings for this data connection and, if applicable, the corresponding HCP system configuration required.

SettingDescriptionHCP configuration required
HCP System to Monitor

The domain name of the HCP system to monitor. For example:

corp-hcp.example.com
N/A
System-level Monitoring
Monitor system via SNMPWhether to use SNMP (Simple Network Management Protocol) to collect information about the HCP system.

To enable SNMP in HCP:

  1. Log into the HCP System Management Console.
  2. Select Monitoring > SNMP.
  3. Select Enable SNMP at snmp.<hcp-system-name>.
  4. Select an SNMP version to use.
  5. Specify a value for the Community field.
  6. If you selected SNMP version 3, specify a username and password.
  7. Click Update Settings.
Version

Version of the SNMP protocol to use. The options are:

  • Version 1 or 2c
  • Version 3

Specify the version setting that your HCP system is using.

Community (SNMP version 1 or 2c)Specify the Community setting that your HCP system is using.
Username, Password (SNMP version 3)Specify the username and password for the SNMP user account your HCP system is using.
Time between checksThe interval, in seconds, between each try to collect information. The default is 180.N/A
Tenant-level Monitoring
Monitor tenants via MAPIWhether to use the HCP Management API (MAPI) to collect information about individual tenants in the HCP system.

To configure HCP to allow MAPI access:

  1. Log into the HCP System Management Console.
  2. Select Security > MAPI.
  3. On the Management API page, select Enable the HCP management API.
  4. In the IP address Allow and Deny lists, make sure that the HCP system allows MAPI access to all IP addresses in the HCM system. You can do this, for example, by adding all HCM system instance IP addresses to the Allow list.
  5. Click Update Settings.
Username, PasswordThe username and password for a system-level HCP user account.The user account you want this data connection to use must exist on the HCP system. For information on configuring HCP system-level user accounts, see the HCP documentation.
HCP Authentication TypeRequiredThe type of authentication which should be used when connecting to an HCP system. Users can select either their local credentials or Active Directory credentials. The default value is Local.
Time between checksThe interval, in seconds, between each try to collect information. The default is 300.N/A
Node-level Monitoring
Monitor NodesWhether to use the HCP Node Status API to collect information about individual nodes in the system.

To enable the Node Status API on your HCP system:

  1. Log into the HCP System Management Console.
  2. Select Security > Network Security.
  3. Select Enable Node Status.
Time between checksThe interval, in seconds, between each try to collect information. The default is 60.N/A

HCP Syslog Kafka Queue data connection

This data connection is a version of the Kafka Queue data connection, specially configured to read and process syslog messages sent by an HCP system to an Apache Kafka message queue.

This data connection reads syslog messages from a specified Apache Kafka queue, not directly from an HCP system.

For more information on Apache Kafka, see http://kafka.apache.org/

This data connection is typically used by the Monitor App, but you can also use it to process, archive, and index HCP syslog messages in your workflows.

TipYou can use this data connection to read HCP syslog events and then use an Action to write them to an HCP namespace for archiving.

HCP requirements

To use this data connection, you need to configure your HCP system to send syslog messages to a Kafka queue.

Checking for updates with this connector

During a workflow task, when this data connection reads messages from a Kafka queue, it continues until all messages in the queue have been read.

If the Check for Updates setting is disabled for the workflow task, the task stops when all messages have been read.

If the Check for Updates setting is enabled, the data connection continuously scans the queue for new messages and reads them as they are added.

Configuration settings

SettingRequired/OptionalDescription
Connection Settings
Kafka ServersRequired

A comma-separated list of host/port pairs to use for establishing the initial connection to a Kafka cluster. The list should be in the form:

<host>:<port>,<host>:<port>,...

For example:

kafka1.example.com:9092,kafka2.example.com:9092

These servers are used for the initial connection to discover the full cluster membership, which might change dynamically. The list does not need to contain the full set of servers but you might want to specify additional in the event one becomes unavailable.

Security ProtocolRequired

The security protocol used to communicate with the Kafka brokers. Options are:

  • PLAINTEXT: Communication with the Kafka brokers is not secured.
  • SSL: Communication with the Kafka brokers is secured using SSL.
Queue Settings
Kafka TopicRequiredName of the Kafka topic to connect to.
Initial timestampOptional

The earliest time after which you want to retrieve messages from the queue.

Valid values:

  • A time specified as a number of milliseconds since midnight on January 1st, 1970: The data connection collects only the messages that were added to the queue at or after this time.
  • 0, or no value: The data connection collects all messages in the queue, regardless of age.
  • END: When the data connection is used as a workflow input, the data connection collects only the messages added to the queue after the initial start of the workflow.
Batch sizeOptionalThe maximum number of messages to retrieve from a queue at one time.
HCP Settings
HCP SystemRequiredThe domain name of the HCP system to process syslog messages from. This option is used to filter the applicable messages from the queue

Testing this connection

When you test this data connection, the system tests that the data connection can connect to the specified Kafka topic. It does not test whether HCP syslog messages are successfully being added to that topic.

Kafka Queue data connection

The Kafka Queue data connection allows messages to be read from and written to Apache Kafka message queues. These queues facilitate the sharing of messages between systems, often in real-time.

NoteQueue data connections pointing to index collections will not scan for updated documents if used as workflow inputs.

For more information on Apache Kafka, see http://kafka.apache.org/

Checking for updates with this connector

During a workflow task, when this data connection reads messages from a Kafka queue, it continues until all messages in the queue have been read.

If the Check for Updates setting is disabled for the workflow task, the task stops when all messages have been read.

If the Check for Updates setting is enabled, the data connection continuously checks the queue for new messages and reads them as they are added.

Configuration settings

SettingRequired/OptionalDescription
Connection Settings
Kafka ServersRequired

A comma-separated list of host/port pairs to use for establishing the initial connection to a Kafka cluster. The list should be in the form:

<host>:<port>,<host>:<port>,...

For example:

kafka1.example.com:9092,kafka2.example.com:9092

These servers are used for the initial connection to discover the full cluster membership, which might change dynamically. The list does not need to contain the full set of servers but you might want to specify additional in the event one becomes unavailable.

Security ProtocolRequired

The security protocol used to communicate with the Kafka brokers. Options are:

  • PLAINTEXT: Communication with the Kafka brokers is not secured.
  • SSL: Communication with the Kafka brokers is secured using SSL.
Queue Settings
Kafka TopicRequiredName of the Kafka topic to connect to.
Initial timestampOptional

The earliest time after which you want to retrieve messages from the queue.

Valid values:

  • A time specified as a number of milliseconds since midnight on January 1st, 1970: The data connection collects only the messages that were added to the queue at or after this time.
  • 0, or no value: The data connection collects all messages in the queue, regardless of age.
  • END: When the data connection is used as a workflow input, the data connection collects only the messages added to the queue after the initial start of the workflow.
Batch sizeOptionalThe maximum number of messages to retrieve from a queue at one time.

Supported actions

Action nameDescriptionConfiguration settings
Enqueue MessageFor each document, the data connection writes a message to the message queue.

Message: The contents of the message to enqueue for each document.

To include a document field values, use this syntax:

${field-name}

For example:

Document ${HCI_displayName} was processed

PostgreSQL JDBC data connection

This data connection uses the Java Database Connectivity (JDBC) API to connect to PostgreSQL databases. It uses SQL queries to retrieve documents from specified database tables.

When information from a database table is read into the system, rows become documents and columns become fields within documents.

NoteJDBC data connections pointing to index collections will not check for updated documents if used as workflow inputs.

Authentication

If a database needs authentication, you need to provide the username and password for a PostgreSQL database user account when configuring this data connection. The account must have permission to read the database you want.

Avoid restarting workflow tasks that use this data connection

This data connection does not checkpoint its progress while examining a database. If your workflow uses this data connection, when you pause the workflow task and then resume it, the data connection rereads the entire database.

Checking for updates for an SQL data connector

NoteJDBC data connections pointing to index collections will not check for updated documents if used as workflow inputs.
List-based change tracking

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Identifying changed rows

Use the Version setting for this data connection be able to identify updated rows. If you leave this setting blank, the data connection can identify new and deleted rows, but not updated ones.

Configuration settings

SettingRequired/OptionalDescription
JDBC Settings
JDBC ConnectionRequired

Connection string for accessing the database using JDBC. For example:

jdbc:postgresql://myserver.example.com:5432/postgres

For information on formatting the JDBC connection string for PostgreSQL, see the PostgreSQL documentation.

User NameOptionalUsername for a database user account.
PasswordOptionalPassword for the user account.
Query batch sizeOptionalNumber of rows to retrieve from the data source per request. To disable batching, specify -1 or leave this setting blank.
Query Settings
SELECT columnsRequired

SQL SELECT clause, used to specify which columns to include with each row extracted from the database.

Valid values include:

  • Comma-separated list of columns.
  • Asterisk (*), to select all columns.
Tip: You can use SQL alias syntax to rename retrieved columns.

For example, if your database contains a column called physician, you can change it to doctor in all extracted documents by specifying:

doctor AS physician
Note: By default, all data connections add these fields to the documents they read:
  • HCI_id
  • HCI_URI
  • HCI_dataSourceUuid
  • HCI_displayName
  • HCI_doc_version
FROMRequiredSQL FROM clause, used to specify the database tables to extract documents from.
WHEREOptional

SQL WHERE clause, used to limit which rows are retrieved from the data source.

For example, say that you have a database with information on cities, which contains a column called Population. To retrieve only rows for cities with populations of one million or more, specify:

Population > 1000000
Results Settings
Primary KeyRequired

Comma-separated list of columns that uniquely identify a row in the database.

This value is used to populate the HCI_id document field.

Display NameOptional

Comma-separated list of columns to be used as the friendly display name for a row.

This value is used to populate the HCI_displayName document field.

If you don't specify a value for this setting, the value specified for the Primary Key setting is used.

VersionOptional

Comma-separated list of columns whose contents can be used to determine when a row has been changed.

Leave this setting blank if no such column exists in the data source.

This value is used to populate the HCI_doc_version field.

Supported actions

This data connection does not support any actions. It can only read documents.

MySQL and MariaDB JDBC data connection

This data connection uses the Java Database Connectivity (JDBC) API to connect to MySQL and MariaDB databases. It uses SQL queries to retrieve documents from specified database tables.

When information from a database table is read into the system, rows become documents and columns become fields within documents.

NoteJDBC data connections pointing to index collections will not scan for updated documents if used as workflow inputs.

Authentication

If a database needs authentication, you need to provide the username and password for a database user account when configuring this data connection. The account must have permission to read the database you want.

Avoid restarting workflow tasks that use this data connection

This data connection does not checkpoint its progress while examining a database. If your workflow uses this data connection, when you pause the workflow task and then resume it, the data connection rereads the entire database.

Checking for updates for an SQL data connector

NoteJDBC data connections pointing to index collections will not check for updated documents if used as workflow inputs.
List-based change tracking

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Identifying changed rows

Use the Version setting for this data connection be able to identify updated rows. If you leave this setting blank, the data connection can identify new and deleted rows, but not updated ones.

Configuration settings for a MySQL and MariaDB JDBC connection

SettingRequired/OptionalDescription
JDBC Settings
JDBC ConnectionRequired

Connection string for accessing the database using JDBC. For example:

jdbc:mysql://myserver.example.com:3306/mydatabase

For more information on:

  • Formatting the JDBC connection string for MySQL, see the MySQL documentation.
  • Formatting the JDBC connection string for MariaDB, see the MariaDB documentation.
User NameOptionalUsername for a database user account.
PasswordOptionalPassword for the user account.
Query batch sizeOptionalNumber of rows to retrieve from the data source per request. To disable batching, specify -1 or leave this setting blank.
Query Settings
SELECT columnsRequired

SQL SELECT clause, used to specify which columns to include with each row extracted from the database.

Valid values include:

  • Comma-separated list of columns.
  • Asterisk (*), to select all columns.
Tip: You can use SQL alias syntax to rename retrieved columns.

For example, if your database contains a column called physician, you can change it to doctor in all extracted documents by specifying:

doctor AS physician
Note: By default, all data connections add these fields to the documents they read:
  • HCI_id
  • HCI_URI
  • HCI_dataSourceUuid
  • HCI_displayName
  • HCI_doc_version
FROMRequiredSQL FROM clause, used to specify the database tables to extract documents from.
WHEREOptional

SQL WHERE clause, used to limit which rows are retrieved from the data source.

For example, say that you have a database with information on cities, which contains a column called Population. To retrieve only rows for cities with populations of one million or more, specify:

Population > 1000000
Results Settings
Primary KeyRequired

Comma-separated list of columns that uniquely identify a row in the database.

This value is used to populate the HCI_id document field.

Display NameOptional

Comma-separated list of columns to be used as the friendly display name for a row.

This value is used to populate the HCI_displayName document field.

If you don't specify a value for this setting, the value specified for the Primary Key setting is used.

VersionOptional

Comma-separated list of columns whose contents can be used to determine when a row has been changed.

Leave this setting blank if no such column exists in the data source.

This value is used to populate the HCI_doc_version field.

Supported actions

This data connection does not support any actions. It can only read documents.

Solr JDBC data connection

This data connection uses the Java Database Connectivity (JDBC) API to connect to Solr indexes. It uses SQL queries to retrieve documents from specified Solr indexes.

You can use this data connection to retrieve documents from either internal or external Solr indexes.

TipHitachi Content Intelligence also includes a version of this plugin specifically configured for internally-managed indexes.

An index must be at Solr version 6 or later for this data connection to connect to it.

NoteJDBC data connections pointing to index collections will not scan for updated documents if used as workflow inputs.

Avoid restarting workflow tasks that use this data connection

This data connection does not checkpoint its progress while examining a index. If your workflow uses this data connection, when you pause the workflow task and then resume it, the data connection rereads the entire index

Checking for updates for an SQL data connector

NoteJDBC data connections pointing to index collections will not check for updated documents if used as workflow inputs.
List-based change tracking

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Identifying changed rows

Use the Version setting for this data connection be able to identify updated rows. If you leave this setting blank, the data connection can identify new and deleted rows, but not updated ones.

Configuration settings for a MySQL and MariaDB JDBC connection

SettingRequired/OptionalDescription
JDBC Settings
JDBC ConnectionRequired

Connection string for accessing the database using JDBC. For example:

jdbc:mysql://myserver.example.com:3306/mydatabase

For more information on:

  • Formatting the JDBC connection string for MySQL, see the MySQL documentation.
  • Formatting the JDBC connection string for MariaDB, see the MariaDB documentation.
User NameOptionalUsername for a database user account.
PasswordOptionalPassword for the user account.
Query batch sizeOptionalNumber of rows to retrieve from the data source per request. To disable batching, specify -1 or leave this setting blank.
Query Settings
SELECT columnsRequired

SQL SELECT clause, used to specify which columns to include with each row extracted from the database.

Valid values include:

  • Comma-separated list of columns.
  • Asterisk (*), to select all columns.
Tip: You can use SQL alias syntax to rename retrieved columns.

For example, if your database contains a column called physician, you can change it to doctor in all extracted documents by specifying:

doctor AS physician
Note: By default, all data connections add these fields to the documents they read:
  • HCI_id
  • HCI_URI
  • HCI_dataSourceUuid
  • HCI_displayName
  • HCI_doc_version
FROMRequiredSQL FROM clause, used to specify the database tables to extract documents from.
WHEREOptional

SQL WHERE clause, used to limit which rows are retrieved from the data source.

For example, say that you have a database with information on cities, which contains a column called Population. To retrieve only rows for cities with populations of one million or more, specify:

Population > 1000000
Results Settings
Primary KeyRequired

Comma-separated list of columns that uniquely identify a row in the database.

This value is used to populate the HCI_id document field.

Display NameOptional

Comma-separated list of columns to be used as the friendly display name for a row.

This value is used to populate the HCI_displayName document field.

If you don't specify a value for this setting, the value specified for the Primary Key setting is used.

VersionOptional

Comma-separated list of columns whose contents can be used to determine when a row has been changed.

Leave this setting blank if no such column exists in the data source.

This value is used to populate the HCI_doc_version field.

Configuration settings for a Solr JDBC data connection

SettingRequired/OptionalDescription
JDBC Settings
JDBC ConnectionRequired

Connection string for accessing the database using JDBC. For example:

jdbc:solr://<zookeeper-host>:<zookeeper-port>/
<zookeeper-path>?collection=<collection-name>

For example:

jdbc: solr://mySolr.example.com:2181/solr?...ection=myIndex
Note: Though you specify a Solr collection here, you can specify a different one for the FROM field.
Query batch sizeOptionalNumber of rows to retrieve from the data source per request. To disable batching, specify -1 or leave this setting blank.

Important: If you enable batching for this data connection, all index fields you specify for the SELECT columns setting must have the docValues field attribute in the index.

Query Settings
SELECT columnsRequired

SQL SELECT clause, used to specify which columns to include with each row extracted from the database.

Valid values include:

  • Comma-separated list of columns.
  • Asterisk (*), to select all columns.
Tip: You can use SQL alias syntax to rename retrieved columns.

For example, if your database contains a column called physician, you can change it to doctor in all extracted documents by specifying:

doctor AS physician
Note: By default, all data connections add these fields to the documents they read:
  • HCI_id
  • HCI_URI
  • HCI_dataSourceUuid
  • HCI_displayName
  • HCI_doc_version
Note: The SQL syntax that you can use is determined by what Solr supports. For information on Solr SQL support, see the applicable Solr documentation.
FROMRequiredSQL FROM clause, used to specify the database tables to extract documents from.

Note: The SQL syntax that you can use is determined by what Solr supports. For information on Solr SQL support, see the applicable Solr documentation.

WHEREOptional

SQL WHERE clause, used to limit which rows are retrieved from the data source.

For example, say that you have an index of images containing a field called City. To retrieve only the documents for images that were taken in London, specify:

City = 'London'
Note: The SQL syntax that you can use is determined by what Solr supports. For information on Solr SQL support, see the applicable Solr documentation.
Results Settings
Primary KeyRequired

Comma-separated list of columns that uniquely identify a row in the index.

All fields you specify here must also be specified for the SELECT columns setting

This value is used to populate the HCI_id document field.

Display NameOptional

Comma-separated list of fields to be used as the friendly display name for a document.

All fields you specify here must also be specified for the SELECT columns setting.

This value is used to populate the HCI_displayName document field.

If you don't specify a value for this setting, the value specified for the Primary Key setting is used.

VersionOptional

Comma-separated list of fields whose contents can be used to determine when a document has been changed.

All fields you specify here must also be specified for the SELECT columns setting.

Leave this setting blank if no such column exists in the data source.

This value is used to populate the HCI_doc_version field.

Supported actions

This data connection does not support any actions. It can only read documents.

Hadoop File System data connection

This data connection allows access to Hadoop Distributed File Systems (HDFS).

Authentication

This data connection does not support authentication.

Checking for updates with a Hadoop data connection

This is a list-based data connection. This means that when the Check for Updates setting is enabled for a workflow task, this data connection relies on a list kept by the task to determine which files have changed since the last time the data source was read.

This is different from a change-based data connection, such as the HCP MQE data connection, which can ask the data source directly for a list of files that changed during a span of time.

Configuration settings

SettingDescription
NameA name for the data connection.
DescriptionAn optional description for the data connection.
HDFS HostHostname or IP address for the HDFS NameNode.
HDFS PortPort to connect to on the HDFS NameNode.
Use SSLWhether to use SSL when connecting to HDFS.
Base directoryThe path on the HDFS system to the folder containing the data you want to process.
Max visited file size (bytes)

Hitachi Content Intelligence will create documents only for files smaller than this limit.

The default is 107374182400 (100 GB).

Filter type

The filter to use when crawling the HDFS system. Choose from the following filter types:

  • None: Crawl everything without restriction.
  • Whitelist: Crawl only directories in your whitelist. When selected, you must specify a list of directories.
  • Blacklist: Crawl everything except for directories in your blacklist. When selected, you must specify a list of directories.

Supported actions

Action nameDescriptionConfiguration settings
Delete

For each document, the system deletes the corresponding file from the HDFS file system.

This action is available only when the Hadoop File System data connection is used by an Execute Action stage, not when it is included as a workflow output.

  • Filename field: The document field that contains the filename for the corresponding file in the data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Delete Empty Parent Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file.
Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write File action. The Delete action is executed only when both of these conditions are true:

  • The Output File action is included as a workflowoutput, not as a pipeline Execute Action stage.
  • A document has an HCI_operation field with a value of DELETED.

    This indicates that the corresponding file was deleted from the local file system. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Owner UID: User ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_ownerSID.

  • Owner GID: Group ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_groupSID.

  • Permissions: POSIX file permissions value (mode) for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_mode.

  • Delete Empty Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file.
Write File

For each document, this action writes the specified stream to a file.

If the file doesn't exist, the action creates it.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Owner UID: User ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_ownerSID.

  • Owner GID: Group ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_groupSID.

  • Permissions: POSIX file permissions value (mode) for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_mode.

How this data connection determines which file to perform an action on

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>//
<Base Path-from-action-config-(if-specified)>/<Relative path field-from-action-config>/
<Filename field-from-action-config>

How this data connection populates the HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

If this data connection is configured to read objects from the root directory (/) of a data source, the value for the HCI_relativePath field is relative to the root directory.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

If you change the data connection to read from a specific directory (for example, /logs), the HCI_relativePath field value is relative to that directory. For example, in this case, the HCI_relativePath value for /logs/March.log is /.

Local File System data connection (DEPRECATED)

ImportantThis connector has been deprecated as of version 2.2.2 of HCI and will be replaced in an upcoming release.

This data connection retrieves files from the local file system (LFS) on each Hitachi Content Intelligence instance.

When configuring this data connection, you specify a path to retrieve files from. This path must be located within the product installation folder and must exist on every instance in the system.

In order for this connector to run correctly when executing pipelines, mount points for the associated drives need to be created before starting HCI.

Connecting to NFS file systems

You can use the Local File System data connection to allow Hitachi Content Intelligence to read and perform actions on data stored on remote network file systems.

You do this by mounting a network file system to the same location within the Hitachi Content Intelligence installation folder on each instance in the system.

Procedure

  1. Use SSH to access all instances in the system.

  2. On each instance, create a folder within the Hitachi Content Intelligence installation folder:

    mkdir/<install-folder>/hci/<nfs-mount-location>
    ImportantDo not create your new folder within any of the existing directories under /hci. These directories were created by Hitachi Content Intelligence when it was installed.
  3. Mount the NFS file system you want to access:

    mount <hostname>:<path-to-mount> /<install-folder>/hci/<nfs-mount-location>
  4. In the Hitachi Content Intelligence Admin App, create a Local File System data connection. For the Base directory, specify the path to the folder where you mounted the NFS file system.

  5. To access the files on the NFS file system, use the data connection as part of a workflow or Execute Action stage.

    NoteIf your Local File System cannot access files in the NFS file system, see Troubleshooting.

POSIX metadata

The Local File System data connection collects POSIX filesystem metadata from the files it reads. This metadata is converted to field/value pairs in the resulting documents.

POSIX metadataResulting fieldField value type
sizeHCI_sizeLong
uid (ID of the owning user)HCI_uidInteger
gid (ID of the owning group)HCI_gidInteger
mode (file permissions)HCI_modeInteger
ctime (change time)

HCI_createdDateMillis

HCI_createdDateString

LongString
atime (access time)HCI_accessDateMillis HCI_accessDateStringLongString
mtime (modification time)HCI_modifiedDateMillis HCI_modifiedDateStringLongString

Configuration settings

SettingDescription
NameA name for the data connection.
DescriptionAn optional description for the data connection.
Base directory

The path to the folder that contains the data you want to process.

Important:

  • This must be a location within the system install directory.
  • The path you specify must exist on every instance in the system.
Max visited file size (bytes)

The system will create documents only for files smaller than this limit.

The default is 107374182400 (100GB).

Filter type

The filter to use when crawling the file system. Choose from the following filter types:

  • None: Crawl everything.
  • Whitelist: Crawl only directories in your whitelist. When selected, you must specify a list of directories.
  • Blacklist: Crawl everything except for directories in your blacklist. When selected, you must specify a list of directories.

Supported actions

Action nameDescriptionConfiguration settings
Delete

For each document, the system deletes the corresponding file from the local file system.

This action is available only when the Local File System data connection is used by an Execute Action stage, not when it is included as a workflow output.

  • Filename field: The document field that contains the filename for the corresponding file in the data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Delete Empty Parent Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file.
Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write File action. The Delete action is executed only when both of these conditions are true:

  • The Output File action is included as a workflowoutput, not as a pipeline Execute Action stage.
  • A document has an HCI_operation field with a value of DELETED.

    This indicates that the corresponding file was deleted from the local file system. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Owner UID: User ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_ownerSID.

  • Owner GID: Group ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_groupSID.

  • Permissions: POSIX file permissions value (mode) for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_mode.

  • Delete Empty Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file.
Write File

For each document, this action writes the specified stream to a file.

If the file doesn't exist, the action creates it.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Owner UID: User ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_ownerSID.

  • Owner GID: Group ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_groupSID.

  • Permissions: POSIX file permissions value (mode) for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_mode.

How this data connection determines where to perform an action

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>//
<Base Path-from-action-config-(if-specified)>/<Relative path field-from-action-config>/
<Filename field-from-action-config>

How this data connection populates the HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

If this data connection is configured to read objects from the root directory (/) of a data source, the value for the HCI_relativePath field is relative to the root directory.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

If you change the data connection to read from a specific directory (for example, /logs), the HCI_relativePath field value is relative to that directory. For example, in this case, the HCI_relativePath value for /logs/March.log is /.

CIFS data connection

This data connection allows you to access Common Internet File System (CIFS) 2.x shares. CIFS is a form of Service Message Block (SMB), a network sharing protocol.

Authentication

To access a CIFS share, you can use local or Active Directory (AD) authentication. If you are using AD, you must specify a domain name. If you do not specify a domain name, you have access to the CIFS share as a local user. If you do not specify a username, you can access the CIFS share anonymously.

Configuration settings

SettingDescription
NameA name for the data connection.
DescriptionAn optional description for the data connection.
HostHostname or IP address for the CIFS server.
Share nameName of the share on the CIFS server.
Base directory

The path to the folder that contains the data you want to process.

Important:

  • This must be a location within the system install directory.
  • The path you specify must exist on every instance in the system.
Include user and group SIDs

Whether to query the CIFS server for user and group identifiers. This needs an additional request per file, so performance is slower. The HCI_ownerSID and HCI_groupSID fields get added to each document when enabled.

The default is false.

UsernameA username to access the share.
DomainThe Active Directory domain to access. This must be specified to access the CIFS share using AD authentication.
PasswordPassword for the user account.
Max visited file size (bytes)

The system will create documents only for files smaller than this limit.

The default is 107374182400 (100GB).

Filter type

The filter to use when crawling the file system. Choose from the following filter types:

  • None: Crawl everything.
  • Whitelist: Crawl only directories in your whitelist. When selected, you must specify a list of directories.
  • Blacklist: Crawl everything except for directories in your blacklist. When selected, you must specify a list of directories.
Include Hidden FilesWhen set to true, crawls hidden files on the CIFS share. Default is false.

Supported actions

Action nameDescriptionConfiguration settings
Delete

For each document, the system deletes the corresponding file from the share.

This action is available only when the CIFS data connection is used by an Execute Action stage, not when it is included as a workflow output.

  • Filename field: The document field that contains the filename for the corresponding file in the data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's path.

    The default is HCI_relativePath.

  • Delete Empty Parent Directories: Option to also delete a file's parent directories if they become empty due to the deletion the file.
Output File

Depending on the state of the incoming document, executes either the Write File or Delete action.

This action usually executes the Write File action. The Deleteaction is executed only when both of these conditions are true:

  • The Output File action is included as a workflowoutput, not as a pipeline Execute Action stage.
  • A document has an HCI_operation field with a value of DELETED.

    This indicates that the corresponding file was deleted from the local file system. Such documents do not go through the pipeline; they are sent directly to workflow output.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Owner UID: User ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_ownerSID.

  • Owner GID: Group ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_groupSID.

  • Permissions: POSIX file permissions value (mode) for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_mode.

  • Delete Empty Directories: Option to also delete a file's parent directories if they become empty due to deletion of the file.
Write File

For each document, the action writes the specified stream to a share and all parent directories.

If the file doesn't exist, the action creates it.

  • Stream: The stream that contains the full content for the document.

    The default is HCI_content.

  • Filename field: The document field that contains the filename for the corresponding file in the HCP data source.

    The default is HCI_filename.

  • Relative path field: The document field that contains the document's relative path.

    The default is HCI_relativePath.

  • Base path: An optional sub path. If specified, the path you specified is prepended to the value for the Path field setting.
  • Owner UID: User ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_ownerSID.

  • Owner GID: Group ID for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_groupSID.

  • Permissions: POSIX file permissions value (mode) for files created by this action. Can be a literal value or the name of a document field.

    The default is HCI_mode.

How this data connection determines where to perform an action

This syntax shows how this data connection determines where to perform an action:

<location-specified-by-the-data-connection-used-by-the-action>//
<Base Path-from-action-config-(if-specified)>/<Relative path field-from-action-config>/
<Filename field-from-action-config>

How this data connection populates the HCI_relativePath field

This data connection adds the HCI_relativePath field to each document it creates. By default, data connections use the HCI_relativePath field to determine where actions should be performed.

If this data connection is configured to read objects from the root directory (/) of a data source, the value for the HCI_relativePath field is relative to the root directory.

For example, when the file /logs/March.log is converted to a document, the HCI_relativePath field value for the document is logs/.

If you change the data connection to read from a specific directory (for example, /logs), the HCI_relativePath field value is relative to that directory. For example, in this case, the HCI_relativePath value for /logs/March.log is /.

Solr Query data connections

Solr Query data connection

The Solr Query data connection uses Solr's Query API to retrieve documents from internal or external Solr indexes. As this connector pages through the results of the query, its progress is checkpointed. This allows workflows that use these connectors to be paused/resumed.

When configuring a Solr Query data connector, you specify:

  • The Solr server to connect to.
  • The index to read from.
  • The filtering criteria for limiting which documents are processed.
  • The fields to include with processed documents.
NoteWhen filing in the Solr Connection field, /solr must be used as the path if your Solr index is managed by HCI.

Internal Index Query data connection

The Internal Index Query connection works only with internal indexes. It functions exactly as the Solr Query connector does but needs less user input, as the default values for the configuration options are pulled from the managed HCI internal indexes.

NoteInternal Index Query connectors do not support any Actions.

Configuration settings for a Solr Query connector

When configuring a Solr Index Query connector, you must specify the following:

  • Index Settings:
    • Solr Connection: URL used to connect to Solr. Format is kHost:zkPort[/zkPath] with the host and port of the ZooKeeper used by the Solr.
    • Index: Name of the Solr index.
    • Query batch size: Number of documents to be requested from Solr in a single request.
  • Query Settings:
    • Query: The search query to send to Solr.
    • Fields: Comma-separated list of fields to request for the query, or use * to include all fields returned by default.
  • Filter Queries:
    • Add Item: Add a document filter to your query. After entering a name, you can either add the item or cancel it.
    • Select Fields: Select the document fields you want to filter.
    • Delete Selected Fields: Delete the document fields you previously selected.
  • Results Settings:
    • Unique Key: The unique key that identifies the Solr document. For HCI internal indexes, this will generally be the HCI_id field.
    • Display Name: The display name of the Solr document. For HCI internal indexes, this will generally be the HCI_displayName field. If not specified, Primary Key will be used as the display name.

Configuration settings for an Internal Index Query connector

When configuring an Internal Index Query connector, you must specify the following:

  • Index Settings:
    • Index: Name of the Solr index.
    • Query batch size: Number of documents to be requested from Solr in a single request.
  • Query Settings:
    • Query: The search query to send to Solr.
    • Fields: Comma-separated list of fields to request for the query, or use * to include all fields returned by default.
  • Filter Queries:
    • Add Item: Add a document filter to your query. After entering a name, you can either add the item or cancel it.
    • Select Fields: Select the document fields you want to filter.
    • Delete Selected Fields: Delete the document fields you previously selected.

Crawling behavior for Solr Query connectors

When crawling, Solr Query connectors execute configured queries against a Solr index and creates an HCI document for each Solr query result. The HCI document maintains all fields from the query.

NoteSolr Query connectors do not support any Actions.

 

  • Was this article helpful?