Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Workflow tasks

After you've built a workflow, you can run a task for it. Running a task for a workflow causes the system to run a job, which in turn performs the work that the workflow specifies. As a task runs, the system does one or more of these:

  • Connects to each workflow input and reads data from it.
  • Converts that data into representations called documents.
  • Sends documents through the workflow pipeline.
  • Sends the documents produced by the workflow pipeline to each of the workflow's outputs.

You can run a workflow task continuously or schedule it to run only during the times you specify.

NoteFor the system to be able to use an instance's resources to run workflow tasks, that instance must be configured to run Workflow-Agent jobs.

Running workflow tasks

After you've finished configuring your workflow, you can run a task for the workflow.

NoteBy default, each workflow task is configured to use all available system CPU resources. This means that, by default, multiple tasks cannot run at the same time.
Task schedulingYou can configure whether a task runs indefinitely or only during the times you specify. For information, see Scheduling tasks.
NoteStarting a test manually overrides the task's schedule. That is, if you start the task yourself, even during a non-scheduled time period, the task starts running.
Checking for updatesYou can configure whether the task should periodically verify data sources for new and changed files. If not, the task reads all files in the data source one time, then stops.
Workflow Recursion Use the Workflow Recursion setting to configure how a task handles documents created by your pipeline.
Stopping tasksAfter you start a task running, the task might stop by itself. For more information, see Determining when a running task will stop. For information on stopping tasks manually, see Clearing workflow tasks and Pausing and resuming tasks.

Run a workflow task

To run a workflow task:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

  4. Optionally, to configure the settings for the task, click Actions > Edit settings.

    To run the task continuously, enable the Check for Updates option.

  5. Do one of these:

    1. To start the task running immediately and not according to a schedule, click Actions > Run Workflow Task.

    2. To configure the task to run on a schedule:

      • Click Actions > Edit settings.
      • In the Schedule section, use the calendar tool to set when the task should run.
      • Click Update.

Results

The task starts running at the beginning of the next block of time that you scheduled.

Running multiple workflows at the same time

By default, multiple workflows cannot run at the same time on the same instances. This is because each workflow is configured to use all available system CPU resources on the instances where it runs.

To run multiple workflows simultaneously, do one of these:

  • Configure workflows to not run on the same instances.

    Each workflow is associated with a job, which is the system resource that actually performs the work for the workflow. Your system administrator can configure each workflow's job to run on a different instance or set of instances from another workflow's job.

  • Configure workflows to share instance resources.

    To run multiple workflows concurrently, reconfigure the CPU Core Maximum setting for each workflow task to limit the amount of CPU resources that the task can use. This allows two or more workflow tasks to be in the Running state at the same time. For example, say you have 16 total CPU cores across all instances in your system (4-instance system with 4-core CPUs). To share these cores equally across two tasks, set the CPU Core Maximum setting to 8 for both workflow tasks.

  • Configure workflows to run at different times on the same instances.

    You can interrupt one running task to run another by doing either of these:

    • Schedule tasks to run at different times.
    • Pause one task and then start a second. After the second task has completed, you can resume the first task.

Configuring where workflows run

You can specify which system instances workflow workflows are allowed to run on. You can do this for individual workflows or for all workflows by configuring where Workflow-Agent jobs are allowed to run.

To specify where workflows can run:

Procedure

  1. Specify a number of instances to be able to run jobs at all. In the admin app, you do this on the System Configuration > Jobs > All Jobs page.

  2. Of those instances, specify which ones are allowed to run Workflow-Agent jobs. You do this on the System Configuration > Jobs > Workflow-Agent page.

  3. Of those instances, specify which ones are allowed to run a particular job. You do this on the System Configuration > Jobs > job-name page.

    For more information, see Configuring where workflows run.

Pausing and resuming tasks

If a workflow task is in the Running or Idle state, you can pause it. While paused, a task does not read, process, or index documents.

At any time, you can resume a paused task. If the paused task is configured to run on a schedule, it will start automatically at the beginning of its next scheduled time period.

Note
  • Avoid frequently pausing and resuming a task. This might cause the system to unnecessarily reread files.
  • When you pause a workflow, the corresponding job changes to the Canceled state. Currently, there is no Paused state for jobs.

To pause and resume workflows:

Procedure

  1. Click the Workflow Designer window.

  2. Click the play or pause icon for the workflow that you want.

    Alternatively:

    1. Select the workflow that you want.

    2. Click the Task window.

    3. Click Actions.

    4. In the menu, click Pause Workflow Task or Resume Workflow Task, as applicable.

Task settings

This topic describes the settings you can configure for a workflow task. For information on:

Editing task settings, see Reconfiguring tasks.

Best practices for editing tasks, see Best practices for running workflow tasks.

Document Discovery settings

Setting NameDescriptionValuesTipsNotes and considerations
Check for Updates

Specifies whether the task should run one time or continuously.

Disabled: The task runs only one time. It does not discover any documents added to, changed, or deleted from the data source after its initial verification. This is the default.

After the task finishes its scan of all documents in all workflow inputs, the task status changes to Completed.

Enabled: The task runs continuously. It periodically scans the workflow inputs for new, deleted, and changed documents. With this setting enabled, the task state never changes to Completed.

Use the Time Between Checks setting to specify the number of seconds that the task should wait from the end of the last check until it checks the data source again for updates. The default is 86400 (24 hours).

Disable this setting if the contents of your data sources never change.

Running multiple workflow tasks concurrently

By default, each workflow task is configured to use all available system CPU resources. This means that if you enable the Check for Updates setting for a workflow and run it, no other workflow tasks can run.

Checking multiple data connections

The behavior of the Time Between Checks setting differs for each data connection in the workflow. For example, say a workflow has two data connections. The task finished its last check of data connection A at 11AM and its last check of data connection B at 12PM. If you set Time Between Checks to 86400 (24 hours), the task will not reexamine data source A until at least 11AM and data source B until at least 12PM the next day.

Running without a schedule

By default, a task's schedule is empty. This means that the task does not run automatically and must be run manually.

Running with a schedule

When you configure a schedule for a task, the task runs only during the specified periods of time. It starts automatically when a scheduled time period begins and stops when the time period ends.

Note: Starting a test manually overrides the task's schedule. If you start the task yourself, even during a non-scheduled time period, the task starts running.

Preprocessing Recursion

When enabled, new documents created by a stage are not sent to the next stage. Rather, they are sent to the beginning of the first pipeline in the workflow with the Preprocessing execution mode.

Preprocessing Recursion limit: Limits the number of times that documents extracted from archive files can be sent back to the first Preprocessing pipeline. This setting protects your system from infinite recursions (that is, an infinite number of archives within archives).

Enable this setting to eliminate unnecessary stages from your pipeline.

Note: A document output by a stage is considered new if its HCI_id field value does not match the HCI_id field value of the document that entered the stage.
Process All DocumentsIf Process All Documents is enabled, all documents (including folders and directories without content) are passed through the pipelines and outputs.
ImportantProcess All Documents will only pick up documents after it has been enabled. Additionally, if it is enabled while a list-based workflow is paused, only newly created directories will be processed as documents once the workflow is resumed.
Yes: The setting is enabled and all documents will be processed.

No: Documents will be processed according to your regular workflow settings.

This task setting is disabled by default.
  • This setting will not affect the Delete Empty Parent Directories feature on the file system connectors.
  • Delete actions will be skipped on all documents without content.

Retry Failed Documents

If Check for Updates is enabled, specifies how the task should handle failed documents when reexamining a list-based data connection for changed documents

Disabled: The task retries failed documents only if their contents have changed.

Enabled: The task retries failed documents even if their contents have not changed.

When enabled, each document retried counts towards the values displayed on the task's Metrics page.

Enable this setting only when you expect your workflow to encounter temporary failures, such as connection issues, that can be resolved by trying failed documents again.

Some failures cannot be resolved simply by retrying a document. With Retry Failed Documents enabled, the task wastes resources continually retrying such failures.

This setting affects only list-based data connections, such as the built-in HCP data connection.

A change-based data connection does not revisit any document unless it detects that the document's contents have changed. This is true even when a document fails to be processed by a workflow.

Workflow Agent Recursion

When enabled, new documents created by a stage are not sent to the next stage. Rather, they are sent to the beginning of the first pipeline in the workflow with the Workflow-Agent execution mode.

Workflow Agent Recursion limit: Limits the number of times that documents extracted from archive files can be sent back to the first Workflow-Agent pipeline. This setting protects your system from infinite recursions (that is, an infinite number of archives within archives).

Enable this setting to eliminate unnecessary stages from your pipeline.

Note: A document output by a stage is considered new if its HCI_id field value does not match the HCI_id field value of the document that entered the stage.

Performance settings

By default, the Performance Settings field for a workflow task is set to Default. With this setting, the task is automatically configured with default performance values appropriate for running a single workflow at a time on both single-instance and multi-instance systems.

To manually configure performance settings for a task, change this setting to Custom.

Important
  • Configuring task performance settings incorrectly can harm task performance or cause the task to experience out-of-memory errors.
  • Before trying to change task performance settings, you should evaluate your stages and pipelines to ensure that your task is not performing unnecessary work.
  • For information on changing task settings to allow you to run multiple workflow tasks concurrently, see Running multiple workflows at the same time.

The performance settings for each task let you configure these aspects of how a task runs:

  • CPU resource usage for the task.
  • The number of documents that the task works on at a time.
  • The number of subdivisions (called jobs and partitions) to split a task into for parallel processing.

Setting NameDescriptionValuesTipsNotes and considerations
Parallel Jobs

Specifies the number of subdivisions to split tasks into. These subdivisions are called jobs.

Positive integers, not including zero.

The default value is two.

Specifying a value of two or more might improve performance by allowing task work to be done in parallel.

However, setting this value too high might harm task performance as the overhead from creating multiple jobs might offset the time savings from running task work in parallel.

Reported extra cores

Allows certain internal components (the ones that perform task work) to claim that they have more CPU cores allocated to them than they really do.

You can specify the number of additional cores for each component to advertise.

Positive integers, including zero.

The default is four.

If a workflow task takes a long time to run, verify the CPU utilization for your worker instances. If it is a low percentage, try increasing the Reported extra cores setting so the task uses more CPU resources.

Increasing this value might improve task performance because it allows certain internal components to be assigned more work than they otherwise can handle.

However, increasing this value too high might slow task performance.

Partitions Per Job

To allow task work to be processed concurrently, Hitachi Content Intelligence internally breaks tasks down into smaller divisions called jobs. Each job is then further broken down into divisions called partitions.

This settings specifies the number of partitions across the entire system used to process the task.

Positive integers or -1.

By default, the value is -1, which is interpreted as 32 times the number of instances on which this workflow job is configured to run, up to a maximum of 128.

To tune this setting:

Specify a value 32 times the number of instances on which the workflow is configured to run. Then run the task.

Specify a value 1.5 times the amount of the value you previously entered. Then run the task again.

Continue until task performance stops improving.

If you edit this setting, at a minimum, specify a number greater than or equal to the number of CPU cores across all instances in the system.

Processing Batch Size

The maximum number of documents to read into memory before batching for parallel processing.

A batch is created after either of these happens:

The target number of documents specified by this setting are read into memory.

The amount of time specified by the Processing Batch Size Timeout setting elapses.

Positive integers.

The default is 10,000.

Typically, a higher value for this setting is ideal. As a task runs, it needs to spend time setting up and tearing down each job. Because of this, it's more efficient for each job process as many documents as possible.

However, a lower value might be better if your input data connections are slow to read data from. In that case, the task might spend more time waiting for additional documents to be read when it can be processing the documents it has already read.

Processing Batch Size Timeout

The maximum time to wait, in seconds, for documents to be read into memory before batching for parallel processing.

A batch is created after either of these happens:

The amount of time specified by this setting elapses.

The target number of documents specified by the Processing Batch Size setting are read into memory.

Positive integers.

The default value is 60.

Specify 0 to use no timeout.

If your data connection reads documents in real-time, specify a low value for this setting.

Setting this value too small might cause document batch sizes to be too small, thereby reducing task performance.

CPU Core MaximumLimits the maximum number of CPU cores that each job in the task can use.

Positive integers.

The default value is -1, which means that the task uses all available CPU cores.

Use this setting to run multiple concurrent tasks:

To run two tasks with equal priority, specify the same CPU Core Maximum value for each task.

To prioritize one task over another, assign a higher CPU Core Maximum value to the one you want the system to devote more resources to.

Because the minimum value you can specify for this setting is 1, the maximum number of tasks you can run concurrently is equal to the number of CPU cores across all instances in the system.

Output Batch Size

The number of documents to send at a time to the workflow outputs.

Positive integers.

The default is 100.

Specifying a value of 1 disables batching.

Increasing this setting means more documents are sent per request, thereby lowering the number of output requests that the task needs to make. This can increase workflow output performance at the cost of higher memory utilization.

To get a performance benefit, a workflow's outputs must support batching.

Doc Process Time LimitThe max amount of time (in seconds) that should pass while processing a single document through a stage before reporting that document as taking a long time to process.The default time limit is 5 minutes (300 seconds).This value can be adjusted to help in determining stuck or stalled documents while providing increase visibility into your workflow status. To learn more, see Status messages.

Metrics settings

Setting NameDescription

Performance Impact

Collect Aggregation Metrics

When enabled, specifies that the task should collect certain information from the documents it processes. The information collected depends on the aggregations configured for the workflow. For information, see Aggregations.

This setting does not affect the collection of other task metrics, such as data about document failures or stage performance. For information on viewing task discovery metrics, see Metrics settings.

This setting is enabled by default.

Collecting aggregation metrics can consume a significant amount of memory in your system. This can decrease task performance and might eventually cause your workflow tasks to halt.

Collect Historical Metrics

When enabled, the task retains certain metrics for up to 30 days. The information is displayed in several graphs on the task Metrics page.

When disabled, the task retains only the most recent set of metrics.

For information on the metrics that are collected, see Task details, status, and results.

Collecting historical metrics can consume a significant amount of memory in your system. This can decrease task performance and might eventually cause your workflow tasks to halt.

Consider disabling this option when running your workflow in production.

Error Handling settings

Setting NameDescriptionValuesTipsNotes and considerations
Bypass reporting of successful documentsIf enabled, successfully processed documents previously recorded as having failed in a workflow will not be reported. Bypassing these additional reports will increase your system’s overall performance while maintaining the option for you to process the failures separately. Yes: The setting is enabled and all successful documents will be processed.

No: Documents will be processed according to your regular workflow settings.

To manually process your document failures, choose your workflow and select Failures > Retry Document Failures.

Continue processing failed documents

When enabled, documents that fail during processing continue through the workflow. The error messages for these documents get added to the HCI_failure field.Yes: The setting is enabled and failed documents are processed.

No: Documents will be processed according to your regular workflow settings.

  • When a task revisits a failed document and that document fails again, the second failure is also counted towards this limit.
  • When a task pauses or goes idle and then resumes, the number of failures that count toward this limit are reset.
Halt task after set amount of failuresWhen enabled, specifies that the task should stop when it encounters the number of document failures that you specify.Yes: The setting is enabled and the task stops after the set number of document failures.

No: Documents will be processed according to your regular workflow settings.

A large number of errors can indicate that something is wrong with your pipeline. Enabling this setting allows you to not waste time and computational resources on processing additional documents with a faulty pipeline.

Memory settings

  • Driver Heap Limit: The amount of memory to allocate to the task component that reads files from data sources, in either megabytes or gigabytes. Valid values have this format:
    <number-of-bytes>[m|g]

    The default is 1024m.

    Increase this setting if your task experiences Crawler-type OutOfMemory errors.

  • Executor Heap Limit: The amount of memory to allocate to the task component that performs the work of processing pipelines, in either megabytes or gigabytes. Valid values have this format:
    <number-of-bytes>[m|g]

    The default is 1024m.

    Increase this setting if your task experiences Stage-type OutOfMemory errors.

Reconfiguring tasks

You can change the settings and schedule for a task at any time, even while the task is running.

When you reconfigure a task:

  • If the task is currently running, you need to either restart or pause and resume the task for it to begin using the new settings.
  • If the task is currently idle, the task will use the new settings the next time it starts.

To reconfigure a task for a workflow:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

  4. Click Actions > Edit settings.

  5. In the Schedule section, use the calendar tool to set when the task should run.

  6. Configure the other settings for the task.

  7. Click Update.

Scheduling tasks

You can configure a schedule for each workflow task. You can use this, for example, to ensure that a task does not consume system resources during business hours.

Running without a schedule

By default, a task's schedule is empty. This means that the task does not run automatically and must be run manually.

Running with a schedule

When you configure a schedule for a task, the task runs only during the specified periods of time. It starts automatically when a scheduled time period begins and stops when the time period ends.

NoteStarting a test manually overrides the task's schedule. If you start the task yourself, even during a non-scheduled time period, the task starts running.
Workflow-Agent job type schedules

When you configure a schedule for workflow task, identical changes are made to the schedule for the corresponding Workflow-Agent job.

Note
  • When you first configure a task schedule, do not use the Run option to start the task. It starts automatically at the beginning of the next scheduled block of time.
  • When configuring a task schedule, you specify times in your local time zone. The task scheduler does not account for daylight savings time.
How to schedule a task

When editing a workflow task in the Admin App, you use the calendar tool to configure the task's schedule.

GUID-7E4515BF-7126-4474-AFC8-D12835D16F35-low.png

Procedure

  1. In this tool, select a day to add a block of time. Click and drag the block to cover the hours you want the task to run.

  2. To remove a block, right-click it.

Results

After you edit the schedule for a workflow task, the workflow is marked with the scheduled icon: GUID-8E45254C-AEF0-4E70-BB7E-C2D89D227B70-low.png

Determining when a running task will stop

When a task is in the Running state, the task's schedule and Check for Updates setting determine when, or if, the task stops automatically.

ScheduledCheck for Updates settingWhen a running task stops

Yes

Enabled

The task enters the Running state at the beginning of a scheduled time block. At the end of that block , the task stops running and enters the Idle state.

The task resumes at the beginning of the next scheduled block of time.

Because the Check for Updates setting is enabled, the task never reaches the Completed state.

No

Enabled

The task runs endlessly and remains in the Running state.

Because the Check for Updates setting is enabled, the task never reaches the Completed state.

Yes

Disabled

The task runs until it either:

  • Reaches the end of a scheduled block of time without reading all documents in the data source. At this point, it stops running and enters the Idle state. The task resumes at the beginning of the next scheduled block of time.
  • Finishes reading all documents in the data source. The task then stops with a status of Completed and never resumes.

No

Disabled

The task runs until it reads all files in the workflow inputs. The task then stops with a status of Completed.

Task details, status, and results

After a workflow task has started running, you can view or retrieve information about how the task is performing.

The following sections detail workflow task details.

Task Status

GUID-3095009B-2E85-4994-A664-B911EBFAF5C5-low.png
  • Running: The task is currently doing work.
  • Idle: The task is not currently doing work for one of these reasons:
    • The task is configured to run on a schedule and is currently off hours.
    • The task has never been run.
    • The task was cleared but has not yet been restarted.
  • Paused: The task is not currently doing work because you paused it. The task will not start again until either:
    • You resume it.
    • Its next scheduled run time, if the task is configured to run on a schedule.
  • Halted: The task encountered an error that prevents it from running.
  • Completed: The task has processed all documents from all workflow inputs. A task in the Completed state never resumes, even when configured to run on schedule.

A task can never reach this state if the Check for Updates setting is enabled.

Status messages

While running, the task displays messages about the work it's currently performing and the work that it has completed.

GUID-2A7B4EA8-C5EC-40E6-B166-6ECE68EE1FA5-low.png

Possible task status messages include:

  • Collected a new batch of <batch-size> documents: This message indicates that the task has collected a new batch of documents from the data connections. The timestamp indicates when the collection last occurred.
  • Processing N batches in parallel: This message shows the number of active jobs being processed in parallel. This is determined by the Parallel Jobs setting for the task. Each job consumes a single batch.
  • Processing has completed for X of Y batches: This message appears when at least one job has completed it's work, and is awaiting a checkpoint operation. This message replaces the previous message when the completed count is greater than 1.
  • Job-<job-number>: <Running|Completed> (<duration>): N documents: For an active job, this message shows the job's identifying number, status, duration, and number of documents processed.
  • Last successful checkpoint (<duration>): Shows the completion time for the last successful checkpoint operation and how long that operation took to complete. A long duration can indicate performance problems or service issues.
  • Checkpoint in progress (<duration>): A checkpoint operation is currently active. The message shows the amount of time that the operation has been active. A long duration can indicate performance problems or service issues.
  • <Current Date & Time> - Document: <document_id> has been processing for <long_value> seconds. This is above the threshold configured for this workflow.: Alerts users to the presence of stuck or stalled documents and gives increased visibility into workflow progress for files taking longer periods of time to process. The default time is 5 minutes and can be manually adjusted under Edit Settings > Performance > Custom. To learn more, see Performance settings.

Status icons

IconDescription
GUID-8E45254C-AEF0-4E70-BB7E-C2D89D227B70-low.pngThe task runs according to a schedule.
GUID-D0030AA2-021B-47BC-9A65-3F64B4E803F1-low.pngOne or more documents failed to be processed. Click this icon to view the list of document failures.
GUID-62884F3B-2068-43EF-84D5-215696C8CED8-low.pngOne of these:
  • The task has encountered an error and cannot continue running.
  • The task has reached the specified number of document failures for the Error Handling setting.

Click this icon to view the list of task errors.

Failures

Summarizes information about documents that failed to be processed or errors that the task encountered.

This section shows only current failures and errors. When a task runs again or revisits document failures, any cleared errors or failures are removed from the list.

Document Failures

The Document Failures tab shows documents that failed to be processed by a stage or added to an index. For each failure, the tab shows:

  • Date: The date and time that the failure occurred.
  • Category: One of these:
    • Stage: The document failed to be processed by a stage in the pipeline.
    • Index: The document failed to be indexed by the workflow output.
    • Crawler: The document failed to be retrieved by the data connection.
  • Failure: A short description of the failure. Click the expand icon (GUID-AEFDF58B-EC13-4B14-93C6-C4A4D50FBEE3-low.png) to view the complete error message text reported by the applicable plugin.

Select an individual failure to view:

  • Document ID/URI: The URI or ID of the document that caused the error.
  • Message: A short description of the failure.
  • Details: The stack trace for the failure. Use this to determine which component produced the error.
Task Errors

The Task Errors tab shows errors encountered by the task itself. These errors don't apply to individual documents.

For each error, the tab shows:

  • Date: The date and time that the error occurred.
  • Category: Workflow
  • Error: A short description of the failure. Click the expand icon (GUID-AEFDF58B-EC13-4B14-93C6-C4A4D50FBEE3-low.png) to view the complete error message text reported by the applicable plugin.

Metrics

Summarizes information about the documents processed within the last 30 days.

Summary information GUID-566A3F8F-58CA-466D-8C48-6313F0130A55-low.png
  • Average DPS: Average number of documents processed per second.
  • Input: The number of documents that have entered the pipeline.
  • Output: The number of documents that exited the pipeline.
  • Expanded: The number of documents added by expanding archive files. This number does not include the archive files themselves.
  • Dropped: Typically, the number of documents removed from the pipeline by a Drop Documents stage.

    This metric is also incremented if any of your own custom stages exhibit this behavior; when a document enters the stage, it produces zero output documents and zero errors.

  • Failed: The number of documents that failed to be processed.
Historical metrics graphs

The historical metrics graphs for a workflow task are line graphs that show changes in some task metric over time.

GUID-E2F297CB-35A5-4AA1-9E03-00454650C06C-low.png

You can use the focusing tool at the bottom of the graph to zoom in on a particular section of data.

GUID-C6DD3799-68AA-4848-9BA4-B6DC683ABF08-low.png

The Collect Historical Metrics setting must be enabled for these graphs to display information.

TipYou can improve task performance by disabling the Collect Historical Metrics setting.
  • Documents Per Second graph: Shows changes over a period of time in both the average and actual numbers of documents processed per second.
  • Task Metrics graph: Shows changes over a period of time in the number of documents input, output, expanded, dropped, and failed.
  • Document Updates graph: Shows changes over a period of time in the numbers these values:
    • Update requests: The number of documents entering the workflow pipeline that include the HCI_operation:CREATED field/value pair.
    • Updates performed: The number of documents exiting the workflow pipeline that include the HCI_operation:CREATED field/value pair.
  • Document Deletes graph: Shows changes over a period of time in the numbers of these values:
    • Delete requests: The number of documents entering the workflow pipeline that include the HCI_operation:DELETED field/value pair.
    • Deletes performed: The number of documents exiting the workflow pipeline that include the HCI_operation:DELETED field/value pair.

Performance

Summarizes information about the how quickly documents were processed by the pipeline and its individual stages.

SectionDescription

On the Performance window

The Performance window shows:

  • Runtime: The amount of real-world time since the task started running.
  • Total CPU Processing Time: The total amount of computing time spent processing the task, across all instances.
  • Least Active Stage: The stage that spent the least amount of time processing documents during the task run.
  • Most Active Stage: The stage that spent the most amount of time processing documents during the task run.

Performance window > Overview tab

The Overview tab shows:

  • Slowest Stage (average): The stage with the slowest average document processing time.
  • Fastest Stage (average): The stage with the fastest average document processing time.
  • Slowest Individual Stage: Shows:
    • The stage that took the longest amount of time to process a single document.
    • The amount of time it took to process that document.
  • Fastest Individual Stage: Shows:
    • The stage that took the shortest amount of time to process a single document.
    • The amount of time it took to process that document.
  • Smallest Deviation: The stage with the smallest standard deviation in document processing times.
  • Largest Deviation: The stage with the largest standard deviation in document processing time.

Performance window > Stage Metrics tab

Contains the Visual View and Table View subtabs. These subtabs show, respectively, bar graphs and tabular views of performance for all stages in the workflow pipeline.

For each stage, both subtabs show:

  • The minimum, average, and maximum per-document processing times
  • The number of times that the stage ran during the pipeline

    On the Visual View tab, you can enable or disable the Include Times Run option to configure whether the bars in the graph reflect the number of times that the stage ran. That is, when you enable this option, all processing times are multiplied by the number of times that the stage ran.

    Tip: Leave this option enabled to get a more accurate visual representation of where your pipeline spends the most time processing documents.

The Table View subtab also shows each stage's standard deviation for processing times.

Tip: Use the information on this window to identify document processing bottlenecks in your pipeline.

Aggregations task details

Lists the aggregations that the workflow has. Click an aggregation to view the information collected for that aggregation.

Triggers

List the triggers that the workflow has and also lets you add triggers to the workflow.

Viewing workflow task details

To view your workflow task details:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

Results

The task details appear.

Task status icons

This table describes the status icons that can be displayed on the Task page for a workflow.

IconDescription
GUID-8E45254C-AEF0-4E70-BB7E-C2D89D227B70-low.pngThe task runs according to a schedule.
GUID-D0030AA2-021B-47BC-9A65-3F64B4E803F1-low.pngOne or more documents failed to be processed. Click this icon to view the list of document failures.
GUID-62884F3B-2068-43EF-84D5-215696C8CED8-low.pngOne of these:
  • The task has encountered an error and cannot continue running.
  • The task has reached the specified number of document failures for the Error Handling setting.

Click this icon to view the list of task errors.

Working with fields discovered by a workflow task

You can take the list of fields discovered by a workflow task and import some or all of those fields into an index collection. For information, see Importing fields from a workflow into an index collection.

Clearing workflow tasks

At any time, you can clear all data from a workflow task. Doing this deletes all result details from the workflow, including performance metrics, historical metrics graphs, and the values for all aggregations associated with the workflow. The workflow keeps its task settings, including its schedule.

Effects on indexes

Clearing a workflow task does not delete any search indexes associated with the workflow. If you start the task running again, it updates the index; that is, any documents that were previously indexed are reindexed with any changes that you've made.

Starting a task over

Clearing task data has the same effect as starting the task over, except that when you start a task over:

  • The task begins running again
  • Historical metrics graphs are kept
To clear a workflow task:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want

  3. Click the Task window.

  4. Click Actions.

  5. In the menu, click Clear Workflow Task.

Starting a task over

At any time, you can restart a workflow task from the beginning. When you do this, some data is deleted from the workflow, including:

  • Performance metrics, such as the Input or Average DPS values.
  • The values for all aggregations.

The workflow retains:

  • Any graphical historic metrics data, such as the Documents Per Second graph.
  • All task settings, including its schedule.

Effects on indexes

Starting a workflow task over does not delete any search indexes associated with the workflow. When the task starts running again, it updates the indexes. The documents that were previously indexed are reindexed with any changes that you've made.

Clearing a workflow task

Starting a task over has the same effect as clearing the task, except that when you clear a task:
  • The task does not begin running.
  • Historical metrics graphs are deleted.
To start a task over:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

  4. Click Actions.

  5. In the menu, click Start over.

Downloading task reports

You can download workflow task details as a report file. You can download the complete report or only specific categories.

When you download a report, it contains details for the point in time at which you downloaded it. You cannot download reports for past task details.

Task reports are formatted in JavaScript Object Notation (JSON).

Task report categories
  • Metrics: Includes overall task information such as number of documents processed, number of failures encountered, and total task run time.
  • Failures: Includes information about documents that either failed to be indexed or failed to be processed by a stage in the workflow pipeline. For each failure, the report contains the complete error message reported by the applicable stage or index plugin.

    You can download information for up to 10,000 failures in a single report.

  • Aggregations: Includes values for each aggregation in the workflow.
  • Performance: Includes the minimum, maximum, and average document processing times for each stage in the workflow pipeline.

To download task reports:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

  4. Click Actions.

  5. In the menu, click Download report.

  6. In the window that appears, select the parts of the report you want.

  7. If you are downloading information about failures, use the Failure Start Offset and Maximum Failures to Download fields to specify the subset of failures that you want.

    For example, to download information about the second hundred failures, specify:

    • Failure Start Offset: 101
    • Maximum Failures to Download: 200
  8. Click Download Report.

Testing individual document failures

If a stage fails to process a document during a workflow task, you can learn more about the failure by testing a pipeline using the failed document.

TipIf you encounter a large number of document failures for a task, you can have the task retry only the failed documents.

To test individual document failures:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

  4. Click the Failures window.

  5. On the Document Failures tab, select a failed document.

  6. Click Test Document.

  7. From the list, select one of these:

    • To test an individual pipeline, Pipeline: <pipeline-name>
    • To test the entire workflow pipeline, Workflow: <workflow-pipeline>

    The pipeline test page opens and begins testing the pipeline.

    If the test fails for a stage, the stage displays this icon:

  8. Click the View Results link for the stage to view the error message reported by the stage.

Retrying task failures

The Failures section of the task details view for a workflow shows information about all documents that the task failed to process.

At any point, you can have a task retry these failures. When you do this, your system starts a new, separate task just for processing these failures.

Note
  • Some document failures are caused by transient problems like connection issues, and might be resolved by being retried. Other document failures may not be resolved by being retried.
  • You can also configure a task to automatically retry document failures for certain types of data connections.
  • To learn more about why an individual document failed to be processed, you can test your workflow pipeline using only that document.

To retry task failures:

Procedure

  1. Click the Workflow Designer window.

  2. Select the workflow that you want.

  3. Click the Task window.

  4. Click Failures.

  5. On the Document Failures tab, click Retry Document Failures.

  6. A Retry Failures Task window appears. This window displays the progress of the task. From the Actions menu for the task, you can:

    • Pause or resume the task.
    • Clear the task.
    • Start the task over.

 

  • Was this article helpful?