Workflow tasks
After you've built a workflow, you can run a task for it. Running a task for a workflow causes the system to run a job, which in turn performs the work that the workflow specifies. As a task runs, the system does one or more of these:
- Connects to each workflow input and reads data from it.
- Converts that data into representations called documents.
- Sends documents through the workflow pipeline.
- Sends the documents produced by the workflow pipeline to each of the workflow's outputs.
You can run a workflow task continuously or schedule it to run only during the times you specify.
Running workflow tasks
After you've finished configuring your workflow, you can run a task for the workflow.
Run a workflow task
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Optionally, to configure the settings for the task, click Actions > Edit settings.
To run the task continuously, enable the Check for Updates option.
Do one of these:
To start the task running immediately and not according to a schedule, click Actions > Run Workflow Task.
To configure the task to run on a schedule:
- Click Actions > Edit settings.
- In the Schedule section, use the calendar tool to set when the task should run.
- Click Update.
Related CLI commands
runWorkflow
Related REST API methods
POST /workflows/{uuid}/task
Results
Running multiple workflows at the same time
By default, multiple workflows cannot run at the same time on the same instances. This is because each workflow is configured to use all available system CPU resources on the instances where it runs.
To run multiple workflows simultaneously, do one of these:
- Configure workflows to not run on the same instances.
Each workflow is associated with a job, which is the system resource that actually performs the work for the workflow. Your system administrator can configure each workflow's job to run on a different instance or set of instances from another workflow's job.
- Configure workflows to share instance resources.
To run multiple workflows concurrently, reconfigure the CPU Core Maximum setting for each workflow task to limit the amount of CPU resources that the task can use. This allows two or more workflow tasks to be in the Running state at the same time. For example, say you have 16 total CPU cores across all instances in your system (4-instance system with 4-core CPUs). To share these cores equally across two tasks, set the CPU Core Maximum setting to 8 for both workflow tasks.
- Configure workflows to run at different times on the same instances.
You can interrupt one running task to run another by doing either of these:
- Schedule tasks to run at different times.
- Pause one task and then start a second. After the second task has completed, you can resume the first task.
Configuring where workflows run
You can specify which system instances workflow workflows are allowed to run on. You can do this for individual workflows or for all workflows by configuring where Workflow-Agent jobs are allowed to run.
To specify where workflows can run:
Procedure
Specify a number of instances to be able to run jobs at all. In the admin app, you do this on the System Configuration > Jobs > All Jobs page.
Of those instances, specify which ones are allowed to run Workflow-Agent jobs. You do this on the System Configuration > Jobs > Workflow-Agent page.
Of those instances, specify which ones are allowed to run a particular job. You do this on the System Configuration > Jobs > job-name page.
For more information, see Configuring where workflows run.
Pausing and resuming tasks
If a workflow task is in the Running or Idle state, you can pause it. While paused, a task does not read, process, or index documents.
At any time, you can resume a paused task. If the paused task is configured to run on a schedule, it will start automatically at the beginning of its next scheduled time period.
- Avoid frequently pausing and resuming a task. This might cause the system to unnecessarily reread files.
- When you pause a workflow, the corresponding job changes to the Canceled state. Currently, there is no Paused state for jobs.
Procedure
Click the Workflow Designer window.
Click the play or pause icon for the workflow that you want.
Alternatively:
Select the workflow that you want.
Click the Task window.
Click Actions.
In the menu, click Pause Workflow Task or Resume Workflow Task, as applicable.
Related CLI commands
runWorkflow
pauseWorkflow
Related REST API methods
POST /workflows/{uuid}/task/pause
POST /workflows/{uuid}/task
Task settings
This topic describes the settings you can configure for a workflow task. For information on:
Editing task settings, see Reconfiguring tasks.
Best practices for editing tasks, see Best practices for running workflow tasks.
Document Discovery settings
Setting Name | Description | Values | Tips | Notes and considerations |
Check for Updates |
Specifies whether the task should run one time or continuously. |
Disabled: The task runs only one time. It does not discover any documents added to, changed, or deleted from the data source after its initial verification. This is the default. After the task finishes its scan of all documents in all workflow inputs, the task status changes to Completed. Enabled: The task runs continuously. It periodically scans the workflow inputs for new, deleted, and changed documents. With this setting enabled, the task state never changes to Completed. Use the Time Between Checks setting to specify the number of seconds that the task should wait from the end of the last check until it checks the data source again for updates. The default is 86400 (24 hours). |
Disable this setting if the contents of your data sources never change. | Running multiple workflow tasks concurrently By default, each workflow task is configured to use all available system CPU resources. This means that if you enable the Check for Updates setting for a workflow and run it, no other workflow tasks can run. Checking multiple data connections The behavior of the Time Between Checks setting differs for each data connection in the workflow. For example, say a workflow has two data connections. The task finished its last check of data connection A at 11AM and its last check of data connection B at 12PM. If you set Time Between Checks to 86400 (24 hours), the task will not reexamine data source A until at least 11AM and data source B until at least 12PM the next day. Running without a schedule By default, a task's schedule is empty. This means that the task does not run automatically and must be run manually. Running with a schedule When you configure a schedule for a task, the task runs only during the specified periods of time. It starts automatically when a scheduled time period begins and stops when the time period ends. Note: Starting a test manually overrides the task's schedule. If you start the task yourself, even during a non-scheduled time period, the task starts running. |
Preprocessing Recursion |
When enabled, new documents created by a stage are not sent to the next stage. Rather, they are sent to the beginning of the first pipeline in the workflow with the Preprocessing execution mode. |
Preprocessing Recursion limit: Limits the number of times that documents extracted from archive files can be sent back to the first Preprocessing pipeline. This setting protects your system from infinite recursions (that is, an infinite number of archives within archives). |
Enable this setting to eliminate unnecessary stages from your pipeline. | Note: A document output by a stage is considered new if its HCI_id field value does not match the HCI_id field value of the document that entered the stage. |
Process All Documents | If Process All Documents is enabled, all documents (including folders and directories without content) are passed through the pipelines and outputs. ImportantProcess All Documents will only pick up documents after it has been enabled. Additionally, if it is enabled while a list-based workflow is paused, only newly created directories will be processed as documents once the workflow is resumed. | Yes: The setting is enabled and all documents will be processed. No: Documents will be processed according to your regular workflow settings. | This task setting is disabled by default. |
|
Retry Failed Documents |
If Check for Updates is enabled, specifies how the task should handle failed documents when reexamining a list-based data connection for changed documents |
Disabled: The task retries failed documents only if their contents have changed. Enabled: The task retries failed documents even if their contents have not changed. When enabled, each document retried counts towards the values displayed on the task's Metrics page. |
Enable this setting only when you expect your workflow to encounter temporary failures, such as connection issues, that can be resolved by trying failed documents again. Some failures cannot be resolved simply by retrying a document. With Retry Failed Documents enabled, the task wastes resources continually retrying such failures. |
This setting affects only list-based data connections, such as the built-in HCP data connection. A change-based data connection does not revisit any document unless it detects that the document's contents have changed. This is true even when a document fails to be processed by a workflow. |
Workflow Agent Recursion |
When enabled, new documents created by a stage are not sent to the next stage. Rather, they are sent to the beginning of the first pipeline in the workflow with the Workflow-Agent execution mode. |
Workflow Agent Recursion limit: Limits the number of times that documents extracted from archive files can be sent back to the first Workflow-Agent pipeline. This setting protects your system from infinite recursions (that is, an infinite number of archives within archives). |
Enable this setting to eliminate unnecessary stages from your pipeline. | Note: A document output by a stage is considered new if its HCI_id field value does not match the HCI_id field value of the document that entered the stage. |
Performance settings
By default, the Performance Settings field for a workflow task is set to Default. With this setting, the task is automatically configured with default performance values appropriate for running a single workflow at a time on both single-instance and multi-instance systems.
To manually configure performance settings for a task, change this setting to Custom.
- Configuring task performance settings incorrectly can harm task performance or cause the task to experience out-of-memory errors.
- Before trying to change task performance settings, you should evaluate your stages and pipelines to ensure that your task is not performing unnecessary work.
- For information on changing task settings to allow you to run multiple workflow tasks concurrently, see Running multiple workflows at the same time.
The performance settings for each task let you configure these aspects of how a task runs:
- CPU resource usage for the task.
- The number of documents that the task works on at a time.
- The number of subdivisions (called jobs and partitions) to split a task into for parallel processing.
Setting Name | Description | Values | Tips | Notes and considerations |
Parallel Jobs |
Specifies the number of subdivisions to split tasks into. These subdivisions are called jobs. |
Positive integers, not including zero. The default value is two. |
Specifying a value of two or more might improve performance by allowing task work to be done in parallel. However, setting this value too high might harm task performance as the overhead from creating multiple jobs might offset the time savings from running task work in parallel. | |
Reported extra cores |
Allows certain internal components (the ones that perform task work) to claim that they have more CPU cores allocated to them than they really do. You can specify the number of additional cores for each component to advertise. |
Positive integers, including zero. The default is four. |
If a workflow task takes a long time to run, verify the CPU utilization for your worker instances. If it is a low percentage, try increasing the Reported extra cores setting so the task uses more CPU resources. |
Increasing this value might improve task performance because it allows certain internal components to be assigned more work than they otherwise can handle. However, increasing this value too high might slow task performance. |
Partitions Per Job |
To allow task work to be processed concurrently, Hitachi Content Intelligence internally breaks tasks down into smaller divisions called jobs. Each job is then further broken down into divisions called partitions. This settings specifies the number of partitions across the entire system used to process the task. |
Positive integers or -1. By default, the value is -1, which is interpreted as 32 times the number of instances on which this workflow job is configured to run, up to a maximum of 128. |
To tune this setting: Specify a value 32 times the number of instances on which the workflow is configured to run. Then run the task. Specify a value 1.5 times the amount of the value you previously entered. Then run the task again. Continue until task performance stops improving. |
If you edit this setting, at a minimum, specify a number greater than or equal to the number of CPU cores across all instances in the system. |
Processing Batch Size |
The maximum number of documents to read into memory before batching for parallel processing. A batch is created after either of these happens: The target number of documents specified by this setting are read into memory. The amount of time specified by the Processing Batch Size Timeout setting elapses. |
Positive integers. The default is 10,000. |
Typically, a higher value for this setting is ideal. As a task runs, it needs to spend time setting up and tearing down each job. Because of this, it's more efficient for each job process as many documents as possible. However, a lower value might be better if your input data connections are slow to read data from. In that case, the task might spend more time waiting for additional documents to be read when it can be processing the documents it has already read. | |
Processing Batch Size Timeout |
The maximum time to wait, in seconds, for documents to be read into memory before batching for parallel processing. A batch is created after either of these happens: The amount of time specified by this setting elapses. The target number of documents specified by the Processing Batch Size setting are read into memory. |
Positive integers. The default value is 60. Specify 0 to use no timeout. |
If your data connection reads documents in real-time, specify a low value for this setting. |
Setting this value too small might cause document batch sizes to be too small, thereby reducing task performance. |
CPU Core Maximum | Limits the maximum number of CPU cores that each job in the task can use. |
Positive integers. The default value is -1, which means that the task uses all available CPU cores. |
Use this setting to run multiple concurrent tasks: To run two tasks with equal priority, specify the same CPU Core Maximum value for each task. To prioritize one task over another, assign a higher CPU Core Maximum value to the one you want the system to devote more resources to. Because the minimum value you can specify for this setting is 1, the maximum number of tasks you can run concurrently is equal to the number of CPU cores across all instances in the system. | |
Output Batch Size | The number of documents to send at a time to the workflow outputs. |
Positive integers. The default is 100. Specifying a value of 1 disables batching. |
Increasing this setting means more documents are sent per request, thereby lowering the number of output requests that the task needs to make. This can increase workflow output performance at the cost of higher memory utilization. To get a performance benefit, a workflow's outputs must support batching. | |
Doc Process Time Limit | The max amount of time (in seconds) that should pass while processing a single document through a stage before reporting that document as taking a long time to process. | The default time limit is 5 minutes (300 seconds). | This value can be adjusted to help in determining stuck or stalled documents while providing increase visibility into your workflow status. To learn more, see Status messages. |
Metrics settings
Setting Name | Description |
Performance Impact |
Collect Aggregation Metrics |
When enabled, specifies that the task should collect certain information from the documents it processes. The information collected depends on the aggregations configured for the workflow. For information, see Aggregations. This setting does not affect the collection of other task metrics, such as data about document failures or stage performance. For information on viewing task discovery metrics, see Metrics settings. This setting is enabled by default. |
Collecting aggregation metrics can consume a significant amount of memory in your system. This can decrease task performance and might eventually cause your workflow tasks to halt. |
Collect Historical Metrics |
When enabled, the task retains certain metrics for up to 30 days. The information is displayed in several graphs on the task Metrics page. When disabled, the task retains only the most recent set of metrics. For information on the metrics that are collected, see Task details, status, and results. |
Collecting historical metrics can consume a significant amount of memory in your system. This can decrease task performance and might eventually cause your workflow tasks to halt. Consider disabling this option when running your workflow in production. |
Error Handling settings
Setting Name | Description | Values | Tips | Notes and considerations |
Bypass reporting of successful documents | If enabled, successfully processed documents previously recorded as having failed in a workflow will not be reported. Bypassing these additional reports will increase your system’s overall performance while maintaining the option for you to process the failures separately. | Yes: The setting is enabled and all successful documents will be processed. No: Documents will be processed according to your regular workflow settings. | To manually process your document failures, choose your workflow and select Failures > Retry Document Failures. | |
Continue processing failed documents | When enabled, documents that fail during processing continue through the workflow. The error messages for these documents get added to the HCI_failure field. | Yes: The setting is enabled and failed documents are processed. No: Documents will be processed according to your regular workflow settings. |
| |
Halt task after set amount of failures | When enabled, specifies that the task should stop when it encounters the number of document failures that you specify. | Yes: The setting is enabled and the task stops after the set number of document failures. No: Documents will be processed according to your regular workflow settings. | A large number of errors can indicate that something is wrong with your pipeline. Enabling this setting allows you to not waste time and computational resources on processing additional documents with a faulty pipeline. |
Memory settings
- Driver Heap Limit: The amount of memory to allocate to the task component that reads files from data sources, in either megabytes or gigabytes. Valid values have this format:
<number-of-bytes>[m|g]
The default is 1024m.
Increase this setting if your task experiences Crawler-type OutOfMemory errors.
- Executor Heap Limit: The amount of memory to allocate to the task component that performs the work of processing pipelines, in either megabytes or gigabytes. Valid values have this format:
<number-of-bytes>[m|g]
The default is 1024m.
Increase this setting if your task experiences Stage-type OutOfMemory errors.
Reconfiguring tasks
You can change the settings and schedule for a task at any time, even while the task is running.
When you reconfigure a task:
- If the task is currently running, you need to either restart or pause and resume the task for it to begin using the new settings.
- If the task is currently idle, the task will use the new settings the next time it starts.
To run a task for a workflow:
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Click Actions > Edit settings.
In the Schedule section, use the calendar tool to set when the task should run.
Configure the other settings for the task.
Click Update.
Related CLI commands
editWorkflow
Related REST API methods
PUT /workflows/{uuid}
Scheduling tasks
You can configure a schedule for each workflow task. You can use this, for example, to ensure that a task does not consume system resources during business hours.
Running without a schedule
By default, a task's schedule is empty. This means that the task does not run automatically and must be run manually.
Running with a schedule
When you configure a schedule for a task, the task runs only during the specified periods of time. It starts automatically when a scheduled time period begins and stops when the time period ends.
When you configure a schedule for workflow task, identical changes are made to the schedule for the corresponding Workflow-Agent job.
- When you first configure a task schedule, do not use the Run option to start the task. It starts automatically at the beginning of the next scheduled block of time.
- When configuring a task schedule, you specify times in your local time zone. The task scheduler does not account for daylight savings time.
When editing a workflow task in the Admin App, you use the calendar tool to configure the task's schedule.

Procedure
In this tool, select a day to add a block of time. Click and drag the block to cover the hours you want the task to run.
To remove a block, right-click it.
Results

Related CLI commands
editWorkflow
Related REST API methods
PUT /workflows/{uuid}
Determining when a running task will stop
When a task is in the Running state, the task's schedule and Check for Updates setting determine when, or if, the task stops automatically.
Scheduled | Check for Updates setting | When a running task stops |
Yes |
Enabled |
The task enters the Running state at the beginning of a scheduled time block. At the end of that block , the task stops running and enters the Idle state. The task resumes at the beginning of the next scheduled block of time. Because the Check for Updates setting is enabled, the task never reaches the Completed state. |
No |
Enabled |
The task runs endlessly and remains in the Running state. Because the Check for Updates setting is enabled, the task never reaches the Completed state. |
Yes |
Disabled |
The task runs until it either:
|
No |
Disabled | The task runs until it reads all files in the workflow inputs. The task then stops with a status of Completed. |
Task details, status, and results
After a workflow task has started running, you can view or retrieve information about how the task is performing.
The following sections detail workflow task details.
Task Status

- Running: The task is currently doing work.
- Idle: The task is not currently doing work for one of these reasons:
- The task is configured to run on a schedule and is currently off hours.
- The task has never been run.
- The task was cleared but has not yet been restarted.
- Paused: The task is not currently doing work because you paused it. The task will not start again until either:
- You resume it.
- Its next scheduled run time, if the task is configured to run on a schedule.
- Halted: The task encountered an error that prevents it from running.
- Completed: The task has processed all documents from all workflow inputs. A task in the Completed state never resumes, even when configured to run on schedule.
A task can never reach this state if the Check for Updates setting is enabled.
Status messages
While running, the task displays messages about the work it's currently performing and the work that it has completed.

Possible task status messages include:
- Collected a new batch of <batch-size> documents: This message indicates that the task has collected a new batch of documents from the data connections. The timestamp indicates when the collection last occurred.
- Processing N batches in parallel: This message shows the number of active jobs being processed in parallel. This is determined by the Parallel Jobs setting for the task. Each job consumes a single batch.
- Processing has completed for X of Y batches: This message appears when at least one job has completed it's work, and is awaiting a checkpoint operation. This message replaces the previous message when the completed count is greater than 1.
- Job-<job-number>: <Running|Completed> (<duration>): N documents: For an active job, this message shows the job's identifying number, status, duration, and number of documents processed.
- Last successful checkpoint (<duration>): Shows the completion time for the last successful checkpoint operation and how long that operation took to complete. A long duration can indicate performance problems or service issues.
- Checkpoint in progress (<duration>): A checkpoint operation is currently active. The message shows the amount of time that the operation has been active. A long duration can indicate performance problems or service issues.
- <Current Date & Time> - Document: <document_id> has been processing for <long_value> seconds. This is above the threshold configured for this workflow.: Alerts users to the presence of stuck or stalled documents and gives increased visibility into workflow progress for files taking longer periods of time to process. The default time is 5 minutes and can be manually adjusted under Edit Settings > Performance > Custom. To learn more, see Performance settings.
Status icons
Icon | Description |
![]() | The task runs according to a schedule. |
![]() | One or more documents failed to be processed. Click this icon to view the list of document failures. |
![]() | One of these:
Click this icon to view the list of task errors. |
Failures
Summarizes information about documents that failed to be processed or errors that the task encountered.
This section shows only current failures and errors. When a task runs again or revisits document failures, any cleared errors or failures are removed from the list.
The Document Failures tab shows documents that failed to be processed by a stage or added to an index. For each failure, the tab shows:
- Date: The date and time that the failure occurred.
- Category: One of these:
- Stage: The document failed to be processed by a stage in the pipeline.
- Index: The document failed to be indexed by the workflow output.
- Crawler: The document failed to be retrieved by the data connection.
- Failure: A short description of the failure. Click the expand icon (
) to view the complete error message text reported by the applicable plugin.
Select an individual failure to view:
- Document ID/URI: The URI or ID of the document that caused the error.
- Message: A short description of the failure.
- Details: The stack trace for the failure. Use this to determine which component produced the error.
The Task Errors tab shows errors encountered by the task itself. These errors don't apply to individual documents.
For each error, the tab shows:
- Date: The date and time that the error occurred.
- Category: Workflow
- Error: A short description of the failure. Click the expand icon (
) to view the complete error message text reported by the applicable plugin.
Metrics
Summarizes information about the documents processed within the last 30 days.

- Average DPS: Average number of documents processed per second.
- Input: The number of documents that have entered the pipeline.
- Output: The number of documents that exited the pipeline.
- Expanded: The number of documents added by expanding archive files. This number does not include the archive files themselves.
-
Dropped: Typically, the number of documents removed from the pipeline by a Drop Documents stage.
This metric is also incremented if any of your own custom stages exhibit this behavior; when a document enters the stage, it produces zero output documents and zero errors.
- Failed: The number of documents that failed to be processed.
The historical metrics graphs for a workflow task are line graphs that show changes in some task metric over time.

You can use the focusing tool at the bottom of the graph to zoom in on a particular section of data.

The Collect Historical Metrics setting must be enabled for these graphs to display information.
- Documents Per Second graph: Shows changes over a period of time in both the average and actual numbers of documents processed per second.
- Task Metrics graph: Shows changes over a period of time in the number of documents input, output, expanded, dropped, and failed.
- Document Updates graph: Shows changes over a period of time in the numbers these values:
- Update requests: The number of documents entering the workflow pipeline that include the HCI_operation:CREATED field/value pair.
- Updates performed: The number of documents exiting the workflow pipeline that include the HCI_operation:CREATED field/value pair.
- Document Deletes graph: Shows changes over a period of time in the numbers of these values:
- Delete requests: The number of documents entering the workflow pipeline that include the HCI_operation:DELETED field/value pair.
- Deletes performed: The number of documents exiting the workflow pipeline that include the HCI_operation:DELETED field/value pair.
Performance
Summarizes information about the how quickly documents were processed by the pipeline and its individual stages.
Section | Description |
On the Performance window |
The Performance window shows:
|
Performance window > Overview tab |
The Overview tab shows:
|
Performance window > Stage Metrics tab | Contains the Visual View and Table View subtabs. These subtabs show, respectively, bar graphs and tabular views of performance for all stages in the workflow pipeline. For each stage, both subtabs show:
The Table View subtab also shows each stage's standard deviation for processing times. Tip: Use the information on this window to identify document processing bottlenecks in your pipeline. |
Aggregations task details
Lists the aggregations that the workflow has. Click an aggregation to view the information collected for that aggregation.
Triggers
List the triggers that the workflow has and also lets you add triggers to the workflow.
Viewing workflow task details
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Results
Related CLI commands
getWorkflowReport
getHistoricalWorkflowMetrics
Related REST API methods
POST /workflows/{uuid}/report
POST /workflows/{uuid}/report/historical
Task status icons
This table describes the status icons that can be displayed on the Task page for a workflow.
Icon | Description |
![]() | The task runs according to a schedule. |
![]() | One or more documents failed to be processed. Click this icon to view the list of document failures. |
![]() | One of these:
Click this icon to view the list of task errors. |
Working with fields discovered by a workflow task
You can take the list of fields discovered by a workflow task and import some or all of those fields into an index collection. For information, see Importing fields from a workflow into an index collection.
Clearing workflow tasks
At any time, you can clear all data from a workflow task. Doing this deletes all result details from the workflow, including performance metrics, historical metrics graphs, and the values for all aggregations associated with the workflow. The workflow keeps its task settings, including its schedule.
Affects on indexes
Clearing a workflow task does not delete any search indexes associated with the workflow. If you start the task running again, it updates the index; that is, any documents that were previously indexed are reindexed with any changes that you've made.
Starting a task over
Clearing task data has the same effect as starting the task over, except that when you start a task over:
- The task begins running again
- Historical metrics graphs are kept
Procedure
Click the Workflow Designer window.
Select the workflow that you want
Click the Task window.
Click Actions.
In the menu, click Clear Workflow Task.
Related CLI commands
resetWorkflowTask
Related REST API methods
DELETE /workflows/{uuid}/task
Starting a task over
At any time, you can restart a workflow task from the beginning. When you do this, some data is deleted from the workflow, including:
- Performance metrics, such as the Input or Average DPS values.
- The values for all aggregations.
The workflow retains:
- Any graphical historic metrics data, such as the Documents Per Second graph.
- All task settings, including its schedule.
Affects on indexes
Starting a workflow task over does not delete any search indexes associated with the workflow. When the task starts running again, it updates the indexes. The documents that were previously indexed are reindexed with any changes that you've made.Clearing a workflow task
Starting a task over has the same effect as clearing the task, except that when you clear a task:- The task does not begin running
- Historical metrics graphs are deleted
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Click Actions.
In the menu, click Start over.
Related CLI commands
startOverWorkflow
Related REST API methods
POST /workflows/{uuid}/task/startOver
Downloading task reports
You can download workflow task details as a report file. You can download the complete report or only specific categories.
When you download a report, it contains details for the point in time at which you downloaded it. You cannot download reports for past task details.
Task reports are formatted in JavaScript Object Notation (JSON).
- Metrics: Includes overall task information such as number of documents processed, number of failures encountered, and total task run time.
- Failures: Includes information about documents that either failed to be indexed or failed to be processed by a stage in the workflow pipeline. For each failure, the report contains the complete error message reported by the applicable stage or index plugin.
You can download information for up to 10,000 failures in a single report.
- Aggregations: Includes values for each aggregation in the workflow.
- Performance: Includes the minimum, maximum, and average document processing times for each stage in the workflow pipeline.
To download task reports:
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Click Actions.
In the menu, click Download report.
In the window that appears, select the parts of the report you want.
If you are downloading information about failures, use the Failure Start Offset and Maximum Failures to Download fields to specify the subset of failures that you want.
For example, to download information about the second hundred failures, specify:
- Failure Start Offset: 101
- Maximum Failures to Download: 200
Click Download Report.
Related CLI commands
getWorkflowReport
Related REST API methods
POST /workflows/{uuid}/report
Testing individual document failures
If a stage fails to process a document during a workflow task, you can learn more about the failure by testing a pipeline using the failed document.
Workflow Designer instructions
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Click the Failures window.
On the Document Failures tab, select a failed document.
Click Test Document.
From the list, select one of these:
- To test an individual pipeline, Pipeline: <pipeline-name>
- To test the entire workflow pipeline, Workflow: <workflow-pipeline>
The pipeline test page opens and begins testing the pipeline.
If the test fails for a stage, the stage displays this icon:
Click the View Results link for the stage to view the error message reported by the stage.
Related CLI commands
runWorkflowTest
getWorkflowTestReport
Related REST API methods
GET /workflows/{uuid}/test
POST /workflows/{uuid}/test
Retrying task failures
The Failures section of the task details view for a workflow shows information about all documents that the task failed to process.
At any point, you can have a task retry these failures. When you do this, your system starts a new, separate task just for processing these failures.
- Some document failures are caused by transient problems like connection issues, and might be resolved by being retried. Other document failures may not be resolved by being retried.
- You can also configure a task to automatically retry document failures for certain types of data connections.
- To learn more about why an individual document failed to be processed, you can test your workflow pipeline using only that document.
To retry task failures:
Procedure
Click the Workflow Designer window.
Select the workflow that you want.
Click the Task window.
Click Failures.
On the Document Failures tab, click Retry Document Failures.
A Retry Failures Task window appears. This window displays the progress of the task. From the Actions menu for the task, you can:
-
Pause or resume the task.
-
Clear the task.
-
Start the task over.
Related CLI commands
runWorkflowFailuresTask
Related REST API methods
POST /workflows/{uuid}/task/failures
-