Evaluating pipeline performance
This topic describes steps you can take to evaluate and improve the performance of your pipelines.
For additional best practices on improving pipeline performance, see Best practices for configuring stages and pipelines.
Obtain and analyze pipeline performance data
Create a workflow and add your processing pipelines and data connections to it.
Run a task for the workflow. Do not enable the Check for Updates option.
After the task finishes, determine where your task is spending the most processing time:
Click Workflows > workflow-name > Task > Performance > Stage Metrics.
The Visual View tab shows a graph of all stages in your pipeline.
Use the graph to determine where you task is spending the most processing time, on average.
For example, this graph indicates that the task is spending most of its time indexing documents and on running the Mbox Expansion and Text and Metadata Extraction stages.
Depending on your task's performance metrics, see one or more of these:
- Reducing time spent indexing.
- Reducing time spent expanding archive files.
- Reducing time spent by non-expansion stages.
Reducing time spent indexing
It's normal for indexing to take more time than processing documents through other stages. However, if indexing is unacceptability slow:
- Ensure your index collection schema is configured to include only the fields and search features that you need. For information, see Best practices for designing an index.
- Try creating an index collection with a higher number of shards. For information, see Adding index collections and Index shards.
- Try increasing the Output Batch Size setting for the task. For information, see Task settings.
Reducing time spent expanding archive files
Do one or both of these:
- Surround each archive expansion stage with a conditional statement to ensure that the stage processes only the applicable files.
- Consider expanding the archive files to a separate data source. For more information, see Favor processing large numbers of small documents.
Reducing time spent by non-expansion stages
Test the pipeline that contains the applicable stage. For information, see Testing pipelines.
In the pipeline test results, click the View Results link for the stage you want.
Examine the fields added by the stage to determine whether those fields are worth the amount of time needed to run the stage:
- If you don't make use of any of the fields added by the stage, delete the stage from your pipeline.
- If you make use of some of the fields added by the stage, determine whether there's another way you can add those fields to documents.
For example, to eliminate unnecessary processing by the MIME Type Detection stage, use its Custom extension mapping setting. For information, see MIME Type Detection stage.