Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

For designing an index

Scale your index appropriately to ensure availability and fast query responses

You can configure the number of shards and replication settings for each of your index collections. These settings let you specify how each search index is distributed across the instances in your system.

For information, see Index shards.

ImportantSelect the appropriate number of shards when creating an index collection. You cannot change the number of shards for an index collection after you create it.

For HCI Indexes, increase the Index Protection Level setting to maintain index availability and performance

The Index Protection Level setting for an HCI Index determines the number of copies of the index that exist in the system. An index with multiple copies remains available in the event of an instance outage and can provide better performance than an index with only one copy.

For more information, see Index protection level for HCI Indexes.

Use the appropriate initial schema value

When you create an index collection, you need to select an initial index schema option:

  • For an index to be used in production, select Basic.
  • For an index to be used for testing or learning about your data, select Schemaless or Default.

For more information, see Initial schema options.

Favor query settings changes over index collection schema changes

Unlike changes to an index collection schema, changes to query settings take effect immediately and do not need you to reindex your content. If you need to make changes to your users' search experience after you've finalized your index collection schema, try making those changes through query settings before reconfiguring your schema.

Minimize field attributes in your index collection schema

You should minimize the number of attributes and use cases that you enable for fields in your index collection schema. Enabling a field attribute or use case means that more information must be stored in the search index to support the attribute's functionality. This can cause an index to grow very large and be slow to return results.

For example, do not enable the stored field attribute for fields that you don't need to return in search results.

For more information, see Use cases and Field attributes.

NoteOne exception is the omitNorms attribute. You should enable this on as many fields as possible because enabling it reduces the amount of data stored in the index for the field.

Minimize the number of copy fields

The special field category copy fields allow one field's values to be copied to another. However, this causes the index to include two separate copies of the same data.

For more information, see Defined, dynamic, and copy fields and Adding and editing fields in an index collection schema.

Do not use dynamic fields in a production index

Dynamic fields allow fields to be automatically added to an index collection schema. Dynamic fields let you learn about the fields that exist in your data without having to manually configure an index collection schema.

However, many of these automatically-added fields might not be useful for your end users. Additionally, these automatically-added fields might be given the wrong types.

When creating an index collection for use in production, select the Basic option for the initial schema. With this option, the index collection schema contains no dynamic fields.

Use the smallest field types possible for numerical values

A field's type determines how much space is reserved in an index for storing that field's values. For example, if a field called serialNumber has a type of long, 64 bits are reserved for each document in which the field appears.

However, if the largest value for the serialNumber field requires less than 32 bits, the field type should be int. Using long for that field unnecessarily reserves space in your index.

For more information, see Field types.

Follow this example procedure to create an index for use in production

Creating an index collection can be a long process, involving many rounds of editing, verification, and reediting before you arrive at a search index that is efficient and meets your users' needs.

You can use this example procedure as a guide in creating an index to suit your user's needs.

Procedure

  1. Create a data connection to allow the system to access the data you want to index.

    If you have a very large volume of data, start with a small sample that's representative of the overall set.

    See Creating data connections.

  2. Decide on a processing pipeline to use. Either create a new one or use the built-in default pipeline.

    See Processing pipelines and stages.

    Note
    • When a document exits a pipeline during a workflow task, all of its streams are added to the index. To minimize the amount of data added to an index, add a Filter stage to the end of your workflow pipeline. Configure the stage to whitelist only one stream per document. For an example, see Default pipeline.
    • Beginning a field name with a dollar sign ($) causes the field to be deleted when the workflow pipeline finishes processing the associated document. Therefore, the field is not indexed. Use this technique to prevent unnecessary fields from being indexed. For example, you can add a field called $meetsCondition to a document to satisfy a conditional statement later on in the pipeline, but the field might not include any valuable information for your users to search on.
  3. Create a workflow.

  4. Add the data connection and your processing pipeline to the workflow.

  5. Run the workflow task.

  6. Create a new index collection with an initial schema of Basic.

  7. Review the fields discovered by the task.

  8. From the list of fields discovered by the task, import the ones you want into the index collection schema.

  9. Add the index collection to the workflow and run the workflow task again.

    As the task runs, it sends documents to the index collection. The index collection schema is used to determine how to add those documents to a search index that you can query.

  10. Examine and test the search index by:

    • Using the Query tab on the Index Details page. This lets you view the raw JSON that the search engine returns in response to a query. For information, see Querying an index.
    • Using the Search App. This lets you test your users' search experience. For information, see the Search App.
    • Viewing size and performance data about the search index. For information, see Viewing index collections and statistics.
  11. Edit your processing pipelines to add any additional fields you want in your index.

    For example, to allow your users to sort search results based on document creation date, each document needs a field that contains that information. For more information, see Search result sorting.

  12. Edit the index collection schema as needed.

    For example, to allow your users to sort search results based on document creation date, you need to add that field to the index collection schema and configure it to support the Sort on field use case.

  13. Update the search index with your new changes. To do this, you need to start the workflow task over.

  14. Repeat steps 10 through 13 as necessary.

Use the Index action in your pipeline to test the effects of pipeline stages on your index

When you configure the Execute Action stage in a pipeline to run the Index action, all documents that pass through the stage are immediately indexed. By adding Execute Action stages at various points in your pipeline and configuring each to point to a different test index, you can look at each test index to examine the effects of your pipeline stages on indexed documents.

 

  • Was this article helpful?