Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

For writing content classes

Have Hitachi Content Intelligence create content properties for you

The fastest and simplest way to create a content class for XML or JSON files is to provide some sample JSON or XML from which Hitachi Content Intelligence can automatically extract fields. Ideally, provide sample JSON or XML that represents every field that appears in your XML and JSON documents. That way, you don't have to manually write any content property field expressions.

For more information, see Creating and editing content classes.

Keep content property types in separate content classes

If you have both XML and JSON content properties, you should put them in separate content classes to improve pipeline performance.

If you include two kinds of content properties in the same content class, the class will attempt to apply both sets of properties to every document it processes. This means the class will spend time unnecessarily trying to extract fields from XML documents using JSONPath, and vice versa.

Additionally, when you add a content class stage to a pipeline, you should surround the stage with a conditional statement that allows only the applicable document type to reach the stage.

Normalize field names within a content class

Your data might contain fields that contain identical information but have different names. For example, a field called doctor might have the same value as a field called physician.

You can use a content class to normalize these field names as you extract them. To do this, create content properties with the same names but different expressions. For example, these content properties both extract information to a field called doctor:

TypeNameExpression
XMLdoctor/xml/physician/name
XMLdoctor/xml/doctor/name

Alternatively, you can add a Mapping stage to your pipeline to normalize field names. See Mapping stage.

 

  • Was this article helpful?