For writing content classes
Have Hitachi Content Intelligence create content properties for you
The fastest and simplest way to create a content class for XML or JSON files is to provide some sample JSON or XML from which Hitachi Content Intelligence can automatically extract fields. Ideally, provide sample JSON or XML that represents every field that appears in your XML and JSON documents. That way, you don't have to manually write any content property field expressions.
For more information, see Creating and editing content classes.
Keep content property types in separate content classes
If you have both XML and JSON content properties, you should put them in separate content classes to improve pipeline performance.
If you include two kinds of content properties in the same content class, the class will attempt to apply both sets of properties to every document it processes. This means the class will spend time unnecessarily trying to extract fields from XML documents using JSONPath, and vice versa.
Additionally, when you add a content class stage to a pipeline, you should surround the stage with a conditional statement that allows only the applicable document type to reach the stage.
Normalize field names within a content class
Your data might contain fields that contain identical information but have different names. For example, a field called doctor might have the same value as a field called physician.
You can use a content class to normalize these field names as you extract them. To do this, create content properties with the same names but different expressions. For example, these content properties both extract information to a field called doctor:
Type | Name | Expression |
XML | doctor | /xml/physician/name |
XML | doctor | /xml/doctor/name |
Alternatively, you can add a Mapping stage to your pipeline to normalize field names. See Mapping stage.