You can use content classes to make your files more searchable. Each content class identifies bits of information in your documents and extracts that information as fields. These fields can then be indexed and used in searches.
Each content class contains one or more content properties. Each content property includes:
- A query expression for finding and extracting information from a document. These expressions can be XPath, JSONPath, or regular expressions (regex). You can write these query expressions yourself, or, for XPath and JSONPath expressions, have Hitachi Content Intelligence generate them for you based on sample XML or JSON that you provide.
- A field name to associate with the extracted information. The extracted information becomes a field value pair that is added to a document as it passes through a pipeline.
Take for example this XML document of a patient's blood pressure results:
<xml> <date>2012-09-17</date> <patient> <name>John Smith</name> <age>56</age> <sex>M</sex> </patient> <diastolic>60,mm[Hg]</diastolic> <systolic>107,mm[Hg]</systolic> <assessment>low</assessement> </xml>
And these content properties:
|XML||overFifty||boolean(/xml/patient/age > 50)|
lastName: Smith overFifty: true
If you don't use content classes to extract additional fields from this XML, your users are limited in the ways that they can search for this information. For example, in the case of the sample XML document shown above, users would not be able to retrieve the document with either of these queries, even though the document appears to satisfy both:
Retrieve files for patients with the last name Smith:
Retrieve files for all male patients over the age of 50:
Using content classes, you can extract lastName, sex, and overFifty as searchable fields.
You use a content class by adding a Content Class Extraction stage to a pipeline. When you run the pipeline as part of a workflow task, the Content Class Extraction stage uses the information from the content class to extract fields from the documents that pass through the stage.