Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Content classes

You can use content classes to make your files more searchable. Each content class identifies bits of information in your documents and extracts that information as fields. These fields can then be indexed and used in searches.

How content classes work

Each content class contains one or more content properties. Each content property includes:

  • A query expression for finding and extracting information from a document. These expressions can be XPath, JSONPath, or regular expressions (regex). You can write these query expressions yourself, or, for XPath and JSONPath expressions, have Hitachi Content Intelligence generate them for you based on sample XML or JSON that you provide.
  • A field name to associate with the extracted information. The extracted information becomes a field value pair that is added to a document as it passes through a pipeline.

Take for example this XML document of a patient's blood pressure results:

<xml>
    <date>2012-09-17</date>
    <patient>
        <name>John Smith</name>
        <age>56</age>
        <sex>M</sex>
    </patient>
    <diastolic>60,mm[Hg]</diastolic>
    <systolic>107,mm[Hg]</systolic>
    <assessment>low</assessement>		
</xml>

And these content properties:

TypeNameExpression (XPath)
XMLlastName/xml/patient/name/lastName
XMLoverFiftyboolean(/xml/patient/age > 50)
When applied to the sample XML above, the content properties yield these field/value pairs:
lastName: Smith
overFifty: true
Without content classes

If you don't use content classes to extract additional fields from this XML, your users are limited in the ways that they can search for this information. For example, in the case of the sample XML document shown above, users would not be able to retrieve the document with either of these queries, even though the document appears to satisfy both:

Retrieve files for patients with the last name Smith:

lastName:Smith

Retrieve files for all male patients over the age of 50:

+sex:M +overFifty:true

Using content classes, you can extract lastName, sex, and overFifty as searchable fields.

Putting content classes to use

You use a content class by adding a Content Class Extraction stage to a pipeline. When you run the pipeline as part of a workflow task, the Content Class Extraction stage uses the information from the content class to extract fields from the documents that pass through the stage.

GUID-7F810C11-08BA-41A8-91C9-45C432BC4F33-low.png

 

  • Was this article helpful?