Skip to main content
Outside service Partner
Hitachi Vantara Knowledge

Content classes and content properties


A content class is a named construct that is used to characterize objects in one or more namespaces. Content classes use object metadata to impose structure on unstructured namespace content. They do this through content properties.

A content property is a named construct used to extract an element or attribute value from custom metadata that's well-formed XML. Content properties use XPath expressions to identify the metadata of interest. When content properties are indexed, users can use them to find unstructured content that matches structured patterns.

For example, consider the following XML structure that could occur in the custom metadata for multiple objects in a namespace that contains medical data:

<doctor>
    <name>doctor-name</name>
</doctor>
<patient>
    <name>patient-name</name>
</patient>

The information of interest in this custom metadata consists of the doctor’s name and the patient’s name.

Based on the metadata structure above, you could create content properties named Doctor_Name and Patient_Name that extract the doctor’s name and patient’s name from the custom metadata XML for each object. The metadata query engine could then index objects with this metadata structure by those property values. Using the metadata query API or the Metadata Query Engine Console, users could query for objects that have Doctor_Name or Patient_Name equal to a specific value.

Content properties belong to content classes. Both content classes and content properties are defined at the tenant level. Content classes are optionally associated with namespaces. Through this association, content properties are associated with namespaces.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Metadata query engine indexing of custom metadata


By default, when custom metadata indexing is enabled for a namespace, the metadata query engine indexes the content properties for that namespace and not the full text of custom metadata. If the namespace doesn’t have any content properties (that is, it’s not associated with any content classes that have content properties), no custom metadata is indexed.

You can choose to have the metadata query engine index the full text of custom metadata. If you enable this option, the metadata query engine indexes both content properties, if any exist, and the full text of custom metadata.

With content properties, the metadata query engine indexes only the values that you determine are of interest. When indexing the full text of custom metadata, the metadata query engine indexes each word individually.

For example, suppose an object has this XML in its custom metadata:

<doctor>
    <name>Lee Green</name>
</doctor>
<patient>
    <name>Paris Black</name>
</patient>

If you’ve defined the Doctor_Name and Patient_Name properties, the metadata query engine index includes:

Lee Green
Paris Black

If full text indexing is enabled, the metadata query engine index includes:

doctor name Lee Green name doctor patient name Paris Black name patient

In this case, to use the metadata query API to find the objects that have a doctor named Lee Green, users would need to query for custom metadata containing “doctor.name.Lee Green.name.doctor”. This kind of query can become very complex when elements are nested to deeper levels or when they have attributes.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content class and content property workflow


Here’s the basic procedure for working with content classes and content properties:

1.Create one or more content classes for the tenant. Give each class a meaningful name. For example, if a class will contain object properties that pertain to medical images, you could name it DICOM. (DICOM is a standard for managing medical images.)

A tenant can have at most 25 content classes.

For more information, see Creating a content class.

2.Create content properties for each content class. Create only the content properties that will be useful to metadata query API and Search Console users. Creating content properties that won’t be used unnecessarily increases the size of the metadata query engine index.

A content class can have at most 100 content properties.

For more information, see Content property definitions and Managing content properties for a content class.

3.If custom metadata indexing isn’t already enabled for the namespaces you plan to associate with the content classes, enable it. For more information, see Setting search and indexing options.

4.Associate namespaces with the applicable content classes. For clarity, associate a namespace with a content class only if the namespace contains objects that can be characterized by the content properties in the content class.

You can associate any number of namespaces with a content class. Additionally, a namespace can be associated with any number of content classes.

For more information, see Changing the namespaces associated with a content class.

5.Optionally, reindex some or all of the namespaces associated with the content classes. You would reindex a namespace if you want objects that were already in the namespace to be indexed by the new content properties.

You can reindex namespaces starting from the time they were created or starting from a specific date. When reindexing a namespace, the metadata query engine reindexes all objects with a change time that’s equal to or later than the time you specify.

TipWebHelp.png

Tip: Because reindexing can take a long time, before reindexing a namespace:

Create all the content properties you want for the namespace

Associate all the content classes containing those properties with the namespace

For more information, see Reindexing namespaces associated with a content class and Reindexing an individual namespace.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content property definitions


The definition of a content property consists of:

A name for the property.

The XPath expression that identifies the property values.

The data type of the property values.

For numeric and datetime data types, the format of the property values.

An indication of whether the property is single-valued or multivalued. A multivalued property can have multiple values for any given object.

The examples of content property definitions in the following sections are based on this sample custom metadata XML:

<?xml version="1.0" ?>
<dicom_image>
    <image type="MRI">
        <date>09/27/2012</date>
        <technician>Morgan Grey</technician>
    </image>
    <doctor>
        <name>Lee Green</name>
        <office>ABC Oncology</office>
        <address>
            <address1>Anytown Medical Building</address1>
            <address2>1 Main Street</address2>
            <city>Anytown</city>
            <state>MA</state>
            <zip>02000</zip>
        </address>
        <specialties>
            <specialty primary="true">Oncology</specialty>
            <specialty>Internal Medicine</specialty>
        </specialties>
    </doctor>
    <patient>
        <id>243789</id>
        <name>Paris Black</name>
        <address>
            <address1>10 Elm Street</address1>
            <address2/>
            <city>Anytown</city>
            <state>MA</state>
            <zip>02000</zip>
       </address>
    </patient>
    <followup_needed>true</followup_needed>
</dicom_image>

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content property names


When you define a content property, you specify a name for it. Content property names must be from one through 25 characters long, can contain only alphanumeric characters and underscores (_), and are case sensitive. White space is not allowed.

Content property names should be intuitive for users of the metadata query API and Metadata Query Engine Console. For example, for the property that extracts the name of the doctor from the sample custom metadata, you should use a name like Doctor_Name rather than a name like dname.

Content properties with the same name

You can use the same name for multiple content properties as long as those properties have the same data type. For example, suppose the custom metadata for some objects includes a physician element instead of a doctor element, like this:

     <physician>
        <name>Lee Green</name>
         <office>ABC Oncology</office>
        <address>
             <address1>Anytown Medical Building</address1>
             <address2>1 Main Street</address2>
            <city>Anytown</city>
            <state>MA</state>
             <zip>02000</zip>
        </address>
        <specialties>
            <specialty primary="true">Oncology</specialty>
            <specialty>Internal Medicine</specialty>
        </specialties>
    </physician>

You could define two content properties named Doctor_Name, one with an XPath expression that includes the doctor element, the other with an XPath expression that includes the physician element.

Within a content class, content properties with the same name must have the same data type. For information on data types, see Content property data types.

Reserved words

The following words are reserved and cannot be used as content property names:

accessTime
accessTimeString
acl
changeTimeMilliseconds
changeTimeString
customMetadata
customMetadataAnnotation
dpl
gid
hash
hashScheme
hold
index
ingestTime
ingestTimeString
namespace
objectPath
operation
owner
permission
replicated
retention
retentionClass
retentionString
shred
size
type
uid
urlName
updateTime
updateTimeString
utf8Name
version

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content property expressions


For each content property you create, you specify an XPath expression. An XPath expression is an instruction for navigating an XML document to find an element or attribute value.

XPath expressions use the XPath language. HCP supports the full syntax of this language. The examples in this section illustrate only a small part of the XPath syntax.

You can learn more about XPath expressions at:

http://www.w3schools.com/xpath

XPath expressions that find element values

Here’s a simple XPath expression that finds the value of the followup_needed element:

/dicom_image/followup_needed

The forward slash (/) at the beginning of the expression means that the first element is the root element in the XML. The element after the second forward slash is a child of the root element.

Here’s another simple XPath expression:

//name

This expression is probably not very useful. The double slash at the beginning means find the value of any name element, regardless of whether that element is a child of the doctor element or the patient element.

A more useful XPath expression specifies a path to the name element:

/dicom_image/doctor/name

This expression means start at the root element, find the doctor element that’s the child of the root element, and then find the name element that’s the child of the doctor element. A content property with this expression finds only the name of the doctor, not the name of the patient.

A different content property with this Xpath expression finds only the patient’s name:

/dicom_image/patient/name

The element path in an XPath expression can go deeper than the three levels shown above. Here’s an XPath expression that’s four levels deep and finds the city in which the doctor’s office is located:

/dicom_image/doctor/address/city

XPath expressions that find attribute values

To find the value of an attribute, you include an at sign (@) followed by the attribute name at the end of the XPath expression. For example, here’s an XPath expression that finds the value of the type attribute of the image element:

/dicom_image/image@type

Complex XPath expressions

XPath expressions can be much more complex than the ones shown so far. For example, an XPath expression can navigate XML based on the values of elements and attributes. Here’s an expression that finds the name of a doctor whose primary specialty is oncology:

/dicom_image/doctor/specialties/specialty[@primary='true' and text()='Oncology']/
ancestor::doctor/name

This expression navigates down from the doctor element to the specialty elements and finds the one that has both a value of Oncology and a primary attribute with a value of true. The expression then navigates back up to the same doctor element and from there down to the name element that’s the child of the doctor element.

Annotation-specific content properties

You can associate content properties with annotation names. When a content property is associated with an annotation name, the metadata query engine indexes the value of that property only when the value occurs in an annotation with the specified name.

To associate a content property with an annotation name, you specify the case-sensitive annotation name in front of the XPath expression for the property, in this format:

@annotation-name:xpath-expression

For example, suppose:

The objects in a namespace can have either of two annotations — one named dicom, the other named appointment

Both of these annotations have a date element

Depending on who created the dicom annotation, the date element in it might be a child of the root element or a child of the image element

To find the value of the date element in the dicom annotation, you would need to use this XPath expression:

//date

This expression means find a date element that occurs anywhere in the XML. Without more context, it applies equally to the dicom and appointment annotations.

To have the content property with this XPath expression apply only to the dicom annotation, you would specify the expression this way:

@dicom://date

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content property data types


Each content property has a data type that determines how the property values are treated by the metadata query engine. The possible data types are:

String — The metadata query engine indexes the value as a text string. The value is handled as a single unit, even if it contains white space. Users cannot base queries on individual terms within a string value.

Tokenized — The metadata query engine indexes the value as a text string after breaking it into tokens. A token is a string of either alphabetic or numeric characters. For example, the value SSN12345789 becomes this string of two tokens: ssn 123456789. Tokens are not case sensitive.

The metadata query engine treats white space and special characters as token separators. For example, the value 12A Elm Street, apt. 2D becomes this string of seven tokens: 12 a elm street apt 2 d.

Users can base queries on any individual token or sequence of tokens within a tokenized string.

Boolean — The metadata query engine indexes the value as true or false. Values that start with 1, t, or T are treated as true. Any other values are treated as false.

Integer — The metadata query engine indexes the value as an integer. Users can base queries on comparative numeric values.

The metadata query engine indexes values for a content property with a data type of integer only if the values conform to the format for the property. For more information, see Format for the integer and float data types.

Float — The metadata query engine indexes the value as a decimal number with or without an exponent, depending on the value. Users can base queries on comparative numeric values.

The metadata query engine indexes values for a content property with a data type of float only if the values conform to the format for the property. For more information, see Datetime data type formats.

Datetime — The metadata query engine indexes the value as a date and time. Users can base queries on comparative datetime values.

The metadata query engine indexes values for a content property with a data type of date only if the values conform to the format for the property. For more information, see Datetime data type formats.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Format for the integer and float data types


For a content property with the integer or float data type, you can specify a format that values needs to match in order to be indexed. The following sections include basic information about these formats. You can find more information at:

http://docs.oracle.com/javase/6/docs/api/java/text/DecimalFormat.html

Integer data type formats

The basic format for a content property with the integer data type is:

optional-prefixnumber-patternoptional-suffix

A number pattern for the integer data type consists of any number of number signs (#), followed by any number of zeroes. Both number signs and zeroes represent any number of digits, including none. The metadata query engine does not consider the length of the number pattern when matching values.

A number pattern can include a thousands separator. With the integer data type, the metadata query engine recognizes either commas (,) or periods (,) as the thousands separator.

For example, a value of 1234 matches any of these number patterns:

0
000
##
###0000
0,0
##,000

If a content property value contains a thousands separator, the value matches only number patterns that contain the same thousands separator. For example, the value 1,234 matches the last two patterns above, but not the first four. It also does not match 0.0 or ##.000.

The prefix or suffix in the format for the integer data type can be any character string, with a few exceptions. For example, a prefix or suffix cannot include a period (.) or percent sign (%). The format must include white space between the integer pattern and the suffix, if used.

For example, for the metadata query engine to index the value $1234 as an integer, the format for the content property must have a dollar sign ($) in front of the integer pattern, with no space between them.

Here are some examples of integer formats with examples of values that match them:

Format        Example
$ 0,0                  $ 1,234
###0 AD            2012 AD
~# mph                             ~55 mph

If you don’t specify a format for a content property with the integer data type, the metadata query engine indexes only sequences of digits with no special characters.

Float data type formats

For the format for a content property with the float data type, you can use any of the formats for the integer data type. However, with the float data type, the thousands separator, if used, must be a comma (,).

You can include a period as a decimal separator in the number pattern for the float data type, although this is not required. If you do include it, any number signs (#) must come after any zeroes in the part following separator.

For example, a value of 1234.5 matches any of these number patterns:

0
00.0
.0
#0.0#
##,000
0,0
#,0.0#

You can also include an exponent character (E) followed by one or more zeroes in the number pattern for the float data type. However, values with an exponent character also match patterns that don’t include the exponent character, and values without an exponent character also match patterns with an exponent character.

For example, a value of 1234E5 matches any of these number patterns:

0
00.0
.0E0
#0.0#E000
##,000E0
0,0
#,0.0E00

You can use a percent sign (%) by itself as the prefix or suffix in the format for the float data type. Before indexing values with a matching percent sign, the metadata query engine converts them to their decimal equivalents. For example, a value of 1234% matches a format of 0% and is indexed as 12.34.

White space is not required between the number pattern and a suffix that’s a percent sign.

If you don’t specify a format for a float data type, the metadata query engine indexes only sequences of digits that optionally include one decimal point.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Datetime data type formats


For a content property with the datetime data type, you can specify a format that values needs to match in order to be indexed. The format consists of a pattern of letters, optional separators, and optional quoted text. The letters represent date or time components, as outlined in the table below. Letters can be repeated, which can affect their meaning.

Letter

Description

G

Represents a valid era indicator, such as AD, BC, or BCE. Repetition has no effect.

If a datetime pattern doesn’t include any occurrences of G, the metadata query engine assumes an era of AD for matching values.

y

Represents a year. For matching values with a two-digit year, a pattern that includes y more than twice in a row causes the metadata query engine to interpret the two digits as being preceded by two zeroes rather than by the number that indicates the current century.

If a datetime pattern doesn’t include any occurrences of y, the metadata query engine assumes a year of 1970 for matching values.

M

Represents a month. Values that include the month as a number match a pattern that includes M or MM. Values that include the name of the month, either in full or as a three-letter abbreviation, match a pattern that includes three or more occurrences of M in a row.

If a datetime pattern doesn’t include any occurrences of M, the metadata query engine assumes a month of January for matching values.

w

Represents the number of the week into the year. Repetition has no effect.

W

Represents the number of the week into the month, where the first week is the week that includes the first day of the month. Repetition has no effect.

D

Represents the number of the day into the year. Repetition has no effect.

d

Represents the number of the day into the month. Repetition has no effect.

F

Represents the number of the week into the month, where the first week starts with the first Sunday in the month. Repetition has no effect.

E

Represents the day of the week. Matching values include the name of the day in full or as a three-letter abbreviation. Repetition has no effect.

a

Represents a valid morning or afternoon indicator, such as AM or pm. Repetition has no effect.

H

Represents the hour on a 24-hour clock, where midnight is represented by zero. Repetition has no effect.

k

Represents the hour on a 24-hour clock, where midnight is represented by 24. Repetition has no effect.

K

Represents the hour on a 12-hour clock, where midnight and noon are represented by zero. Repetition has no effect.

h

Represents the hour on a 12-hour clock, where midnight and noon are represented by 12. Repetition has no effect.

m

Represents the minute into the hour. Repetition has no effect.

If a datetime pattern doesn’t include any occurrences of m, the metadata query engine assumes that the number of minutes is zero for matching values.

s

Represents the second into the minute. Repetition has no effect.

If a datetime pattern doesn’t include any occurrences of s, the metadata query engine assumes that the number of seconds is zero for matching values.

S

Represents a number of milliseconds past the applicable second. Repetition has no effect.

z

Represents a valid time zone specified as text, such Eastern Standard Time, EDT, or GMT. Repetition has no effect.

Z

Represents a valid time zone specified as an offset from GMT, formatted as (+|-)nnnn, such as +0500 or -0200. Repetition has no effect.

If a datetime format doesn’t include a representation for:

A day, the metadata query engine assumes that the day is the first day of the applicable month for matching values

An hour, the metadata query engine assumes that the hour is midnight

A time zone, the metadata query engine assumes that the time is in the HCP system time zone

The separators in a datetime format can be any of several different special characters, including forward slashes (/), hyphens (-), colons (:), semicolons (;), at signs (@), and spaces.

To include text in a datetime format, enclose the text in single quotation marks ('). To include a single quotation mark, specify two single quotation marks in a row.

Here are some examples of datetime formats with examples of values that match them:

Format Example
MM/dd/yy HH:mm:ss z 03/19/12 14:35:27 EST
hh 'o''clock' a, zzz 2 o'clock PM, Eastern Standard Time
yyyy-MM-dd'T'HH:mm:ss.SSSZ 2012-03-19T14:35:27.236-0400
E., MMM d, yyyy 'at' k:s Mon., March 19, 2012 at 14:35

If you don’t specify a format for a content property with the datetime data type, the metadata query engine indexes only values that match patterns such as MM/dd/yyyy, MM-dd-yyyy, yyyy-MM-dd, or yyyy-MM-dd'T'HH:mm:
ssZ.

You can find more information about datetime formats at:

http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Multivalued content properties


A content property is defined as either single-valued or multivalued:

If you define a content property as single-valued, the metadata query engine indexes only the first occurrence of it for any given object, regardless of how many times it occurs in the custom metadata XML for that object.

If you define a content property as multivalued, the metadata query engine indexes all occurrences of it in the custom metadata XML for an object.

For example, based on the sample custom metadata XML, you would define as multivalued a content property that extracts the value of the specialty element.

With the metadata query API, users can sort query results based on single-valued content properties but not on multivalued properties.

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content properties extracted from sample XML


When working with content properties in the Tenant Management Console, you can supply sample well-formed XML and have HCP extract content properties from that XML. You can then select which of those properties you want to add to a content class.

HCP extracts only content properties for XPath expressions that follow a straight path from the root element. These conventions apply to the content property definitions:

The XPath expression always starts from the root element.

The name of a content property that extracts an element value is the name of the element preceded by the name of the parent element.

The name of a content property that extracts an attribute value is the name of the attribute preceded by the name of the element the attribute applies to.

Content property names that would exceed 25 characters in length are truncated to 25 characters, starting from the beginning.

The definitions do not include formats.

The definitions are listed alphabetically by XPath expression.

When adding extracted content properties to a content class, you can change any parts of their definitions.

The table below shows the definitions of the content properties HCP extracts from the sample custom metadata XML.

XPath expression

Name

Data Type

Multivalued

/dicom_image/doctor/address/address1

addressAddress1

String

No

/dicom_image/doctor/address/address2

addressAddress2

String

No

/dicom_image/doctor/address/city

addressCity

String

No

/dicom_image/doctor/address/state

addressState

String

No

/dicom_image/doctor/address/zip

addressZip

Integer

No

/dicom_image/doctor/name

doctorName

String

No

/dicom_image/doctor/office

doctorOffice

String

No

/dicom_image/doctor/specialties/specialty

specialtiesSpecialty

String

Yes

/dicom_image/doctor/specialties/specialty/@primary

specialtyPrimary

Boolean

No

/dicom_image/followup_needed

icom_imageFollowup_needed

Boolean

No

/dicom_image/image/@type

imageType

String

No

/dicom_image/image/date

imageDate

String

No

/dicom_image/image/technician

imageTechnician

String

No

/dicom_image/patient/address/address1

addressAddress1

String

No

/dicom_image/patient/address/address2

addressAddress2

String

No

/dicom_image/patient/address/city

addressCity

String

No

/dicom_image/patient/address/state

addressState

String

No

/dicom_image/patient/address/zip

addressZip

Integer

No

/dicom_image/patient/id

patientId

Integer

No

/dicom_image/patient/name

patientName

String

No

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.

Content property files


You can export the content properties for a content class to a file that you can then use to import the properties to another class. The exported file contains XML definitions of the content properties in this format:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<contentClass>
    <contentProperties>
        <contentProperty>
             <name>property-name</name>
            <expression>xpath-expression</expression>
            <type>data-type</type>
            <multivalued>true-or-false</multivalued>
            <format>format</format>
        </contentProperty>
        .
        .
        .
    </contentProperties>
</contentClass>

Using the same format, you can also create content property files yourself.

Here’s an example of XML that defines some content properties based on the sample custom metadata XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<contentClass>
    <contentProperties>
        <contentProperty>
            <name>Doctor_City</name>
            <expression>/dicom_image/doctor/address/city</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Doctor_State</name>
            <expression>/dicom_image/doctor/address/state</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Doctor_Name</name>
            <expression>/dicom_image/doctor/name</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Doctor_Office</name>
            <expression>/dicom_image/doctor/office</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Doctor_Specialty</name>
            <expression>/dicom_image/doctor/specialties/specialty</expression>
            <type>STRING</type>
            <multivalued>true</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Followup_Needed</name>
            <expression>/dicom_image/followup_needed</expression>
            <type>BOOLEAN</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Image_Type</name>
            <expression>/dicom_image/image/@type</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Image_Date</name>
            <expression>/dicom_image/image/date</expression>
            <type>DATE</type>
            <multivalued>false</multivalued>
            <format>MM/dd/yyyy</format>
        </contentProperty>
        <contentProperty>
            <name>Patient_City</name>
            <expression>/dicom_image/patient/address/city</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Patient_State</name>
            <expression>/dicom_image/patient/address/state</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Patient_ID</name>
            <expression>/dicom_image/patient/id</expression>
            <type>INTEGER</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
        <contentProperty>
            <name>Patient_Name</name>
            <expression>/dicom_image/patient/name</expression>
            <type>STRING</type>
            <multivalued>false</multivalued>
            <format />
        </contentProperty>
    </contentProperties>
</contentClass>

© 2015, 2019 Hitachi Vantara Corporation. All rights reserved.