Skip to main content

We've Moved!

Product Documentation has moved to docs.hitachivantara.com
Hitachi Vantara Knowledge

Documents

A document is a representation of a piece of data. Every piece of data read by a workflow task, whether it's a picture, PDF, audio file or any other type of data, is converted into a document.

Documents contain:

  • Field/value metadata pairs (or fields for short): For example, a medical image can become a document that contains field/value pairs such as doctor:"John Smith" and location:"City Hospital". These fields are the metadata for your files and can be used to construct a search index.
  • Streams: Pointers to data that lives in another location, not within the document itself. The data referenced by a stream can live either locally on one or more Hitachi Content Intelligence instances, or remotely in a data source.

    Streams that point to locally stored data have this format:

    <stream-name>: "X-HCI_local-path=<path-to-stream-tmp-file>"

    Streams that point to remotely stored data have this format:

    <stream-name>: "<pointer-to-stream-location>"

    Streams typically point to large pieces of data that would be prohibitively expensive to include as document fields, such as the full content of a PDF file. Rather than spending system resources passing this large amount of data through a pipeline, the system uses streams to access data and read it from where it lives.

 

  • Was this article helpful?