A document is a representation of a piece of data. Every piece of data read by a workflow task, whether it's a picture, PDF, audio file or any other type of data, is converted into a document.
- Field/value metadata pairs (or fields for short): For example, a medical image can become a document that contains field/value pairs such as doctor:"John Smith" and location:"City Hospital". These fields are the metadata for your files and can be used to construct a search index.
- Streams: Pointers to data that lives in another location, not within the document itself. The data referenced by a stream can live either locally on one or more Hitachi Content Intelligence instances, or remotely in a data source.
Streams that point to locally stored data have this format:
Streams that point to remotely stored data have this format:
Streams typically point to large pieces of data that would be prohibitively expensive to include as document fields, such as the full content of a PDF file. Rather than spending system resources passing this large amount of data through a pipeline, the system uses streams to access data and read it from where it lives.