UBI is a new infrastructure that allows applications and storage to interface to each other in a standard and uniform manner. Previously, the storage handlers needed some understanding of the data they were storing requiring dedicated logic for each supported application. This complicated and lengthened the development of both applications and storage handlers.
UBI however provides a standard way for applications and storage work together. This required the development of new UBI compliant application and storage handlers.
UBI Applications includes:
- Oracle RMAN
UBI Storage includes:
- Gen2 Repository
- Gen2 HCP
- Amazon S3
Any UBI compliant application can store data in any UBI compliant storage. UBI compliant storage can forward an instance of data to any other UBI compliant storage.
Data within UBI Storage is structured as follows:
A Datastore is the UBI abstraction for an entire storage location (e.g. an AWS Bucket or an HCP namespace)
This is a logical construct. Each instance of an application has its own store. If the application is distributed across many nodes, it would still share the same store.
This is the instance of data within the store. Each store will have a live instance and historical instances. Consider this analogous to a filesystem that you can read and write but also has snapshots. The historical instances facility is available on all UBI storage, even cloud storage such as S3 which doesn’t have native snapshots. If an instance of data is removed, data unique to that instance is also removed, thereby providing efficient automated space reclamation.
Objects are place holders for data. They can be hierarchical, and you can attach metadata to them. They do not however store data directly, data is stored in streams within the object.
Streams are what objects use to store their data. Typically, there will be a “data” stream. Applications may also use other streams to store data relevant to that object, e.g. security stream, thumbnail, etc.
The UBI standard key features:
To facilitate performance, the UBI standard allows a datastore to be accessed by multiple processes on the same machine and/or multiple processes across multiple machines. This allows for a clustered application to backup from multiple nodes to one store.
This allows the user to see that there is a point in time recoverable data. Typically a Recovery Point maps to an instance within a datastore, but more complicated applications could in theory have multiple instances to make recoverable data. It is the application that determines whether a Recovery Point is created. Some applications, like Oracle RMAN, don’t have Recovery Points as the RMAN application manages the life cycle of individual portions of a backup without using instances. Recovery Points are stored in a database on the storage but can be searched from the Master.
An item index is a method to add a layer of indexing more granular than the recovery point index. It is primarily used to index large objects. A good example is a VMware backup creates an Item Index record for each VM. This allows the user to look for instances of a VM by doing a search for a VM rather than browsing Recovery Point Index. Like Recovery Point Index, these are stored on the storage.
Source Side Block Deduplication
UBI has built-in source side block deduplication as part of its standard. The infrastructure calculates the checksum of a block for a given stream of data prior to transmitting it to the storage and will not send that data if it is already on the storage. This reduces both storage and network transfer overhead.
Many applications across many OS nodes can access the storage. Each application however can access the same storage but only their data within that storage. This includes applications that are using cloud storage. With Amazon S3, for instance, an application can only access its own data even if it’s in the same Amazon Bucket as other applications.
Cloud Credential Security
The UBI infrastructure assumes that because applications servers are not under control of the backup / storage administrator, they can be compromised. So, although multiple application may access the same cloud storage node, to ensure security, an application never receives the cloud credentials.
Application aware data browsing
When performing a partial recovery of a dataset, it is required to be able to browse the contents of a Recovery Point. To do this an application specific agent on the master is used to interpret the data retained on the storage device. This ensures the recovery interface can display recovery information that is optimized for that application.