This chapter contains considerations that apply to using the HSwift API.
Hostname and IP address considerations
In the URL you use to access HCP, you can specify either a hostname or an IP address. If the HCP system supports DNS and you specify a hostname, HCP selects the IP address for you from the currently available nodes. HCP uses a round-robin method to ensure that it doesn’t always select the same address.
When you specify IP addresses, your application must take responsibility for balancing the load among nodes. Also, you risk trying to connect (or reconnect) to a node that is not available. However, in several cases using explicit IP addresses to connect to specific nodes can have advantages over using hostnames.
These considerations apply when deciding which technique to use:
- If your client uses a hosts file to map HCP hostnames to IP addresses, the client system has full responsibility for converting any hostnames to IP addresses. Therefore, HCP cannot spread the load or prevent attempts to connect to an unavailable node.
- If your client caches DNS information, connecting by hostname may result in the same node being used repeatedly.
- When you access the HCP system by hostname, HCP ensures that requests are distributed among nodes, but it does not ensure that the resulting loads on the nodes are evenly balanced.
- When multiple applications access the HCP system by hostname concurrently, HCP is less likely to spread the load evenly across the nodes than with a single application.
Because of the way HCP stores objects, the directory structures you create and the way you store objects in them can have an impact on performance. Here are some guidelines for creating effective directory structures:
- Plan your directory structures before storing objects. Make sure all namespace users are aware of these plans.
- Avoid structures that result in a single directory getting a large amount of traffic in a short time. For example, if you ingest objects rapidly, use structures that do not store objects by date and time.
- If you do store objects by date and time, consider the number of objects ingested during a given period of time when planning the directory structure. For example, if you ingest several hundred files per second, you might use a directory structure such as year/month/day/hour/minute/second. If you ingest just a few files per second, a less fine-grained structure would be better.
- Follow these guidelines on directory depth and size:
- Try to balance the namespace directory tree width and depth.
- Do not create directory structures that are more than 20 levels deep. Instead, create flatter directory structures.
- Avoid placing a large number of objects (greater than 100,000) in a single directory. Instead, create multiple directories and evenly distribute the objects among them.
Concurrent writes of the same object
If two or more clients try to store an object with the same name at the same time, what happens depends on whether versioning is enabled for the target bucket:
- If versioning is enabled, HCP creates one version of the object for each PUT request. The versions are numbered in the order in which HCP received the requests, regardless of the order in which HCP finished processing the requests.
- If versioning is disabled and the bucket doesn’t already contain an object with the specified name, HCP creates the object for the first PUT request. In response to each subsequent PUT request, HCP returns a 409 (Conflict) status code and does not create an object. This happens regardless of whether HCP has finished processing the first request.
Failed PUT requests to store objects
A PUT request to store an object fails if either of these happens:
- The target node fails while the object is open for write.
- The TCP connection breaks while the object is open for write (for example, due to a network failure or the abnormal termination of the client application).
Also, in some circumstances, a PUT request fails if HCP system hardware fails while HCP is processing the request.
When a PUT request fails, HCP does not create a new object or object version.
When you use a PUT request to write a zero-sized file to HCP, the result is an empty object (that is, an object that has no data). Empty objects are WORM and are treated like any other object.
Deleting objects under repair
HCP regularly checks the health of the objects stored in the repository. If an object is found to be unhealthy, HCP tries to repair it.
If you try to delete an object while it is under repair, HCP returns a 409 (Conflict) status code and does not delete the object. In response to such an error, you should wait a few minutes and then try the request again.
HCP lets multiple threads access a bucket concurrently. Using multiple threads can enhance performance, especially when accessing many small objects across multiple folders.
Here are some guidelines for the effective use of multithreading:
- Concurrent threads, both reads and writes, should be directed against different folders. If that’s not possible, multiple threads working against a single folder is still better than a single thread.
- To the extent possible, concurrent threads should work against different IP addresses. If that’s not possible, multiple threads working against a single IP address is still better than a single thread.
- Only one client can write to a given object at one time. Similarly, a multithreaded client cannot have multiple threads writing to the same object at the same time. However, a multithreaded client can write to multiple objects at the same time.
- Multiple clients can read the same object concurrently. Similarly, a multithreaded client can use multiple threads to read a single object. However, because the reads can occur out of order, you generally get better performance by using one thread per object.
The S3 compatible API shares a connection pool with the REST, HSwift, and WebDAV APIs. HCP has a limit of 255 concurrent connections from this pool, with another 20 queued.
HCP supports persistent connections. Following a request for an operation, HCP keeps the connection open for 60 seconds, so a subsequent request can use the same connection.
Persistent connections enhance performance because they avoid the overhead of opening and closing multiple connections. In conjunction with persistent connections, using multiple threads so that operations can run concurrently provides still better performance.
If the persistent connection timeout period is too short, tell your tenant administrator.
To avoid this issue, either don't use persistent connections or ensure that no more than 254 threads are working against a single node at any time.
Connection failure handling
You should retry a request if either of these happens:
- The client cannot establish a connection to the HCP system through the API.
- The connection breaks while HCP is processing a request. In this case, the most likely cause is that the node processing the request became unavailable.
When retrying the request:
- If the original request used the hostname of the HCP system in the URL, repeat the request in the same way.
- If the original request used an IP address, retry the request using either a different IP address or the hostname of the system.
If the connection breaks while HCP is processing a GET request, you may not know whether the returned data is all or only some of the object data. In this case, you can check the number of returned bytes against the content length returned in the
Content-Length response header. If the numbers match, the returned data is complete.
Session cookie encoding
In the response to a client request, HCP includes a cookie that contains encoded session information.
HCP supports two formats for encoding the session cookie:
HCP used only this format in releases 5.0 and earlier.
HCP has used this format by default in all releases since 5.0.
You can use the
X-HCP-CookieCompatibility request header to specify the format HCP should use to encode the session cookie. Valid values for this header are RFC2109 and RFC6265.
X-HCP-CookieCompatibility header is:
- Optional and typically not used for RFC6265
- Required for RFC2109