Back to all posts
development
USE CASES
Company
Infrastructure
Workflows
May 28, 2025
May 28, 2025

How to monitor your UltiHash test installation

Discover all the metrics, logs and traces you can track when testing UltiHash
Posted by
Juliette Lehmann
Founder's Associate
Key takeaways

Why storage observability matters in AI workflows

Training a model, say, a speech-to-text system, often means processing hours of audio data. Your storage needs to keep up with high-throughput reads, or else GPU utilization drops and training slows down. If things start lagging, you need to figure out whether the problem is in preprocessing, data loading, or storage I/O. That’s where monitoring platforms like Uptrace or Datadog come in. These platforms are built on observability: the ability to collect and analyze metrics, logs, and traces to understand how your system behaves under load. They help surface what’s actually slowing things down.

How UltiHash supports observability

UltiHash supports observability through native support for OpenTelemetry, an open-source framework that captures and transports telemetry data across your stack. It integrates smoothly with systems like Prometheus, Grafana, or Datadog. Metrics and logs flow through a standardized protocol, with no need for custom instrumentation.

With these metrics, teams can:

  • track how storage handles peak access patterns
  • monitor throughput and latency over time
  • debug unexpected slowdowns with precision

You can set up alerts when free space runs low or when data egress exceeds expected thresholds. You can trace slowdowns in model inference or RAG retrieval to object-level read latency. You can finally connect storage performance to application behavior, and act before things fall over.

Next up: a step-by-step guide to enabling observability when testing UltiHash locally with our Docker setup.

A quick guide to monitoring UltiHash with OpenTelemetry

When you're testing up UltiHash locally with Docker, you don’t need to fly blind. The test setup comes prewired with OpenTelemetry, so you can connect an observability platform and start streaming real metrics, logs, and traces from the get-go. In this guide, we’ll show you how to connect it to Uptrace, a platform many of our users have found easy to get started with. No extra setup, no custom tweaks, it all works out of the box.

1. Set up credentials in Terminal

Start by setting up your credentials in the terminal.

docker login registry.ultihash.io -u <registry-login>

You’ll find <registry-login> in your UltiHash dashboard. After that, you’ll need to export your environment variables, also available in your dashboard.

#UH_CUSTOMER_ID and UH_ACCESS_TOKEN grant you access to your UltiHash license
export UH_CUSTOMER_ID="<customer-id>"
export UH_ACCESS_TOKEN="<access-token>"

#AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY grant you access to your UltiHash cluster
export AWS_ACCESS_KEY_ID="FANCY-ROOT-KEY"
export AWS_SECRET_ACCESS_KEY="SECRET"

2. Download setup files

We prepared the configuration YAML files you’ll need to download for the setup, and put them in the folder available below.

3. Start up UltiHash

Now, you can run the following command to start the UltiHash cluster with the Docker configuration needed to export traces, logs and metrics to Uptrace.

docker compose up -d

4. Write data to bucket

Start by creating a bucket, and writing data to your UltiHash cluster. To do so, follow these commands:

# Create a bucket
aws s3api create-bucket --bucket <bucket-name> --endpoint-url http://127.0.0.1:8080

# Upload your dataset (your dataset is a folder)
aws --endpoint-url http://127.0.0.1:8080 s3 cp 'path-to-your-data' s3://bucket-name/ --recursive

# Upload your dataset (your dataset is a file)
aws --endpoint-url http://127.0.0.1:8080 s3 cp 'path-to-your-data' s3://bucket-name/

5. Log into Uptrace

Go to your browser and go to localhost:14318/login , this will redirect you to the Uptrace login page where you’ll find generic credentials. This address is private and can only be accessed by you (unless you give someone else permission or make the address public)

6. Access UltiHash metrics

Once logged in, you’ll land on the Uptrace overview dashboard

From this view, you should access the UltiHash environment, as described in Image 2. You need to click on the Uptrace tab, a menu will appear with the options Uptrace and UltiHash, there, select UltiHash.

After this step, you’ll have access the UltiHash environment on Uptrace, and will see a dashboard similar to the one below. On this view, you can see the system overview.

7. Get further details

To get further into details as to what has happened, you can go to Traces & Logs, which will display all the traced and logs of your system.

Finally you can deep-dive into the metrics of your system in the Metrics tab (in the Metrics Explorer). All metrics UltiHash is tracking can be found there.

8. Explore individual metrics

You can go a step further, and focus on one metric, or even see how several metrics behave together overtime. Image 5 displays the write requests to UltiHash (storage_write_req) and the storage used (storage_used_space_gauge) behave overtime, in this case over the past 30 min. In this case, we see that - quite as expected - storage_used_space_gauge increases as more write requests are made (storage_write_req).

Voilà! You’re all set to test UltiHash and monitor your operations. For further detailed setup instructions, you can refer to our documentation. Don’t hesitate to reach out to us if you’re unsure about the setup or if you have questions.

Overview of key storage metrics to track with OpenTelemetry

UltiHash emits detailed, low-level metrics categorized across services. Here’s a sample breakdown of what you’ll be able to see:

Storage Service Requests

These metrics track how the storage layer is being accessed and how often. Monitoring these gives visibility into the system’s I/O behavior, essential when your performance depends on streaming thousands of files quickly and in parallel.

  • storage_read_fragment_req: requests to read a fragment
  • storage_write_req: data write requests
  • storage_sync_req: calls to persist data to disk
  • storage_remove_fragment_req: deletions
  • storage_used_req: requests to check space usage

Entrypoint (S3 API) Requests

These capture every S3-compatible API call handled by UltiHash. It’s where you see what your applications are doing, creating buckets, uploading objects, listing contents, and how they’re interacting with the storage system.

  • entrypoint_get_object_req, put_object_req, list_objects_req, etc.: full visibility into every S3-compatible call hitting your cluster

Cache Utilization and Efficiency

These metrics provide insight into how well the system is using its caching layers, both for reducing I/O and speeding up hot-path queries.

  • gdv_l1_cache_hit_counter / miss_counter: performance of L1 in-memory cache
  • gdv_l2_cache_hit_counter / miss_counter: L2 cache stats for deeper look

I/O and Resource Monitoring

These give you an overview of system health: how much data is flowing through, how full the system is, and how many connections are being handled at a time.

  • entrypoint_ingested_data_counter: volume of uploaded data
  • entrypoint_egressed_data_counter: data served out of the system
  • storage_available_space_gauge, storage_used_space_gauge: real-time storage usage
  • active_connections: number of concurrent connections handled

This gives you full visibility into the system’s behavior, from core I/O and API traffic to storage health and caching efficiency.

Share this post:
Check this out:
How to monitor your UltiHash test installation
Discover all the metrics, logs and traces you can track when testing UltiHash
Posted by
Juliette Lehmann
Founder's Associate
Build faster AI infrastructure with less storage resources
Get 10TB Free

How to monitor your UltiHash test installation

Discover all the metrics, logs and traces you can track when testing UltiHash

Why storage observability matters in AI workflows

Training a model, say, a speech-to-text system, often means processing hours of audio data. Your storage needs to keep up with high-throughput reads, or else GPU utilization drops and training slows down. If things start lagging, you need to figure out whether the problem is in preprocessing, data loading, or storage I/O. That’s where monitoring platforms like Uptrace or Datadog come in. These platforms are built on observability: the ability to collect and analyze metrics, logs, and traces to understand how your system behaves under load. They help surface what’s actually slowing things down.

How UltiHash supports observability

UltiHash supports observability through native support for OpenTelemetry, an open-source framework that captures and transports telemetry data across your stack. It integrates smoothly with systems like Prometheus, Grafana, or Datadog. Metrics and logs flow through a standardized protocol, with no need for custom instrumentation.

With these metrics, teams can:

  • track how storage handles peak access patterns
  • monitor throughput and latency over time
  • debug unexpected slowdowns with precision

You can set up alerts when free space runs low or when data egress exceeds expected thresholds. You can trace slowdowns in model inference or RAG retrieval to object-level read latency. You can finally connect storage performance to application behavior, and act before things fall over.

Next up: a step-by-step guide to enabling observability when testing UltiHash locally with our Docker setup.

A quick guide to monitoring UltiHash with OpenTelemetry

When you're testing up UltiHash locally with Docker, you don’t need to fly blind. The test setup comes prewired with OpenTelemetry, so you can connect an observability platform and start streaming real metrics, logs, and traces from the get-go. In this guide, we’ll show you how to connect it to Uptrace, a platform many of our users have found easy to get started with. No extra setup, no custom tweaks, it all works out of the box.

1. Set up credentials in Terminal

Start by setting up your credentials in the terminal.

docker login registry.ultihash.io -u <registry-login>

You’ll find <registry-login> in your UltiHash dashboard. After that, you’ll need to export your environment variables, also available in your dashboard.

#UH_CUSTOMER_ID and UH_ACCESS_TOKEN grant you access to your UltiHash license
export UH_CUSTOMER_ID="<customer-id>"
export UH_ACCESS_TOKEN="<access-token>"

#AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY grant you access to your UltiHash cluster
export AWS_ACCESS_KEY_ID="FANCY-ROOT-KEY"
export AWS_SECRET_ACCESS_KEY="SECRET"

2. Download setup files

We prepared the configuration YAML files you’ll need to download for the setup, and put them in the folder available below.

3. Start up UltiHash

Now, you can run the following command to start the UltiHash cluster with the Docker configuration needed to export traces, logs and metrics to Uptrace.

docker compose up -d

4. Write data to bucket

Start by creating a bucket, and writing data to your UltiHash cluster. To do so, follow these commands:

# Create a bucket
aws s3api create-bucket --bucket <bucket-name> --endpoint-url http://127.0.0.1:8080

# Upload your dataset (your dataset is a folder)
aws --endpoint-url http://127.0.0.1:8080 s3 cp 'path-to-your-data' s3://bucket-name/ --recursive

# Upload your dataset (your dataset is a file)
aws --endpoint-url http://127.0.0.1:8080 s3 cp 'path-to-your-data' s3://bucket-name/

5. Log into Uptrace

Go to your browser and go to localhost:14318/login , this will redirect you to the Uptrace login page where you’ll find generic credentials. This address is private and can only be accessed by you (unless you give someone else permission or make the address public)

6. Access UltiHash metrics

Once logged in, you’ll land on the Uptrace overview dashboard

From this view, you should access the UltiHash environment, as described in Image 2. You need to click on the Uptrace tab, a menu will appear with the options Uptrace and UltiHash, there, select UltiHash.

After this step, you’ll have access the UltiHash environment on Uptrace, and will see a dashboard similar to the one below. On this view, you can see the system overview.

7. Get further details

To get further into details as to what has happened, you can go to Traces & Logs, which will display all the traced and logs of your system.

Finally you can deep-dive into the metrics of your system in the Metrics tab (in the Metrics Explorer). All metrics UltiHash is tracking can be found there.

8. Explore individual metrics

You can go a step further, and focus on one metric, or even see how several metrics behave together overtime. Image 5 displays the write requests to UltiHash (storage_write_req) and the storage used (storage_used_space_gauge) behave overtime, in this case over the past 30 min. In this case, we see that - quite as expected - storage_used_space_gauge increases as more write requests are made (storage_write_req).

Voilà! You’re all set to test UltiHash and monitor your operations. For further detailed setup instructions, you can refer to our documentation. Don’t hesitate to reach out to us if you’re unsure about the setup or if you have questions.

Overview of key storage metrics to track with OpenTelemetry

UltiHash emits detailed, low-level metrics categorized across services. Here’s a sample breakdown of what you’ll be able to see:

Storage Service Requests

These metrics track how the storage layer is being accessed and how often. Monitoring these gives visibility into the system’s I/O behavior, essential when your performance depends on streaming thousands of files quickly and in parallel.

  • storage_read_fragment_req: requests to read a fragment
  • storage_write_req: data write requests
  • storage_sync_req: calls to persist data to disk
  • storage_remove_fragment_req: deletions
  • storage_used_req: requests to check space usage

Entrypoint (S3 API) Requests

These capture every S3-compatible API call handled by UltiHash. It’s where you see what your applications are doing, creating buckets, uploading objects, listing contents, and how they’re interacting with the storage system.

  • entrypoint_get_object_req, put_object_req, list_objects_req, etc.: full visibility into every S3-compatible call hitting your cluster

Cache Utilization and Efficiency

These metrics provide insight into how well the system is using its caching layers, both for reducing I/O and speeding up hot-path queries.

  • gdv_l1_cache_hit_counter / miss_counter: performance of L1 in-memory cache
  • gdv_l2_cache_hit_counter / miss_counter: L2 cache stats for deeper look

I/O and Resource Monitoring

These give you an overview of system health: how much data is flowing through, how full the system is, and how many connections are being handled at a time.

  • entrypoint_ingested_data_counter: volume of uploaded data
  • entrypoint_egressed_data_counter: data served out of the system
  • storage_available_space_gauge, storage_used_space_gauge: real-time storage usage
  • active_connections: number of concurrent connections handled

This gives you full visibility into the system’s behavior, from core I/O and API traffic to storage health and caching efficiency.