UltiHash now supports parity-based storage using Reed-Solomon erasure coding - reducing overhead while keeping data resilient.
Today, we’re introducing a new parity-based storage engine, now available for testing in our v.1.4.0 public beta.
This upgrade brings built-in fault tolerance to every UltiHash cluster. If a node fails, data can be automatically reconstructed from parity data, with no downtime, no data loss, and no full-copy replication overhead. It’s a major step forward in making UltiHash more resilient by default.
At its core is Reed-Solomon erasure coding - the gold standard for distributed storage resilience, and the same technology trusted by hyperscale providers like Azure, Backblaze, and others to safeguard exabytes of critical data in production.
We’re excited to open this up for testing and learn how it performs in your real-world environments. Whether you’re pushing throughput or stress-testing reliability, we’d love your feedback.
What’s new
- Store data with Reed-Solomon parity: Data is split into
k
shards, with m
additional parity shards for recovery. The total capacity of an UltiHash storage cluster is k + m
data nodes, where k equals usable capacity and m is the number of node failures that can be tolerated without data loss. - Configurable storage groups: Choose how many data nodes to assign per group, which defines your erasure coding setup (
k
data shards + m
parity shards). For example, 6 data + 2 parity allows 2 node failures with 33% overhead. Stripe size (the amount of data split across k
shards) must be divisible by k
. - Kubernetes-native orchestration: UltiHash maps operational logic to native Kubernetes constructs - using StatefulSets and a distributed key-value store under the hood. It’s fully config-driven, so you define the group layout, and Kubernetes takes care of coordination.
- Read availability during failures: Your data remains accessible even if up to
m
data nodes in a group go down. No manual fail-over or recovery needed.
Why it matters
Traditional replication is simple but inefficient: storing two full copies of your data doubles your storage cost. Erasure coding breaks data into fragments and only stores extra parity shards, often reducing overhead by 50% or more while maintaining fault tolerance.
Replication still has its place: it can be more practical for multi–data center deployments, where data needs to be mirrored across locations to maximize availability. Replicated setups also maintain full throughput even when a node is down; erasure-coded groups may see reduced performance during recovery.
Here’s a table that compares the overhead of traditional 2× replication with various erasure coding schemes - and includes a recommendation for the kind of high-throughput production workloads UltiHash is designed for:
Configuration |
Storage overhead |
Notes |
– (replication), 3 total shards |
+100% |
Tolerates 2 full node failures. Very durable, but extremely inefficient |
2 data + 1 parity, 3 total shards |
+50% |
Tolerates 1 node failure. Lightweight but limited protection |
4 data + 2 parity, 6 total shards |
+50% |
Tolerates 2 node failures. Matches replication durability at half the cost |
6 data + 2 parity, 8 total shards |
+33% |
✅ Recommended for UltiHash workloads: tolerates 2 node failures with excellent efficiency and resilience |
10 data + 4 parity, 14 total shards |
+40% |
Tolerates 4 node failures. Strong fault tolerance at moderate overhead |
12 data + 4 parity, 16 total shards |
+33% |
Tolerates 4 node failures. Well-balanced for large-scale deployments |
12 data + 2 parity, 14 total shards |
+17% |
Tolerates 2 node failures. High storage efficiency, but slower recovery in degraded state |
Our implementation supports:
- High availability: Even if multiple nodes fail, your data stays available. For example, in a group with 6 data nodes and 2 parity nodes, all reads continue seamlessly even if 2 nodes go down.
- Custom tradeoffs: Tune
k
(data shards) and m
(parity shards) per group to fit your resilience and performance needs. Use setups like 6 data + 2 parity to balance performance and resilience - a lightweight default that tolerates 2 failures with only 33% overhead. Stripe sizes between 256–1024 KiB are recommended depending on object size. - Uniform storage requirements: All nodes in a group must provide the same logical capacity - otherwise, usable space is capped by the smallest. This is a logical constraint: nodes aren’t tied to physical disk sizes and can use partitioned storage.
And thanks to a fully declarative configuration model and etcd-backed group coordination, this all integrates seamlessly into your Kubernetes-managed clusters.
How it works
You can configure parity-based storage in UltiHash by defining storage groups in your Helm chart. Each group determines how data is split, stored, and protected - including how many shards are used for usable data (k
) and how many for redundancy (m
). This setup gives you control over the tradeoff between resilience, overhead, and performance, all using standard Kubernetes primitives.
Here’s an example:
storage:
groups:
- id: 0
type: ERASURE_CODING
storages: 3
data_shards: 2
parity_shards: 1
stripe_size_kib: 256
size: 5Gi # Storage capacity per shard
storageClass: local-path
stripe_size_kib
must be divisible by data_shards
parity_shards
must be ≤ data_shards
storages
= data_shards
+ parity_shards
- Storage capacity (
size
) is logical — not tied to physical disk size - Only one storage group (
id: 0
) is currently supported
These settings let you tune how much data can be lost (m
) and how much overhead you’re willing to trade for resilience.
Under the hood
- Data nodes are assigned to storage groups through Kubernetes StatefulSets, which ensure stable identity and placement within each group.
- Each group is coordinated internally to manage writes and ensure consistency across nodes - all handled transparently by the system.
- Data is written in fixed-size shards across the group, with parity shards computed and distributed in parallel.
- Reads in degraded mode automatically recover missing data on the fly using available shards. Although read performance remains good in some cases, overall performance will likely be lower than in a healthy state.
Each storage group is modeled as a virtual component: a class with shared state distributed via etcd and accessed by all nodes and interfaces in the group.
Limitations
Erasure coding is designed to handle a limited number of node failures within a storage group - typically up to m
parity shards’ worth. If more nodes fail than the group is configured to tolerate, data loss will occur.
This means erasure coding does not replace backups. It improves availability and resilience within a running cluster but doesn’t protect against total cluster loss, region-wide outages, or accidental deletions. For those cases, you should still use external backup and disaster recovery strategies.
What’s next
Parity-based storage is now available in the v1.4.0 public beta — and this is just the beginning. We’re actively working toward general availability, with upcoming improvements focused on real-world durability, scale, and control.
Here’s what’s coming next:
- User-defined group targeting per bucket: assign buckets to specific storage groups to control performance and fault tolerance at a granular level
- Bitrot detection and automatic healing: detect silent data corruption and recover affected shards using parity
- Hardware acceleration for Reed-Solomon encoding: offload encoding and decoding to improve throughput on compute-heavy workloads
- Group-aware scaling strategies: make it easier to deploy and rebalance storage groups across changing cluster environments
- Flexible scaling across storage groups: support larger clusters by coordinating multiple fixed-size groups, each with predictable performance and failure isolation
We’d love your input as we shape the roadmap. If you have specific requirements or challenges around data resiliency, contact us at support@ultihash.io. You can also submit your feedback at ultihash.io/feedback - and help guide what comes next.
This feature is already available with our free Community License, so you can try it today without commitment. For setup instructions, you see our full documentation about configuring storage groups for erasure coding.