CONTENTS
Generative AI is having a huge impact on everyday lifeLLMs are streamlining a whole range of processesHow do LLM architectures and the transformer model work?Storing training data is becoming a major bottleneck.Resource efficiency and performance
for generative AI? UltiHash bridges the gap
Any questions?
USE CASE

Fast, efficient storage foundation for generative AI: the case of LLMs

Generative AI is having a huge impact on everyday life.

Generative AI is a broad space comprising general-purpose “foundation models” like Google Gemini or Mistral and smaller scale models that have been trained to understand the challenges of specific industries like smart cities to health-tech. In both these scenarios, generative AI is a catalyst for innovation. Expanding on this, generative AI can be merged with predictive analytics, like with Pecan AI which allows users to quickly customize predictive queries and results for specific business needs without hand-code models.

There are various different techniques of neural networks (such as Large Language Models, Generative Adversarial Networks, Variational Autoencoders, etc.) that can be used for data modeling, and reinforcement learning (RL) is employed for decision-making and model refinement. These methods are hungry in variability of data to maximize the models’ output quality. Unstructured data is the first data type used to train generative AI models, and the more ingestion data the better the output models’ quality, where data volumes can easily reach a petabyte-scale.
Generative AI in health tech

Turning to health tech: Insilico Medicine leverages on generative AI to model new molecules with Chemistry42, their generative chemistry application, to find a cure for cancer. Recently Insilico received the approval from the U.S Food and Drug Administration (FDA) for their Investigational New Drug (IND) application for the molecule ISM3412. The latter is an inhibitor of an enzyme essential to produce a biochemical agent for successful cell growth and proliferation. Introducing such molecule will therefore lower the enzyme levels, lowering the presence of said biochemical agent, resulting in decreasing cancerous cell growth and proliferation.

LLMs are streamlining a whole range of processes.

In today’s business landscape, LLMs are being leveraged more and more: from smart assistants like Notion AI that can translate, shorten or improve a text; to GitHub Copilot that can code, test and debug code. LLMs have broad business value, let’s take the example of a marketing department to grasp how LLMs can be implemented there: the company resides in Germany and sells its product all over Europe. The company has a yearly process consisting in gathering customer feedback under the shape of a survey, which output will be used to improve the product, but also to connect with customers. To execute this process, the company needs to create a meaningful survey (ask the right questions), analyze the feedback and individually email the respondents.

A process that could take months and require extensive resources in terms of translation, analysis and individualization could be significantly improved by LLMs. With an LLM-powered application, the company could fully automate time-consuming tasks such as the analysis, content creation and presentation of these insights to the relevant stakeholders. To start with, the LLM-powered application could analyse product performance over the past year and identify areas of improvement which will be the underlying purpose of the survey. Using these inputs, it can then write the survey’s questions and deliver these in the official language of each of the desired countries. After deciding the appropriate goals of the survey and localising the content, the LLM-powered application can take this much further by: 1. analyzing the results of the survey, 2. preparing a full report in German to each of the adequate departments and 3. writing individualized emails to each of the respondents to follow-up based on their answers in their language.

How do LLM architectures and the transformer model work?

LLMs’ architecture relies on neural networks that are designed to recognize patterns in text data. However, LLMs’ output is not a suite of words independent from each other, it should be meaningful sentences. LLMs are trained on neural networks, and were previously mainly built with Recurrent Neural Networks (RNNs). RNNs present an architecture that has the capability to remember context, in other words, to remember the word in position N-1 to predict the word in position N, and so on.

However, this architecture was not optimized for long-range dependencies, which was only introduced in the paper ‘Attention Is All You Need’ published by Google Brain in 2017 shedding light on a new architecture: the transformer model. It is based on self-attention mechanisms that assess the relevance of each part of an input sequence to each part of the output. In other words, transformer models assign a significance level to words in a sentence, enhancing long-range dependencies: they are state of the art approaches to sentiment analysis, text translation, and text generation.
Source: The Transformer - model architecture, Vaswani et al., 2017, "Attention Is All You Need"
Transformers are composed of two pillars: an encoder and a decoder. The encoder processes the input sentence through several stacks of layers, of which each has two sub-layers: a multi-head self-attention mechanism and a connected feed-forward network. The first sub-layer aims to weigh the importance of each word in a sentence against the others’ and look at the sentence from different angles. Self-attention mechanisms call upon linear functions such as dot products, which determine how much attention the model pays to each input element with respect to all the other elements using multiplications operations to achieve its role.

However, in practice, it is quite rare for the input and output to have linear relationship, for example, in text prediction the output of the LLM is not linear to the input as it heavily depends on the situation’s context. Therefore, the second sub-layer is a feed-forward network that introduces non-linearity with functions such as ReLU (rectified linear unit) or GELU (Gaussian error linear unit). Non-linearity allows the model to identify more complex patterns and to understand nuanced relationships generating deeper, more contextually relevant outputs. The encoder consists of several layers, enabling hierarchical learning that processes and reprocesses information for deeper and more complex understanding.

The output of the encoder is a fixed-length vector known as the hidden state, or the encoding which is passed to the decoder to generate the output of the LLM. The decoder architecture begins with a masked attention layer preventing information leakage from future tokens. The decoder, like the encoder, is composed of different layers of multi-head self-attention mechanisms calculating the source-target attention. The multi-head self-attention and forward feeding network sub-layers in the decoder each perform distinct roles: the first identifies relationships and dependencies within the sequence being decoded, while the second introduces non-linearity, enabling the model to grasp complex patterns like idiomatic expressions and contextual nuances. The decoder makes predictions one step at a time, hence the need for multiple layers to predict the entire output.

Storing training data is becoming a major bottleneck.

Training LLMs like OpenAI's GPT involves handling massive amounts of data and requires significant computational resources, including specialized data storage solutions. The choice of storage can significantly impact the resource consumption and speed of the training process. Referencing their ties to legendary F1 team McLaren, Dell, one of our innovation partners, managed to perfectly summarise the role that infrastructure plays in training Generative AI models: data is the fuel, storage the fuel tank and compute the engine. This highlights the crucial role that storage plays in deploying generative AI. Without the right foundation, storage is quickly becoming a last-mile bottleneck for companies that are adopting this new technology. In fact, since early 2024, many companies have recognized that their primary data storage solution will be a major bottleneck, particularly in the context of LLMs.
LLMs offer high potential allowing for faster time-to-value, and enhanced productivity. However, it also requires strong implications and commitments from businesses: multiplying data sources, expansive tech-stacks and scalable data storage. The Stanford Human-Centered Artificial Intelligence the AI index published this April reports ChatGPT-4’s training costs above $90 millions, and that funding for generative AI surged to reach $25.2 billion. These monetary metrics indicate a first taste of the devotion that generative AI require, but the challenge goes further than current budgets: Amazon developed their AI assistant, Amazon Q, by training it on 17 years of data! This data must be stored in solutions capable of significant scaling and delivering high speeds, regardless of the data's recency or when it was last updated. Organizations need hot data storage capabilities for historical data, allowing access within milliseconds at the cost of cold storage. Unfortunately, the high cost of hot storage currently makes it too costly for companies to maintain historical data in this tier. Companies dealing with large data volumes, especially those using generative AI, are witnessing exponential data growth. This growth leads to an ever-increasing cycle of resource consumption, rising electricity and water usage, along with frequent hardware upgrades and updates, resulting in both cost and resource inefficiencies.

Resource efficiency and performance
for generative AI?

UltiHash bridges the gap.

UltiHash is the primary underlying storage foundation for AIRAs (AI-ready architectures), container-native for cloud and on-premises applications. UltiHash is designed to handle peta- to exabyte-scale data volumes while maintaining high speed and being resource-efficient.

Resource-efficient scalable storage

We offer fine-granular sub-object-deduplication across the entire storage pool. This blended technology allows for high and optimal scalability. The result? Storage volumes do not grow linearly with your total data.

Lightweight, CPU-optimized deduplication

We maintain high performance through an architecture designed to handle high IOPS. Moreover, our lightweight deduplication algorithm keeps CPU-time at minimum, delivering fast read to optimize time-to-model and time-to-insights.

Flexible + interoperable via S3 API

UltiHash offers high interoperability through its native S3-compatible API. UltiHash supports processing engines (Flink, Pyspark), ETL tools (Airflow), open table formats (Delta Lake, Iceberg) and querying engines (Presto, Hive). If you’re using a tool we are not supporting yet, let us know and we’ll look into it!

Any questions?

What is UltiHash?

UltiHash is the neat foundation for data-intensive applications. It is powered by deduplication algorithms and streamlined storage techniques. It leverages on past data integrations to generate significant space savings while delivering high-speed access. UltiHash enhances your data management as it makes large datasets, and data growth having a synergistic effect on your infrastructure.

What does UltiHash offer?

UltiHash facilitates data growth within the same existing storage capacity. UltiHash deduplicates per and across datasets from terabytes to exabytes: users store only what they truly need. It’s fast, efficient, and works at a byte level, making it agnostic to data format or type. With UltiHash, the trade-off between high costs and low performance is a thing of the past.

What is an object storage?

Object storage is a data storage solution that is suitable to store all data types (structured, semi-structured and unstructured) as objects. Each object includes the data itself, its metadata, and a unique identifier, allowing for easy retrieval and management. Unlike traditional file or block storage, object storage is highly scalable, making it ideal for managing large amounts of unstructured data.

How does data deduplication work in UltiHash?


Data is analysed on a byte level and dynamically split into fragments, which allows the system to separate fragments that are unique from those that contain duplicates. UltiHash matches duplicates per and across datasets, leveraging the entirety of the data. Fragments that are unique and were not matched across the dataset or past integrations are then added to UltiHash, while matches are added to an existing fragment. This is our process to keep your storage footprint growth sustainable.

What is unique about UltiHash?

UltiHash efficiently stores your desired data volume, providing significant space savings, high speed and the flexibility to scale up seamlessly. Increase your data volumes within the existing storage capacity, without compromising on speed.

Can UltiHash be integrated in existing cloud environments?

Absolutely - UltiHash can be integrated to existing cloud environments, such those that leverage EBS. UltiHash was designed to be deployed in the cloud, and we can suggest specific machine configurations for optimal performance. The cloud environment remains in the hands of the administrator, who can configure it as preferred.

What API does UltiHash provide and connect to my other applications?

UltiHash provides an S3-compatible API. The decision for our API to be S3 compatible was made with its utility in mind - any S3 compatible application qualifies as a native integration. We want our users to have smooth and seamless integration.

How does UltiHash ensure data security and privacy?

The user is in full control of the data. UltiHash is a foundation layer that slides into an existing IT system. The infrastructure, and data stored, are the sole property of the user: UltiHash merely configures the infrastructure as code.

Is UltiHash suitable for both large and small scale enterprises?

UltiHash was designed to empower small and large data-driven organisations to pursue their thirst to innovate at a high pace.

What type of data can be stored in UltiHash?

The data integrated through UltiHash is read on a byte level; in other words, UltiHash processes are not impacted by the type or format of data integrated and works with structured, semi-structured and unstructured data.

What are the pricing models for UltiHash services?

UltiHash currently charges a fixed fee of $6 per TB per month - whether on-premises or in the cloud.

Need more answers?