CONTENTS
LLMs in everyday lifeLLM power innovation in healthtechLLM model architecture - the transformer modelHow to train an LLMAn LLM’s data infrastructure, what should it look like?Resource efficiency and performance
for generative AI? UltiHash bridges the gap
Any questions?
USE CASE

Fast, efficient Object Storage for Generative AI Infrastructures:

The case of LLMs

LLMs in everyday life

In today’s business landscape, Large Language Models (LLMs) are increasingly being utilized, from Notion AI's smart assistants that can translate, shorten, or improve text, to GitHub Co-pilot's ability to create, test, and debug code. These models offer substantial business value. Consider a marketing department in a German company that sells products across Europe. Annually, the company gathers customer feedback through a survey to enhance products and engage customers. Traditionally, this process could take months and require extensive resources for translation, analysis, and customization. However, an LLM can automate these tasks in record time, handling content creation, feedback analysis, and the delivery of insights to stakeholders.

An LLM-powered application can analyze product performance over the past year, identify areas for improvement, draft survey questions, and deliver them in the target consumers' native languages. It can also analyze survey results, prepare comprehensive reports in German for each department, and write individualized follow-up emails to respondents in their native languages based on their answers. By adopting an LLM, the company can streamline its feedback process, saving significant time and resources while maintaining a high level of personalization and accuracy.

LLM power innovation in healthtech

An application in Healthcare is using LLMs as a time-saving resource for doctors by answering patients’ messages, with the aim to increase availability for in-person consultations. However, currently this comes with limitations: a May 2024 study from Stanford HAI concluded that LLM's aren't mature enough to directly access patient details and sensitive communication due to concerns of confidentiality and the need for highly accurate responses. Hence, for use cases requiring high precision in combination with low failure tolerance current LLMs are reaching their limits.

Smart assistants are not the only role LLMs can play in this sector. Insilico, a US-based biotechnology company, is developing new solutions for pharmaceutical research and development powered by generative AI. In May 2024, they announced a partnership with NVIDIA to develop an LLM, “nach0”, integrating both natural language texts and chemical structure descriptions to perform diverse tasks such as answering biomedical queries and synthesising new molecules. In the field, previous models could be divided into two groups: the first including models like BioBert was trained on biomedical natural language texts (genes, drugs, etc.) but lacked training on chemical structures. The second group was trained on both biomedical texts and chemical structure descriptions but hasn't been trained for diverse chemical tasks. Nach0 performs three key tasks: natural language processing (document classification and question answering), chemistry-related tasks (molecular property prediction, molecular generation, and reagent prediction), and cross-domain tasks (description-guided molecule design and molecular description generation). “Nach0 represents a step forward in automating drug discovery through natural language prompts,” says Alex Zhavoronkov, PhD, founder and CEO of Insilico Medicine.

LLM model architecture - the transformer model

But how do Large Language Models actually work? Let’s dig into the their architecture.

LLMs’ architecture relies on neural networks that are designed to recognize patterns in text data. In 2017, Google Brain published a paper ‘Attention Is All You Need’ shedding light on the transformer model. It is based on self-attention mechanisms that assess the relevance of each part of an input sequence to each part of the output. In other words, transformer models assign a significance level to words in a sentence, giving meaning to the dependencies between words across sentences - they are state of the art approaches to sentiment analysis, text translation, and text generation.

A transformer model is composed of two main entities: an encoder and a decoder, which work together for the LLM to learn how to create data based on an input. Both are composed of different layers, each containing multi-head self-attention mechanisms and feed-forward networks. The multi-head self-attention mechanism helps the LLM understand how words relate to each other, while the feed-forward network enhances this process. For example, in the sentence “I usually like cheese but the one I had for lunch wasn't nice,” the multi-head self-attention mechanism connects “cheese” and “the one I had at lunch,” while the feed-forward networks refine this connection for a more nuanced understanding. This is crucial for an LLM to understand human prompts, even across different languages.

These mechanisms do more than ensure correct grammatical structure; they also grasp the meaning of idiomatic expressions. For instance, the English idiom “kill two birds with one stone” does not translate literally in other languages. The actual idiomatic expressions in German and French convey the same idea but with different words. This highlights how transformers must capture not only syntactic structure but also cultural and contextual nuances of language.
Source: The Transformer - model architecture, Vaswani et al., 2017, "Attention Is All You Need"
More about the transformer

The first sub-layer of the transformer model aims to weigh the importance of each word in a sentence against the others and view the sentence from different perspectives. Self-attention mechanisms use linear functions like dot products to determine the attention each word receives relative to others. However, text prediction often involves non-linear relationships, where output heavily depends on contextual nuances. The second sub-layer is a feed-forward network introducing non-linearity through functions like ReLU (rectified linear unit). It enables the model to identify complex patterns and nuanced relationships for deeper, contextually relevant outputs which cannot be achieved with simple linear transformations (first sub-layer). The encoder consists of several layers, enabling hierarchical learning that processes and reprocesses information for deeper understanding. The multi-headed attention mechanism allows the model to focus on multiple parts of the input sequence simultaneously.

The encoder's output is a fixed-length vector known as the hidden state, or encoding, passed to the decoder to generate the LLM's output. The decoder begins with a masked attention layer to prevent future token influence. It is essential as it ensures sequential text generation, predicting one word at a time without being influenced by future words. This mirrors how humans read and write, making the generated text more coherent and contextually accurate. Like the encoder, the decoder has multiple layers of multi-head self-attention mechanisms calculating source-target attention. In the decoder, multi-head self-attention identifies relationships within the sequence, while the feed-forward network introduces non-linearity to grasp complex patterns and contextual nuances. The decoder makes predictions one step at a time, necessitating multiple layers to predict the entire output.

In summary, the transformer model's encoder-decoder architecture, multi-head self-attention, and feed-forward networks work together to process input sequences, capture contextual nuances, and generate coherent, contextually relevant outputs, revolutionizing natural language processing and generation.

How to train an LLM

There are six strategies for implementing a large language model (LLM). Off-the-shelf models are ready-made products that can sometimes be optimized by adjusting parameters or fine-tuned through additional training on specific data. Users can also choose to develop their own models, either starting from scratch or using a baseline pre-trained model like BERT, which they can further train on their data.

This guide outlines ten essential steps for training an LLM, providing a comprehensive overview. However, if the goal is to focus on specific approaches like fine-tuning or optimizing an LLM, attention can be directed to the relevant steps as follows:
Using an Off-the-Shelf LLM
Directly proceed to deployment and monitoring.
Fine-Tuning an Off-the-Shelf LLM
Follow steps on defining objectives, collecting and preprocessing data, configuring training parameters, preparing the training environment, training, and evaluating the model.
Fine-tune and Optimizing an Off-the-Shelf LLM
Focus on steps related to configuring training parameters, preparing the training environment, optimizing the model.
Fine-Tuning a Pre-trained transformer model
Emphasize steps on defining objectives, collecting and preprocessing data, selecting model architecture, configuring training parameters, preparing the training environment, training, and evaluating the model.
Fine-Tuning & Optimizing a Pre-trained transformer model
Concentrate on steps related to configuring training parameters, preparing the training environment, and optimizing the model.
Training a LLM from Scratch
Follow all ten steps comprehensively to ensure a thorough and effective training process.
This structured approach ensures that the training process is tailored to the specific needs and goals of the model, whether leveraging existing solutions or developing new capabilities.
1
Define your objective
Establishing a clear objective sets the direction for the entire project, influencing data collection, model architecture selection, and evaluation criteria. Without a specific goal, the project may lack focus and effectiveness. Identifying whether the aim is to develop a conversational AI, a content generator, or a specialized tool for a particular industry is crucial. Understanding the end-users and the specific problems the model will address is key for an optimal LLM learning path.
2
Collect and Preprocess Data
A diverse dataset relevant to the objective is essential. Data cleaning involves removing noise, tokenizing for consistency, and normalizing to standard formats. For handling large datasets, with raw and processed data, data lake or lakehouse architecture are highly recommended.
They rely on object storage due to their scaling capability and their ability to manage and integrate unstructured, semi-structured, and structured data in a central infrastructure.
3
Select or Develop a Model Architecture
Choosing between transformer architecture such as GPT, BERT, or a custom model is a key decision in the LLM building process. Leveraging pre-trained models can accelerate development and provide a strong foundation. Tools like Hugging Face's Transformers library offer training datasets and pre-trained models.
Pre-trained models and checkpoints should be stored in accessible, high-availability storage.
4
Configure Training Parameters
Set hyperparameters including learning rate, batch size, and number of epochs. Select an optimizer (e.g., Adam, SGD). Use hyperparameter tuning frameworks like Optuna or Hyperopt to optimize performance.
Ensure that you have sufficient storage for saving model checkpoints and logs.
5
Prepare the Training Environment
Efficient training requires computational resources such as GPUs or TPUs, paired with software frameworks like TensorFlow or PyTorch. Cloud-based solutions, including AWS EC2 instances or Google Cloud AI Platform, provide the necessary scalable infrastructure. Containerization with Docker and orchestration with Kubernetes can help manage scalability and resources effectively.
Object storage provides low latency and high throughput, enhancing overall training efficiency.
6
Train the Model
During training, closely monitoring progress and adjusting parameters ensures optimal performance. Visualization tools like TensorBoard can provide valuable insights. Regular checkpointing saves the model's state, allowing for recovery in case of interruptions.
High-throughput storage supports efficient read/write operations, and implementing data versioning helps track changes over time.
7
Evaluate the Model
Model performance is assessed using validation datasets and metrics such as accuracy, loss, and perplexity. Thorough testing ensures the model generalizes well to unseen data.
Keeping validation datasets easily accessible and stored separately aids in effective evaluation.
8
Optimize and fine-tune the model (optional)
Optimizing the LLM can improve its efficiency and performance for deployment. Optimization techniques include pruning and quantization which are methods helping to reduce model size and enhance inference speed without sacrificing accuracy. Libraries like the TensorFlow Model Optimization Toolkit can be helpful. Fine-tuning the model involves feeding the LLM more data to adapt a pre-trained model to specific tasks or datasets, enhancing specific outputs' accuracy. Fine-tuning techniques include transfer learning, parameter tuning, data augmentation, and hyperparameter optimization. The key difference is that all these techniques focus on further training the LLM with additional data.
Proper storage for fine-tuned models and maintaining version control are also important. The object storage handles the models that have been fine-tuned in a way that they can be easily accessed and managed. Additionally, keeping track of different versions of these models (version control) ensures that changes can be monitored and previous versions can be restored if needed.
9
Deploy the Model
Deployment involves setting up infrastructure that includes APIs for model access. Using platforms like Kubernetes or serverless architectures ensures scalable deployment. A continuous integration and continuous deployment (CI/CD) pipeline facilitates regular updates.
Robust storage solutions for model hosting and logging help maintain reliability and performance.
10
Monitor and Maintain
Continuous monitoring of the model’s performance in production is crucial. Tools like Prometheus and Grafana provide real-time insights. Regular updates and retraining with new data keep the model relevant and accurate.
Effective logging and alerting systems quickly address any issues, and maintaining performance metrics and logs is essential for long-term success.
You’re done - or almost done…

At this point, you should assess your LLM against the plan you had for it. LLMs can serve generic or specific purposes: all-purpose LLMs address general use-cases, while specialized LLMs provide tailored answers. Sometimes, training is not enough, and the LLM may have hallucinations, generating content that is irrelevant, made-up, or inconsistent with the input data. Other times, you need an LLM that produces up-to-date answers. These challenges can be solved with a RAG-system implementation. Check out <this link> for more info about RAG and how to set it up.

Training LLMs involves handling massive amounts of data and requires significant computational resources, including specialized data storage solutions. The choice of storage can significantly impact the resource consumption and speed of the training process. Referencing their ties to legendary F1 team McLaren, Dell, managed to perfectly summarise the role that infrastructure plays in training Generative AI models: data is the fuel, storage the fuel tank and compute the engine. This highlights the crucial role that storage plays in deploying generative AI. Without the right foundation, storage is quickly becoming a last-mile bottleneck for companies that are adopting this new technology.

An LLM’s data infrastructure, what should it look like?

Just like any generative AI method, LLMs requires a massive amount of data storage. The success of carrying out MLOps such as LLM training relies firstly on the data infrastructure that support these operations. Indeed, the underlying data infrastructure has a make it or break it kind of power.

What are these data architecture requirements and where are they coming from? LLMs require large training datasets, high throughput, and frequent checkpointing. Data Lake or Data Lakehouse Architectures are perfect for powering LLM training. Object storage is the preferred solution in this type of architecture because it supports auto-scalable storage for unstructured, semi-structured, and structured data in various formats. In the current market, when users are looking for a storage solution, they’re facing a very much binary situation: either expensive storage that allows for fast data retrieval, or affordable storage that has longer retrieval time. This shows the first issue faced in the industry: the current status quo does not solve the users’ challenges, it makes them sacrifice one of their constraint to the profit of the other, time for money or money for time. There is another challenge that is left unanswered in the world of data infrastructure: resource consumption is skyrocketing in terms of hardware, maintenance, cloud space and costs.

UltiHash is the new object storage that offers high performance, unlimited capacity, and resource efficiency. A common challenge users face is finding object storage that can keep up with the pace of their model training and slow storage results in significant costs increase. UltiHash was designed specifically for AI/ML architectures, providing high IOPS to enable users to maintain maximum GPU performance and train their models as quickly as possible. This feature also facilitates fast checkpoint writing and reading, allocating more time to LLM training rather than maintenance tasks. UltiHash is powered by a lightweight deduplication algorithm, which allows users to manage exponential data growth without impacting their data infrastructure.

What does a LLM data infrastructure look like? Well you’d start by setting up an ingestion layer to your UltiHash object storage, and integrate your ML tools like SageMaker, or compute instances like EC2 to it. Of course, a Data Lake or a Lakehouse can be added seamlessly, let’s illustrate this point.

Resource efficiency and performance
for generative AI?

UltiHash bridges the gap.

UltiHash is the primary underlying storage foundation for AIRAs (AI-ready architectures), container-native for cloud and on-premises applications. UltiHash is designed to handle peta- to exabyte-scale data volumes while maintaining high speed and being resource-efficient.

Resource-efficient scalable storage

We offer fine-granular sub-object-deduplication across the entire storage pool. This blended technology allows for high and optimal scalability. The result? Storage volumes do not grow linearly with your total data.

Lightweight, CPU-optimized deduplication

We maintain high performance through an architecture designed to handle high IOPS. Moreover, our lightweight deduplication algorithm keeps CPU-time at minimum, delivering fast read to optimize time-to-model and time-to-insights.

Flexible + interoperable via S3 API

UltiHash offers high interoperability through its native S3-compatible API. UltiHash supports processing engines (Flink, Pyspark), ETL tools (Airflow), open table formats (Delta Lake, Iceberg) and querying engines (Presto, Hive). If you’re using a tool we are not supporting yet, let us know and we’ll look into it!

Any questions?

What is UltiHash?

UltiHash is the neat foundation for data-intensive applications. It is powered by deduplication algorithms and streamlined storage techniques. It leverages on past data integrations to generate significant space savings while delivering high-speed access. UltiHash enhances your data management as it makes large datasets, and data growth having a synergistic effect on your infrastructure.

What does UltiHash offer?

UltiHash facilitates data growth within the same existing storage capacity. UltiHash deduplicates per and across datasets from terabytes to exabytes: users store only what they truly need. It’s fast, efficient, and works at a byte level, making it agnostic to data format or type. With UltiHash, the trade-off between high costs and low performance is a thing of the past.

What is an object storage?

Object storage is a data storage solution that is suitable to store all data types (structured, semi-structured and unstructured) as objects. Each object includes the data itself, its metadata, and a unique identifier, allowing for easy retrieval and management. Unlike traditional file or block storage, object storage is highly scalable, making it ideal for managing large amounts of unstructured data.

How does data deduplication work in UltiHash?


Data is analysed on a byte level and dynamically split into fragments, which allows the system to separate fragments that are unique from those that contain duplicates. UltiHash matches duplicates per and across datasets, leveraging the entirety of the data. Fragments that are unique and were not matched across the dataset or past integrations are then added to UltiHash, while matches are added to an existing fragment. This is our process to keep your storage footprint growth sustainable.

What is unique about UltiHash?

UltiHash efficiently stores your desired data volume, providing significant space savings, high speed and the flexibility to scale up seamlessly. Increase your data volumes within the existing storage capacity, without compromising on speed.

Can UltiHash be integrated in existing cloud environments?

Absolutely - UltiHash can be integrated to existing cloud environments, such those that leverage EBS. UltiHash was designed to be deployed in the cloud, and we can suggest specific machine configurations for optimal performance. The cloud environment remains in the hands of the administrator, who can configure it as preferred.

What API does UltiHash provide and connect to my other applications?

UltiHash provides an S3-compatible API. The decision for our API to be S3 compatible was made with its utility in mind - any S3 compatible application qualifies as a native integration. We want our users to have smooth and seamless integration.

How does UltiHash ensure data security and privacy?

The user is in full control of the data. UltiHash is a foundation layer that slides into an existing IT system. The infrastructure, and data stored, are the sole property of the user: UltiHash merely configures the infrastructure as code.

Is UltiHash suitable for both large and small scale enterprises?

UltiHash was designed to empower small and large data-driven organisations to pursue their thirst to innovate at a high pace.

What type of data can be stored in UltiHash?

The data integrated through UltiHash is read on a byte level; in other words, UltiHash processes are not impacted by the type or format of data integrated and works with structured, semi-structured and unstructured data.

What are the pricing models for UltiHash services?

UltiHash currently charges a fixed fee of $6 per TB per month - whether on-premises or in the cloud.

Need more answers?