Supporting Intel Gaudi AI accelerators¶
At dstack
, our goal is to make AI container orchestration simpler and fully vendor-agnostic. That’s why we support not
just leading cloud providers and on-prem environments but also a wide range of accelerators.
With our latest release, we’re adding support for Intel Gaudi AI Accelerator and launching a new partnership with Intel.
About Intel Gaudi¶
Intel Gaudi AI Accelerator is a series of accelerators built to handle AI tasks. Powered by Intel’s Habana architecture, Gaudi is tailored for high-performance AI inference and training, offering high throughput and efficiency. It has a scalable design with numerous cores and ample memory bandwidth, enabling better performance per watt.
Here's a brief spec for Gaudi 2 and Gaudi 3:
Gaudi 2 | Gaudi 3 | |
---|---|---|
MME Units | 2 | 8 |
TPC Units | 24 | 64 |
HBM Capacity | 96 GB | 128 GB |
HBM Bandwidth | 2.46 TB/s | 3.7 TB/s |
Networking | 600 GB/s | 1200 GB/s |
FP8 Performance | 865 TFLOPs | 1835 TFLOPs |
BF16 Performance | 432 TFLOPs | 1835 TFLOPs |
In the latest release, dstack
now supports the orchestration of containers across on-prem
machines equipped with Intel Gaudi accelerators.
Create a fleet¶
To manage container workloads on on-prem machines with Intel Gaudi accelerators, start by configuring an SSH fleet. Here’s an example configuration for your fleet:
type: fleet
name: my-gaudi2-fleet
ssh_config:
hosts:
- hostname: 100.83.163.67
user: sdp
identity_file: ~/.ssh/id_rsa
blocks: auto
- hostname: 100.83.163.68
user: sdp
identity_file: ~/.ssh/id_rsa
blocks: auto
proxy_jump:
hostname: 146.152.186.135
user: guest
identity_file: ~/.ssh/intel_id_rsa
To provision the fleet, run the dstack apply
command:
$ dstack apply -f examples/misc/fleets/gaudi.dstack.yml
Provisioning...
---> 100%
FLEET INSTANCE BACKEND GPU STATUS CREATED
my-gaudi2-fleet 0 ssh 152xCPU, 1007GB, 8xGaudi2 idle 3 mins ago
(96GB), 388.0GB (disk)
1 ssh 152xCPU, 1007GB, 8xGaudi2 idle 3 mins ago
(96GB), 388.0GB (disk)
Apply a configuration¶
With your fleet provisioned, you can now run dev environments, tasks, services.
Below is an example of a task configuration for fine-tuning the DeepSeek-R1-Distill-Qwen-7B
model using Optimum for Intel Gaudi
and DeepSpeed with
the lvwerra/stack-exchange-paired
dataset:
type: task
name: trl-train
image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- WANDB_API_KEY
- WANDB_PROJECT
commands:
- pip install --upgrade-strategy eager optimum[habana]
- pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0
- git clone https://github.com/huggingface/optimum-habana.git
- cd optimum-habana/examples/trl
- pip install -r requirements.txt
- pip install wandb
- DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python ../gaudi_spawn.py --world_size $DSTACK_GPUS_NUM --use_deepspeed sft.py
--model_name_or_path $MODEL_ID
--dataset_name "lvwerra/stack-exchange-paired"
--deepspeed ../language-modeling/llama2_ds_zero3_config.json
--output_dir="./sft"
--do_train
--max_steps=500
--logging_steps=10
--save_steps=100
--per_device_train_batch_size=1
--per_device_eval_batch_size=1
--gradient_accumulation_steps=2
--learning_rate=1e-4
--lr_scheduler_type="cosine"
--warmup_steps=100
--weight_decay=0.05
--optim="paged_adamw_32bit"
--lora_target_modules "q_proj" "v_proj"
--bf16
--remove_unused_columns=False
--run_name="sft_deepseek_70"
--report_to="wandb"
--use_habana
--use_lazy_mode
resources:
gpu: gaudi2:8
Submit the task using the dstack apply
command:
$ dstack apply -f examples/fine-tuning/trl/intel/.dstack.yml -R
dstack
will automatically create containers according to the run configuration and execute them across the fleet.
Explore our examples to learn how to train and deploy large models on Intel Gaudi AI Accelerator.
Intel Tiber AI Cloud
At dstack
, we’re grateful to be part of the Intel Liftoff program, which allowed us to access Intel Gaudi AI
accelerators via Intel Tiber AI Cloud .
You can sign up if you’d like to access Intel Gaudi AI accelerators via the cloud.
Native integration with Intel Tiber AI Cloud is also coming soon to dstack
.
What's next?
- Refer to Quickstart
- Check dev environments, tasks, services, and fleets
- Join Discord