Examples
Single-node training¶
TRL
Fine-tune Llama 3.1 8B on a custom dataset using TRL.
Axolotl
Fine-tune Llama 4 on a custom dataset using Axolotl.
Distributed training¶
TRL
Fine-tune LLM on multiple nodes with TRL, Accelerate, and Deepspeed.
Axolotl
Fine-tune LLM on multiple nodes with Axolotl.
Ray+RAGEN
Fine-tune an agent on multiple nodes with RAGEN, verl, and Ray.
Clusters¶
NCCL tests
Run multi-node NCCL tests with MPI
RCCL tests
Run multi-node RCCL tests with MPI
A3 Mega
Set up GCP A3 Mega clusters with optimized networking
A3 High
Set up GCP A3 High clusters with optimized networking
Inference¶
SGLang
Deploy DeepSeek distilled models with SGLang
vLLM
Deploy Llama 3.1 with vLLM
TGI
Deploy Llama 4 with TGI
NIM
Deploy a DeepSeek distilled model with NIM
TensorRT-LLM
Deploy DeepSeek models with TensorRT-LLM
Accelerators¶
AMD
Deploy and fine-tune LLMs on AMD
TPU
Deploy and fine-tune LLMs on TPU
Intel Gaudi
Deploy and fine-tune LLMs on Intel Gaudi
Tenstorrent
Deploy and fine-tune LLMs on Tenstorrent