Axolotl¶
This example shows how to use Axolotl with dstack
to fine-tune 4-bit Quantized Llama-4-Scout-17B-16E
using SFT with FSDP and QLoRA.
Prerequisites
Once dstack
is installed, go ahead clone the repo, and run dstack init
.
$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init
Define a configuration¶
Axolotl reads the model, QLoRA, and dataset arguments, as well as trainer configuration from a scout-qlora-fsdp1.yaml
file. The configuration uses 4-bit axolotl quantized version of meta-llama/Llama-4-Scout-17B-16E
, requiring only ~43GB VRAM/GPU with 4K context length.
Below is a task configuration that does fine-tuning.
type: task
# The name is optional, if not specified, generated randomly
name: axolotl-nvidia-llama-scout-train
# Using the official Axolotl's Docker image
image: axolotlai/axolotl:main-latest
# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
- WANDB_PROJECT
- HUB_MODEL_ID
- DSTACK_RUN_NAME
# Commands of the task
commands:
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
- |
axolotl train scout-qlora-fsdp1.yaml \
--wandb-project $WANDB_PROJECT \
--wandb-name $DSTACK_RUN_NAME \
--hub-model-id $HUB_MODEL_ID
resources:
# Four GPU (required by FSDP)
gpu: H100:4
# Shared memory size for inter-process communication
shm_size: 64GB
disk: 500GB..
The task uses Axolotl's Docker image, where Axolotl is already pre-installed.
AMD
The example above uses NVIDIA accelerators. To use it with AMD, check out AMD.
Run the configuration¶
Once the configuration is ready, run dstack apply -f <configuration file>
, and dstack
will automatically provision the
cloud resources and run the configuration.
$ HF_TOKEN=...
$ WANDB_API_KEY=...
$ WANDB_PROJECT=...
$ HUB_MODEL_ID=...
$ dstack apply -f examples/single-node-training/axolotl/.dstack.yml
# BACKEND RESOURCES INSTANCE TYPE PRICE
1 vastai (cz-czechia) cpu=64 mem=128GB H100:80GB:2 18794506 $3.8907
2 vastai (us-texas) cpu=52 mem=64GB H100:80GB:2 20442365 $3.6926
3 vastai (fr-france) cpu=64 mem=96GB H100:80GB:2 20379984 $3.7389
Submit the run axolotl-nvidia-llama-scout-train? [y/n]:
Provisioning...
---> 100%
Source code¶
The source-code of this example can be found in
examples/single-node-training/axolotl
and examples/distributed-training/axolotl
.
What's next?¶
- Browse the Axolotl distributed training example
- Check dev environments, tasks, services, fleets
- See the AMD example