Alignment Handbook¶

This example shows how use Alignment Handbook with dstack to fine-tune Gemma 7B on your SFT dataset using one node or multiple nodes.

Prerequisites

Once dstack is installed, go ahead clone the repo, and run dstack init.

$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init

Training configuration recipe¶

Alignment Handbook's training script reads the model, LoRA, and dataset arguments, as well as trainer configuration from a YAML file. This file can be found at examples/fine-tuning/alignment-handbook/config.yaml. You can modify it as needed.

Before you proceed with training, make sure to update the hub_model_id in examples/fine-tuning/alignment-handbook/config.yaml with your HuggingFace username.

Single-node training¶

The easiest way to run a training script with dstack is by creating a task configuration file. This file can be found at examples/fine-tuning/alignment-handbook/train.dstack.yml . Below is its content:

type: task
name: ah-train

# If `image` is not specified, dstack uses its default image
python: "3.10"
# Ensure nvcc is installed (req. for Flash Attention) 
nvcc: true

# Required environment variables
env:
  - HUGGING_FACE_HUB_TOKEN
  - ACCELERATE_LOG_LEVEL=info
  - WANDB_API_KEY
# Commands of the task
commands:
  - git clone https://github.com/huggingface/alignment-handbook.git
  - cd alignment-handbook
  - pip install .
  - pip install flash-attn --no-build-isolation
  - pip install wandb
  - accelerate launch
    --config_file recipes/accelerate_configs/multi_gpu.yaml
    --num_processes=$DSTACK_GPUS_NUM
    scripts/run_sft.py
    ../examples/fine-tuning/alignment-handbook/config.yaml
# Uncomment to access TensorBoard
#ports:
#  - 6006

resources:
  # Required resources
  gpu: 24GB

The task clones Alignment Handbook's repo, installs the dependencies, and runs the script.

The DSTACK_GPUS_NUM environment variable is automatically passed to the container according to the resoruce property.

To run the task, use dstack apply:

$ HUGGING_FACE_HUB_TOKEN=...
$ WANDB_API_KEY=...

$ dstack apply -f examples/fine-tuning/alignment-handbook/train.dstack.yml

If you list tensorbord via report_to in examples/fine-tuning/alignment-handbook/config.yaml, you'll be able to access experiment metrics via http://localhost:6006 (while the task is running).

Multi-node training¶

The multi-node training task configuration file can be found at examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml. Below is its content:

type: task
name: ah-train-distrib

# If `image` is not specified, dstack uses its default image
python: "3.10"
# Ensure nvcc is installed (req. for Flash Attention) 
nvcc: true

# The size of cluster
nodes: 2

# Required environment variables
env:
  - HUGGING_FACE_HUB_TOKEN
  - ACCELERATE_LOG_LEVEL=info
  - WANDB_API_KEY
# Commands of the task (dstack runs it on each node)
commands:
  - git clone https://github.com/huggingface/alignment-handbook.git
  - cd alignment-handbook
  - pip install .
  - pip install flash-attn --no-build-isolation
  - pip install wandb
  - accelerate launch
    --config_file ../examples/fine-tuning/alignment-handbook/fsdp_qlora_full_shard.yaml
    --main_process_ip=$DSTACK_MASTER_NODE_IP
    --main_process_port=8008
    --machine_rank=$DSTACK_NODE_RANK
    --num_processes=$DSTACK_GPUS_NUM
    --num_machines=$DSTACK_NODES_NUM
    scripts/run_sft.py 
    ../examples/fine-tuning/alignment-handbook/config.yaml
# Expose 6006 to access TensorBoard
ports:
  - 6006

resources:
  # Required resources
  gpu: 24GB
  # Shared memory size for inter-process communication
  shm_size: 24GB

Here's how the multi-node task is different from the single-node one:

The nodes property is specified with a number of required nodes (should match the fleet's nodes number).
Under resoruces, shm_size is specified with the shared memory size used for the communication of parallel processes within a node (in case multiple GPUs per node are used).
Instead of Alignment Handbook's recipes/accelerate_configs/multi_gpu.yaml, we use examples/fine-tuning/alignment-handbook/fsdp_qlora_full_shard.yaml as an accelerate config.
We use DSTACK_MASTER_NODE_IP, DSTACK_NODE_RANK, DSTACK_GPUS_NUM, and DSTACK_NODES_NUM environment variables to configure accelerate. The environment variables are automatically passed to the container for each node based on the task configuration.

Fleets¶

By default, dstack run reuses idle instances from one of the existing fleets. If no idle instances meet the requirements, it creates a new fleet using one of the configured backends.

The example folder includes two cloud fleet configurations: examples/fine-tuning/alignment-handbook/fleet.dstack.yml (a single node with a 24GB GPU), and a examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml (a cluster of two nodes each with a 24GB GPU).

You can update the fleet configurations to change the vRAM size, GPU model, number of GPUs per node, or number of nodes.

A fleet can be provisioned with dstack apply:

$ dstack apply -f examples/fine-tuning/alignment-handbook/fleet.dstack.yml

Once provisioned, the fleet can run dev environments and fine-tuning tasks. To delete the fleet, use dstack fleet delete.

To ensure dstack apply always reuses an existing fleet, pass --reuse to dstack apply (or set creation_policy to reuse in the task configuration). The default policy is reuse_or_create.

Dev environment¶

If you'd like to play with the example using a dev environment, run .dstack.yml via dstack apply:

dstack apply -f examples/fine-tuning/alignment-handbook/.dstack.yaml

Source code¶

The source-code of this example can be found in examples/fine-tuning/alignment-handbook .

What's next?¶

Check dev environments, tasks, services, and fleets.
Browse Alignment Handbook .
See other examples.