Skip to content

Tasks

A task allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.

Define a configuration

First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or train.dstack.yml are both acceptable).

type: task
# The name is optional, if not specified, generated randomly
name: axolotl-train

# Using the official Axolotl's Docker image
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1

# Required environment variables
env:
  - HF_TOKEN
  - WANDB_API_KEY
# Commands of the task
commands:
  - accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml

resources:
  gpu:
    # 24GB or more vRAM
    memory: 24GB..
    # Two or more GPU
    count: 2..

If you don't specify your Docker image, dstack uses the base image (pre-configured with Python, Conda, and essential CUDA drivers).

Distributed tasks

By default, tasks run on a single instance. However, you can specify the number of nodes. In this case, the task will run a cluster of instances.

Reference

See .dstack.yml for all the options supported by tasks, along with multiple examples.

Run a configuration

To run a configuration, use the dstack apply command.

$ HF_TOKEN=...
$ WANDB_API_KEY=...
$ dstack apply -f examples/.dstack.yml

 #  BACKEND  REGION    RESOURCES                    SPOT  PRICE
 1  runpod   CA-MTL-1  18xCPU, 100GB, A5000:24GB:2  yes   $0.22
 2  runpod   EU-SE-1   18xCPU, 100GB, A5000:24GB:2  yes   $0.22
 3  gcp      us-west4  27xCPU, 150GB, A5000:24GB:3  yes   $0.33

Submit the run axolotl-train? [y/n]: y

Launching `axolotl-train`...
---> 100%

{'loss': 1.4967, 'grad_norm': 1.2734375, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}
  0% 1/24680 [00:13<95:34:17, 13.94s/it]
  6% 73/1300 [00:48<13:57,  1.47it/s]

dstack apply automatically uploads the code from the current repo, including your local uncommitted changes. To avoid uploading large files, ensure they are listed in .gitignore.

Ports

If the task specifies ports, dstack run automatically forwards them to your local machine for convenient and secure access.

Queueing tasks

By default, if dstack apply cannot find capacity, the task fails. To queue the task and wait for capacity, specify the retry property in the task configuration.

Manage runs

List runs

The dstack ps command lists all running jobs and their statuses. Use --watch (or -w) to monitor the live status of runs.

Stop a run

Once the run exceeds the max_duration, or when you use dstack stop, the dev environment is stopped. Use --abort or -x to stop the run abruptly.

Manage fleets

Creation policy

By default, when you run dstack apply with a dev environment, task, or service, dstack reuses idle instances from an existing fleet. If no idle instances matching the requirements, it automatically creates a new fleet using backends.

To ensure dstack apply doesn't create a new fleet but reuses an existing one, pass -R (or --reuse) to dstack apply.

$ dstack apply -R -f examples/.dstack.yml

Alternatively, set creation_policy to reuse in the run configuration.

Termination policy

If a fleet is created automatically, it remains idle for 5 minutes and can be reused within that time. To change the default idle duration, set termination_idle_time in the run configuration (e.g., to 0 or a longer duration).

Fleets

For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use fleets directly.

What's next?

  1. Check the Axolotl example
  2. Browse all examples
  3. See fleets on how to manage fleets

Reference

See .dstack.yml for all the options supported by tasks, along with multiple examples.