Tasks¶
A task allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.
Define a configuration¶
First, create a YAML file in your project folder. Its name must end with .dstack.yml
(e.g. .dstack.yml
or train.dstack.yml
are both acceptable).
type: task
# The name is optional, if not specified, generated randomly
name: axolotl-train
# Using the official Axolotl's Docker image
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
# Commands of the task
commands:
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
resources:
gpu:
# 24GB or more vRAM
memory: 24GB..
# Two or more GPU
count: 2..
If you don't specify your Docker image, dstack
uses the base image
(pre-configured with Python, Conda, and essential CUDA drivers).
Distributed tasks
By default, tasks run on a single instance. However, you can specify the number of nodes. In this case, the task will run a cluster of instances.
Reference
See .dstack.yml for all the options supported by tasks, along with multiple examples.
Run a configuration¶
To run a configuration, use the dstack apply
command.
$ HF_TOKEN=...
$ WANDB_API_KEY=...
$ dstack apply -f examples/.dstack.yml
# BACKEND REGION RESOURCES SPOT PRICE
1 runpod CA-MTL-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22
2 runpod EU-SE-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22
3 gcp us-west4 27xCPU, 150GB, A5000:24GB:3 yes $0.33
Submit the run axolotl-train? [y/n]: y
Launching `axolotl-train`...
---> 100%
{'loss': 1.4967, 'grad_norm': 1.2734375, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}
0% 1/24680 [00:13<95:34:17, 13.94s/it]
6% 73/1300 [00:48<13:57, 1.47it/s]
dstack apply
automatically provisions instances, uploads the contents of the repo (incl. your local uncommitted changes),
and runs the configuration.
Ports
If the task specifies ports
, dstack run
automatically forwards them to your
local machine for convenient and secure access.
Queueing tasks
By default, if dstack apply
cannot find capacity, the task fails.
To queue the task and wait for capacity, specify the retry
property in the task configuration.
Manage runs¶
List runs¶
The dstack ps
command lists all running jobs and their statuses.
Use --watch
(or -w
) to monitor the live status of runs.
Stop a run¶
Once the run exceeds the max_duration
, or when you use dstack stop
,
the dev environment is stopped. Use --abort
or -x
to stop the run abruptly.
Manage fleets¶
Creation policy¶
By default, when you run dstack apply
with a dev environment, task, or service,
dstack
reuses idle
instances from an existing fleet.
If no idle
instances matching the requirements, it automatically creates a new fleet
using backends.
To ensure dstack apply
doesn't create a new fleet but reuses an existing one,
pass -R
(or --reuse
) to dstack apply
.
$ dstack apply -R -f examples/.dstack.yml
Alternatively, set creation_policy
to reuse
in the run configuration.
Termination policy¶
If a fleet is created automatically, it remains idle
for 5 minutes and can be reused within that time.
To change the default idle duration, set
termination_idle_time
in the run configuration (e.g., to 0 or a
longer duration).
Fleets
For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use fleets directly.
What's next?¶
- Read about dev environments, services, and repos
- Learn how to manage fleets
- Check the Axolotl example
Reference
See .dstack.yml for all the options supported by tasks, along with multiple examples.