`task`¶

The task configuration type allows running tasks.

Root reference¶

`nodes` - (Optional) Number of nodes. Defaults to `1`.¶

`name` - (Optional) The run name. If not specified, a random name is generated.¶

`image` - (Optional) The name of the Docker image to run.¶

`user` - (Optional) The user inside the container, `user_name_or_id[:group_name_or_id]` (e.g., `ubuntu`, `1000:1000`). Defaults to the default `image` user.¶

`privileged` - (Optional) Run the container in privileged mode.¶

`entrypoint` - (Optional) The Docker entrypoint.¶

`working_dir` - (Optional) The path to the working directory inside the container. It's specified relative to the repository directory (`/workflow`) and should be inside it. Defaults to `"."` .¶

`registry_auth` - (Optional) Credentials for pulling a private Docker image.¶

`python` - (Optional) The major version of Python. Mutually exclusive with `image` and `docker`.¶

`nvcc` - (Optional) Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with `image` and `docker`.¶

`single_branch` - (Optional) Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to `false` for dev environments and to `true` for tasks and services.¶

`env` - (Optional) The mapping or the list of environment variables.¶

`shell` - (Optional) The shell used to run commands. Allowed values are `sh`, `bash`, or an absolute path, e.g., `/usr/bin/zsh`. Defaults to `/bin/sh` if the `image` is specified, `/bin/bash` otherwise.¶

`resources` - (Optional) The resources requirements to run the configuration.¶

`priority` - (Optional) The priority of the run, an integer between `0` and `100`. `dstack` tries to provision runs with higher priority first. Defaults to `0`.¶

`volumes` - (Optional) The volumes mount points.¶

`docker` - (Optional) Use Docker inside the container. Mutually exclusive with `image`, `python`, and `nvcc`. Overrides `privileged`.¶

`files` - (Optional) The local to container file path mappings.¶

`ports` - (Optional) Port numbers/mapping to expose.¶

`commands` - (Optional) The shell commands to run.¶

`backends` - (Optional) The backends to consider for provisioning (e.g., `[aws, gcp]`).¶

`regions` - (Optional) The regions to consider for provisioning (e.g., `[eu-west-1, us-west4, westeurope]`).¶

`availability_zones` - (Optional) The availability zones to consider for provisioning (e.g., `[eu-west-1a, us-west4-a]`).¶

`instance_types` - (Optional) The cloud-specific instance types to consider for provisioning (e.g., `[p3.8xlarge, n1-standard-4]`).¶

`reservation` - (Optional) The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations and Capacity Blocks.¶

`spot_policy` - (Optional) The policy for provisioning spot or on-demand instances: `spot`, `on-demand`, `auto`. Defaults to `on-demand`.¶

`retry` - (Optional) The policy for resubmitting the run. Defaults to `false`.¶

`max_duration` - (Optional) The maximum duration of a run (e.g., `2h`, `1d`, etc). After it elapses, the run is automatically stopped. Use `off` for unlimited duration. Defaults to `off`.¶

`stop_duration` - (Optional) The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use `off` for unlimited duration. Defaults to `5m`.¶

`max_price` - (Optional) The maximum instance price per hour, in dollars.¶

`creation_policy` - (Optional) The policy for using instances from fleets: `reuse`, `reuse-or-create`. Defaults to `reuse-or-create`.¶

`idle_duration` - (Optional) Time to wait before terminating idle instances. Defaults to `5m` for runs and `3d` for fleets. Use `off` for unlimited duration.¶

`utilization_policy` - (Optional) Run termination policy based on utilization.¶

`startup_order` - (Optional) The order in which master and workers jobs are started: `any`, `master-first`, `workers-first`. Defaults to `any`.¶

`stop_criteria` - (Optional) The criteria determining when a multi-node run should be considered finished: `all-done`, `master-done`. Defaults to `all-done`.¶

`fleets` - (Optional) The fleets considered for reuse.¶

`tags` - (Optional) The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them.¶

`retry`¶

`on_events` - (Optional) The list of events that should be handled with retry. Supported events are `no-capacity`, `interruption`, `error`. Omit to retry on all events.¶

`duration` - (Optional) The maximum period of retrying the run, e.g., `4h` or `1d`.¶

`utilization_policy`¶

`min_gpu_utilization` - Minimum required GPU utilization, percent. If any GPU has utilization below specified value during the whole time window, the run is terminated.¶

`time_window` - The time window of metric samples taking into account to measure utilization (e.g., `30m`, `1h`). Minimum is `5m`.¶

`resources`¶

`cpu` - (Optional) The CPU requirements.¶

`memory` - (Optional) The RAM size (e.g., `8GB`). Defaults to `8GB..`.¶

`shm_size` - (Optional) The size of shared memory (e.g., `8GB`). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.¶

`gpu` - (Optional) The GPU requirements.¶

`disk` - (Optional) The disk resources.¶

`resources.cpu`¶

`arch` - (Optional) The CPU architecture, one of: `x86`, `arm`.¶

`count` - (Optional) The number of CPU cores. Defaults to `2..`.¶

`resources.gpu`¶

`vendor` - (Optional) The vendor of the GPU/accelerator, one of: `nvidia`, `amd`, `google` (alias: `tpu`), `intel`.¶

`name` - (Optional) The name of the GPU (e.g., `A100` or `H100`).¶

`count` - (Optional) The number of GPUs. Defaults to `1..`.¶

`memory` - (Optional) The RAM size (e.g., `16GB`). Can be set to a range (e.g. `16GB..`, or `16GB..80GB`).¶

`total_memory` - (Optional) The total RAM size (e.g., `32GB`). Can be set to a range (e.g. `16GB..`, or `16GB..80GB`).¶

`compute_capability` - (Optional) The minimum compute capability of the GPU (e.g., `7.5`).¶

`resources.disk`¶

`size` - Disk size.¶

`registry_auth`¶

`username` - The username.¶

`password` - The password or access token.¶

`volumes[n]`¶

Network volumesInstance volumes

`name` - The network volume name or the list of network volume names to mount. If a list is specified, one of the volumes in the list will be mounted. Specify volumes from different backends/regions to increase availability.¶

`path` - The absolute container path to mount the volume at.¶

`instance_path` - The absolute path on the instance (host).¶

`path` - The absolute path in the container.¶

`optional` - (Optional) Allow running without this volume in backends that do not support instance volumes.¶

Short syntax

The short syntax for volumes is a colon-separated string in the form of source:destination

volume-name:/container/path for network volumes
/instance/path:/container/path for instance volumes

`files[n]`¶

`local_path` - The path on the user's machine. Relative paths are resolved relative to the parent directory of the the configuration file.¶

`path` - The path in the container. Relative paths are resolved relative to the repo directory (`/workflow`).¶

Short syntax

The short syntax for files is a colon-separated string in the form of local_path[:path] where path is optional and can be omitted if it's equal to local_path.

~/.bashrc, same as ~/.bashrc:~/.bashrc
/opt/myorg, same as /opt/myorg/ and /opt/myorg:/opt/myorg
libs/patched_libibverbs.so.1:/lib/x86_64-linux-gnu/libibverbs.so.1

task¶

Root reference¶

nodes - (Optional) Number of nodes. Defaults to 1.¶

name - (Optional) The run name. If not specified, a random name is generated.¶

image - (Optional) The name of the Docker image to run.¶

user - (Optional) The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default image user.¶

privileged - (Optional) Run the container in privileged mode.¶

entrypoint - (Optional) The Docker entrypoint.¶

working_dir - (Optional) The path to the working directory inside the container. It's specified relative to the repository directory (/workflow) and should be inside it. Defaults to "." .¶

registry_auth - (Optional) Credentials for pulling a private Docker image.¶

python - (Optional) The major version of Python. Mutually exclusive with image and docker.¶

nvcc - (Optional) Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker.¶

single_branch - (Optional) Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services.¶

env - (Optional) The mapping or the list of environment variables.¶

shell - (Optional) The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise.¶

resources - (Optional) The resources requirements to run the configuration.¶

priority - (Optional) The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0.¶

volumes - (Optional) The volumes mount points.¶

docker - (Optional) Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged.¶

files - (Optional) The local to container file path mappings.¶

ports - (Optional) Port numbers/mapping to expose.¶

commands - (Optional) The shell commands to run.¶

backends - (Optional) The backends to consider for provisioning (e.g., [aws, gcp]).¶

regions - (Optional) The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope]).¶

availability_zones - (Optional) The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a]).¶

instance_types - (Optional) The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4]).¶

reservation - (Optional) The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations and Capacity Blocks.¶

spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand.¶

retry - (Optional) The policy for resubmitting the run. Defaults to false.¶

max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off.¶

stop_duration - (Optional) The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m.¶

max_price - (Optional) The maximum instance price per hour, in dollars.¶

creation_policy - (Optional) The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create.¶

idle_duration - (Optional) Time to wait before terminating idle instances. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration.¶

utilization_policy - (Optional) Run termination policy based on utilization.¶

startup_order - (Optional) The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any.¶

stop_criteria - (Optional) The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done.¶

fleets - (Optional) The fleets considered for reuse.¶

tags - (Optional) The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them.¶

retry¶

on_events - (Optional) The list of events that should be handled with retry. Supported events are no-capacity, interruption, error. Omit to retry on all events.¶

duration - (Optional) The maximum period of retrying the run, e.g., 4h or 1d.¶

utilization_policy¶

min_gpu_utilization - Minimum required GPU utilization, percent. If any GPU has utilization below specified value during the whole time window, the run is terminated.¶

time_window - The time window of metric samples taking into account to measure utilization (e.g., 30m, 1h). Minimum is 5m.¶

resources¶

cpu - (Optional) The CPU requirements.¶

memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...¶

shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.¶

gpu - (Optional) The GPU requirements.¶

disk - (Optional) The disk resources.¶

resources.cpu¶

arch - (Optional) The CPU architecture, one of: x86, arm.¶

count - (Optional) The number of CPU cores. Defaults to 2...¶

resources.gpu¶

vendor - (Optional) The vendor of the GPU/accelerator, one of: nvidia, amd, google (alias: tpu), intel.¶

name - (Optional) The name of the GPU (e.g., A100 or H100).¶

count - (Optional) The number of GPUs. Defaults to 1...¶

memory - (Optional) The RAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).¶

total_memory - (Optional) The total RAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).¶

compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).¶

resources.disk¶

size - Disk size.¶

registry_auth¶

username - The username.¶

password - The password or access token.¶

volumes[n]¶

name - The network volume name or the list of network volume names to mount. If a list is specified, one of the volumes in the list will be mounted. Specify volumes from different backends/regions to increase availability.¶

path - The absolute container path to mount the volume at.¶

instance_path - The absolute path on the instance (host).¶

path - The absolute path in the container.¶

optional - (Optional) Allow running without this volume in backends that do not support instance volumes.¶

files[n]¶

local_path - The path on the user's machine. Relative paths are resolved relative to the parent directory of the the configuration file.¶

path - The path in the container. Relative paths are resolved relative to the repo directory (/workflow).¶