task
¶
The task
configuration type allows running tasks.
Root reference¶
nodes
- (Optional) Number of nodes. Defaults to 1
.¶
name
- (Optional) The run name. If not specified, a random name is generated.¶
image
- (Optional) The name of the Docker image to run.¶
user
- (Optional) The user inside the container, user_name_or_id[:group_name_or_id]
(e.g., ubuntu
, 1000:1000
). Defaults to the default image
user.¶
privileged
- (Optional) Run the container in privileged mode.¶
entrypoint
- (Optional) The Docker entrypoint.¶
working_dir
- (Optional) The path to the working directory inside the container. It's specified relative to the repository directory (/workflow
) and should be inside it. Defaults to "."
.¶
registry_auth
- (Optional) Credentials for pulling a private Docker image.¶
python
- (Optional) The major version of Python. Mutually exclusive with image
.¶
nvcc
- (Optional) Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image
.¶
env
- (Optional) The mapping or the list of environment variables.¶
resources
- (Optional) The resources requirements to run the configuration.¶
volumes
- (Optional) The volumes mount points.¶
ports
- (Optional) Port numbers/mapping to expose.¶
commands
- (Optional) The bash commands to run.¶
backends
- (Optional) The backends to consider for provisioning (e.g., [aws, gcp]
).¶
regions
- (Optional) The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope]
).¶
instance_types
- (Optional) The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4]
).¶
reservation
- (Optional) The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations and Capacity Blocks.¶
spot_policy
- (Optional) The policy for provisioning spot or on-demand instances: spot
, on-demand
, or auto
. Defaults to on-demand
.¶
retry
- (Optional) The policy for resubmitting the run. Defaults to false
.¶
max_duration
- (Optional) The maximum duration of a run (e.g., 2h
, 1d
, etc). After it elapses, the run is forced to stop. Defaults to off
.¶
max_price
- (Optional) The maximum instance price per hour, in dollars.¶
creation_policy
- (Optional) The policy for using instances from the pool. Defaults to reuse-or-create
.¶
idle_duration
- (Optional) Time to wait before terminating idle instances. Defaults to 5m
for runs and 3d
for fleets. Use off
for unlimited duration.¶
termination_policy
- (Optional) Deprecated in favor of idle_duration
.¶
termination_idle_time
- (Optional) Deprecated in favor of idle_duration
.¶
retry
¶
on_events
- The list of events that should be handled with retry. Supported events are no-capacity
, interruption
, and error
.¶
duration
- (Optional) The maximum period of retrying the run, e.g., 4h
or 1d
.¶
resources
¶
cpu
- (Optional) The number of CPU cores. Defaults to 2..
.¶
memory
- (Optional) The RAM size (e.g., 8GB
). Defaults to 8GB..
.¶
shm_size
- (Optional) The size of shared memory (e.g., 8GB
). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.¶
gpu
- (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100
, 80GB:2
, etc.), or an object.¶
disk
- (Optional) The disk resources.¶
resouces.gpu
¶
vendor
- (Optional) The vendor of the GPU/accelerator, one of: nvidia
, amd
, google
(alias: tpu
).¶
name
- (Optional) The GPU name or list of names.¶
count
- (Optional) The number of GPUs. Defaults to 1
.¶
memory
- (Optional) The RAM size (e.g., 16GB
). Can be set to a range (e.g. 16GB..
, or 16GB..80GB
).¶
total_memory
- (Optional) The total RAM size (e.g., 32GB
). Can be set to a range (e.g. 16GB..
, or 16GB..80GB
).¶
compute_capability
- (Optional) The minimum compute capability of the GPU (e.g., 7.5
).¶
resouces.disk
¶
size
- The disk size. Can be set to a range (e.g., 100GB..
or 100GB..200GB
).¶
registry_auth
¶
username
- The username.¶
password
- The password or access token.¶
volumes[n]
¶
Short syntax
The short syntax for volumes is a colon-separated string in the form of source:destination
volume-name:/container/path
for network volumes/instance/path:/container/path
for instance volumes