Skip to content

fleet

The fleet configuration type allows creating and updating fleets.

Root reference

name - (Optional) The fleet name.
env - (Optional) The mapping or the list of environment variables.
ssh_config - (Optional) The parameters for adding instances via SSH.
nodes - (Optional) The number of instances.
placement - (Optional) The placement of instances: any or cluster.
reservation - (Optional) The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations and Capacity Blocks.
resources - (Optional) The resources requirements.
backends - (Optional) The backends to consider for provisioning (e.g., [aws, gcp]).
regions - (Optional) The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope]).
instance_types - (Optional) The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4]).
spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.
retry - (Optional) The policy for provisioning retry. Defaults to false.
max_price - (Optional) The maximum instance price per hour, in dollars.
idle_duration - (Optional) Time to wait before terminating idle instances. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration.
termination_policy - (Optional) Deprecated in favor of idle_duration.
termination_idle_time - (Optional) Deprecated in favor of idle_duration.

ssh_config

user - (Optional) The user to log in with on all hosts.
port - (Optional) The SSH port to connect to.
identity_file - (Optional) The private key to use for all hosts.
hosts - The per host connection parameters: a hostname or an object that overrides default ssh parameters.
network - (Optional) The network address for cluster setup in the format <ip>/<netmask>. dstack will use IP addresses from this network for communication between hosts. If not specified, dstack will use IPs from the first found internal network..

ssh_config.hosts[n]

hostname - The IP address or domain to connect to.
port - (Optional) The SSH port to connect to for this host.
user - (Optional) The user to log in with for this host.
identity_file - (Optional) The private key to use for this host.
internal_ip - (Optional) The internal IP of the host used for communication inside the cluster. If not specified, dstack will use the IP address from network or from the first found internal network..

resources

cpu - (Optional) The number of CPU cores. Defaults to 2...
memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...
shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.
gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object.
disk - (Optional) The disk resources.

resouces.gpu

vendor - (Optional) The vendor of the GPU/accelerator, one of: nvidia, amd, google (alias: tpu).
name - (Optional) The GPU name or list of names.
count - (Optional) The number of GPUs. Defaults to 1.
memory - (Optional) The RAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).
total_memory - (Optional) The total RAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).
compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).

resouces.disk

size - The disk size. Can be set to a range (e.g., 100GB.. or 100GB..200GB).

retry

on_events - The list of events that should be handled with retry. Supported events are no-capacity, interruption, and error.
duration - (Optional) The maximum period of retrying the run, e.g., 4h or 1d.