fleet
The fleet
configuration type allows creating and updating fleets.
Configuration files must be inside the project repo, and their names must end with .dstack.yml
(e.g. .dstack.yml
or fleet.dstack.yml
are both acceptable).
Any configuration can be run via dstack apply
.
Examples
Cloud
type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet
# The number of instances
nodes: 4
# Ensure the instances are interconnected
placement: cluster
# Use either spot or on-demand instances
spot_policy: auto
resources:
gpu:
# 24GB or more vRAM
memory: 24GB..
# One or more GPU
count: 1..
SSH
type: fleet
# The name is optional, if not specified, generated randomly
name: my-ssh-fleet
# Ensure instances are interconnected
placement: cluster
# The user, private SSH key, and hostnames of the on-prem servers
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
hosts:
- 3.255.177.51
- 3.255.177.52
Root reference
name
- (Optional) The fleet name.
env
- (Optional) The mapping or the list of environment variables.
ssh_config
- (Optional) The parameters for adding instances via SSH.
nodes
- (Optional) The number of instances.
placement
- (Optional) The placement of instances: any
or cluster
.
resources
- (Optional) The resources requirements.
backends
- (Optional) The backends to consider for provisioning (e.g., [aws, gcp]
).
regions
- (Optional) The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope]
).
instance_types
- (Optional) The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4]
).
spot_policy
- (Optional) The policy for provisioning spot or on-demand instances: spot
, on-demand
, or auto
.
retry
- (Optional) The policy for provisioning retry. Defaults to false
.
max_price
- (Optional) The maximum instance price per hour, in dollars.
termination_policy
- (Optional) The policy for instance termination. Defaults to destroy-after-idle
.
termination_idle_time
- (Optional) Time to wait before destroying idle instances. Defaults to 3d
.
ssh_config
user
- (Optional) The user to log in with on all hosts.
port
- (Optional) The SSH port to connect to.
identity_file
- (Optional) The private key to use for all hosts.
hosts
- The per host connection parameters: a hostname or an object that overrides default ssh parameters.
network
- (Optional) The network address for cluster setup in the format <ip>/<netmask>
.
ssh_config.hosts[n]
hostname
- The IP address or domain to connect to.
port
- (Optional) The SSH port to connect to for this host.
user
- (Optional) The user to log in with for this host.
identity_file
- (Optional) The private key to use for this host.
resources
cpu
- (Optional) The number of CPU cores. Defaults to 2..
.
memory
- (Optional) The RAM size (e.g., 8GB
). Defaults to 8GB..
.
shm_size
- (Optional) The size of shared memory (e.g., 8GB
). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.
gpu
- (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100
, 80GB:2
, etc.), or an object.
disk
- (Optional) The disk resources.
resouces.gpu
name
- (Optional) The GPU name or list of names.
count
- (Optional) The number of GPUs. Defaults to 1
.
memory
- (Optional) The RAM size (e.g., 16GB
). Can be set to a range (e.g. 16GB..
, or 16GB..80GB
).
total_memory
- (Optional) The total RAM size (e.g., 32GB
). Can be set to a range (e.g. 16GB..
, or 16GB..80GB
).
compute_capability
- (Optional) The minimum compute capability of the GPU (e.g., 7.5
).
resouces.disk
size
- The disk size. Can be a string (e.g., 100GB
or 100GB..
) or an object.
retry
on_events
- The list of events that should be handled with retry. Supported events are no-capacity
, interruption
, and error
.
duration
- (Optional) The maximum period of retrying the run, e.g., 4h
or 1d
.