fleet
¶
The fleet
configuration type allows creating and updating fleets.
fleet
¶The fleet
configuration type allows creating and updating fleets.
name
- (Optional) The fleet name.¶env
- (Optional) The mapping or the list of environment variables.¶ssh_config
- (Optional) The parameters for adding instances via SSH.¶nodes
- (Optional) The number of instances.¶placement
- (Optional) The placement of instances: any
or cluster
.¶reservation
- (Optional) The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations and Capacity Blocks.¶resources
- (Optional) The resources requirements.¶blocks
- (Optional) The amount of blocks to split the instance into, a number or auto
. auto
means as many as possible. The number of GPUs and CPUs must be divisible by the number of blocks. Defaults to 1
, i.e. do not split. Defaults to 1
.¶backends
- (Optional) The backends to consider for provisioning (e.g., [aws, gcp]
).¶regions
- (Optional) The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope]
).¶availability_zones
- (Optional) The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a]
).¶instance_types
- (Optional) The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4]
).¶spot_policy
- (Optional) The policy for provisioning spot or on-demand instances: spot
, on-demand
, or auto
.¶retry
- (Optional) The policy for provisioning retry. Defaults to false
.¶max_price
- (Optional) The maximum instance price per hour, in dollars.¶idle_duration
- (Optional) Time to wait before terminating idle instances. Defaults to 5m
for runs and 3d
for fleets. Use off
for unlimited duration.¶termination_policy
- (Optional) Deprecated in favor of idle_duration
.¶termination_idle_time
- (Optional) Deprecated in favor of idle_duration
.¶ssh_config
¶user
- (Optional) The user to log in with on all hosts.¶port
- (Optional) The SSH port to connect to.¶identity_file
- (Optional) The private key to use for all hosts.¶proxy_jump
- (Optional) The SSH proxy configuration for all hosts.¶hosts
- The per host connection parameters: a hostname or an object that overrides default ssh parameters.¶network
- (Optional) The network address for cluster setup in the format <ip>/<netmask>
. dstack
will use IP addresses from this network for communication between hosts. If not specified, dstack
will use IPs from the first found internal network..¶ssh_config.proxy_jump
¶hostname
- The IP address or domain of proxy host.¶port
- (Optional) The SSH port of proxy host.¶user
- The user to log in with for proxy host.¶identity_file
- The private key to use for proxy host.¶ssh_config.hosts[n]
¶hostname
- The IP address or domain to connect to.¶port
- (Optional) The SSH port to connect to for this host.¶user
- (Optional) The user to log in with for this host.¶identity_file
- (Optional) The private key to use for this host.¶proxy_jump
- (Optional) The SSH proxy configuration for this host.¶internal_ip
- (Optional) The internal IP of the host used for communication inside the cluster. If not specified, dstack
will use the IP address from network
or from the first found internal network..¶blocks
- (Optional) The amount of blocks to split the instance into, a number or auto
. auto
means as many as possible. The number of GPUs and CPUs must be divisible by the number of blocks. Defaults to 1
, i.e. do not split. Defaults to 1
.¶ssh_config.hosts[n].proxy_jump
¶hostname
- The IP address or domain of proxy host.¶port
- (Optional) The SSH port of proxy host.¶user
- The user to log in with for proxy host.¶identity_file
- The private key to use for proxy host.¶resources
¶cpu
- (Optional) The number of CPU cores. Defaults to 2..
.¶memory
- (Optional) The RAM size (e.g., 8GB
). Defaults to 8GB..
.¶shm_size
- (Optional) The size of shared memory (e.g., 8GB
). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.¶gpu
- (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100
, 80GB:2
, etc.), or an object.¶disk
- (Optional) The disk resources.¶resouces.gpu
¶vendor
- (Optional) The vendor of the GPU/accelerator, one of: nvidia
, amd
, google
(alias: tpu
), intel
.¶name
- (Optional) The GPU name or list of names.¶count
- (Optional) The number of GPUs. Defaults to 1
.¶memory
- (Optional) The RAM size (e.g., 16GB
). Can be set to a range (e.g. 16GB..
, or 16GB..80GB
).¶total_memory
- (Optional) The total RAM size (e.g., 32GB
). Can be set to a range (e.g. 16GB..
, or 16GB..80GB
).¶compute_capability
- (Optional) The minimum compute capability of the GPU (e.g., 7.5
).¶resouces.disk
¶size
- The disk size. Can be set to a range (e.g., 100GB..
or 100GB..200GB
).¶retry
¶on_events
- The list of events that should be handled with retry. Supported events are no-capacity
, interruption
, and error
.¶duration
- (Optional) The maximum period of retrying the run, e.g., 4h
or 1d
.¶