Skip to content

Fleets

Fleets act both as pools of instances and as templates for how those instances are provisioned.

dstack supports two kinds of fleets:

Standard fleets

When you run dstack apply to start a dev environment, task, or service, dstack will reuse idle instances
from an existing fleet whenever available.

If no fleet meets the requirements or has idle capacity, dstack can create a new fleet on the fly.
However, it’s generally better to define fleets explicitly in configuration files for greater control.

Apply a configuration

Define a fleet configuration as a YAML file in your project directory. The file must have a .dstack.yml extension (e.g. .dstack.yml or fleet.dstack.yml).

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

# Can be a range or a fixed number
nodes: 2
# Uncomment to ensure instances are inter-connected
#placement: cluster

resources:
  gpu: 24GB

To create or update the fleet, pass the fleet configuration to dstack apply:

$ dstack apply -f examples/misc/fleets/.dstack.yml

Provisioning...
---> 100%

 FLEET     INSTANCE  BACKEND              GPU             PRICE    STATUS  CREATED 
 my-fleet  0         gcp (europe-west-1)  L4:24GB (spot)  $0.1624  idle    3 mins ago      
           1         gcp (europe-west-1)  L4:24GB (spot)  $0.1624  idle    3 mins ago    

Once the status of instances changes to idle, they can be used by dev environments, tasks, and services.

Container-based backends

Container-based backends don’t support pre-provisioning, so nodes can only be set to a range starting with 0.

This means instances are created only when a run starts, and once it finishes, they’re terminated and released back to the provider (either a cloud service or Kubernetes).

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

# Specify the number of instances
nodes: 0..2
# Uncomment to ensure instances are inter-connected
#placement: cluster

resources:
  gpu: 24GB

Configuration options

Nodes

The nodes property controls how many instances to provision and maintain in the fleet:

type: fleet

name: my-fleet

nodes:
  min: 1 # Always maintain at least 1 idle instance. Can be 0.
  target: 2 # (Optional) Provision 2 instances initially
  max: 3 # (Optional) Do not allow more than 3 instances

dstack ensures the fleet always has at least nodes.min instances, creating new instances in the background if necessary. If you don't need to keep instances in the fleet forever, you can set nodes.min to 0. By default, dstack apply also provisions nodes.min instances. The nodes.target property allows provisioning more instances initially than needs to be maintained.

Placement

To ensure instances are interconnected (e.g., for distributed tasks), set placement to cluster. This ensures all instances are provisioned with optimal inter-node connectivity.

AWS

When you create a fleet with AWS, Elastic Fabric Adapter networking is automatically configured if it’s supported for the corresponding instance type. Note, EFA requires the public_ips to be set to false in the aws backend configuration. Otherwise, instances are only connected by the default VPC subnet.

Refer to the EFA example for more details.

GCP

When you create a fleet with GCP, dstack automatically configures GPUDirect-TCPXO and GPUDirect-TCPX networking for the A3 Mega and A3 High instance types, as well as RoCE networking for the A4 instance type.

Backend configuration

You may need to configure extra_vpcs and roce_vpcs in the gcp backend configuration. Refer to the A4, A3 Mega, and A3 High examples for more details.

Nebius

When you create a fleet with Nebius, InfiniBand networking is automatically configured if it’s supported for the corresponding instance type. Otherwise, instances are only connected by the default VPC subnet.

An InfiniBand fabric for the cluster is selected automatically. If you prefer to use some specific fabrics, configure them in the backend settings.

The cluster placement is supported for aws, azure, gcp, nebius, oci, and vultr backends.

For more details on optimal inter-node connectivity, read the Clusters guide.

Resources

When you specify a resource value like cpu or memory, you can either use an exact value (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

nodes: 2

resources:
  # 200GB or more RAM
  memory: 200GB..
  # 4 GPUs from 40GB to 80GB
  gpu: 40GB..80GB:4
  # Disk size
  disk: 500GB

The gpu property allows specifying not only memory size but also GPU vendor, names and their quantity. Examples: nvidia (one NVIDIA GPU), A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).

Google Cloud TPU

To use TPUs, specify its architecture via the gpu property.

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

nodes: 2

resources:
  gpu: v2-8

Currently, only 8 TPU cores can be specified, supporting single TPU device workloads. Multi-TPU support is coming soon.

If you’re unsure which offers (hardware configurations) are available from the configured backends, use the dstack offer command to list them.

Blocks

For standard fleets, blocks function the same way as in SSH fleets. See the Blocks section under SSH fleets for details on the blocks concept.

type: fleet

name: my-fleet

resources:
  gpu: NVIDIA:80GB:8

# Split into 4 blocks, each with 2 GPUs
blocks: 4

Idle duration

By default, fleet instances stay idle for 3 days and can be reused within that time. If the fleet is not reused within this period, it is automatically terminated.

To change the default idle duration, set idle_duration in the run configuration (e.g., 0s, 1m, or off for unlimited).

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

nodes: 2

# Terminate instances idle for more than 1 hour
idle_duration: 1h

resources:
  gpu: 24GB

Spot policy

By default, dstack uses on-demand instances. However, you can change that via the spot_policy property. It accepts spot, on-demand, and auto.

Retry policy

By default, if dstack fails to provision an instance or an instance is interrupted, no retry is attempted.

If you'd like dstack to do it, configure the retry property accordingly:

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

nodes: 1

resources:
  gpu: 24GB

retry:
  # Retry on specific events
  on_events: [no-capacity, interruption]
  # Retry for up to 1 hour
  duration: 1h

Reference

Standard fleets support many more configuration options, incl. backends, regions, max_price, and among others.

SSH fleets

If you have a group of on-prem servers accessible via SSH, you can create an SSH fleet.

Apply a configuration

Define a fleet configuration as a YAML file in your project directory. The file must have a .dstack.yml extension (e.g. .dstack.yml or fleet.dstack.yml).

type: fleet
# The name is optional, if not specified, generated randomly
name: my-fleet

# Uncomment if instances are interconnected
#placement: cluster

# SSH credentials for the on-prem servers
ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52
Requirements

1. Hosts must be pre-installed with Docker.

2. Hosts with NVIDIA GPUs must also be pre-installed with CUDA 12.1 and NVIDIA Container Toolkit .

2. Hosts with AMD GPUs must also be pre-installed with AMDGPU-DKMS kernel driver (e.g. via native package manager or AMDGPU installer .)

2. Hosts with Intel Gaudi accelerators must be pre-installed with Gaudi software and drivers. This must include the drivers, hl-smi, and Habana Container Runtime.

2. Hosts with Tenstorrent accelerators must be pre-installed with Tenstorrent software. This must include the drivers, tt-smi, and HugePages.

3. The user specified must have passwordless sudo access.

4. The SSH server must be running and configured with AllowTcpForwarding yes in /etc/ssh/sshd_config.

5. The firewall must allow SSH and should forbid any other connections from external networks. For placement: cluster fleets, it should also allow any communication between fleet nodes.

To create or update the fleet, pass the fleet configuration to dstack apply:

$ dstack apply -f examples/misc/fleets/.dstack.yml

Provisioning...
---> 100%

 FLEET     INSTANCE  GPU             PRICE  STATUS  CREATED 
 my-fleet  0         L4:24GB (spot)  $0     idle    3 mins ago      
           1         L4:24GB (spot)  $0     idle    3 mins ago    

When you apply, dstack connects to the specified hosts using the provided SSH credentials, installs the dependencies, and configures these hosts as a fleet.

Once the status of instances changes to idle, they can be used by dev environments, tasks, and services.

Configuration options

Placement

If the hosts are interconnected (i.e. share the same network), set placement to cluster. This is required if you'd like to use the fleet for distributed tasks.

Network

By default, dstack automatically detects the network shared by the hosts. However, it's possible to configure it explicitly via the network property.

For more details on optimal inter-node connectivity, read the Clusters guide.

Blocks

By default, a job uses the entire instance—e.g., all 8 GPUs. To allow multiple jobs on the same instance, set the blocks property to divide the instance. Each job can then use one or more blocks, up to the full instance.

type: fleet
name: my-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - hostname: 3.255.177.51
      blocks: 4
    - hostname: 3.255.177.52
      # As many as possible, according to numbers of GPUs and CPUs
      blocks: auto
    - hostname: 3.255.177.53
      # Do not sclice. This is the default value, may be omitted
      blocks: 1

All resources (GPU, CPU, memory) are split evenly across blocks, while disk is shared.

For example, with 8 GPUs, 128 CPUs, and 2TB RAM, setting blocks to 8 gives each block 1 GPU, 16 CPUs, and 256 GB RAM.

Set blocks to auto to match the number of blocks to the number of GPUs.

Distributed tasks

Distributed tasks require exclusive access to all host resources and therefore must use all blocks on each node.

Environment variables

If needed, you can specify environment variables that will be used by dstack-shim and passed to containers.

For example, these variables can be used to configure a proxy:

type: fleet
name: my-fleet

env:
  - HTTP_PROXY=http://proxy.example.com:80
  - HTTPS_PROXY=http://proxy.example.com:80
  - NO_PROXY=localhost,127.0.0.1

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

Proxy jump

If fleet hosts are behind a head node (aka "login node"), configure proxy_jump:

type: fleet
name: my-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/worker_node_key
  hosts:
    - 3.255.177.51
    - 3.255.177.52
  proxy_jump:
    hostname: 3.255.177.50
    user: ubuntu
    identity_file: ~/.ssh/head_node_key

To be able to attach to runs, both explicitly with dstack attach and implicitly with dstack apply, you must either add a front node key (~/.ssh/head_node_key) to an SSH agent or configure a key path in ~/.ssh/config:

Host 3.255.177.50
    IdentityFile ~/.ssh/head_node_key

where Host must match ssh_config.proxy_jump.hostname or ssh_config.hosts[n].proxy_jump.hostname if you configure head nodes on a per-worker basis.

Reference

For all SSH fleet configuration options, refer to the reference.

Troubleshooting

Resources

Once the fleet is created, double-check that the GPU, memory, and disk are detected correctly.

If the status does not change to idle after a few minutes or the resources are not displayed correctly, ensure that all host requirements are satisfied.

If the requirements are met but the fleet still fails to be created correctly, check the logs at /root/.dstack/shim.log on the hosts for error details.

Manage fleets

List fleets

The dstack fleet command lists fleet instances and their status:

$ dstack fleet

 FLEET     INSTANCE  BACKEND              GPU             PRICE    STATUS  CREATED 
 my-fleet  0         gcp (europe-west-1)  L4:24GB (spot)  $0.1624  idle    3 mins ago      
           1         gcp (europe-west-1)  L4:24GB (spot)  $0.1624  idle    3 mins ago    

Delete fleets

When a fleet isn't used by a run, you can delete it by passing the fleet configuration to dstack delete:

$ dstack delete -f cluster.dstack.yaml
Delete the fleet my-gcp-fleet? [y/n]: y
Fleet my-gcp-fleet deleted

Alternatively, you can delete a fleet by passing the fleet name to dstack fleet delete. To terminate and delete specific instances from a fleet, pass -i INSTANCE_NUM.

What's next?

  1. Check dev environments, tasks, and services
  2. Read the Clusters guide