Skip to content

Orchestrating GPUs on Kubernetes clusters

dstack gives teams a unified way to run and manage GPU-native containers across clouds and on-prem environments — without requiring Kubernetes. At the same time, many organizations rely on Kubernetes as the foundation of their infrastructure.

To support these users, dstack is releasing the beta of its native Kubernetes integration.

This update allows dstack to orchestrate dev environments, distributed training, and inference workloads directly on Kubernetes clusters — combining the best of both worlds: an ML-tailored interface for ML teams together with the full Kubernetes ecosystem.

Read below to learn on how to use dstack with Kubernetes clusters.

Creating a Kubernetes cluster

A major advantage of Kubernetes is its portability. Whether you’re using managed Kubernetes on a GPU cloud or an on-prem cluster, you can connect it to dstack and use it to orchestrate your GPU workloads.

NVIDIA GPU Operator

For dstack to correctly detect GPUs in your Kubernetes cluster, the cluster must have the NVIDIA GPU Operator pre-installed.

Nebius example

If you're using Nebius , the process of creating a Kubernetes cluster is straightforward.

Select the region of interest and click Create cluster.
Once the cluster is created, switch to Applications and install the nvidia-device-plugin application — this can be done in one click.

Next, go to Node groups and click Create node group. Choose the GPU type and count, disk size, and other options.
If dstack doesn't run in the same network, enable public IPs so that dstack can access the nodes.

Setting up the backend

Once the cluster is ready, you need to configure the kubernetes backend in the dstack server.
To do this, add the corresponding configuration to your ~/.dstack/server/config.yml file:

projects:
- name: main
    backends:
    - type: kubernetes
        kubeconfig:
            filename: ~/.kube/config
        proxy_jump:
            hostname: 204.12.171.137
            port: 32000

The configuration includes two main parts: the path to the kubeconfig file and the proxy-jump configuration.

If your cluster is on Nebius, click How to connect in the console — it will guide you through setting up the kubeconfig file.

Proxy jump

To allow dstack to forward SSH traffic, it needs one node to act as a proxy jump. Choose any node in the cluster and specify its IP address and an accessible port in the backend configuration.

Now that the backend is configured, go ahead and restart the dstack server.

That’s it — you can now use all of dstack’s features, including dev environments, tasks, services, and fleets.

Running a dev environment

A dev environment lets you provision an instance and connect to it from your desktop IDE.

type: dev-environment
# The name is optional, if not specified, generated randomly
name: vscode

python: "3.11"

# Uncomment to use a custom Docker image
#image: huggingface/trl-latest-gpu

ide: vscode

resources:
  gpu: H200

To run a dev environment, pass the configuration to dstack apply:

$ dstack apply -f examples/.dstack.yml

 #  BACKEND         RESOURCES                                   INSTANCE TYPE                       PRICE
 1  kubernetes (-)  cpu=127 mem=1574GB disk=871GB H200:141GB:8  computeinstance-u00hwk32d0xemhxhvj  $0
 2  kubernetes (-)  cpu=127 mem=1574GB disk=871GB H200:141GB:8  computeinstance-u00n24fb4q85yavc9z  $0

Submit the run vscode? [y/n]: y

Launching `vscode`...
---> 100%

To open in VS Code Desktop, use this link:
  vscode://vscode-remote/ssh-remote+vscode/workflow

Dev environments support many diffrent options, including a custom Docker image, mounted repositories, idle timeout, min GPU utilization, and more.

Running distributed training

Distributed training can be performed in dstack using distributed tasks. The configuration is similar to a dev environment, except it runs across multiple nodes.

Creating a cluster fleet

Before running a distributed task, create a fleet with placement set to cluster:

type: fleet
# The name is optional; if not specified, one is generated automatically
name: my-k8s-fleet

# For `kubernetes`, `min` should be set to `0` since it can't pre-provision VMs.
# Optionally, you can set the maximum number of nodes to limit scaling.
nodes: 0..

placement: cluster

backends: [kuberenetes]

resources:
  # Specify requirements to filter nodes
  gpu: 1..8

Then, create the fleet using the dstack apply command:

$ dstack apply -f examples/misc/fleets/.dstack.yml

Provisioning...
---> 100%

 FLEET     INSTANCE  BACKEND              GPU             PRICE    STATUS  CREATED 

Once the fleet is created, you can run distributed tasks on it.

NCCL tests example

Below is an example of using distributed tasks to run NCCL tests. It also demonstrates how to use mpirun with dstack:

type: task
name: nccl-tests

nodes: 2

# The `startup_order` and `stop_criteria` properties are required for `mpirun`
startup_order: workers-first
stop_criteria: master-done

env:
  - NCCL_DEBUG=INFO
commands:
  - |
    if [ $DSTACK_NODE_RANK -eq 0 ]; then
      mpirun \
        --allow-run-as-root \
        --hostfile $DSTACK_MPI_HOSTFILE \
        -n $DSTACK_GPUS_NUM \
        -N $DSTACK_GPUS_PER_NODE \
        --bind-to none \
        /opt/nccl-tests/build/all_reduce_perf -b 8 -e 8G -f 2 -g 1
    else
      sleep infinity
    fi

# The `kubernetes` backend requires it
privileged: true

resources:
  gpu: nvidia:1..8
  shm_size: 16GB

To run the configuration, use the dstack apply command.

$ dstack apply -f examples/clusters/nccl-tests/.dstack.yml --fleet my-k8s-fleet

#  BACKEND         RESOURCES                                   INSTANCE TYPE                       PRICE
1  kubernetes (-)  cpu=127 mem=1574GB disk=871GB H200:141GB:8  computeinstance-u00hwk32d0xemhxhvj  $0
2  kubernetes (-)  cpu=127 mem=1574GB disk=871GB H200:141GB:8  computeinstance-u00n24fb4q85yavc9z  $0

Submit the run nccl-tests? [y/n]: y

Distributed training example

Below is a minimal example of a distributed training configuration:

type: task
name: train-distrib

nodes: 2

python: 3.12
env:
  - NCCL_DEBUG=INFO
commands:
  - git clone https://github.com/pytorch/examples.git pytorch-examples
  - cd pytorch-examples/distributed/ddp-tutorial-series
  - uv pip install -r requirements.txt
  - |
    torchrun \
      --nproc-per-node=$DSTACK_GPUS_PER_NODE \
      --node-rank=$DSTACK_NODE_RANK \
      --nnodes=$DSTACK_NODES_NUM \
      --master-addr=$DSTACK_MASTER_NODE_IP \
      --master-port=12345 \
      multinode.py 50 10

resources:
  gpu: 1..8
  shm_size: 16GB

To run the configuration, use the dstack apply command.

$ dstack apply -f examples/distributed-training/torchrun/.dstack.yml --fleet my-k8s-fleet

#  BACKEND         RESOURCES                                   INSTANCE TYPE                       PRICE
1  kubernetes (-)  cpu=127 mem=1574GB disk=871GB H200:141GB:8  computeinstance-u00hwk32d0xemhxhvj  $0
2  kubernetes (-)  cpu=127 mem=1574GB disk=871GB H200:141GB:8  computeinstance-u00n24fb4q85yavc9z  $0

Submit the run nccl-tests? [y/n]: y

For more examples, explore the distirbuted training section in the docs.

FAQ

VM-based backends vs Kubernetes backend

While the kubernetes backend is preferred if your team depends on the Kubernetes ecosystem, the VM-based backends leverage native integration with top GPU clouds (including Nebius and others) and may be a better choice if Kubernetes isn’t required.

VM-based backends also offer more granular control over cluster provisioning.

Note that dstack doesn’t yet support Kubernetes clusters with auto-scaling enabled (coming soon), which can be another reason to use VM-based backends.

SSH fleets vs Kubernetes backend

If you’re using on-prem servers and Kubernetes isn’t a requirement, SSH fleets may be simpler. They provide a lightweight and flexible alternative.

AMD GPUs

Support for AMD GPUs is coming soon — our team is actively working on it right now.

What's next

  1. Check Quickstart
  2. Explore dev environments, tasks, services, and fleets
  3. Read the the clusters guide
  4. Join Discord