Agentic orchestration

dstack is an open-source control plane for agents and engineers to provision compute and run training, inference, across GPU vendors, clouds, Kubernetes, and bare-metal clusters.

Install open-source Documentation

Finally, an orchestration stack that doesn’t suck.

A unified control plane for compute orchestration

Managing AI infrastructure requires fine-grained primitives for compute provisioning, with native integration across GPU vendors, clouds, and open-source frameworks.

dstack is a unified control plane for provisioning clusters and running training, inference, and sandboxes across clouds, Kubernetes, and bare-metal clusters.

It’s built for containerized workloads and designed for both engineers and agents. No Kubernetes or Slurm hassle required.

type: fleet
name: my-fleet

# Pre-provision 2, scale up to 10
nodes: 2..10
placement: cluster

# Deprovision instances if idle
idle_duration: 1h

resources:
  gpu: H100:8

backends: [aws]

Provision compute in any GPU cloud

dstack provisions GPU VMs directly through cloud APIs—no Kubernetes needed.

If you already have a Kubernetes cluster, dstack can manage it too.

Once a backend fleet is created, dstack will let you run dev environments, tasks, and services on this fleet.

Backends

Bring your existing GPU clusters

Have bare-metal servers or pre-provisioned VMs? Use SSH fleets to connect them to dstack.

Just provide SSH credentials and host addresses, and dstack creates an SSH fleet.

Once created, dstack will let you run dev environments, tasks, and services on this fleet.

SSH fleets

type: fleet
name: my-fleet

placement: cluster

# Allow concurrent workloads on the same host
blocks: auto

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

type: dev-environment
name: vscode

python: "3.12"

# Clones this repo to the working dir (from default image)
repos:
  - .

ide: vscode

# Stop if inactive for 2 hours
inactivity_duration: 2h

resources:
  gpu: H100:1

Run development environments

If you or your agent need a development environment with a GPU, let dstack create you a dev environment.

If you plan to work with it yourself, you can access it using your desktop IDE such as VS Code, Cursor, and Windsurf. dstack apply prints both the IDE URL and SSH command.

Dev environments

Run training or batch jobs at any scale

Run training or batch workloads on a single GPU, or scale to multi-GPU and multi-node clusters using simple task configurations. dstack automates cluster provisioning, resource allocation, and job scheduling.

During execution, dstack reports GPU utilization, memory usage, and GPU health metrics for each job.

Tasks

type: task
name: train-distrib

nodes: 2

python: 3.12
env:
  - NCCL_DEBUG=INFO
files: .
commands:
  - uv pip install -r requirements.txt
  - |
    torchrun \
      --nproc-per-node=$DSTACK_GPUS_PER_NODE \
      --node-rank=$DSTACK_NODE_RANK \
      --nnodes=$DSTACK_NODES_NUM \
      --master-addr=$DSTACK_MASTER_NODE_IP \
      --master-port=12345 \
      multinode.py 50 10

resources:
  gpu: H100:8
  shm_size: 24GB

type: service
name: qwen3-235b

image: lmsysorg/sglang
env:
  - HF_TOKEN
  - MODEL_ID=Qwen/Qwen3-235B-A22B-FP8
commands:
  - |
    python3 -m sglang.launch_server \
      --model-path $MODEL_ID \
      --tp $DSTACK_GPUS_NUM \
      --reasoning-parser qwen3 \
      --port 8000

port: 8000
model: Qwen/Qwen3-235B-A22B-FP8

resources:
  gpu: H100:8
  disk: 500GB..

Run high-performance model inference

With dstack, you can deploy models as secure, auto-scaling, OpenAI-compatible endpoints, integrating with top open-source serving frameworks such as SGLang, vLLM, TensorRT-LLM, or any other.

dstack enables Disaggregated Prefill/Decode and cache-aware routing, providing production-grade, optimized inference.

Services

FAQ

How does dstack differ from Slurm?

Slurm is a battle-tested system with decades of production use in HPC environments. dstack by contrast, is built for modern ML/AI workloads with cloud-native provisioning and a container-first architecture. While both support distributed training and batch jobs, dstack also natively supports development and production-grade inference.

See the migration guide for a detailed comparison.

How does dstack compare to Kubernetes?

Kubernetes is a general-purpose container orchestrator. dstack also orchestrates containers, but it provides a lightweight and streamlined interface that is purpose built for ML.

You declare dev environments, tasks, services, and fleets with simple configuration. dstack provisions GPUs, manages clusters via fleets with fine-grained controls, and optimizes cost and utilization, while keeping a simple UI and CLI.

If you already use Kubernetes, you can run dstack on it via the Kubernetes backend.

Can I use dstack with Kubernetes?

Yes. You can connect existing Kubernetes clusters using the Kubernetes backend and run dev environments, tasks, and services on it. Choose the Kubernetes backend if your GPUs already run on Kubernetes and your team depends on its ecosystem and tooling. See the Kubernetes guide for setup and best practices.

If your priority is orchestrating cloud GPUs and Kubernetes isn’t a must, VM-based backends are a better fit thanks to their native cloud integration. For on-prem GPUs where Kubernetes is optional, SSH fleets provide a simpler and more lightweight alternative.

When should I use dstack?

dstack accelerates ML development with a simple, ML‑native interface. Spin up dev environments, run single‑node or distributed tasks, and deploy services without infrastructure overhead.

It radically reduces GPU costs via smart orchestration and fine‑grained fleet controls, including efficient reuse, right‑sizing, and support for spot, on‑demand, and reserved capacity.

It is 100% interoperable with your stack and works with any open‑source frameworks and tools, as well as your own Docker images and code, across GPU clouds, Kubernetes, and on‑prem GPUs.

Have questions, or need help?
Discord Talk to us

Trusted by thousands of engineers across 100+ AI-first companies

Wah Loon Keng

Sr. AI Engineer @Electronic Arts

With dstack, AI researchers at EA can spin up and scale experiments without touching infrastructure. It supports everything from quick prototyping to multi-node training on any cloud.

Aleksandr Movchan

ML Engineer @Mobius Labs

Thanks to dstack, my team can quickly tap into affordable GPUs and streamline our workflows from testing and development to full-scale application deployment.

Alvaro Bartolome

ML Engineer @Argilla

With dstack it's incredibly easy to define a configuration within a repository and run it without worrying about GPU availability. It lets you focus on data and your research.

Park Chansung

ML Researcher @ETRI

Thanks to dstack, I can effortlessly access the top GPU options across different clouds, saving me time and money while pushing my AI work forward.

Eckart Burgwedel

CEO @Uberchord

With dstack, running LLMs on a cloud GPU is as easy as running a local Docker container. It combines the ease of Docker with the auto-scaling capabilities of K8S.

Peter Hill

Co-Founder @CUDO Compute

dstack simplifies infrastructure provisioning and AI development. If your team is on the lookout for an AI platform, I wholeheartedly recommend dstack.

Get started in minutes

Install dstack on your laptop with uv, or deploy it anywhere using the dstackai/dstack Docker image.

Bring your compute via backends or SSH fleets, then bring your team.

Quickstart Installation

dstack Sky

Hosted by us. Bring your own cloud, or tap into marketplace.

Sign up free

Get $5 in GPU credits. Already have an account? Sign in

dstack Enterprise

Self-hosted with SSO, air-gapped environments, and dedicated support.

Talk to us

See how dstack fits your infrastructure.