Orchestrating GPUs on DigitalOcean and AMD Developer Cloud¶
Orchestration automates provisioning, running jobs, and tearing them down. While Kubernetes and Slurm are powerful in their domains, they lack the lightweight, GPU-native focus modern teams need to move faster.
dstack
is built entirely around GPUs. Our latest update introduces native integration with DigitalOcean and
AMD Developer Cloud , enabling teams to provision cloud GPUs and run workloads more cost-efficiently.
About Digital Ocean¶
DigitalOcean is one of the leading cloud platforms offering GPUs both as VMs and as bare-metal clusters equipped with NVIDIA and AMD GPUs.
About AMD Developer Cloud¶
AMD Developer Cloud is a new cloud platform designed to make AMD GPUs easily accessible to developers, academics, open-source contributors, and AI innovators worldwide.
Why dstack¶
dstack
provides a high-level, AI-engineer-friendly interface where GPUs work out of the box—no K8S custom operators or low-level setup required. It’s use-case agnostic, equally suited for training, inference, benchmarking, and dev environments.
With the new DigitalOcean and AMD Developer Cloud backends, you can now provision NVIDIA or AMD GPU VMs and run workloads with a single CLI command.
Getting started¶
Best part about dstack
is that it's very easy to get started.
- Create a project in Digital Ocean or AMD Developer Cloud
- Get credits or approve a payment method
- Create an API key
Then, configure the backend in ~/.dstack/server/config.yml
:
projects:
- name: main
backends:
- type: amddevcloud
project_name: my-amd-project
creds:
type: api_key
api_key: ...
For DigitalOcean, set type
to digitalocean
.
Install and start the dstack
server:
$ pip install "dstack[server]"
$ dstack server
For more details, see Installation.
Use the dstack
CLI to
manage dev environments, tasks,
and services.
The digitalocean
and amddevcloud
backends support NVIDIA and AMD GPU VMs, respectively, and allow you to run
dev environments (interactive development), tasks
(training, fine-tuning, or other batch jobs), and services (inference).
Here’s an example of a service configuration:
type: service
name: gpt-oss-120b
model: openai/gpt-oss-120b
env:
- HF_TOKEN
- MODEL=openai/gpt-oss-120b
# To enable AITER, set below to 1. Otherwise, set it to 0.
- VLLM_ROCM_USE_AITER=1
# To enable AITER Triton unified attention
- VLLM_USE_AITER_UNIFIED_ATTENTION=1
# below is required in order to enable AITER unified attention by disabling AITER MHA
- VLLM_ROCM_USE_AITER_MHA=0
image: rocm/vllm-dev:open-mi300-08052025
commands:
- |
vllm serve $MODEL \
--tensor-parallel $DSTACK_GPUS_NUM \
--no-enable-prefix-caching \
--disable-log-requests \
--compilation-config '{"full_cuda_graph": true}'
port: 8000
volumes:
# Cache downloaded models
- /root/.cache/huggingface:/root/.cache/huggingface
resources:
gpu: MI300X:8
shm_size: 32GB
As with any configuration, you can apply it via dstack apply
. If needed, dstack
will automatically provision new VMs and run the inference endpoint.
$ dstack apply -f examples/models/gpt-oss/120b.dstack.yml
# BACKEND RESOURCES PRICE
1 amddevcloud (alt1) cpu=20 mem=240GB disk=720GB MI300X:192GB:8 $15.92
Submit the run? [y/n]:
If you prefer to use bare-metal clusters with
dstack
, you can create an SSH fleet. This way, you’ll be able to run distributed tasks efficiently across the cluster.
What's next?
- Check Quickstart
- Learn more about DigitalOcean and AMD Developer Cloud
- Explore dev environments, tasks, services, and fleets
- Join Discord