Skip to content

Blog

Nebius joins dstack Sky GPU marketplace, with production-ready GPU clusters

dstack is an open-source control plane for orchestrating GPU workloads. It can provision cloud VMs, run on top of Kubernetes, or manage on-prem clusters. If you don’t want to self-host, you can use dstack Sky , the managed version of dstack that also provides access to cloud GPUs via its markfetplace.

With our latest release, we’re excited to announce that Nebius , a purpose-built AI cloud for large scale training and inference, has joined the dstack Sky marketplace to offer on-demand and spot GPUs, including clusters.

The state of cloud GPUs in 2025: costs, performance, playbooks

This is a practical map for teams renting GPUs — whether you’re a single project team fine-tuning models or a production-scale team managing thousand-GPU workloads. We’ll break down where providers fit, what actually drives performance, how pricing really works, and how to design a control plane that makes multi-cloud not just possible, but a competitive advantage.

Orchestrating GPUs on DigitalOcean and AMD Developer Cloud

Orchestration automates provisioning, running jobs, and tearing them down. While Kubernetes and Slurm are powerful in their domains, they lack the lightweight, GPU-native focus modern teams need to move faster.

dstack is built entirely around GPUs. Our latest update introduces native integration with DigitalOcean and AMD Developer Cloud , enabling teams to provision cloud GPUs and run workloads more cost-efficiently.

Introducing service probes

dstack services are long-running workloads—most often inference endpoints and sometimes web apps—that run continuously on GPU or CPU instances. They can scale across replicas and support rolling deployments.

This release adds HTTP probes inspired by Kubernetes readiness probes. Probes periodically call an endpoint on each replica (for example, /health) to confirm it responds as expected. The result gives clear visibility into startup progress and, during rolling deployments, ensures traffic only shifts to a replacement replica after all configured probes have proven ready.

Introducing passive GPU health checks

In large-scale training, a single bad GPU can derail progress. Sometimes the failure is obvious — jobs crash outright. Other times it’s subtle: correctable memory errors, intermittent instability, or thermal throttling that quietly drags down throughput. In big experiments, these issues can go unnoticed for hours or days, wasting compute and delaying results.

dstack already supports GPU telemetry monitoring through NVIDIA DCGM metrics, covering utilization, memory, and temperature. This release extends that capability with passive hardware health checks powered by DCGM background health checks. With these, dstack continuously evaluates fleet GPUs for hardware reliability and displays their status before scheduling workloads.

Supporting Hot Aisle AMD AI Developer Cloud

As the ecosystem around AMD GPUs matures, developers are looking for easier ways to experiment with ROCm, benchmark new architectures, and run cost-effective workloads—without manual infrastructure setup.

dstack is an open-source orchestrator designed for AI workloads, providing a lightweight, container-native alternative to Kubernetes and Slurm.

Today, we’re excited to announce native integration with Hot Aisle , an AMD-only GPU neocloud offering VMs and clusters at highly competitive on-demand pricing.

Benchmarking AMD GPUs: bare-metal, VMs

This is the first in our series of benchmarks exploring the performance of AMD GPUs in virtualized versus bare-metal environments. As cloud infrastructure increasingly relies on virtualization, a key question arises: can VMs match bare-metal performance for GPU-intensive tasks? For this initial investigation, we focus specifically on a single-GPU setup, comparing a containerized workload on a VM against a bare-metal server, both equipped with the powerful AMD MI300X GPU.

Benchmarking AMD GPUs: bare-metal, containers, partitions

Our new benchmark explores two important areas for optimizing AI workloads on AMD GPUs: First, do containers introduce a performance penalty for network-intensive tasks compared to a bare-metal setup? Second, how does partitioning a powerful GPU like the MI300X affect its real-world performance for different types of AI workloads?

Rolling deployment, Secrets, Files, Tenstorrent, and more

Thanks to feedback from the community, dstack continues to evolve. Here’s a look at what’s new.

Rolling deployments

Previously, updating running services could cause downtime. The latest release fixes this with rolling deployments. Replicas are now updated one by one, allowing uninterrupted traffic during redeployments.

$ dstack apply -f .dstack.yml

Active run my-service already exists. Detected changes that can be updated in-place:
- Repo state (branch, commit, or other)
- File archives
- Configuration properties:
  - env
  - files

Update the run? [y/n]: y  Launching my-service...

 NAME                      BACKEND          PRICE    STATUS       SUBMITTED
 my-service deployment=1                             running      11 mins ago
   replica=0 deployment=0  aws (us-west-2)  $0.0026  terminating  11 mins ago
   replica=1 deployment=1  aws (us-west-2)  $0.0026  running      1 min ago

How EA uses dstack to fast-track AI development

At NVIDIA GTC 2025, Electronic Arts shared how they’re scaling AI development and managing infrastructure across teams. They highlighted using tools like dstack to provision GPUs quickly, flexibly, and cost-efficiently. This case study summarizes key insights from their talk.

EA has over 100+ AI projects running, and the number keeps growing. There are many teams with AI needs—game dev, ML engineers, AI researchers, and platform teams—supported by a central tech team. Some need full MLOps support; others have in-house expertise but need flexible tooling and infrastructure.