Skip to content

Blog

Introducing service probes

dstack services are long-running workloads—most often inference endpoints and sometimes web apps—that run continuously on GPU or CPU instances. They can scale across replicas and support rolling deployments.

This release adds HTTP probes inspired by Kubernetes readiness probes. Probes periodically call an endpoint on each replica (for example, /health) to confirm it responds as expected. The result gives clear visibility into startup progress and, during rolling deployments, ensures traffic only shifts to a replacement replica after all configured probes have proven ready.

Introducing passive GPU health checks

In large-scale training, a single bad GPU can derail progress. Sometimes the failure is obvious — jobs crash outright. Other times it’s subtle: correctable memory errors, intermittent instability, or thermal throttling that quietly drags down throughput. In big experiments, these issues can go unnoticed for hours or days, wasting compute and delaying results.

dstack already supports GPU telemetry monitoring through NVIDIA DCGM metrics, covering utilization, memory, and temperature. This release extends that capability with passive hardware health checks powered by DCGM background health checks. With these, dstack continuously evaluates fleet GPUs for hardware reliability and displays their status before scheduling workloads.

Supporting Hot Aisle AMD AI Developer Cloud

As the ecosystem around AMD GPUs matures, developers are looking for easier ways to experiment with ROCm, benchmark new architectures, and run cost-effective workloads—without manual infrastructure setup.

dstack is an open-source orchestrator designed for AI workloads, providing a lightweight, container-native alternative to Kubernetes and Slurm.

Today, we’re excited to announce native integration with Hot Aisle , an AMD-only GPU neocloud offering VMs and clusters at highly competitive on-demand pricing.

Benchmarking AMD GPUs: bare-metal, VMs

This is the first in our series of benchmarks exploring the performance of AMD GPUs in virtualized versus bare-metal environments. As cloud infrastructure increasingly relies on virtualization, a key question arises: can VMs match bare-metal performance for GPU-intensive tasks? For this initial investigation, we focus specifically on a single-GPU setup, comparing a containerized workload on a VM against a bare-metal server, both equipped with the powerful AMD MI300X GPU.

Benchmarking AMD GPUs: bare-metal, containers, partitions

Our new benchmark explores two important areas for optimizing AI workloads on AMD GPUs: First, do containers introduce a performance penalty for network-intensive tasks compared to a bare-metal setup? Second, how does partitioning a powerful GPU like the MI300X affect its real-world performance for different types of AI workloads?

Rolling deployment, Secrets, Files, Tenstorrent, and more

Thanks to feedback from the community, dstack continues to evolve. Here’s a look at what’s new.

Rolling deployments

Previously, updating running services could cause downtime. The latest release fixes this with rolling deployments. Replicas are now updated one by one, allowing uninterrupted traffic during redeployments.

$ dstack apply -f .dstack.yml

Active run my-service already exists. Detected changes that can be updated in-place:
- Repo state (branch, commit, or other)
- File archives
- Configuration properties:
  - env
  - files

Update the run? [y/n]: y  Launching my-service...

 NAME                      BACKEND          PRICE    STATUS       SUBMITTED
 my-service deployment=1                             running      11 mins ago
   replica=0 deployment=0  aws (us-west-2)  $0.0026  terminating  11 mins ago
   replica=1 deployment=1  aws (us-west-2)  $0.0026  running      1 min ago

How EA uses dstack to fast-track AI development

At NVIDIA GTC 2025, Electronic Arts shared how they’re scaling AI development and managing infrastructure across teams. They highlighted using tools like dstack to provision GPUs quickly, flexibly, and cost-efficiently. This case study summarizes key insights from their talk.

EA has over 100+ AI projects running, and the number keeps growing. There are many teams with AI needs—game dev, ML engineers, AI researchers, and platform teams—supported by a central tech team. Some need full MLOps support; others have in-house expertise but need flexible tooling and infrastructure.

Supporting GPU provisioning and orchestration on Nebius

As demand for GPU compute continues to scale, open-source tools tailored for AI workloads are becoming critical to developer velocity and efficiency. dstack is an open-source orchestrator purpose-built for AI infrastructure—offering a lightweight, container-native alternative to Kubernetes and Slurm.

Today, we’re announcing native integration with Nebius , offering a streamlined developer experience for teams using GPUs for AI workloads.

Built-in UI for monitoring essential GPU metrics

AI workloads generate vast amounts of metrics, making it essential to have efficient monitoring tools. While our recent update introduced the ability to export available metrics to Prometheus for maximum flexibility, there are times when users need to quickly access essential metrics without the need to switch to an external tool.

Previously, we introduced a CLI command that allows users to view essential GPU metrics for both NVIDIA and AMD hardware. Now, with this latest update, we’re excited to announce the addition of a built-in dashboard within the dstack control plane.