Changelog¶

April 3, 2025
in Changelog
2 min read

Built-in UI for monitoring essential GPU metrics

AI workloads generate vast amounts of metrics, making it essential to have efficient monitoring tools. While our recent update introduced the ability to export available metrics to Prometheus for maximum flexibility, there are times when users need to quickly access essential metrics without the need to switch to an external tool.

Previously, we introduced a CLI command that allows users to view essential GPU metrics for both NVIDIA and AMD hardware. Now, with this latest update, we’re excited to announce the addition of a built-in dashboard within the dstack control plane.

April 2, 2025
in Changelog
2 min read

Supporting MPI and NCCL/RCCL tests

As AI models grow in complexity, efficient orchestration tools become increasingly important. Fleets introduced by dstack last year streamline task execution on both cloud and on-prem clusters, whether it's pre-training, fine-tuning, or batch processing.

The strength of dstack lies in its flexibility. Users can leverage distributed framework like torchrun, accelerate, or others. dstack handles node provisioning, job execution, and automatically propagates system environment variables—such as DSTACK_NODE_RANK, DSTACK_MASTER_NODE_IP, DSTACK_GPUS_PER_NODE and others—to containers.

One use case dstack hasn’t supported until now is MPI, as it requires a scheduled environment or direct SSH connections between containers. Since mpirun is essential for running NCCL/RCCL tests—crucial for large-scale cluster usage—we’ve added support for it.

April 1, 2025
in Changelog
2 min read

Exporting GPU, cost, and other metrics to Prometheus

Effective AI infrastructure management requires full visibility into compute performance and costs. AI researchers need detailed insights into container- and GPU-level performance, while managers rely on cost metrics to track resource usage across projects.

While dstack provides key metrics through its UI and dstack metrics CLI, teams often need more granular data and prefer using their own monitoring tools. To support this, we’ve introduced a new endpoint that allows real-time exporting all collected metrics—covering fleets and runs—directly to Prometheus.

March 31, 2025
in Changelog
2 min read

Accessing dev environments with Cursor

Dev environments enable seamless provisioning of remote instances with the necessary GPU resources, automatic repository fetching, and streamlined access via SSH or a preferred desktop IDE.

Previously, support was limited to VS Code. However, as developers rely on a variety of desktop IDEs, we’ve expanded compatibility. With this update, dev environments now offer effortless access for users of Cursor.

February 21, 2025
in Changelog
3 min read

Supporting Intel Gaudi AI accelerators with SSH fleets

At dstack, our goal is to make AI container orchestration simpler and fully vendor-agnostic. That’s why we support not just leading cloud providers and on-prem environments but also a wide range of accelerators.

With our latest release, we’re adding support for Intel Gaudi AI Accelerator and launching a new partnership with Intel.

February 19, 2025
in Changelog
2 min read

Auto-shutdown for inactive dev environments—no idle GPUs

Whether you’re using cloud or on-prem compute, you may want to test your code before launching a training task or deploying a service. dstack’s dev environments make this easy by setting up a remote machine, cloning your repository, and configuring your IDE —all within a container that has GPU access.

One issue with dev environments is forgetting to stop them or closing your laptop, leaving the GPU idle and costly. With our latest update, dstack now detects inactive environments and automatically shuts them down, saving you money.

February 18, 2025
in Changelog
4 min read

Introducing GPU blocks and proxy jump for SSH fleets

Recent breakthroughs in open-source AI have made AI infrastructure accessible beyond public clouds, driving demand for running AI workloads in on-premises data centers and private clouds. This shift offers organizations both high-performant clusters and flexibility and control.

However, Kubernetes, while a popular choice for traditional deployments, is often too complex and low-level to address the needs of AI teams.

Originally, dstack was focused on public clouds. With the new release, dstack extends support to data centers and private clouds, offering a simpler, AI-native solution that replaces Kubernetes and Slurm.

February 17, 2025
in Changelog
2 min read

Supporting NVIDIA and AMD accelerators on Vultr

As demand for AI infrastructure grows, the need for efficient, vendor-neutral orchestration tools is becoming increasingly important. At dstack, we’re committed to redefining AI container orchestration by prioritizing an AI-native, open-source-first approach. Today, we’re excited to share a new integration and partnership with Vultr.

This new integration enables Vultr customers to train and deploy models on both AMD and NVIDIA GPUs with greater flexibility and efficiency–using dstack.

November 5, 2024
in Changelog
2 min read

Introducing instance volumes to persist data on instances

Until now, dstack supported data persistence only with network volumes, managed by clouds. While convenient, sometimes you might want to use a simple cache on the instance or mount an NFS share to your SSH fleet. To address this, we're now introducing instance volumes that work for both cases.

type: task 
name: llama32-task

env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
commands:
  - pip install vllm
  - vllm serve $MODEL_ID --max-model-len 4096
ports: [8000]

volumes:
  - /root/.dstack/cache:/root/.cache

resources:
  gpu: 16GB..

October 22, 2024
in Changelog
2 min read

Monitoring essential GPU metrics via CLI

While it's possible to use third-party monitoring tools with dstack, it is often more convenient to debug your run and track metrics out of the box. That's why, with the latest release, dstack introduced dstack stats, a new CLI (and API) for monitoring container metrics, including GPU usage for NVIDIA, AMD, and other accelerators.