NVIDIA¶

May 22, 2025
in Case studies, NVIDIA
3 min read

Case study: how EA uses dstack to fast-track AI development

At NVIDIA GTC 2025, Electronic Arts shared how they’re scaling AI development and managing infrastructure across teams. They highlighted using tools like dstack to provision GPUs quickly, flexibly, and cost-efficiently. This case study summarizes key insights from their talk.

EA has over 100+ AI projects running, and the number keeps growing. There are many teams with AI needs—game dev, ML engineers, AI researchers, and platform teams—supported by a central tech team. Some need full MLOps support; others have in-house expertise but need flexible tooling and infrastructure.

April 11, 2025
in Cloud fleets, NVIDIA
2 min read

Supporting GPU provisioning and orchestration on Nebius

As demand for GPU compute continues to scale, open-source tools tailored for AI workloads are becoming critical to developer velocity and efficiency. dstack is an open-source orchestrator purpose-built for AI infrastructure—offering a lightweight, container-native alternative to Kubernetes and Slurm.

Today, we’re announcing native integration with Nebius , offering a streamlined developer experience for teams using GPUs for AI workloads.

April 3, 2025
in Metrics, AMD, NVIDIA
2 min read

Built-in UI for monitoring essential GPU metrics

AI workloads generate vast amounts of metrics, making it essential to have efficient monitoring tools. While our recent update introduced the ability to export available metrics to Prometheus for maximum flexibility, there are times when users need to quickly access essential metrics without the need to switch to an external tool.

Previously, we introduced a CLI command that allows users to view essential GPU metrics for both NVIDIA and AMD hardware. Now, with this latest update, we’re excited to announce the addition of a built-in dashboard within the dstack control plane.

April 1, 2025
in Metrics, NVIDIA
2 min read

Exporting GPU, cost, and other metrics to Prometheus

Effective AI infrastructure management requires full visibility into compute performance and costs. AI researchers need detailed insights into container- and GPU-level performance, while managers rely on cost metrics to track resource usage across projects.

While dstack provides key metrics through its UI and dstack metrics CLI, teams often need more granular data and prefer using their own monitoring tools. To support this, we’ve introduced a new endpoint that allows real-time exporting all collected metrics—covering fleets and runs—directly to Prometheus.

March 18, 2025
in Benchmarks, AMD, NVIDIA
6 min read

DeepSeek R1 inference performance: MI300X vs. H200

DeepSeek-R1, with its innovative architecture combining Multi-head Latent Attention (MLA) and DeepSeekMoE, presents unique challenges for inference workloads. As a reasoning-focused model, it generates intermediate chain-of-thought outputs, placing significant demands on memory capacity and bandwidth.

In this benchmark, we evaluate the performance of three inference backends—SGLang, vLLM, and TensorRT-LLM—on two hardware configurations: 8x NVIDIA H200 and 8x AMD MI300X. Our goal is to compare throughput, latency, and overall efficiency to determine the optimal backend and hardware pairing for DeepSeek-R1's demanding requirements.

This benchmark was made possible through the generous support of our partners at Vultr and Lambda , who provided access to the necessary hardware.

February 17, 2025
in Cloud fleets, NVIDIA, AMD
2 min read

Supporting NVIDIA and AMD accelerators on Vultr

As demand for AI infrastructure grows, the need for efficient, vendor-neutral orchestration tools is becoming increasingly important. At dstack, we’re committed to redefining AI container orchestration by prioritizing an AI-native, open-source-first approach. Today, we’re excited to share a new integration and partnership with Vultr .

This new integration enables Vultr customers to train and deploy models on both AMD and NVIDIA GPUs with greater flexibility and efficiency–using dstack.

December 10, 2024
in AMD, NVIDIA, Volumes, Cloud fleets, SSH fleets
3 min read

Beyond Kubernetes: 2024 recap and what's ahead for AI infra

At dstack, we aim to simplify AI model development, training, and deployment of AI models by offering an alternative to the complex Kubernetes ecosystem. Our goal is to enable seamless AI infrastructure management across any cloud or hardware vendor.

As 2024 comes to a close, we reflect on the milestones we've achieved and look ahead to the next steps.

December 5, 2024
in Benchmarks, AMD, NVIDIA
6 min read

Exploring inference memory saturation effect: H100 vs MI300x

GPU memory plays a critical role in LLM inference, affecting both performance and cost. This benchmark evaluates memory saturation’s impact on inference using NVIDIA's H100 and AMD's MI300x with Llama 3.1 405B FP8.

We examine the effect of limited parallel computational resources on throughput and Time to First Token (TTFT). Additionally, we compare deployment strategies: running two Llama 3.1 405B FP8 replicas on 4xMI300x versus a single replica on 4xMI300x and 8xMI300x

Finally, we extrapolate performance projections for upcoming GPUs like NVIDIA H200, B200, and AMD MI325x, MI350x.

This benchmark is made possible through the generous support of our friends at Hot Aisle and Lambda , who provided high-end hardware.

October 22, 2024
in AMD, NVIDIA, Metrics
2 min read

Monitoring essential GPU metrics via CLI

While it's possible to use third-party monitoring tools with dstack, it is often more convenient to debug your run and track metrics out of the box. That's why, with the latest release, dstack introduced dstack stats, a new CLI (and API) for monitoring container metrics, including GPU usage for NVIDIA, AMD, and other accelerators.