Orchestrating GPUs in data centers and private clouds¶
Recent breakthroughs in open-source AI have made AI infrastructure accessible beyond public clouds, driving demand for running AI workloads in on-premises data centers and private clouds. This shift offers organizations both high-performant clusters and flexibility and control.
However, Kubernetes, while a popular choice for traditional deployments, is often too complex and low-level to address the needs of AI teams.
Originally, dstack
was focused on public clouds. With the new release, dstack
extends support to data centers and private clouds, offering a simpler, AI-native solution that replaces Kubernetes and
Slurm.
Private clouds offer the scalability and performance needed for large GPU clusters, while on-premises data centers provide stronger security and privacy controls.
In both cases, the focus isn’t just on seamless orchestration but also on maximizing infrastructure efficiency. This has long been a strength of Kubernetes, which enables concurrent workload execution across provisioned nodes to minimize resource waste.
GPU blocks¶
The newest version of dstack
introduces a feature called GPU blocks, bringing this level of efficiency to dstack
. It
enables optimal hardware utilization by allowing concurrent workloads to run on the same hosts, using slices of the
available resources on each host.
For example, imagine you’ve reserved a cluster with multiple bare-metal nodes, each equipped with 8x MI300X GPUs from Hot Aisle .
With dstack
, you can define your fleet configuration like this:
type: fleet
name: my-hotaisle-fleet
ssh_config:
user: ubuntu
identity_file: ~/.ssh/hotaisle_id_rsa
hosts:
- hostname: ssh.hotaisle.cloud
port: 22013
blocks: auto
- hostname: ssh.hotaisle.cloud
port: 22014
blocks: auto
placement: cluster
When you run dstack apply
, each host appears as an available fleet instance, showing 0/8
next to busy
. By setting blocks
to auto
, you automatically slice each host into 8 GPU blocks.
$ dstack apply -f my-hotaisle-fleet.dstack.yml
Provisioning...
---> 100%
FLEET INSTANCE RESOURCES STATUS CREATED
my-hotaisle-fleet 0 8xMI300X (192GB) 0/8 busy 3 mins ago
1 8xMI300X (192GB) 0/8 busy 3 mins ago
For instance, you can run two workloads, each using 4 GPUs, and dstack
will execute them concurrently on a single instance.
As the fleet owner, you can set the blocks
parameter to any number. If you set it to 2
, dstack
will slice each
host into 2 blocks, each with 4 GPUs. This flexibility allows you to define the minimum block size, ensuring the most
efficient utilization of your resources.
Fractional GPU
While we plan to eventually support fractions of a single GPU too, this is not the primary use case, as most modern AI teams require full GPUs for their workloads.
Regardless whether you're using dstack with a data center or a private cloud, once a fleet is created, you’re free to run dev environments, tasks, and services while maximizing the cost-efficiency of GPU utilization by concurrent runs.
Proxy jump¶
Private clouds typically provide access to GPU clusters via SSH through a login node. In these setups, only the login node is internet-accessible, while cluster nodes can only be reached via SSH from the login node. This prevents creating an SSH fleet by directly listing the cluster nodes' hostnames.
The latest dstack
release introduces the proxy_jump
property in SSH fleet configurations, enabling creating fleets
through a login node.
For example, imagine you’ve reserved a 1-Click Cluster from Lambda with multiple nodes, each equipped with 8x H100 GPUs from.
With dstack
, you can define your fleet configuration like this:
type: fleet
name: my-lambda-fleet
ssh_config:
user: ubuntu
identity_file: ~/.ssh/lambda_node_id_rsa
hosts:
- us-east-2-1cc-node-1
- us-east-2-1cc-node-2
- us-east-2-1cc-node-3
- us-east-2-1cc-node-4
proxy_jump:
hostname: 12.34.567.890
user: ubuntu
identity_file: ~/.ssh/lambda_head_id_rsa
placement: cluster
When you run dstack apply
, dstack
creates an SSH fleet and connects to the configured hosts through the login node
specified via proxy_jump
. Fleet instances appear as normal instances, enabling you to run
dev environments,
tasks, and services
just as you would without proxy_jump
.
$ dstack apply -f my-lambda-fleet.dstack.yml
Provisioning...
---> 100%
FLEET INSTANCE RESOURCES STATUS CREATED
my-lambda-fleet 0 8xH100 (80GB) idle 3 mins ago
1 8xH100 (80GB) idle 3 mins ago
2 8xH100 (80GB) idle 3 mins ago
3 8xH100 (80GB) idle 3 mins ago
The dstack
CLI automatically handles SSH tunneling and port forwarding when running workloads.
What's next¶
To sum it up, the latest release enables dstack
to be used efficiently not only with public clouds but also with private
clouds and data centers. It natively supports NVIDIA, AMD, Intel Gaudi, and soon other upcoming chips.
What’s also important is that dstack
comes with a control plane that not only simplifies orchestration but also provides
a console for monitoring and managing workloads across projects (also known as tenants).
As a container orchestrator, dstack
remains a streamlined alternative to Kubernetes and Slurm for AI teams, focusing on
an AI-native experience, simplicity, and vendor-agnostic orchestration for both cloud and on-prem.
Roadmap
We plan to further enhance dstack
's support for both cloud and on-premises setups. For more details on our roadmap,
refer to our GitHub .
Have questions? You're welcome to join our Discord or talk directly to our team .