Skip to content

Orchestrating GPUs in data centers and private clouds

Recent breakthroughs in open-source AI have made AI infrastructure accessible beyond public clouds, driving demand for running AI workloads in on-premises data centers and private clouds. This shift offers organizations both high-performant clusters and flexibility and control.

However, Kubernetes, while a popular choice for traditional deployments, is often too complex and low-level to address the needs of AI teams.

Originally, dstack was focused on public clouds. With the new release, dstack extends support to data centers and private clouds, offering a simpler, AI-native solution that replaces Kubernetes and Slurm.

Private clouds offer the scalability and performance needed for large GPU clusters, while on-premises data centers provide stronger security and privacy controls.

In both cases, the focus isn’t just on seamless orchestration but also on maximizing infrastructure efficiency. This has long been a strength of Kubernetes, which enables concurrent workload execution across provisioned nodes to minimize resource waste.

GPU blocks

The newest version of dstack introduces a feature called GPU blocks, bringing this level of efficiency to dstack. It enables optimal hardware utilization by allowing concurrent workloads to run on the same hosts, using slices of the available resources on each host.

For example, imagine you’ve reserved a cluster with multiple bare-metal nodes, each equipped with 8x MI300X GPUs from Hot Aisle .

With dstack, you can define your fleet configuration like this:

type: fleet
name: my-hotaisle-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/hotaisle_id_rsa
  hosts:
    - hostname: ssh.hotaisle.cloud
      port: 22013
      blocks: auto
    - hostname: ssh.hotaisle.cloud
      port: 22014
      blocks: auto

placement: cluster

When you run dstack apply, each host appears as an available fleet instance, showing 0/8 next to busy. By setting blocks to auto, you automatically slice each host into 8 GPU blocks.

$ dstack apply -f my-hotaisle-fleet.dstack.yml

Provisioning...
---> 100%

 FLEET              INSTANCE  RESOURCES         STATUS     CREATED 
 my-hotaisle-fleet  0         8xMI300X (192GB)  0/8 busy   3 mins ago      
                    1         8xMI300X (192GB)  0/8 busy   3 mins ago    

For instance, you can run two workloads, each using 4 GPUs, and dstack will execute them concurrently on a single instance.

As the fleet owner, you can set the blocks parameter to any number. If you set it to 2, dstack will slice each host into 2 blocks, each with 4 GPUs. This flexibility allows you to define the minimum block size, ensuring the most efficient utilization of your resources.

Fractional GPU

While we plan to eventually support fractions of a single GPU too, this is not the primary use case, as most modern AI teams require full GPUs for their workloads.

Regardless whether you're using dstack with a data center or a private cloud, once a fleet is created, you’re free to run dev environments, tasks, and services while maximizing the cost-efficiency of GPU utilization by concurrent runs.

Proxy jump

Private clouds typically provide access to GPU clusters via SSH through a login node. In these setups, only the login node is internet-accessible, while cluster nodes can only be reached via SSH from the login node. This prevents creating an SSH fleet by directly listing the cluster nodes' hostnames.

The latest dstack release introduces the proxy_jump property in SSH fleet configurations, enabling creating fleets through a login node.

For example, imagine you’ve reserved a 1-Click Cluster from Lambda with multiple nodes, each equipped with 8x H100 GPUs from.

With dstack, you can define your fleet configuration like this:

type: fleet
name: my-lambda-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/lambda_node_id_rsa
  hosts:
    - us-east-2-1cc-node-1
    - us-east-2-1cc-node-2
    - us-east-2-1cc-node-3
    - us-east-2-1cc-node-4
  proxy_jump: 
    hostname: 12.34.567.890
    user: ubuntu
    identity_file: ~/.ssh/lambda_head_id_rsa

placement: cluster

When you run dstack apply, dstack creates an SSH fleet and connects to the configured hosts through the login node specified via proxy_jump. Fleet instances appear as normal instances, enabling you to run dev environments, tasks, and services just as you would without proxy_jump.

$ dstack apply -f my-lambda-fleet.dstack.yml

Provisioning...
---> 100%

 FLEET            INSTANCE  RESOURCES      STATUS  CREATED 
 my-lambda-fleet  0         8xH100 (80GB)  idle    3 mins ago      
                  1         8xH100 (80GB)  idle    3 mins ago    
                  2         8xH100 (80GB)  idle    3 mins ago    
                  3         8xH100 (80GB)  idle    3 mins ago    

The dstack CLI automatically handles SSH tunneling and port forwarding when running workloads.

What's next

To sum it up, the latest release enables dstack to be used efficiently not only with public clouds but also with private clouds and data centers. It natively supports NVIDIA, AMD, Intel Gaudi, and soon other upcoming chips.

What’s also important is that dstack comes with a control plane that not only simplifies orchestration but also provides a console for monitoring and managing workloads across projects (also known as tenants).

As a container orchestrator, dstack remains a streamlined alternative to Kubernetes and Slurm for AI teams, focusing on an AI-native experience, simplicity, and vendor-agnostic orchestration for both cloud and on-prem.

Roadmap

We plan to further enhance dstack's support for both cloud and on-premises setups. For more details on our roadmap, refer to our GitHub .

Have questions? You're welcome to join our Discord or talk directly to our team .