Orchestrating workloads on NVIDIA DGX Spark¶
With support from Graphsignal , our team gained access to the new NVIDIA DGX Spark and used it to validate how dstack operates on this hardware. This post walks through how to set it up with dstack and use it alongside existing on-prem clusters or GPU cloud environments to run workloads.

If DGX Spark is new to you, here is a quick breakdown of the key specs.
- Built on the NVIDIA GB10 Grace Blackwell Superchip with Arm CPUs.
- Capable of up to 1 petaflop of AI compute at FP4 precision, roughly comparable to RTX 5070 performance.
- Features 128GB of unified CPU and GPU memory enabled by the Grace Blackwell architecture.
- Ships with NVIDIA DGX OS (a tuned Ubuntu build) and NVIDIA Container Toolkit.
These characteristics make DGX Spark a fitting extension for local development and smaller-scale model training or inference, including workloads up to the GPT-OSS 120B range.
Creating an SSH fleet¶
Because DGX Spark supports SSH and containers, integrating it with dstack is straightforward. Start by configuring an SSH fleet. The file needs the hosts and access credentials.
type: fleet
name: spark
ssh_config:
user: devops
identity_file: ~/.ssh/id_rsa
hosts:
- spark-e3a4
The user must have sudo privileges.
Apply the configuration:
$ dstack apply -f fleet.dstack.yml
Provisioning...
---> 100%
FLEET INSTANCE GPU PRICE STATUS CREATED
spark 0 GB10:1 $0 idle 3 mins ago
Once active, the system detects hardware and marks the instance as idle. From here, you can run
dev environments, tasks,
and services on the DGX Spark fleet, the same way you would with other on-prem or cloud GPU backends.
Running a dev environment¶
Example configuration:
type: dev-environment
name: cursor
image: lmsysorg/sglang:spark
ide: cursor
resources:
gpu: GB10
volumes:
- /root/.cache/huggingface:/root/.cache/huggingface
fleets: [spark]
We use an instance volume to keep model downloads cached across runs. The lmsysorg/sglang:spark image is tuned for inference on DGX Spark. Any Arm-compatible image with proper driver support will work if customization is needed.
Run the environment:
$ dstack apply -f .dstack.yml
BACKEND GPU INSTANCE TYPE PRICE
ssh (remtoe) GB10:1 instance $0 idle
Submit the run cursor? [y/n]: y
# NAME BACKEND GPU PRICE STATUS SUMBITTED
1 cursor ssh (remote) GB10:1 $0 running 12:24
Launching `cursor`...
---> 100%
To open in VS Code Desktop, use this link:
vscode://vscode-remote/ssh-remote+cursor/workflow
What's next?¶
Running workloads on DGX Spark with
dstackworks the same way as on any other backend (including GPU clouds): you can run dev environments for interactive development, tasks for fine tuning, and services for inference through the unified interface.
- Read the NVIDIA DGX Spark in-depth review by the SGLang team.
- Check dev environments, tasks, services, and fleets
- Follow Quickstart
- Join Discord
Aknowledgement
Thanks to the Graphsignal team for access to DGX Spark and for supporting testing and validation. Graphsignal provides inference observability tooling used to profile CUDA workloads during both training and inference.