Supporting MPI and NCCL/RCCL tests
As AI models grow in complexity, efficient orchestration tools become increasingly important. 
Fleets introduced by dstack last year streamline 
task execution on both cloud and 
on-prem clusters, whether it's pre-training, fine-tuning, or batch processing.
The strength of dstack lies in its flexibility. Users can leverage distributed framework like
torchrun, accelerate, or others. dstack handles node provisioning, job execution, and automatically propagates
system environment variables—such as DSTACK_NODE_RANK, DSTACK_MASTER_NODE_IP,
DSTACK_GPUS_PER_NODE and others—to containers.

One use case dstack hasn’t supported until now is MPI, as it requires a scheduled environment or
direct SSH connections between containers. Since mpirun is essential for running NCCL/RCCL tests—crucial for large-scale
cluster usage—we’ve added support for it.






