Lambda¶
Lambda offers two ways to use clusters with a fast interconnect:
- Kubernetes – Lets you interact with clusters through the Kubernetes API and includes support for NVIDIA GPU operators and related tools.
- 1-Click Clusters (1CC) – Gives you direct access to clusters in the form of bare-metal nodes.
Both options use the same underlying networking infrastructure. This example walks you through how to set up Lambda clusters to use with dstack.
Kubernetes¶
Prerequsisites
- Follow the instructions in Lambda's guide on accessing MK8s.
- Go to
Firewall→Edit rules, clickAdd rule, and allow ingress traffic on port30022. This port will be used by thedstackserver to access the jump host.
Configure the backend¶
Follow the standard instructions for setting up a Kubernetes backend:
projects:
- name: main
backends:
- type: kubernetes
kubeconfig:
filename: <kubeconfig path>
proxy_jump:
port: 30022
Create a fleet¶
Once the Kubernetes cluster and the dstack server are running, you can create a fleet:
type: fleet
name: lambda-fleet
placement: cluster
nodes: 0..
backends: [kubernetes]
resources:
# Specify requirements to filter nodes
gpu: 1..8
Pass the fleet configuration to dstack apply:
$ dstack apply -f lambda-fleet.dstack.yml
Once the fleet is created, you can run dev environments, tasks, and services.
1-Click Clusters¶
Another way to work with Lambda clusters is through 1CC. While dstack supports automated cluster provisioning via VM-based backends, there is currently no programmatic way to provision Lambda 1CCs. As a result, to use a 1CC cluster with dstack, you must use SSH fleets.
Prerequsisites
- Follow the instructions in Lambda's guide on working with 1-Click Clusters
Create a fleet¶
Follow the standard instructions for setting up an SSH fleet:
type: fleet
name: lambda-fleet
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
hosts:
- worker-gpu-8x-b200-rplfm-ll9nr
- worker-gpu-8x-b200-rplfm-qrcs9
proxy_jump:
hostname: 192.222.55.54
user: ubuntu
identity_file: ~/.ssh/id_rsa
placement: cluster
Under
proxy_jump, we specify the hostname of the head node along with the private SSH key.
Pass the fleet configuration to dstack apply:
$ dstack apply -f lambda-fleet.dstack.yml
Once the fleet is created, you can run dev environments, tasks, and services.
Run tasks¶
To run tasks on a cluster, you must use distributed tasks.
Run NCCL tests¶
To validate cluster network bandwidth, use the following task:
type: task
name: nccl-tests
nodes: 2
startup_order: workers-first
stop_criteria: master-done
commands:
- |
if [ $DSTACK_NODE_RANK -eq 0 ]; then
mpirun \
--allow-run-as-root \
--hostfile $DSTACK_MPI_HOSTFILE \
-n $DSTACK_GPUS_NUM \
-N $DSTACK_GPUS_PER_NODE \
--bind-to none \
-x NCCL_IB_HCA=^mlx5_0 \
/opt/nccl-tests/build/all_reduce_perf -b 8 -e 2G -f 2 -t 1 -g 1 -c 1 -n 100
else
sleep infinity
fi
# Uncomment if the `kubernetes` backend requires it for `/dev/infiniband` access
#privileged: true
resources:
gpu: nvidia:B200:8
shm_size: 16GB
Pass the configuration to dstack apply:
$ dstack apply -f lambda-nccl-tests.dstack.yml
Provisioning...
---> 100%
# nccl-tests version 2.17.6 nccl-headers=22602 nccl-library=22602
# Collective test starting: all_reduce_perf
#
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
8 2 float sum -1 36.50 0.00 0.00 0 36.16 0.00 0.00 0
16 4 float sum -1 35.55 0.00 0.00 0 35.49 0.00 0.00 0
32 8 float sum -1 35.49 0.00 0.00 0 36.28 0.00 0.00 0
64 16 float sum -1 35.85 0.00 0.00 0 35.54 0.00 0.00 0
128 32 float sum -1 37.36 0.00 0.01 0 36.82 0.00 0.01 0
256 64 float sum -1 37.38 0.01 0.01 0 37.80 0.01 0.01 0
512 128 float sum -1 51.05 0.01 0.02 0 37.17 0.01 0.03 0
1024 256 float sum -1 45.33 0.02 0.04 0 37.98 0.03 0.05 0
2048 512 float sum -1 38.67 0.05 0.10 0 38.30 0.05 0.10 0
4096 1024 float sum -1 40.08 0.10 0.19 0 39.18 0.10 0.20 0
8192 2048 float sum -1 42.13 0.19 0.36 0 41.47 0.20 0.37 0
16384 4096 float sum -1 43.66 0.38 0.70 0 41.94 0.39 0.73 0
32768 8192 float sum -1 45.42 0.72 1.35 0 43.29 0.76 1.42 0
65536 16384 float sum -1 44.59 1.47 2.76 0 43.90 1.49 2.80 0
131072 32768 float sum -1 47.44 2.76 5.18 0 46.79 2.80 5.25 0
262144 65536 float sum -1 66.68 3.93 7.37 0 65.36 4.01 7.52 0
524288 131072 float sum -1 240.71 2.18 4.08 0 125.73 4.17 7.82 0
1048576 262144 float sum -1 115.58 9.07 17.01 0 115.48 9.08 17.03 0
2097152 524288 float sum -1 114.44 18.33 34.36 0 114.27 18.35 34.41 0
4194304 1048576 float sum -1 118.25 35.47 66.50 0 117.11 35.82 67.15 0
8388608 2097152 float sum -1 141.39 59.33 111.24 0 134.95 62.16 116.55 0
16777216 4194304 float sum -1 186.86 89.78 168.34 0 184.39 90.99 170.60 0
33554432 8388608 float sum -1 255.79 131.18 245.96 0 253.88 132.16 247.81 0
67108864 16777216 float sum -1 350.41 191.52 359.09 0 350.71 191.35 358.79 0
134217728 33554432 float sum -1 596.75 224.92 421.72 0 595.37 225.44 422.69 0
268435456 67108864 float sum -1 934.67 287.20 538.50 0 931.37 288.22 540.41 0
536870912 134217728 float sum -1 1625.63 330.25 619.23 0 1687.31 318.18 596.59 0
1073741824 268435456 float sum -1 2972.25 361.26 677.35 0 2971.33 361.37 677.56 0
2147483648 536870912 float sum -1 5784.75 371.23 696.06 0 5728.40 374.88 702.91 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 137.179
What's next¶
- Learn about dev environments, tasks, services
- Read the Kuberentes, and Clusters guides
- Check Lambda's docs on Kubernetes and 1CC