profiles.yml
Instead of configuring resources and other run options throughdstack run
,
you can do so via .dstack/profiles.yml
in the root folder of the project.
Example
profiles:
- name: large
resources:
memory: 24GB # (Optional) The minimum amount of RAM memory
gpu:
name: A100 # (Optional) The name of the GPU
memory: 40GB # (Optional) The minimum amount of GPU memory
shm_size: 8GB # (Optional) The size of shared memory
spot_policy: auto # (Optional) The spot policy. Supports `spot`, `on-demand, and `auto`.
max_price: 1.5 # (Optional) The maximum price per instance per hour
max_duration: 1d # (Optional) The maximum duration of the run.
retry:
retry-limit: 3h # (Optional) To wait for capacity
backends: [azure, lambda] # (Optional) Use only listed backends
default: true # (Optional) Activate the profile by default
You can mark any profile as default or pass its name via --profile
to dstack run
.
YAML reference
Profile
Property |
Description |
Type |
Default value |
name |
The name of the profile that can be passed as --profile to dstack run |
str |
required |
backends |
The backends to consider for provisionig (e.g., "[aws, gcp]") |
Optional[List[BackendType]] |
None |
resources |
The minimum resources of the instance to be provisioned |
ProfileResources |
cpu=2 memory=8192 gpu=None shm_size=None |
spot_policy |
The policy for provisioning spot or on-demand instances: spot, on-demand, or auto |
Optional[SpotPolicy] |
None |
retry_policy |
The policy for re-submitting the run |
Optional[ProfileRetryPolicy] |
None |
max_duration |
The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. |
Union[Literal['off'],str,int,NoneType] |
None |
max_price |
The maximum price per hour, in dollars |
Optional[float] |
None |
default |
If set to true, dstack run will use this profile by default. |
bool |
False |
ProfileResources
Property |
Description |
Type |
Default value |
cpu |
The minimum number of CPUs |
Optional[int] |
2 |
memory |
The minimum size of RAM memory (e.g., "16GB") |
Union[int,str,NoneType] |
8GB |
gpu |
The minimum number of GPUs or a GPU spec |
Union[int,ProfileGPU,NoneType] |
None |
shm_size |
The size of shared memory (e.g., "8GB"). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this. |
Union[int,str,NoneType] |
None |
ProfileGPU
Property |
Description |
Type |
Default value |
name |
The name of the GPU (e.g., "A100" or "H100") |
Optional[str] |
None |
count |
The minimum number of GPUs |
int |
1 |
memory |
The minimum size of a single GPU memory (e.g., "16GB") |
Union[int,str,NoneType] |
None |
total_memory |
The minimum total size of all GPUs memory (e.g., "32GB") |
Union[int,str,NoneType] |
None |
compute_capability |
The minimum compute capability of the GPU (e.g., 7.5) |
Union[float,str,Tuple,NoneType] |
None |
ProfileRetryPolicy
Property |
Description |
Type |
Default value |
retry |
Whether to retry the run on failure or not |
bool |
False |
limit |
The maximum period of retrying the run, e.g., 4h or 1d |
Union[int,str,NoneType] |
None |