Troubleshooting¶
Reporting issues¶
When you encounter a problem, please report it as a GitHub issue .
If you have a question or need help, feel free to ask it in our Discord server.
When bringing up issues, always include the steps to reproduce.
Steps to reproduce¶
Make sure to provide clear, detailed steps to reproduce the issue. Include server logs, CLI outputs, and configuration samples. Avoid using screenshots for logs or errors—use text instead.
To get more detailed logs, make sure to set the DSTACK_CLI_LOG_LEVEL
and DSTACK_SERVER_LOG_LEVEL
environment variables to debug
when running the CLI and the server, respectively.
See these examples for well-reported issues: this and this .
Typical issues¶
Provisioning fails¶
In certain cases, running dstack apply
may produce the following output:
wet-mangust-1 provisioning completed (failed)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.
Cause 1: Backend misconfiguration¶
If runs consistently fail to provision due to insufficient capacity, it’s likely there is a backend configuration issue. Ensure that your backends are configured correctly and check the server logs for any errors.
Cause 2: Insufficient service quotas¶
If some runs fail to provision, it may be due to an insufficient service quota. For cloud providers like AWS, GCP, Azure, and OCI, you often need to request an increased service quota before you can use specific instances.
Cause 3: Resources mismatch¶
Another possible cause of the insufficient capacity error is that dstack
cannot find an instance that meets the
requirements specified in resources
.
GPU
The gpu
property allows you to specify the GPU name, memory, and quantity. Examples include A100
(one GPU), A100:40GB
(
one GPU with exact memory), A100:4
(four GPUs), etc. If you specify a GPU name without a quantity, it defaults to 1
.
If you request one GPU but only instances with eight GPUs are available, dstack
won’t be able to provide it. Use range
syntax to specify a range, such as A100:1..8
(one to eight GPUs) or A100:1..
(one or more GPUs).
Disk
If you don't specify the disk
property, dstack
defaults it to 100GB
.
In case there is no such instance available, dstack
won’t be able to provide it.
Use range syntax to specify a range, such as 50GB..100GB
(from fifty GBs to one hundred GBs) or 50GB..
(fifty GBs or more).
Run starts but fails¶
There could be several reasons for a run failing after successful provisioning.
Termination reason
To find out why a run terminated, use --verbose
(or -v
) with dstack ps
.
This will show the run's status and any failure reasons.
Diagnostic logs
You can get more information on why a run fails with diagnostic logs.
Pass --diagnose
(or -d
) to dstack logs
and you'll see logs of the run executor.
Cause 1: Spot interruption¶
If a run fails after provisioning with the termination reason INTERRUPTED_BY_NO_CAPACITY
, it is likely that the run
was using spot instances and was interrupted. To address this, you can either set the
spot_policy
to on-demand
or specify the
retry
property.
Services fail to start¶
Cause 1: Gateway misconfiguration¶
If all services fail to start with a specific gateway, make sure a correct DNS record pointing to the gateway's hostname is configured.
Service endpoint doesn't work¶
Cause 1: Bad Authorization¶
If the service endpoint returns a 403 error, it is likely because the Authorization
header with the correct dstack
token was not provided.
Cannot access dev environment or task ports¶
Cause 1: Detached from run¶
When running a dev environment or task with configured ports, dstack apply
automatically forwards remote ports to localhost
via SSH for easy and secure access.
If you interrupt the command, the port forwarding will be disconnected. To reattach, use dstack attach <run name
.
Cause 2: Windows¶
If you're using the CLI on Windows, make sure to run it through WSL by following these instructions. Native support will be available soon.
SSH fleet fails to provision¶
If you set up an SSH fleet and it fails to provision after a long wait, first check the server logs.
Also, review the /root/.dstack/shim.log
file on each host used to create the fleet.
Community¶
If you have a question, please feel free to ask it in our Discord server.