dstack, you can use the CLI or API to deploy models or web apps.
Provide the commands, port, and choose the Python version or a Docker image.
dstack handles the deployment on configured cloud GPU provider(s) with the necessary resources.
If you're using the open-source server, you first have to set up a gateway.
Set up a gateway¶
For example, if your domain is
example.com, go ahead and run the
dstack gateway create command:
$ dstack gateway create --domain example.com --region eu-west-1 --backend aws Creating gateway... ---> 100% BACKEND REGION NAME ADDRESS DOMAIN DEFAULT aws eu-west-1 sour-fireant 22.214.171.124 example.com ✓
Afterward, in your domain's DNS settings, add an
A DNS record for
pointing to the IP address of the gateway.
This way, if you run a service,
dstack will make its endpoint available at
If you're using the cloud version of
dstack, the gateway is set up for you.
Using the CLI¶
Define a configuration¶
First, create a YAML file in your project folder. Its name must end with
are both acceptable).
type: service image: ghcr.io/huggingface/text-generation-inference:latest env: - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ port: 80 commands: - text-generation-launcher --hostname 0.0.0.0 --port 80 --trust-remote-code
dstack uses its own Docker images to run dev environments,
which are pre-configured with Python, Conda, and essential CUDA drivers.
Configuration file allows you to specify a custom Docker image, environment variables, and many other options. For more details, refer to the Reference.
Run the configuration¶
To run a configuration, use the
dstack run command followed by the working directory path,
configuration file path, and any other options (e.g., for requesting hardware resources).
$ dstack run . -f serve.dstack.yml --gpu A100 BACKEND REGION RESOURCES SPOT PRICE tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595 azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673 azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673 Continue? [y/n]: y Provisioning... ---> 100% Serving HTTP on https://yellow-cat-1.example.com ...
Once the service is deployed, its endpoint will be available at
https://<run-name>.<domain-name> (using the domain set up for the gateway).
dstack run command allows you to use
--gpu to request GPUs (e.g.
--gpu A100 or
--gpu 80GB or
--gpu A100:4, etc.),
and many other options (incl. spot instances, max price, max duration, retry policy, etc.).
For more details, refer to the Reference.