Services¶
With dstack
, you can use the CLI or API to deploy models or web apps.
Provide the commands, port, and choose the Python version or a Docker image.
dstack
handles the deployment on configured cloud GPU provider(s) with the necessary resources.
Prerequisites
If you're using the open-source server, you first have to set up a gateway.
Set up a gateway¶
For example, if your domain is example.com
, go ahead and run the
dstack gateway create
command:
$ dstack gateway create --domain example.com --region eu-west-1 --backend aws
Creating gateway...
---> 100%
BACKEND REGION NAME ADDRESS DOMAIN DEFAULT
aws eu-west-1 sour-fireant 52.148.254.14 example.com ✓
Afterward, in your domain's DNS settings, add an A
DNS record for *.example.com
pointing to the IP address of the gateway.
This way, if you run a service, dstack
will make its endpoint available at
https://<run-name>.example.com
.
If you're using the cloud version of dstack
, the gateway is set up for you.
Using the CLI¶
Define a configuration¶
First, create a YAML file in your project folder. Its name must end with .dstack.yml
(e.g. .dstack.yml
or train.dstack.yml
are both acceptable).
type: service
image: ghcr.io/huggingface/text-generation-inference:latest
env:
- MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
port: 80
commands:
- text-generation-launcher --hostname 0.0.0.0 --port 80 --trust-remote-code
By default, dstack
uses its own Docker images to run dev environments,
which are pre-configured with Python, Conda, and essential CUDA drivers.
Configuration options
Configuration file allows you to specify a custom Docker image, environment variables, and many other options. For more details, refer to the Reference.
Run the configuration¶
To run a configuration, use the dstack run
command followed by the working directory path,
configuration file path, and any other options (e.g., for requesting hardware resources).
$ dstack run . -f serve.dstack.yml --gpu A100
BACKEND REGION RESOURCES SPOT PRICE
tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595
azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673
azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673
Continue? [y/n]: y
Provisioning...
---> 100%
Serving HTTP on https://yellow-cat-1.example.com ...
Once the service is deployed, its endpoint will be available at
https://<run-name>.<domain-name>
(using the domain set up for the gateway).
Run options
The dstack run
command allows you to use --gpu
to request GPUs (e.g. --gpu A100
or --gpu 80GB
or --gpu A100:4
, etc.),
and many other options (incl. spot instances, max price, max duration, retry policy, etc.).
For more details, refer to the Reference.