Introducing services to simplify deployment¶
The 0.10.7 update introduces services, a new configuration type for easier deployment.
Until now, dstack
has supported dev-environment
and task
as configuration types. Even though task
may be used for basic serving use cases, it lacks crucial serving features. With the new update, we introduce
service
, a dedicated configuration type for serving.
Consider the following example:
type: task
image: ghcr.io/huggingface/text-generation-inference:0.9.3
ports:
- 8000
commands:
- text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code
When running it, the dstack
CLI forwards traffic to 127.0.0.1:8000
.
This is convenient for development but unsuitable for production.
In production, you need your endpoint available on the external network, preferably behind authentication and a load balancer.
This is why we introduce the service
configuration type.
type: service
image: ghcr.io/huggingface/text-generation-inference:0.9.3
port: 8000
commands:
- text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code
As you see, there are two differences compared to task
.
- The
gateway
property: the address of a special cloud instance that wraps the running service with a public endpoint. Currently, you must specify it manually. In the future,dstack
will assign it automatically. - The
port
property: A service must always configure one port on which it's running.
When running, dstack
forwards the traffic to the gateway, providing you with a public endpoint that you can use to
access the running service.
Existing limitations
- Currently, you must create a gateway manually using the
dstack gateway
command and specify its address via YAML (e.g. using secrets). In the future,dstack
will assign it automatically. - Gateways do not support HTTPS yet. When you run a service, its endpoint URL is
<the address of the gateway>:80
. The port can be overridden via the port property: instead of8000
, specify<gateway port>:8000
. - Gateways do not provide authorization and auto-scaling. In the future,
dstack
will support them as well.
This initial support for services is the first step towards providing multi-cloud and cost-effective inference.
Give it a try and share feedback
Even though the current support is limited in many ways, we encourage you to give it a try and share your feedback with us!
More details on how to use services can be found in a dedicated guide in our docs. Questions and requests for help are very much welcome in our Discord server.