Gateways¶
Gateways manage ingress traffic for running services, handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the SGLang Model Gateway.
Apply a configuration¶
First, define a gateway configuration as a YAML file in your project folder.
The filename must end with .dstack.yml (e.g. .dstack.yml or gateway.dstack.yml are both acceptable).
type: gateway
# A name of the gateway
name: example-gateway
# Gateways are bound to a specific backend and region
backend: aws
region: eu-west-1
# This domain will be used to access the endpoint
domain: example.com
A domain name is required to create a gateway.
To create or update the gateway, simply call the dstack apply command:
$ dstack apply -f gateway.dstack.yml
The example-gateway doesn't exist. Create it? [y/n]: y
Provisioning...
---> 100%
BACKEND REGION NAME HOSTNAME DOMAIN DEFAULT STATUS
aws eu-west-1 example-gateway example.com ✓ submitted
Configuration options¶
Backend¶
You can create gateways with the aws, azure, gcp, or kubernetes backends, but that does not limit where services run. A gateway can use one backend while services run on any other backend supported by dstack, including backends where gateways themselves cannot be created.
Kubernetes
Gateways in kubernetes backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer.
For self-hosted Kubernetes, you must provide a load balancer by yourself.
Router¶
By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the router property. Currently, the only supported external router is sglang.
SGLang¶
The sglang router delegates routing logic to the SGLang Model Gateway.
To enable it, set type field under router to sglang:
type: gateway
name: sglang-gateway
backend: aws
region: eu-west-1
domain: example.com
router:
type: sglang
policy: cache_aware
Policy
The policy property allows you to configure the routing policy:
cache_aware— Default policy; combines cache locality with load balancing, falling back to shortest queue.power_of_two— Samples two workers and picks the lighter one.random— Uniform random selection.round_robin— Cycles through workers in order.
Currently, services using this type of gateway must run standard SGLang workers. See the example.
Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.
Public IP¶
If you don't need/want a public IP for the gateway, you can set the public_ip to false (the default value is true), making the gateway private.
Private gateways are currently supported in aws and gcp backends.
Reference
For all gateway configuration options, refer to the reference.
Update DNS records¶
Once the gateway is assigned a hostname, go to your domain's DNS settings
and add a DNS record for *.<gateway domain>, e.g. *.example.com.
The record should point to the gateway's hostname shown in dstack
and should be of type A if the hostname is an IP address (most cases),
or of type CNAME if the hostname is another domain (some private gateways and Kubernetes).
Manage gateways¶
List gateways¶
The dstack gateway list command lists existing gateways and their status.
Delete a gateway¶
To delete a gateway, pass the gateway configuration to dstack delete:
$ dstack delete -f examples/inference/gateway.dstack.yml
Alternatively, you can delete a gateway by passing the gateway name to dstack gateway delete.
What's next?
- See services on how to run services