Skip to content

Gateways

Gateways manage ingress traffic for running services, handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the SGLang Model Gateway.

Apply a configuration

First, define a gateway configuration as a YAML file in your project folder. The filename must end with .dstack.yml (e.g. .dstack.yml or gateway.dstack.yml are both acceptable).

type: gateway
# A name of the gateway
name: example-gateway

# Gateways are bound to a specific backend and region
backend: aws
region: eu-west-1

# This domain will be used to access the endpoint
domain: example.com

A domain name is required to create a gateway.

To create or update the gateway, simply call the dstack apply command:

$ dstack apply -f gateway.dstack.yml
The example-gateway doesn't exist. Create it? [y/n]: y

Provisioning...
---> 100%

 BACKEND  REGION     NAME             HOSTNAME  DOMAIN       DEFAULT  STATUS
 aws      eu-west-1  example-gateway            example.com          submitted

Configuration options

Backend

You can create gateways with the aws, azure, gcp, or kubernetes backends, but that does not limit where services run. A gateway can use one backend while services run on any other backend supported by dstack, including backends where gateways themselves cannot be created.

Kubernetes

Gateways in kubernetes backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer. For self-hosted Kubernetes, you must provide a load balancer by yourself.

Router

By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the router property. Currently, the only supported external router is sglang.

SGLang

The sglang router delegates routing logic to the SGLang Model Gateway.

To enable it, set type field under router to sglang:

type: gateway
name: sglang-gateway

backend: aws
region: eu-west-1

domain: example.com

router:
  type: sglang
  policy: cache_aware

Policy

The policy property allows you to configure the routing policy:

  • cache_aware — Default policy; combines cache locality with load balancing, falling back to shortest queue.
  • power_of_two — Samples two workers and picks the lighter one.
  • random — Uniform random selection.
  • round_robin — Cycles through workers in order.

Currently, services using this type of gateway must run standard SGLang workers. See the example.

Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.

Public IP

If you don't need/want a public IP for the gateway, you can set the public_ip to false (the default value is true), making the gateway private. Private gateways are currently supported in aws and gcp backends.

Reference

For all gateway configuration options, refer to the reference.

Update DNS records

Once the gateway is assigned a hostname, go to your domain's DNS settings and add a DNS record for *.<gateway domain>, e.g. *.example.com. The record should point to the gateway's hostname shown in dstack and should be of type A if the hostname is an IP address (most cases), or of type CNAME if the hostname is another domain (some private gateways and Kubernetes).

Manage gateways

List gateways

The dstack gateway list command lists existing gateways and their status.

Delete a gateway

To delete a gateway, pass the gateway configuration to dstack delete:

$ dstack delete -f examples/inference/gateway.dstack.yml

Alternatively, you can delete a gateway by passing the gateway name to dstack gateway delete.

What's next?

  1. See services on how to run services