SGLang¶
This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using SGLang and dstack.
Apply a configuration¶
Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
type: service
name: deepseek-r1-nvidia
image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
resources:
gpu: 24GB
type: service
name: deepseek-r1-amd
image: lmsysorg/sglang:v0.4.1.post4-rocm620
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
resources:
gpu: MI300x
disk: 300GB
To run a configuration, use the dstack apply command.
$ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
# BACKEND REGION RESOURCES SPOT PRICE
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
Submit the run deepseek-r1-amd? [y/n]: y
Provisioning...
---> 100%
Once the service is up, the model will be available via the OpenAI-compatible endpoint
at <dstack server URL>/proxy/models/<project name>/.
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
-X POST \
-H 'Authorization: Bearer <dstack token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is Deep Learning?"
}
],
"stream": true,
"max_tokens": 512
}'
SGLang Model Gateway
If you'd like to use a custom routing policy, e.g. by leveraging the SGLang Model Gateway, create a gateway with router set to sglang. Check out gateways for more details.
If a gateway is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at
https://gateway.<gateway domain>/.
Source code¶
The source-code of this example can be found in
examples/llms/deepseek/sglang.
What's next?¶
- Read about services and gateways
- Browse the SgLang DeepSeek Usage, Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X