SGLang¶
This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using SGLang and dstack
.
Prerequisites
Once dstack
is installed, go ahead clone the repo, and run dstack init
.
$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init
Deployment¶
Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
type: service
name: deepseek-r1-amd
image: lmsysorg/sglang:v0.4.1.post4-rocm620
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
resources:
gpu: MI300x
disk: 300GB
type: service
name: deepseek-r1-nvidia
image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
resources:
gpu: 24GB
Applying the configuration¶
To run a configuration, use the dstack apply
command.
$ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
# BACKEND REGION RESOURCES SPOT PRICE
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
Submit the run deepseek-r1-amd? [y/n]: y
Provisioning...
---> 100%
Once the service is up, the model will be available via the OpenAI-compatible endpoint
at <dstack server URL>/proxy/models/<project name>/
.
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
-X POST \
-H 'Authorization: Bearer <dstack token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is Deep Learning?"
}
],
"stream": true,
"max_tokens": 512
}'
When a gateway is configured, the OpenAI-compatible endpoint
is available at https://gateway.<gateway domain>/
.
Source code¶
The source-code of this example can be found in
examples/llms/deepseek/sglang
.
What's next?¶
- Check services
- Browse the SgLang DeepSeek Usage, Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X