Qwen 3.6¶
This example shows how to deploy Qwen/Qwen3.6-27B as a
service using
SGLang and dstack.
Apply a configuration¶
Save one of the following configurations as qwen36.dstack.yml.
type: service
name: qwen36
image: lmsysorg/sglang:v0.5.10.post1
commands:
- |
sglang serve \
--model-path Qwen/Qwen3.6-27B \
--host 0.0.0.0 \
--port 30000 \
--tp $DSTACK_GPUS_NUM \
--reasoning-parser qwen3 \
--mem-fraction-static 0.8 \
--context-length 262144
port: 30000
model: Qwen/Qwen3.6-27B
volumes:
- instance_path: /root/.cache
path: /root/.cache
optional: true
resources:
shm_size: 16GB
gpu: H100:4
type: service
name: qwen36
image: lmsysorg/sglang:v0.5.10-rocm720-mi30x
commands:
- |
sglang serve \
--model-path Qwen/Qwen3.6-27B \
--host 0.0.0.0 \
--port 30000 \
--tp $DSTACK_GPUS_NUM \
--reasoning-parser qwen3 \
--mem-fraction-static 0.8 \
--context-length 262144
port: 30000
model: Qwen/Qwen3.6-27B
volumes:
- instance_path: /root/.cache
path: /root/.cache
optional: true
resources:
cpu: 52..
memory: 896GB..
shm_size: 16GB
disk: 450GB..
gpu: MI300X:4
The NVIDIA and AMD configurations above use pinned SGLang images and the same straightforward 4-GPU layout used across the Qwen 3.6 docs and examples.
Apply the configuration with
dstack apply.
$ dstack apply -f qwen36.dstack.yml
If no gateway is created, the service endpoint will be available at
<dstack server URL>/proxy/services/<project name>/<run name>/.
curl http://127.0.0.1:3000/proxy/services/main/qwen36/v1/chat/completions \
-X POST \
-H 'Authorization: Bearer <dstack token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3.6-27B",
"messages": [
{
"role": "user",
"content": "A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost? Answer with just the dollar amount."
}
],
"max_tokens": 1024
}'
Thinking mode¶
Qwen3.6 uses thinking mode by default. With SGLang, the reasoning stream is
returned separately as reasoning_content.
To disable thinking, pass chat_template_kwargs in the request body.
curl http://127.0.0.1:3000/proxy/services/main/qwen36/v1/chat/completions \
-X POST \
-H 'Authorization: Bearer <dstack token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3.6-27B",
"messages": [
{
"role": "user",
"content": "Summarize the benefits of container images in one sentence."
}
],
"max_tokens": 256,
"chat_template_kwargs": {
"enable_thinking": false
}
}'
What's next?¶
- Read the Qwen/Qwen3.6-27B model card
- Read the Qwen 3.6 SGLang cookbook
- Read the Qwen 3.5 & 3.6 vLLM recipe
- Browse the dedicated SGLang and vLLM examples
- Check the AMD example for more AMD deployment and training configurations