NVIDIA NIM¶
This example shows how to deploy Nemotron-3-Super-120B-A12B using NVIDIA NIM and dstack.
Prerequisites
Once dstack is installed, clone the repo with examples.
$ git clone https://github.com/dstackai/dstack
$ cd dstack
Deployment¶
Here's an example of a service that deploys Nemotron-3-Super-120B-A12B using NIM.
type: service
name: nemotron120
image: nvcr.io/nim/nvidia/nemotron-3-super-120b-a12b:1.8.0
env:
- NGC_API_KEY
registry_auth:
username: $oauthtoken
password: ${{ env.NGC_API_KEY }}
port: 8000
model: nvidia/nemotron-3-super-120b-a12b
volumes:
- instance_path: /root/.cache/nim
path: /opt/nim/.cache
optional: true
resources:
cpu: x86:96..
memory: 512GB..
shm_size: 16GB
disk: 500GB..
gpu: H100:80GB:8
Running a configuration¶
Save the configuration above as nemotron120.dstack.yml, then use the
dstack apply command.
$ NGC_API_KEY=...
$ dstack apply -f nemotron120.dstack.yml
If no gateway is created, the service endpoint will be available at <dstack server URL>/proxy/services/<project name>/<run name>/.
$ curl http://127.0.0.1:3000/proxy/services/main/nemotron120/v1/chat/completions \
-X POST \
-H 'Authorization: Bearer <dstack token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "nvidia/nemotron-3-super-120b-a12b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is Deep Learning?"
}
],
"max_tokens": 128
}'
When a gateway is configured, the service endpoint will be available at https://nemotron120.<gateway domain>/.
What's next?¶
- Check services
- Browse the Nemotron-3-Super-120B-A12B model page