Skip to content

NVIDIA NIM

This example shows how to deploy LLama 3.1 using NVIDIA NIM and dstack.

Prerequisites

Once dstack is installed, go ahead clone the repo, and run dstack init.

$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init

Deployment

Here's an example of a service that deploys Llama 3.1 8B using vLLM.

type: service
name: llama31

image: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
env:
  - NGC_API_KEY
  - NIM_MAX_MODEL_LEN=4096
registry_auth:
  username: $oauthtoken
  password: ${{ env.NGC_API_KEY }}
port: 8000
# Register the model
model: meta/llama-3.1-8b-instruct

# Uncomment to leverage spot instances
#spot_policy: auto

# Cache downloaded models
volumes:
  - /root/.cache/nim:/opt/nim/.cache

resources:
  gpu: 24GB
  # Uncomment if using multiple GPUs
  #shm_size: 24GB

Running a configuration

To run a configuration, use the dstack apply command.

$ NGC_API_KEY=...
$ dstack apply -f examples/deployment/nim/.dstack.yml

 #  BACKEND  REGION             RESOURCES                 SPOT  PRICE       
 1  gcp      asia-northeast3    4xCPU, 16GB, 1xL4 (24GB)  yes   $0.17   
 2  gcp      asia-east1         4xCPU, 16GB, 1xL4 (24GB)  yes   $0.21   
 3  gcp      asia-northeast3    8xCPU, 32GB, 1xL4 (24GB)  yes   $0.21 

Submit the run llama3-nim-task? [y/n]: y

Provisioning...
---> 100%

If no gateway is created, the model will be available via the OpenAI-compatible endpoint at <dstack server URL>/proxy/models/<project name>/.

$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -X POST \
    -H 'Authorization: Bearer &lt;dstack token&gt;' \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "meta/llama3-8b-instruct",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is Deep Learning?"
        }
      ],
      "max_tokens": 128
    }'

When a gateway is configured, the OpenAI-compatible endpoint is available at https://gateway.<gateway domain>/.

Source code

The source-code of this example can be found in examples/deployment/nim .

Limitations

NIM isn't working yet with runpod and vastai backends. Track the issue for progress.

What's next?

  1. Check services
  2. Browse the Llama 3.1, TGI, and vLLM examples
  3. See also AMD and TPU