Skip to content

AMD

Since 0.18.11 , you can specify an AMD GPU under resources. Below are a few examples.

AMD accelerators are currently supported only with the runpod backend.

Deployment

Running as a service

Here's an example of a service that deploys Llama 3.1 70B in FP16 using TGI .

type: service
name: amd-service-tgi

image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
env:
  - HUGGING_FACE_HUB_TOKEN
  - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
  - TRUST_REMOTE_CODE=true
  - ROCM_USE_FLASH_ATTN_V2_TRITON=true
commands:
  - text-generation-launcher --port 8000
port: 8000

resources:
  gpu: MI300X
  disk: 150GB

spot_policy: auto

model:
  type: chat
  name: meta-llama/Meta-Llama-3.1-70B-Instruct
  format: openai

Docker image

Please note that if you want to use AMD, specifying image is currently required. This must be an image that includes ROCm drivers.

To request multiple GPUs, specify the quantity after the GPU name, separated by a colon, e.g., MI300X:4.

AMD accelerators can also be used with other frameworks like vLLM, Ollama, etc., and we'll be adding more examples soon.

Running a configuration

Once the configuration is ready, run dstack apply -f <configuration file>, and dstack will automatically provision the cloud resources and run the configuration.

Fleets

By default, dstack apply reuses idle instances from one of the existing fleets. If no idle instances meet the requirements, it creates a new fleet using one of the configured backends.

Use fleets configurations to create fleets manually. This reduces startup time for dev environments, tasks, and services, and is very convenient if you want to reuse fleets across runs.

Dev environments

Before running a task or service, it's recommended that you first start with a dev environment. Dev environments allow you to run commands interactively.

Source code

The source-code of this example can be found in examples/deployment/tgi/amd .

What's next?

  1. Check dev environments, tasks, and services.