Skip to content

Releases

Introducing instance volumes to persist data on instances

Until now, dstack supported data persistence only with network volumes, managed by clouds. While convenient, sometimes you might want to use a simple cache on the instance or mount an NFS share to your SSH fleet. To address this, we're now introducing instance volumes that work for both cases.

type: task 
name: llama32-task

env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
commands:
  - pip install vllm
  - vllm serve $MODEL_ID --max-model-len 4096
ports: [8000]

volumes:
  - /root/.dstack/cache:/root/.cache

resources:
  gpu: 16GB..

Using TPUs for fine-tuning and deploying LLMs

If you’re using or planning to use TPUs with Google Cloud, you can now do so via dstack. Just specify the TPU version and the number of cores (separated by a dash), in the gpu property under resources.

Read below to find out how to use TPUs with dstack for fine-tuning and deploying LLMs, leveraging open-source tools like Hugging Face’s Optimum TPU and vLLM .

Supporting AMD accelerators on RunPod

While dstack helps streamline the orchestration of containers for AI, its primary goal is to offer vendor independence and portability, ensuring compatibility across different hardware and cloud providers.

Inspired by the recent MI300X benchmarks, we are pleased to announce that RunPod is the first cloud provider to offer AMD GPUs through dstack, with support for other cloud providers and on-prem servers to follow.

Using volumes to optimize cold starts on RunPod

Deploying custom models in the cloud often faces the challenge of cold start times, including the time to provision a new instance and download the model. This is especially relevant for services with autoscaling when new model replicas need to be provisioned quickly.

Let's explore how dstack optimizes this process using volumes, with an example of deploying a model on RunPod.