Orchestrate GPU workloads across clouds
dstack is an open-source framework for orchestrating GPU workloads across various cloud providers.
It offers a simple cloud-agnostic interface for training, fine-tuning, inference, and development of generative AI models.
Get started Join DiscordUse multiple cloud GPU providers
Training
Pre-train or fine-tune LLMs or other state-of-the-art generative AI models across multiple cloud GPU providers, ensuring data privacy, GPU availability, and cost-efficiency.
Inference
Deploy LLMs and other state-of-the-art generative AI models across multiple cloud GPU providers, ensuring data privacy, GPU availability, and cost-efficiency.
Dev environments
Provision development environments over multiple cloud GPI providers, ensuring data privacy, GPU availability, and cost-efficiency.

Featured examples
RAG with Llama Index and Weaviate
Use Llama Index and Weaviate to enhance the capabilities of LLMs with the context of your data.
Deploying LLMs using Python API
Streamlit application that programmatically deploys a Llama LLM using dstack's Python API.
Fine-tuning Llama 2 using QLoRA
Fine-tune Llama 2 on a custom dataset using the peft, and bitsandbytes, and trl libraries.
Deploying LLMs using TGI
Deploy LLMs as services with optimized performance using TGI, an open-source serving framework by Hugging Face.
Deploying Stable Diffusion using FastAPI
Deploy Stable Diffusion XL as a service using FastAPI.
Deploying LLMs using vLLM
Deploy LLMs as services with up to 24 times higher throughput using the vLLM library.
Get started in less than a minute
$ pip install "dstack[all] -U" $ dstack start The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b353b10f
Done! Configure clouds, and use the CLI or API to orchestrate GPU workloads.
Get started Join Discord