Skip to content


dstack is an open-source framework for orchestrating GPU workloads across multiple cloud GPU providers. It provides a simple cloud-agnostic interface for development and deployment of generative AI models.


To use dstack, install it with pip, and start the server.

$ pip install "dstack[all]" -U
$ dstack start

The server is available at

Configure clouds

Upon startup, the server sets up the default project called main. Prior to using dstack, make sure to configure clouds.

Once the server is up, you can orchestrate LLM workloads using either the CLI or the Python API.

Using CLI

Initialize the repo

To use dstack for your project, make sure to first run the dstack init command in the root folder of the project.

$ mkdir quickstart && cd quickstart
$ dstack init

Define a configuration

The CLI allows you to define what you want to run as a YAMl file and run it via the dstack run CLI command.

Configurations can be of three types: dev-environment, task, and service.

Dev environments

A dev environment is a virtual machine pre-configured an IDE.

type: dev-environment

python: "3.11" # (Optional) If not specified, your local version is used

setup: # (Optional) Executed once at the first startup
  - pip install -r requirements.txt

ide: vscode

Once it's live, you can open it in your local VS Code by clicking the provided URL in the output.


A task can be any script that you may want to run on demand: a batch job, or a web application.

type: task

python: "3.11" # (Optional) If not specified, your local version is used

  - 7860

  - pip install -r requirements.txt
  - python

While the task is running in the cloud, the CLI forwards its ports traffic to localhost for convenient access.


A service is an application that is accessible through a public endpoint.

type: service

python: "3.11" # (Optional) If not specified, your local version is used

port: 7860

  - pip install -r requirements.txt
  - python

Once the service is up, dstack makes it accessible from the Internet through the gateway.

For more details on the file syntax, refer to .dstack.yml.

Run the configuration

Default configurations

To run a configuration, you have to call the dstack run command and pass the path to the directory which you want to use as a working directory when running the configuration.

$ dstack run . 

 fast-moth-1  .dstack.yml    aws      5xCPUs, 15987MB  yes   $0.0547

Provisioning and starting SSH tunnel...
---> 100%

To open in VS Code Desktop, use this link:

If you've not specified a specific configuration file, dstack will use the default configuration defined in the given directory (named .dstack.yml).

Non-default configurations

If you want to run a non-default configuration, you have to specify the path to the configuration using the -f argument:

$ dstack run . -f serve.dstack.yml

 old-lionfish-1  serve.dstack.yml  aws      5xCPUs, 15987MB  yes   $0.0547

Provisioning and starting SSH tunnel...
---> 100%

Launching in *reload mode* on: (Press CTRL+C to quit)

For more details on the run command, refer to dstack run.

Requesting resources

You can request resources using the --gpu and --memory arguments with dstack run, or through resources with .dstack/profiles.yml.

Both the dstack run command and .dstack/profiles.yml support various other options, including requesting spot instances, defining the maximum run duration or price, and more.

Automatic instance discovery

dstack will automatically select the suitable instance type from a cloud provider and region with the best price and availability.

Using API

As an alternative to the CLI, you can run tasks and services programmatically via Python API.

import sys

import dstack

task = dstack.Task(
    env={"MODEL_ID": "TheBloke/Llama-2-13B-chat-GPTQ"},
        "text-generation-launcher --trust-remote-code --quantize gptq",
resources = dstack.Resources(gpu=dstack.GPU(memory="20GB"))

if __name__ == "__main__":
    print("Initializing the client...")
    client = dstack.Client.from_config(repo_dir="~/dstack-examples")

    print("Submitting the run...")
    run = client.runs.submit(configuration=task, resources=resources)

    print(f"Run {}: " + run.status())

    print("Attaching to the run...")

    # After the endpoint is up, will return 200 (OK).

        for log in run.logs():

    except KeyboardInterrupt:
        print("Aborting the run...")