Skip to content

Quickstart

dstack is an open-source framework for orchestrating GPU workloads across multiple cloud GPU providers. It provides a simple cloud-agnostic interface for development and deployment of generative AI models.

Installation

To use dstack, install it with pip, and start the server.

$ pip install "dstack[all]" -U
$ dstack start

The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b353b10f

Configure clouds

Upon startup, the server sets up the default project called main. Prior to using dstack, make sure to configure clouds.

Once the server is up, you can orchestrate LLM workloads using either the CLI or the Python API.

Using CLI

Initialize the repo

To use dstack for your project, make sure to first run the dstack init command in the root folder of the project.

$ mkdir quickstart && cd quickstart
$ dstack init

Define a configuration

The CLI allows you to define what you want to run as a YAMl file and run it via the dstack run CLI command.

Configurations can be of three types: dev-environment, task, and service.

Dev environments

A dev environment is a virtual machine pre-configured an IDE.

type: dev-environment

python: "3.11" # (Optional) If not specified, your local version is used

setup: # (Optional) Executed once at the first startup
  - pip install -r requirements.txt

ide: vscode

Once it's live, you can open it in your local VS Code by clicking the provided URL in the output.

Tasks

A task can be any script that you may want to run on demand: a batch job, or a web application.

type: task

python: "3.11" # (Optional) If not specified, your local version is used

ports:
  - 7860

commands:
  - pip install -r requirements.txt
  - python app.py

While the task is running in the cloud, the CLI forwards its ports traffic to localhost for convenient access.

Services

A service is an application that is accessible through a public endpoint.

type: service

python: "3.11" # (Optional) If not specified, your local version is used

port: 7860

commands:
  - pip install -r requirements.txt
  - python app.py

Once the service is up, dstack makes it accessible from the Internet through the gateway.

For more details on the file syntax, refer to .dstack.yml.

Run the configuration

Default configurations

To run a configuration, you have to call the dstack run command and pass the path to the directory which you want to use as a working directory when running the configuration.

$ dstack run . 

 RUN          CONFIGURATION  BACKEND  RESOURCES        SPOT  PRICE
 fast-moth-1  .dstack.yml    aws      5xCPUs, 15987MB  yes   $0.0547


Provisioning and starting SSH tunnel...
---> 100%

To open in VS Code Desktop, use this link:
  vscode://vscode-remote/ssh-remote+fast-moth-1/workflow

If you've not specified a specific configuration file, dstack will use the default configuration defined in the given directory (named .dstack.yml).

Non-default configurations

If you want to run a non-default configuration, you have to specify the path to the configuration using the -f argument:

$ dstack run . -f serve.dstack.yml

 RUN             CONFIGURATION     BACKEND  RESOURCES        SPOT  PRICE
 old-lionfish-1  serve.dstack.yml  aws      5xCPUs, 15987MB  yes   $0.0547

Provisioning and starting SSH tunnel...
---> 100%

Launching in *reload mode* on: http://127.0.0.1:7860 (Press CTRL+C to quit)

For more details on the run command, refer to dstack run.

Requesting resources

You can request resources using the --gpu and --memory arguments with dstack run, or through resources with .dstack/profiles.yml.

Both the dstack run command and .dstack/profiles.yml support various other options, including requesting spot instances, defining the maximum run duration or price, and more.

Automatic instance discovery

dstack will automatically select the suitable instance type from a cloud provider and region with the best price and availability.

Using API

As an alternative to the CLI, you can run tasks and services programmatically via Python API.

import sys

import dstack

task = dstack.Task(
    image="ghcr.io/huggingface/text-generation-inference:latest",
    env={"MODEL_ID": "TheBloke/Llama-2-13B-chat-GPTQ"},
    commands=[
        "text-generation-launcher --trust-remote-code --quantize gptq",
    ],
    ports=["8080:80"],
)
resources = dstack.Resources(gpu=dstack.GPU(memory="20GB"))

if __name__ == "__main__":
    print("Initializing the client...")
    client = dstack.Client.from_config(repo_dir="~/dstack-examples")

    print("Submitting the run...")
    run = client.runs.submit(configuration=task, resources=resources)

    print(f"Run {run.name}: " + run.status())

    print("Attaching to the run...")
    run.attach()

    # After the endpoint is up, http://127.0.0.1:8080/health will return 200 (OK).

    try:
        for log in run.logs():
            sys.stdout.buffer.write(log)
            sys.stdout.buffer.flush()

    except KeyboardInterrupt:
        print("Aborting the run...")
        run.stop(abort=True)
    finally:
        run.detach()