Quickstart¶
dstack
is an open-source framework for orchestrating GPU workloads
across multiple cloud GPU providers. It provides a simple cloud-agnostic interface for
development and deployment of generative AI models.
Installation¶
To use dstack
, install it with pip
, and start the server.
$ pip install "dstack[all]" -U
$ dstack start
The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b353b10f
Configure clouds
Upon startup, the server sets up the default project called main
.
Prior to using dstack
, make sure to configure clouds.
Once the server is up, you can orchestrate LLM workloads using either the CLI or the Python API.
Using CLI¶
Initialize the repo¶
To use dstack
for your project, make sure to first run the dstack init
command in the root folder of the project.
$ mkdir quickstart && cd quickstart
$ dstack init
Define a configuration¶
The CLI allows you to define what you want to run as a YAMl file and run it via the dstack run
CLI command.
Configurations can be of three types: dev-environment
, task
, and service
.
Dev environments¶
A dev environment is a virtual machine pre-configured an IDE.
type: dev-environment
python: "3.11" # (Optional) If not specified, your local version is used
setup: # (Optional) Executed once at the first startup
- pip install -r requirements.txt
ide: vscode
Once it's live, you can open it in your local VS Code by clicking the provided URL in the output.
Tasks¶
A task can be any script that you may want to run on demand: a batch job, or a web application.
type: task
python: "3.11" # (Optional) If not specified, your local version is used
ports:
- 7860
commands:
- pip install -r requirements.txt
- python app.py
While the task is running in the cloud, the CLI forwards its ports traffic to localhost
for convenient access.
Services¶
A service is an application that is accessible through a public endpoint.
type: service
python: "3.11" # (Optional) If not specified, your local version is used
port: 7860
commands:
- pip install -r requirements.txt
- python app.py
Once the service is up, dstack
makes it accessible from the Internet through
the gateway.
For more details on the file syntax, refer to .dstack.yml
.
Run the configuration¶
Default configurations¶
To run a configuration, you have to call the dstack run
command and pass the path to the
directory which you want to use as a working directory when running the configuration.
$ dstack run .
RUN CONFIGURATION BACKEND RESOURCES SPOT PRICE
fast-moth-1 .dstack.yml aws 5xCPUs, 15987MB yes $0.0547
Provisioning and starting SSH tunnel...
---> 100%
To open in VS Code Desktop, use this link:
vscode://vscode-remote/ssh-remote+fast-moth-1/workflow
If you've not specified a specific configuration file, dstack
will use the default configuration
defined in the given directory (named .dstack.yml
).
Non-default configurations¶
If you want to run a non-default configuration, you have to specify the path to the configuration
using the -f
argument:
$ dstack run . -f serve.dstack.yml
RUN CONFIGURATION BACKEND RESOURCES SPOT PRICE
old-lionfish-1 serve.dstack.yml aws 5xCPUs, 15987MB yes $0.0547
Provisioning and starting SSH tunnel...
---> 100%
Launching in *reload mode* on: http://127.0.0.1:7860 (Press CTRL+C to quit)
For more details on the run command, refer to dstack run
.
Requesting resources¶
You can request resources using the --gpu
and --memory
arguments with dstack run
,
or through resources
with .dstack/profiles.yml
.
Both the dstack run
command and .dstack/profiles.yml
support various other options, including requesting spot instances, defining the maximum run duration or price, and
more.
Automatic instance discovery
dstack
will automatically select the suitable instance type from a cloud provider and region with the best
price and availability.
Using API¶
As an alternative to the CLI, you can run tasks and services programmatically via Python API.
import sys
import dstack
task = dstack.Task(
image="ghcr.io/huggingface/text-generation-inference:latest",
env={"MODEL_ID": "TheBloke/Llama-2-13B-chat-GPTQ"},
commands=[
"text-generation-launcher --trust-remote-code --quantize gptq",
],
ports=["8080:80"],
)
resources = dstack.Resources(gpu=dstack.GPU(memory="20GB"))
if __name__ == "__main__":
print("Initializing the client...")
client = dstack.Client.from_config(repo_dir="~/dstack-examples")
print("Submitting the run...")
run = client.runs.submit(configuration=task, resources=resources)
print(f"Run {run.name}: " + run.status())
print("Attaching to the run...")
run.attach()
# After the endpoint is up, http://127.0.0.1:8080/health will return 200 (OK).
try:
for log in run.logs():
sys.stdout.buffer.write(log)
sys.stdout.buffer.flush()
except KeyboardInterrupt:
print("Aborting the run...")
run.stop(abort=True)
finally:
run.detach()