Reproducible ML workflows

Build complex ML workflows. Run workflows on your servers or on spot instances in your cloud.

Create reproducible ML workflows

Complex ML workflows consist of multiple steps. Define these steps in a simple and reproducible format.

Screenshot 2021 10 04 at 17.00.04

The challenges of teams doing ML

  • How do I use my infrastructure efficiently?

    Your team has access to expensive infrastructure. How do you make sure it's efficiently and equally used by all team members?

    With dstack, you can add your servers in a pool, and share quotes between teams or team members.

  • How can I use my own cloud for compute and storage?

    Doesn't matter if you're a startup or a mature company. It's very likely that your team already has a cloud account and would like to manage compute costs there.

  • How do I use on-demand infrastructure efficiently?

    Even if your team has access to expensive infrastructure, what if today you need to do a lot of ML quickly, and tomorrow you don't need it at all?

    Define your limits, and dstack will create spot instances on the fly when you need them.

  • How do I reduce complexity?

    Complex ML pipelines consist of many steps: acquiring data, pre-processing it, training or acquiring a base mode, finetuning, validating, etc.

    With dstack, you can define individual workflows and pipe them together into complex pipelines, all through a simple YAML format. No changes to an existing project are required.

How it works

  • 1

    Set up own runners

    If you have your own servers, log into them via SSH and install the dstack-runner daemon via a simple bash command.

    Dstack screenshots set up runners
  • 2

    Configure own cloud

    If you don't have your own servers, authorize dstack to manage spot instances in your own cloud, and define limits allowed to use.

    Dstack screenshots autoscale
  • 3

    Define workflows

    Create .dstack/workflows.yaml and .dstack/variables.yaml files within your project repository. 

    For every workflow, define variables, specify a Docker image, commands, dependencies, and output artifacts.

    Dstack screenshots workflows
  • 4

    Run workflows

    Run and manage workflows via the dstack CLI. When you run a workflow, the dstack server creates jobs and assigns them to available runners (e.g. your own servers or spot instances in your own cloud).

    Use the same dstack CLI to browse execution logs and output artifacts in real-time.

    Dstack screenshots run workflows

What you can use dstack for

Extract and prepare data

Need to run a heavy job that web-scrape or pre-process large data? Define a simple workflow, run it with parameters over any infrastructure. Re-use its artifacts in other workflows.

Train from scratch

Need to train a model from scratch? Define a simple workflow. Add a dependency on other workflows (e.g. that prepare data.) Run the workflow interactively from the CLI.

Finetune pre-trained models

Wanna finetune a model? Define a workflow and add a dependency to another workflow that downloads or trains the base model. Run it interactively via the CLI.

Benchmark and validate

Automate benchmarking as the final step and invoke it any time you train or finetune a model. Track the results of each model to every step (from data preparation to finetuning.)

F.A.Q.

Sign up for early access to dstack and share your feedback 🙌

Runs on Unicorn Platform