Complex ML workflows consist of multiple steps. Define these steps in a simple and reproducible format.
Your team has access to expensive infrastructure. How do you make sure it's efficiently and equally used by all team members?
With dstack, you can add your servers in a pool, and share quotes between teams or team members.
Doesn't matter if you're a startup or a mature company. It's very likely that your team already has a cloud account and would like to manage compute costs there.
Even if your team has access to expensive infrastructure, what if today you need to do a lot of ML quickly, and tomorrow you don't need it at all?
Define your limits, and dstack will create spot instances on the fly when you need them.
Complex ML pipelines consist of many steps: acquiring data, pre-processing it, training or acquiring a base mode, finetuning, validating, etc.
With dstack, you can define individual workflows and pipe them together into complex pipelines, all through a simple YAML format. No changes to an existing project are required.
If you have your own servers, log into them via SSH and install the dstack-runner daemon via a simple bash command.
If you don't have your own servers, authorize dstack to manage spot instances in your own cloud, and define limits allowed to use.
Create .dstack/workflows.yaml and .dstack/variables.yaml files within your project repository.
For every workflow, define variables, specify a Docker image, commands, dependencies, and output artifacts.
Run and manage workflows via the dstack CLI. When you run a workflow, the dstack server creates jobs and assigns them to available runners (e.g. your own servers or spot instances in your own cloud).
Use the same dstack CLI to browse execution logs and output artifacts in real-time.
Need to run a heavy job that web-scrape or pre-process large data? Define a simple workflow, run it with parameters over any infrastructure. Re-use its artifacts in other workflows.
Need to train a model from scratch? Define a simple workflow. Add a dependency on other workflows (e.g. that prepare data.) Run the workflow interactively from the CLI.
Wanna finetune a model? Define a workflow and add a dependency to another workflow that downloads or trains the base model. Run it interactively via the CLI.
Automate benchmarking as the final step and invoke it any time you train or finetune a model. Track the results of each model to every step (from data preparation to finetuning.)