Skip to content

Embrace Dev Environments, Leave Manual SSH Behind

Overcoming the limitations of dev environments for ML projects.

SSH is frequently an indispensable tool for ML engineers. In this discussion, we will delve into the drawbacks of using SSH manually and the advantages of embracing dev environments, which can significantly enhance your productivity as an ML engineer.

Why SSH is the frequently used for ML

If we are working with a cloud instance (which is often the case in ML projects, especially in the enterprise setting), we often end up using SSH as the main developer interface to access the instance.

$ ssh username@remotehost

root@79fb138f2996:/# 

Why? It's easy to use with any cloud. It's secure. It allows forwarding ports to your local machine. It allows you to use the developer tools of your choice. You can either use it via the command line, attach your code editor to it, or use it via a Jupyter notebook running on the instance.

Finally, and even more importantly, SSH gives you direct access to the environment, allowing you to debug your code and have the shortest feedback loop possible.

Problems that SSH doesn't solve

While SSH makes remote access easier, it by no means helps you set up the environment or provision cloud resources.

In traditional development, you can create a cloud instance, do the setup once, and use the instance for a long period of time. In ML, you can't spin up an expensive cloud instance for a long period of time. That's why if you want to use SSH, you have to set up the environment and manage the cloud instance yourself.

Ideally, you'd like to define your environment and instance requirements as code and be able to spin them up on demand while keeping all the advantages of SSH.

Limitations of dev environments

The concept of dev environments aims to solve this issue. It allows you to define your dev environment configuration as code and then launch it on demand. But if it sounds so good, why isn't it frequently used for ML?

Here are some of the reasons that come to mind:

  • Most dev environments are managed services and can't compare in terms of flexibility to SSH, which is open-source and works with any cloud vendor.
  • ML has a lot of specifics: you may want to use a particular cloud account (e.g., where your data or compute reside), spot instances, or other ML-specific infrastructure, etc.

While dev environments are growing in popularity, there is still a gap before they become the new standard for ML. At dstack, we aim to change this

How dstack is addressing them

dstack is an open-source tool licensed under the Mozilla Public License 2.0 and works with any cloud vendor. Secondly, dstack focuses entirely on ML challenges. So, how does it work?

$ pip install "dstack[aws,gcp,azure]"
dstack start

The dstack start command runs a lightweight server that manages cloud credentials and orchestrates runs.

NOTE:

By default, it runs dev environments locally. To run dev environments in the cloud, all you need to do is to configure the corresponding project by providing the cloud credentials.

Dev environments can be defined via YAML (under the .dstack/workflows folder).

workflows:
  - name: code-gpu
    provider: code
    python: 3.11
    setup:
      - pip install -r dev-environments/requirements.txt
    resources:
      gpu:
        count: 1
    cache:
       - path: ~/.cache/pip
       - path: ./models

Once defined, you can run them via the dstack run command:

$ dstack run code-gpu

RUN      WORKFLOW  SUBMITTED  STATUS     TAG
shady-1  code-gpu  now        Submitted  

Starting SSH tunnel...

To exit, press Ctrl+C.

Web UI available at http://127.0.0.1:51845/?tkn=4d9cc05958094ed2996b6832f899fda1

For convenience, dstack uses an exact copy of the source code that is locally present in the folder where you use the dstack command.

If you click the URL, it will open the web-based VS Code IDE.

dstack automatically and securely forwards the ports of the dev environment to your local machine so onlu you can access it.

You can use dev environments to explore data, train models, run apps, and do other ML tasks without wasting any time on managing infrastructure or setting up the environment.

On top of that, you get a lot of other features which you can read about in the documentation.

NOTE:

Interested? Give it a try and let us know what you think!

Discuss on Slack