Skip to content

Ollama

This example demonstrates how to use Ollama with dstack's services to deploy LLMs.

Define the configuration

To deploy an LLM as a service using vLLM, you have to define the following configuration file:

type: service

image: ollama/ollama
commands:
  - ollama serve &
  - sleep 3
  - ollama pull mixtral
  - fg
port: 11434

resources:
  gpu: 48GB..80GB

# (Optional) Enable the OpenAI-compatible endpoint
model:
  type: chat
  name: mixtral
  format: openai

Run the configuration

Gateway

Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.

$ dstack run . -f deployment/ollama/serve.dstack.yml

Access the endpoint

Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):

Authorization

By default, the service endpoint requires the Authorization header with "Bearer <dstack token>".

OpenAI interface

Because we've configured the model mapping, it will also be possible to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.<gateway domain>", 
    api_key="<dstack token>",
)

completion = client.chat.completions.create(
    model="mixtral",
    messages=[
        {
            "role": "user",
            "content": "Compose a poem that explains the concept of recursion in programming.",
        }
    ],
    stream=True,
)

for chunk in completion:
    print(chunk.choices[0].delta.content, end="")
print()

Hugging Face Hub token

To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).

Source code

The complete, ready-to-run code is available in dstackai/dstack-examples.

What's next?

  1. Check the vLLM and Text Generation Inference examples
  2. Read about services
  3. Browse examples
  4. Join the Discord server