Skip to content

Text Embeddings Inference

This example demonstrates how to use TEI with dstack's services to deploy embeddings.

Define the configuration

To deploy a text embeddings model as a service using TEI, define the following configuration file:

type: service

  - MODEL_ID=thenlper/gte-base
  - text-embeddings-router --port 80
port: 80

  gpu: 16GB

Run the configuration


Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.

$ dstack run . -f deployment/tae/serve.dstack.yml

Access the endpoint

Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):


By default, the service endpoint requires the Authorization header with "Bearer <dstack token>".

$ curl \
    -X POST \
    -H 'Content-Type: application/json' \
    -H 'Authorization: "Bearer &lt;dstack token&gt;"' \
    -d '{"inputs":"What is Deep Learning?"}'


Hugging Face Hub token

To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).

Source code

The complete, ready-to-run code is available in dstackai/dstack-examples.

What's next?

  1. Check the Text Generation Inference and vLLM examples
  2. Read about services
  3. Browse all examples
  4. Join the Discord server