Model inference with Prefill-Decode disaggregation
While dstack started as a GPU-native orchestrator for development and training, over the last year it has increasingly brought inference to the forefront — making serving a first-class citizen.

At the end of last year, we introduced SGLang router integration — bringing cache-aware routing to services. Today, building on that integration, we’re adding native Prefill–Decode (PD) disaggregation.







