Skip to content

Commit e4c6b2b

Browse files
committed
docs: Document dedicated LocalBackend pipeline support
1 parent 3067cc1 commit e4c6b2b

File tree

3 files changed

+28
-2
lines changed

3 files changed

+28
-2
lines changed

docs/features/checkpoint-forking.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ import art
3131
from art.local import LocalBackend
3232

3333
async def train():
34-
with LocalBackend() as backend:
34+
async with LocalBackend() as backend:
3535
# Create a new model that will fork from an existing checkpoint
3636
model = art.TrainableModel(
3737
name="my-model-v2",
@@ -115,7 +115,7 @@ low_lr_model = art.TrainableModel(
115115
)
116116

117117
async def experiment():
118-
with LocalBackend() as backend:
118+
async with LocalBackend() as backend:
119119
# Fork the model from the base model
120120
await backend._experimental_fork_checkpoint(
121121
low_lr_model,

docs/fundamentals/art-backend.mdx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,30 @@ backend = LocalBackend(
7373
)
7474
```
7575

76+
If you're using `PipelineTrainer`, `LocalBackend` is currently supported only in dedicated mode, where training and inference run on separate GPUs.
77+
78+
```python
79+
from art import TrainableModel
80+
from art.dev import InternalModelConfig
81+
from art.local import LocalBackend
82+
83+
backend = LocalBackend(path="./.art")
84+
model = TrainableModel(
85+
name="pipeline-localbackend",
86+
project="my-project",
87+
base_model="Qwen/Qwen3-0.6B",
88+
_internal_config=InternalModelConfig(
89+
trainer_gpu_ids=[0],
90+
inference_gpu_ids=[1],
91+
),
92+
)
93+
```
94+
95+
Shared `LocalBackend` still pauses inference during training, so ART rejects that configuration for `PipelineTrainer`.
96+
97+
In dedicated mode, a new checkpoint becomes the default inference target only after its LoRA has been reloaded into vLLM. That checkpoint publication flow is backend-specific, so `save_checkpoint` does not have identical semantics across every ART backend.
98+
Requests that are already in flight keep using the adapter they started with; the reload only affects subsequent routing to the latest served step.
99+
76100
## Using a backend
77101

78102
Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.

docs/fundamentals/training-loop.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ ART's functionality is divided into a [**client**](/fundamentals/art-client) and
2222

2323
This training loop runs until a specified number of inference and training iterations have completed.
2424

25+
This describes the default shared-resource loop. `PipelineTrainer` can also run with `LocalBackend` in dedicated mode, where training and inference stay on separate GPUs and the latest served step advances only after vLLM reloads the new LoRA.
26+
2527
Training and inference use both the ART **client** and **backend**. Learn more by following the links below!
2628

2729
<div className="cards-container">

0 commit comments

Comments
 (0)