You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/fundamentals/art-backend.mdx
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,6 +73,30 @@ backend = LocalBackend(
73
73
)
74
74
```
75
75
76
+
If you're using `PipelineTrainer`, `LocalBackend` is currently supported only in dedicated mode, where training and inference run on separate GPUs.
77
+
78
+
```python
79
+
from art import TrainableModel
80
+
from art.dev import InternalModelConfig
81
+
from art.local import LocalBackend
82
+
83
+
backend = LocalBackend(path="./.art")
84
+
model = TrainableModel(
85
+
name="pipeline-localbackend",
86
+
project="my-project",
87
+
base_model="Qwen/Qwen3-0.6B",
88
+
_internal_config=InternalModelConfig(
89
+
trainer_gpu_ids=[0],
90
+
inference_gpu_ids=[1],
91
+
),
92
+
)
93
+
```
94
+
95
+
Shared `LocalBackend` still pauses inference during training, so ART rejects that configuration for `PipelineTrainer`.
96
+
97
+
In dedicated mode, a new checkpoint becomes the default inference target only after its LoRA has been reloaded into vLLM. That checkpoint publication flow is backend-specific, so `save_checkpoint` does not have identical semantics across every ART backend.
98
+
Requests that are already in flight keep using the adapter they started with; the reload only affects subsequent routing to the latest served step.
99
+
76
100
## Using a backend
77
101
78
102
Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.
Copy file name to clipboardExpand all lines: docs/fundamentals/training-loop.mdx
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,8 @@ ART's functionality is divided into a [**client**](/fundamentals/art-client) and
22
22
23
23
This training loop runs until a specified number of inference and training iterations have completed.
24
24
25
+
This describes the default shared-resource loop. `PipelineTrainer` can also run with `LocalBackend` in dedicated mode, where training and inference stay on separate GPUs and the latest served step advances only after vLLM reloads the new LoRA.
26
+
25
27
Training and inference use both the ART **client** and **backend**. Learn more by following the links below!
0 commit comments