Skip to content

Commit 1539181

Browse files
Update docs (#390)
* Update training details * Further clarification --------- Co-authored-by: arcticfly <41524992+arcticfly@users.noreply.github.com>
1 parent c81a5a4 commit 1539181

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

docs/tutorials/open-deep-research.mdx

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,16 @@ This is the main GRPO training loop where the model learns to optimize its resea
7878

7979
The first training run will:
8080

81-
- Register the model with ART.
82-
- Download the model checkpoint.
83-
- Start vLLM and the training service on your GPU.
84-
- Train the model for a specified number of steps.
85-
- Upload the final model checkpoint (if configured).
81+
- **Spin up a cluster with 1 or more H200 GPUs.**
82+
- This usually takes about 10 minutes, but RunPod occasionally has network throughput issues that can cause the cluster to take up to 30 minutes to spin up.
83+
- **Register the model with ART.**
84+
- This usually takes less than 5 minutes, though it can require up to 30 minutes if RunPod experiences network issues.
85+
- **Download the model checkpoint.**
86+
- Usually takes a few minutes depending on the model size.
87+
- **Train the model for a specified number of steps.**
88+
- Each RL step involves running the research agent on a subset of benchmark questions, and updating the model based on the rewards. We hold out another randomly-selected subset of 10 questions (10% of the total benchmark) that are never used in training that we run evaluations on every 10 steps to make sure the model is still making progress. Training time depends on the number of steps and the complexity of each research task.
89+
- **Upload the final model checkpoint.**
90+
- This usually takes a few minutes.
8691

8792
### Step 5: Generate the benchmarks
8893

0 commit comments

Comments
 (0)