Skip to content

Commit 6560dfd

Browse files
authored
Add fix art issues skill (#585)
* Add fix art issues skill * update skill
1 parent 4c08d24 commit 6560dfd

2 files changed

Lines changed: 74 additions & 1 deletion

File tree

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
name: fix-art-issues
3+
description: >
4+
Fix a GitHub issue on OpenPipe/ART and open a PR.
5+
Use when the user asks to fix, solve, or work on an ART issue,
6+
or references a GitHub issue URL containing "OpenPipe/ART".
7+
Triggers: "fix ART issue", "solve this issue" with an OpenPipe/ART URL,
8+
"work on ART #N".
9+
---
10+
11+
# Fix ART Issue
12+
13+
Fix a GitHub issue on `OpenPipe/ART` and open a PR.
14+
15+
- **Repo**: `OpenPipe/ART`
16+
- **Base branch**: `main`
17+
18+
Assumes the workspace is already set up with the correct branch checked out and `.env` in place (handled by the system-level `fix-art-workspace` skill).
19+
20+
## Workflow
21+
22+
### 1. Read the Issue
23+
```
24+
gh issue view <number> --repo OpenPipe/ART --json title,body,labels,assignees,comments
25+
```
26+
27+
### 2. Explore, Plan, Implement
28+
- Use the Explore agent to understand relevant code before making changes.
29+
- Plan clearly, implement with minimal focused changes. No over-engineering.
30+
31+
### 3. Commit and Push
32+
- Commit with a message that includes `Closes #<issue-number>`.
33+
- Push the feature branch. If HTTPS push fails due to SAML SSO, set SSH remote: `git remote set-url origin git@github.com:OpenPipe/ART.git`
34+
35+
### 4. Open a Draft PR
36+
- `gh pr create --base main --draft`.
37+
- PR body: `## Summary`, `Closes #<number>`, `## Changes`, `## Test plan`.
38+
39+
### 5. Testing
40+
- **No test artifacts in the final PR**: debug prints, test scripts, and temporary changes must NOT be committed.
41+
- Update the PR's test plan section with detailed results.
42+
- When testing passes, mark the PR as ready: `gh pr ready`.
43+
44+
## Reference
45+
46+
Read `CONTRIBUTING.md` at the repo root for guidance on code quality checks (prek), CI cache refresh, and the release process.
47+
48+
## Dependency Management Tips
49+
50+
- **Pin versions strictly** (`==`) for critical deps like `transformers`, `trl`, `unsloth`, `unsloth-zoo`, `vllm` to avoid surprise breakage from new releases.
51+
- **Don't loosen pins without reason**: if a dep was `==X.Y.Z`, keep it pinned unless there's a specific reason to change. Don't use `>=` just because it seems more flexible.
52+
- **`uv run` fails on macOS** for backend deps (apex/torch need CUDA). This is expected — use `uvx ruff` for linting locally, test on GPU cluster.
53+
54+
## Deploying a GPU Cluster
55+
56+
Name the SkyPilot cluster after the branch name without the `fix/` prefix, replacing `/` with `-` (SkyPilot doesn't allow slashes). For example, if the branch is `fix/short-description`:
57+
```
58+
uv run sky launch -c short-description skypilot-config.yaml -y
59+
```
60+
61+
To connect: `ssh short-description`
62+
63+
To tear down when done: `uv run sky down short-description`
64+
65+
## GPU Cluster Testing Tips
66+
67+
- **Kill stale GPU processes** before re-running tests: `nvidia-smi --query-compute-apps=pid --format=csv,noheader | xargs -r kill -9`. Previous failed runs leave processes holding GPU memory.
68+
- **Set `gpu_memory_utilization`** in test scripts (e.g. `0.7`) — the default `0.9` is too high when Unsloth's training model is also loaded on the same GPU.
69+
- **Redirect test output to a log file**: `nohup python test.py > /tmp/output.log 2>&1 &` then `tail -f /tmp/output.log`. SSH background tasks lose output when connection drops.
70+
- **Git on cluster**: SSH keys may not be configured. Use HTTPS with token: `git remote set-url origin https://${GITHUB_TOKEN}@github.com/OpenPipe/ART.git`
71+
- **Tear down clusters** when done: `sky down <cluster-name> -y`
72+
73+
$ARGUMENTS

skypilot-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@
382382

383383
workdir: .
384384
resources:
385-
accelerators: ["H100-SXM:1", "H100:1", "A100-80GB:1"]
385+
accelerators: ["H200:1", "H100-SXM:1", "H100:1", "A100-80GB:1"]
386386
image_id: docker:pytorch/pytorch:2.9.0-cuda12.8-cudnn9-devel
387387
ports:
388388
- 7999 # main ART server

0 commit comments

Comments
 (0)