You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This prints the `databaseId`, `conclusion`, and `createdAt` of the most recent completed run (meaning finished running, not necessarily passed — a completed run can have failed jobs, which is what we're looking for). Record `databaseId` as `RUN_ID`.
60
+
61
+
**Step 1b — Rerun if the run was cancelled**
62
+
63
+
If the run `conclusion` is `"cancelled"`, the run did not finish normally — some jobs were cut short before they could produce results. Rerun the cancelled/failed jobs automatically:
64
+
65
+
```bash
66
+
gh run rerun RUN_ID --repo SonarSource/peachee-js --failed
67
+
```
68
+
69
+
Then print:
70
+
71
+
```
72
+
⚠️ Run RUN_ID (DATE) was cancelled before completion.
73
+
Rerun triggered for all failed/cancelled jobs. Check back once the rerun completes.
47
74
```
48
75
49
-
Pick the most recent run with `status == "completed"` (meaning finished running, not necessarily passed — a completed run can have failed jobs, which is what we're looking for). Record its `databaseId` as `RUN_ID`.
76
+
Then stop — do not attempt to triage the incomplete results.
50
77
51
-
**Step 1.2 — Collect all failed jobs**
78
+
**Step 2 — Collect all failed jobs**
52
79
53
80
The run has ~250 jobs across 3 pages. Fetch all three pages and collect jobs where
54
81
`conclusion == "failure"`:
55
82
56
83
```bash
57
-
gh api "repos/SonarSource/peachee-js/actions/runs/RUN_ID/jobs?per_page=100&page=1"
58
-
gh api "repos/SonarSource/peachee-js/actions/runs/RUN_ID/jobs?per_page=100&page=2"
59
-
gh api "repos/SonarSource/peachee-js/actions/runs/RUN_ID/jobs?per_page=100&page=3"
84
+
gh api "repos/SonarSource/peachee-js/actions/runs/RUN_ID/jobs?per_page=100&page=1" \
For each page, collect jobs where `conclusion == "failure"`. Record each job's `name` and `id`.
92
+
Each command outputs only the failed jobs for that page. `completedAt` may be `null` — see Step 7 for handling.
63
93
64
-
**Step 1.3 — Early exit if no failures**
94
+
**Step 3 — Early exit if no failures**
65
95
66
96
If there are no failed jobs, print:
67
97
@@ -71,81 +101,175 @@ If there are no failed jobs, print:
71
101
72
102
Then stop.
73
103
74
-
**Step 1.4 — Create tasks for each failure**
104
+
**Step 4 — Mass failure detection**
75
105
76
-
For each failed job, create a task:
106
+
If **≥80% of jobs failed** (e.g. 200+ out of 253), this indicates a single shared root cause.
107
+
Do not triage every job individually.
77
108
109
+
Instead:
110
+
1. Sample 5 representative jobs (spread across pages 1–3)
111
+
2. Run Phase 2 grep on each (see below) to classify each sampled job individually, including sensor and stack trace origin
112
+
3. If **any** sampled job is CRITICAL, the mass verdict is CRITICAL — CRITICAL takes priority regardless of how many other jobs match an IGNORE pattern
113
+
4. Otherwise, apply the shared pattern's verdict to all failed jobs
114
+
5. In the summary, note the mass event and list only the sampled jobs as evidence
115
+
116
+
**Mass failure verdict rules — check in this order:**
117
+
118
+
-**CRITICAL** if the shared stack trace originates in `org.sonar.plugins.javascript` — the
119
+
SonarJS plugin itself is broken (e.g. fails to initialize, crashes during analysis). This
120
+
takes priority over any infrastructure explanation: if the SonarJS plugin is at fault, it
121
+
must be fixed before release regardless of how many jobs are affected.
122
+
123
+
-**IGNORE** if the shared error is a Peach infrastructure failure with no SonarJS involvement:
124
+
- Peach server down: `HTTP 502 Bad Gateway` at `peach.sonarsource.com/api/server/version`
125
+
- Artifact expired: `Artifact has expired (HTTP 410)` during JAR download — **only when
126
+
exit code 1 is the sole failure**. If exit code 3 also appears, the analysis still ran
127
+
after the download failure; treat exit code 3 as a separate failure and check it for
128
+
SonarJS involvement before concluding IGNORE.
129
+
130
+
-**NEEDS-MANUAL-REVIEW** if the pattern does not match either of the above.
131
+
132
+
**Step 5 — Read the classification guide and triage all logs**
133
+
134
+
Read `docs/peach-main-analysis.md` once to load the failure categories and decision flowchart.
135
+
136
+
Create the work directory where logs will be stored for inspection:
description: "Classify the failure of job JOB_NAME (id: JOB_ID) in Peach Main Analysis run RUN_ID.
81
-
Download the logs and classify using docs/peach-main-analysis.md."
82
-
```
83
141
84
-
**Step 1.5 — Launch parallel assessment agents**
142
+
Then triage each failed job using a graduated approach. Work through phases as needed — stop as
143
+
soon as a job can be classified. Run all jobs in parallel within each phase.
144
+
145
+
**Phase 1 — Download log and filter for failure signals (always, all jobs in parallel)**
85
146
86
-
Before launching agents, download the logs for ALL failed jobs:
147
+
Download the log to disk, then filter for key failure signals. Saving to disk avoids re-downloading
148
+
in Phase 2 and leaves logs available for manual inspection after the run. Do NOT use `tail -40` —
149
+
cleanup steps often run after the scan step fails (e.g. always-run SHA extraction), pushing the
150
+
exit code out of the tail window. A multi-line `sed -n` script is more reliable and easier to
151
+
maintain than one long regular expression. `--sandbox` prevents sed from executing shell commands
152
+
via the `e` command, which is a risk when processing untrusted log content:
87
153
88
154
```bash
89
-
gh api "repos/SonarSource/peachee-js/actions/jobs/JOB_ID/logs"
155
+
gh api "repos/SonarSource/peachee-js/actions/jobs/JOB_ID/logs" \
156
+
> target/peach-logs/JOB_ID.log
157
+
sed --sandbox -n '
158
+
/Process completed with exit code/p
159
+
/EXECUTION FAILURE/p
160
+
/OutOfMemoryError/p
161
+
/502 Bad Gateway/p
162
+
/503 Service Unavailable/p
163
+
/Artifact has expired/p
164
+
/All 3 attempts failed/p
165
+
/ERR_PNPM/p
166
+
/ERESOLVE/p
167
+
/ETARGET/p
168
+
/notarget/p
169
+
/Invalid value of sonar/p
170
+
/does not exist for/p
171
+
' target/peach-logs/JOB_ID.log
90
172
```
91
173
92
-
Run this for each failed job. Store the output (trim to the most relevant ~100 lines if very long — keep lines containing ERROR, exit code, exception names, step group headers). You will pass these logs inline to each agent.
174
+
Use the decision flowchart and failure categories from `docs/peach-main-analysis.md` to classify
175
+
the filtered output. If the filtered lines show exit code 3 (EXECUTION FAILURE from the
176
+
SonarQube scanner), always continue to Phase 2 — Phase 1 does not surface Java stack traces,
177
+
so the SonarJS plugin involvement cannot be ruled out from Phase 1 alone.
93
178
94
-
Then update each agent prompt to include the logs inline. The per-job agent prompt becomes (fill in JOB_NAME, JOB_ID, TASK_ID, LOG_CONTENT):
In a SINGLE message, launch one Agent tool call per failed job. All agents run concurrently.
97
-
Each agent receives this prompt (fill in JOB_NAME, JOB_ID, TASK_ID, LOG_CONTENT):
181
+
When Phase 1 shows exit code 3, run this to find the last sensor that ran and surface any
182
+
SonarJS plugin stack trace. The log is already on disk from Phase 1 — no re-download needed:
98
183
184
+
```bash
185
+
sed --sandbox -n '
186
+
/Sensor /p
187
+
/EXECUTION FAILURE/p
188
+
/OutOfMemoryError/p
189
+
/Process completed with exit code/p
190
+
/org\.sonar\.plugins\.javascript/p
191
+
' target/peach-logs/JOB_ID.log
99
192
```
100
-
You are assessing a failed job in the Peach Main Analysis workflow.
101
193
102
-
Job: JOB_NAME
103
-
Job ID: JOB_ID
104
-
Task ID: TASK_ID (mark this task complete when done)
194
+
This surfaces both the last sensor that ran and any `org.sonar.plugins.javascript` frames in the
195
+
stack trace. Apply the classification rules in `docs/peach-main-analysis.md` and run this only
196
+
for jobs that need it, all concurrently.
105
197
106
-
Your steps:
107
-
0. Working directory: use the directory where this skill was invoked (run `pwd` to confirm). Verify that `docs/peach-main-analysis.md` exists before reading it.
108
-
1. Read docs/peach-main-analysis.md to understand failure categories and the decision flowchart
109
-
2. The job logs are provided below between the <job-logs> tags. Read them carefully.
198
+
**Phase 3 — Full log (only when Phase 2 is still ambiguous)**
110
199
111
-
<job-logs>
112
-
LOG_CONTENT
113
-
</job-logs>
200
+
If the failure still cannot be classified (unrecognised stack trace, unexpected exit code), read
201
+
the full log from disk using the `Read` tool on `target/peach-logs/JOB_ID.log`. This should be
202
+
rare.
114
203
115
-
3. Classify the failure using the decision flowchart in docs/peach-main-analysis.md
116
-
4. Mark the task TASK_ID as complete
117
-
5. Return a structured assessment:
204
+
**Step 6 — Classify each job**
118
205
206
+
Using the decision flowchart from the classification guide, classify each job directly from the
207
+
logs. Most failures are unambiguous (clone timeout, dep install failure, project misconfiguration)
208
+
and need no further help.
209
+
210
+
**Only launch parallel agents when** a job's logs are ambiguous — e.g. an unfamiliar stack trace
211
+
or an exit code that doesn't match any known category. Launch one Agent per ambiguous job,
212
+
all concurrently, passing the classification rules and log excerpt inline:
213
+
214
+
```
215
+
You are assessing a failed job in the Peach Main Analysis workflow.
Category: <category name from docs/peach-main-analysis.md>
122
-
Evidence: <the key log line(s) that led to this verdict, max 2 lines>
123
-
124
-
Do not do anything else. Just classify and return the assessment.
224
+
Category: <category name>
225
+
Evidence: <key log line(s), max 2>
125
226
```
126
227
127
-
**Step 1.6 — Collect results and print summary**
228
+
If an agent returns no structured assessment, record that job as `NEEDS-MANUAL-REVIEW` with
229
+
evidence `Agent returned no output`.
230
+
231
+
**Step 7 — Check for clustered failures**
128
232
129
-
Wait for all agents to return. If any agent returned no structured assessment, record that job as `NEEDS-MANUAL-REVIEW` with evidence `Agent returned no output`. Then print the summary table:
233
+
If 2 or more jobs share the same category, check whether they failed within a
234
+
5-minute window. Use `completedAt` timestamps if available; otherwise extract the timestamp prefix
235
+
from log lines (format: `2026-MM-DDTHH:MM:SS.`). If clustered, record a general note for the
236
+
summary, for example:
237
+
> ⚠️ N jobs failed with the same pattern within a 5-minute window — likely caused by a single infrastructure event.
238
+
239
+
**Step 8 — Print summary**
240
+
241
+
Sort rows by verdict: CRITICAL first, then NEEDS-MANUAL-REVIEW, then IGNORE.
242
+
Place the Category column first. After the verdict counts and release recommendation, list any
243
+
general notes collected during log analysis (for example clustered failures or mass-failure
0 commit comments