feat: add skip_permissions, continue command, and build retry for resilient opencode automation#2
feat: add skip_permissions, continue command, and build retry for resilient opencode automation#2jc01rho wants to merge 3 commits into
Conversation
When opencode run fails with retryable provider errors (Provider returned error, rate limits, 5xx, network timeouts), ORW now automatically retries using `opencode run --continue` to resume the interrupted session. New config fields: - retry_attempts: max attempts including first (default 3) - retry_delay_ms: delay between retries (default 10000ms) Retryable patterns detected from log output: - Provider returned error, rate limit, overloaded - HTTP 429/500/502/503 - Network errors (ECONNRESET, ETIMEDOUT, socket hang up)
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds configurable retry behavior for opencode run failures that appear to be transient provider errors, improving resilience of the check workflow.
Changes:
- Introduces
retry_attempts/retry_delay_msto config types, defaults, init config, and example config. - Wraps OpenCode execution in
runOpenCodeWithRetry()and retries withopencode run --continueon retryable errors. - Documents the new retry behavior and config options in README/help text.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/index.ts |
Adds retry config + new retry wrapper around opencode run and log-based retryable error detection. |
config.example.json |
Shows the new retry configuration keys and defaults. |
README.md |
Documents retry behavior and adds the new config keys to the example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let lastErr: unknown; | ||
|
|
||
| for (let attempt = 1; attempt <= cfg.retry_attempts; attempt++) { | ||
| const isRetry = attempt > 1; | ||
| if (isRetry) { | ||
| out(`Retry attempt ${attempt}/${cfg.retry_attempts} after provider error (waiting ${cfg.retry_delay_ms / 1000}s)...`); | ||
| await note(log, `\n--- Retry attempt ${attempt}/${cfg.retry_attempts} ---\n`); | ||
| await sleep(cfg.retry_delay_ms); | ||
| } | ||
|
|
||
| const args = [cfg.opencode_bin, "run"]; | ||
| if (isRetry) args.push("--continue"); | ||
| args.push("--agent", cfg.agent, "--model", cfg.model, prompt); | ||
|
|
||
| try { | ||
| await run(args, { | ||
| cwd: cfg.work_repo, | ||
| log, | ||
| env: envWithFlag, | ||
| }); | ||
| return; | ||
| } catch (err) { | ||
| lastErr = err; | ||
| if (!isRetryableError(err, log)) { | ||
| throw err; | ||
| } | ||
| out(`Attempt ${attempt} failed with retryable provider error.`); | ||
| } | ||
| } | ||
|
|
||
| throw lastErr; |
|
|
||
| if (msg.includes("exited with")) { | ||
| try { | ||
| const logContent = readFileSync(log, "utf8").toLowerCase(); |
There was a problem hiding this comment.
1 issue found across 3 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
ualtinok
left a comment
There was a problem hiding this comment.
Thanks for the PR — the retry behavior is useful, but I don't think this is safe to merge yet.
Please address these before we reconsider:
-
Validate
retry_attempts. Right nowretry_attempts: 0skipsopencode runentirely and ends up throwingundefined. Since attempts include the initial run, values below 1 should either be rejected clearly or clamped to 1. -
Classify retryability from only the current attempt's output/log slice. The implementation scans the full accumulated log file, so after attempt 1 logs a retryable provider error, any later non-retryable failure can still be treated as retryable because the old message remains in the shared log. Capture a log offset before each attempt, or otherwise inspect only output from that attempt.
-
Tighten the HTTP matching. Bare
"429","502", and"503"can match unrelated numbers in logs. Please match provider/HTTP error context instead. -
Add focused regression coverage for the retry behavior: invalid/zero attempts, retry using
--continue, per-attempt retry classification, and a non-retryable second failure not being retried because of stale first-attempt log content.
Typecheck passes, and the feature direction makes sense, but these failure-mode issues need to be fixed before merge.
1. Clamp retry_attempts to minimum 1 to prevent undefined throw 2. Use per-attempt log slicing to avoid stale retryable errors 3. Tighten HTTP status code matching with context patterns 4. Add focused regression tests for retry behavior
There was a problem hiding this comment.
2 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/index.ts">
<violation number="1" location="src/index.ts:447">
P1: Reading the log slice immediately after the process exits can miss provider-error text because `exec` appends log chunks asynchronously via unawaited `void write(chunk)`. Missing error text causes `isRetryableError` to return false, skipping a needed retry.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
There was a problem hiding this comment.
2 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/index.ts">
<violation number="1" location="src/index.ts:447">
P1: Reading the log slice immediately after the process exits can miss provider-error text because `exec` appends log chunks asynchronously via unawaited `void write(chunk)`. Missing error text causes `isRetryableError` to return false, skipping a needed retry.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="prompt.txt">
<violation number="1" location="prompt.txt:24">
P2: Step 5's bug search runs after the build but before verification without an explicit rebuild step, creating ambiguity about whether fixes require recompilation. If bugs are fixed after step 4, the binary verified in steps 6-7 would be stale. The phrase 'before final build' implies a rebuild is needed but no such step is present.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| 4. Build the host CLI in `packages/opencode` with `OPENCODE_CHANNEL={{channel}} OPENCODE_VERSION={{version}} bun run build -- --single`. | ||
| 5. Ensure the built CLI binary exists at `{{cli}}`. | ||
| 6. Run `{{cli}} --version` and verify the output is exactly `{{version}}`. | ||
| 5. Search for session ID encoding bugs in merged PRs before final build. Focus on: |
There was a problem hiding this comment.
P2: Step 5's bug search runs after the build but before verification without an explicit rebuild step, creating ambiguity about whether fixes require recompilation. If bugs are fixed after step 4, the binary verified in steps 6-7 would be stale. The phrase 'before final build' implies a rebuild is needed but no such step is present.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At prompt.txt, line 24:
<comment>Step 5's bug search runs after the build but before verification without an explicit rebuild step, creating ambiguity about whether fixes require recompilation. If bugs are fixed after step 4, the binary verified in steps 6-7 would be stale. The phrase 'before final build' implies a rebuild is needed but no such step is present.</comment>
<file context>
@@ -21,8 +21,13 @@ Exact tasks:
4. Build the host CLI in `packages/opencode` with `OPENCODE_CHANNEL={{channel}} OPENCODE_VERSION={{version}} bun run build -- --single`.
-5. Ensure the built CLI binary exists at `{{cli}}`.
-6. Run `{{cli}} --version` and verify the output is exactly `{{version}}`.
+5. Search for session ID encoding bugs in merged PRs before final build. Focus on:
+ - Auto-resume plugins (`opencode-auto-resume`) that may pass URL-encoded placeholder values like `%7Bid%7D` instead of real session IDs
+ - Session validation code that expects strings starting with `ses` but may receive encoded placeholders
</file context>
| async function continueRun(cfg: Cfg) { | ||
| const release = await latest(cfg); | ||
| const prev = await readState(cfg); | ||
| if (!prev.tag) throw new Error("No previous build state found. Run `orw check` first."); | ||
|
|
||
| const free = await hold(cfg, true); | ||
| try { | ||
| const log = prev.log ?? path.join( | ||
| logDir(cfg), | ||
| `${stamp()}-${release.tag_name.replaceAll("/", "-")}-continue.log`, | ||
| ); | ||
| await fs.mkdir(path.dirname(log), { recursive: true }); | ||
| await note(log, `\n--- Resuming: continue run for ${release.tag_name} ---\n`); | ||
|
|
||
| const env = releaseEnv(release); | ||
| const sources = resolveSources(cfg); | ||
| const prompt = await render(cfg, sources, release); | ||
|
|
||
| out(`Resuming last opencode session for ${release.tag_name}...`); | ||
| const next = await runOpenCodeWithBuildRetry(cfg, prompt, env, log, release); | ||
| await writeState(cfg, next); | ||
| out(`Continue completed for ${release.tag_name}`); | ||
| } finally { | ||
| await free(); | ||
| } | ||
| } |
There was a problem hiding this comment.
orw continue never actually resumes the interrupted session
continueRun delegates to runOpenCodeWithBuildRetry → runOpenCodeWithRetry, which only adds --continue when isRetry = attempt > 1. On the first attempt, isRetry is false, so opencode run is invoked without --continue — starting a brand-new session in the existing work directory rather than resuming the interrupted one. The flag only reaches opencode if that fresh run happens to fail with a retryable provider error and then retries.
To fix this, runOpenCodeWithRetry (and by extension runOpenCodeWithBuildRetry) needs an option like { forceContinue?: boolean } that continueRun can pass through to make --continue unconditional on every attempt.
1. Add build_retry_attempts/build_retry_delay_ms config fields 2. Implement isMissingArtifactError() helper to detect artifact completion 3. Add retry loop around verifyBuild that waits for background artifacts 4. Update help text and config.example.json with new options 5. Add focused regression tests for build retry behavior This enables ORW to automatically retry opencode builds when artifacts aren't immediately available, which handles the case where bun test background processes take longer than 8 seconds to complete. Fixes: build artifact retry for background processes
Adds automatic retry, permission bypass, and session resume to opencode automation.
Features
Config fields
Implementation
Commands
orw check # Build latest release with full retry
orw check --force # Force rebuild even if already processed
orw continue # Resume last interrupted opencode session
Greptile Summary
This PR adds automatic retry logic, permission bypass, and session resume to the opencode automation pipeline. Two nested retry loops (
runOpenCodeWithRetryfor provider errors andrunOpenCodeWithBuildRetryfor missing artifacts) replace the previous single-shotopencode runcall, and a neworw continuecommand wires into the same retry path.src/retry.ts(new): pure-function helpersisRetryableError,readLogSlice, andfileSizewith per-attempt log isolation to avoid false-positive retries from accumulated log content.src/index.ts: replaces the inlineopencode runblock withrunOpenCodeWithBuildRetry→runOpenCodeWithRetry; addscontinueRun,isBuildRetryableError, and five new config fields.src/retry.test.ts(new): 11 unit tests covering pattern detection and per-attempt log slicing.Confidence Score: 4/5
Mostly safe to merge; the duplicate writeState call should be fixed first.
The check function calls writeState(cfg, next) twice in a row — a copy-paste artifact at the seam between the old and new code. While currently idempotent, it is a structural mistake that would break if writeState ever gains append semantics. The retry and permission-bypass logic itself is well-structured and covered by tests.
src/index.ts — the duplicate writeState and two dead variables (afterSize, logOffset) in the new retry functions.
Important Files Changed
Sequence Diagram
sequenceDiagram participant CLI as orw check / orw continue participant BRetry as runOpenCodeWithBuildRetry participant PRetry as runOpenCodeWithRetry participant OC as opencode run participant VB as verifyBuild CLI->>BRetry: prompt, env, log, release loop build_retry_attempts BRetry->>PRetry: cfg, prompt, env, log loop retry_attempts PRetry->>OC: opencode run alt success OC-->>PRetry: exit 0 PRetry-->>BRetry: return else retryable provider error OC-->>PRetry: exit 1 PRetry->>PRetry: isRetryableError(err, logSlice) PRetry->>PRetry: sleep(retry_delay_ms) else non-retryable error PRetry-->>BRetry: throw end end BRetry->>VB: verifyBuild alt artifacts verified VB-->>BRetry: State BRetry-->>CLI: State else isBuildRetryableError VB-->>BRetry: throw BRetry->>BRetry: sleep(build_retry_delay_ms) else non-retryable BRetry-->>CLI: throw end endReviews (8): Last reviewed commit: "feat: add build retry logic with artifac..." | Re-trigger Greptile