Skip to content

fix(core): 🐛 converge orphaned subagents and account final turns for judge-completed goals#226

Merged
jorben merged 2 commits into
masterfrom
fix/orphaned-helpers-turn
Jun 11, 2026
Merged

fix(core): 🐛 converge orphaned subagents and account final turns for judge-completed goals#226
jorben merged 2 commits into
masterfrom
fix/orphaned-helpers-turn

Conversation

@HayWolf

@HayWolf HayWolf commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add backstop to converge orphaned non-terminal run helpers (e.g. Judge subagents) to interrupted when a run finishes, preventing them from lingering at running forever after interrupt/cancel
  • Account the final turn for goals that are judge-completed within the same run, so the finished turn count matches what was shown while the run was active
  • Add helper_judge to the frontend helper kind formatter so Judge Agent shows correctly in the UI

Test Plan

  • Verify orphaned helpers are converged on run cancel/interrupt
  • Verify turn count stays consistent when Judge completes a goal
  • Verify turn accounting is idempotent (no double count on re-evaluation)

🤖 Generated with TiyCode

…judge-completed goals

When a run finishes while a subagent (e.g. a Judge) is still active in the
DB, its helper row can linger at `running` forever because only live
SubagentCompleted/Failed events flip the status. This adds a backstop in
the run-finalization path that:

- Marks all non-terminal helpers for the run as `interrupted`
- Emits matching `SubagentFailed` events so the DB and live stream converge

Additionally, a Judge can flip a goal to `complete` within the same run,
but the normal turn-accounting path is skipped for non-active goals,
causing the finished turn count to drop by one. A new
`account_turn_if_unevaluated` function idempotently bills that final
turn so the running-vs-finished display stays consistent.

The frontend now also maps `helper_judge` to "Judge Agent".
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

AI Code Review Summary

PR: #226 (fix(core): 🐛 converge orphaned subagents and account final turns for judge-completed goals)
Preferred language: English

Overall Assessment

Detected 1 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

  • LOW (1)
    • src-tauri/Cargo.toml:63 - Cargo.toml depends on a release-candidate (rc) version of tiycore

Actionable Suggestions

  • Stabilise the tiycore dependency to a non-rc version before merging to master.
  • Add a more specific interrupt reason for RunStatus::Completed in the orphan helper convergence block.

Potential Risks

  • Using a release-candidate dependency may cause unexpected churn or break CI if the crate is yanked.
  • The RecordObservedUsage function is not gated by the last_usage dedup; future refactors could decouple the two, leading to stale compression-trigger state.
  • The read-before-write in interrupt_non_terminal_by_run can technically return rows that were not actually updated if raced, though the impact is benign.

Test Suggestions

  • Add an integration test for orphan helper convergence when a run finishes with helpers still running.
  • Ensure the frontend integration test validates the new context_size field value.

File-Level Coverage Notes

  • src-tauri/Cargo.toml: ok
  • src-tauri/src/core/agent_run_event_handler.rs: ok
  • src-tauri/src/core/agent_session.rs: ok
  • src-tauri/src/core/agent_session_compression.rs: ok
  • src-tauri/src/core/agent_session_events.rs: ok (The new record_observed_usage function is a drop-in replacement for the old calibration observer; tests should confirm the slot update logic.)
  • src-tauri/src/core/agent_session_tests.rs: ok (All removed tests (calibration seeding, ratio observation) are replaced with equivalent tests for the unified context-size trigger.)
  • src-tauri/src/core/agent_session_types.rs: ok
  • src-tauri/src/core/context_compression.rs: ok (The new should_compress_via_context_size and its tests are clear and comprehensive.)
  • src-tauri/src/core/goal_manager.rs: ok (The new idempotent turn accounting for Judge-terminal runs is sound and tested.)
  • src-tauri/src/model/thread.rs: ok (Adding context_size to RunUsageDto is forward-compatible; wire-level total_tokens remains unchanged.)
  • src-tauri/src/persistence/repo/goal_repo.rs: ok
  • src-tauri/src/persistence/repo/run_helper_repo.rs: ok (The new interrupt_non_terminal_by_run is well-tested.)
  • src-tauri/src/persistence/repo/run_repo.rs: ok
  • src-tauri/tests/frontend_integration.rs: ok
  • src-tauri/tests/goal_lifecycle.rs: ok
  • src/modules/workbench-shell/model/thread-store.ts: File was in scope but not fully reviewed before budget limits. (planner_signaled_done_before_coverage)
  • src/modules/workbench-shell/ui/dashboard-workbench-logic.ts: File was in scope but not fully reviewed before budget limits. (planner_signaled_done_before_coverage)
  • src/modules/workbench-shell/ui/dashboard-workbench.test.ts: File was in scope but not fully reviewed before budget limits. (planner_signaled_done_before_coverage)
  • src/modules/workbench-shell/ui/dashboard-workbench.tsx: File was in scope but not fully reviewed before budget limits. (planner_signaled_done_before_coverage)
  • src/modules/workbench-shell/ui/runtime-thread-surface-helpers.test.ts: File was in scope but not fully reviewed before budget limits. (planner_signaled_done_before_coverage)
  • ... and 8 more file-level entries.

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 28
  • Covered files: 15
  • Uncovered files: 13
  • No-patch/binary covered as file-level: 0
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • src/modules/workbench-shell/model/thread-store.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/dashboard-workbench-logic.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/dashboard-workbench.test.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/dashboard-workbench.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface-helpers.test.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface-helpers.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface-state.ts: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface.test.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/runtime-thread-surface.tsx: planner_signaled_done_before_coverage
  • src/modules/workbench-shell/ui/workbench-top-bar.tsx: planner_signaled_done_before_coverage
  • src/services/bridge/agent-commands.ts: planner_signaled_done_before_coverage
  • src/services/thread-stream/thread-stream.test.ts: planner_signaled_done_before_coverage
  • src/shared/types/api.ts: planner_signaled_done_before_coverage

No-patch covered list:

  • None

Runtime/Budget

  • Rounds used: 1/4
  • Planned batches: 2
  • Executed batches: 2
  • Sub-agent runs: 2
  • Planner calls: 1
  • Reviewer calls: 3
  • Model calls: 4/64
  • Structured-output summary-only degradation: NO

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 0
  • Findings with unknown confidence: 0
  • Inline comments attempted: 1
  • Target files: 7
  • Covered files: 7
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

@@ -16,7 +16,7 @@ use crate::ipc::app_events::{
use crate::ipc::frontend_channels::ThreadStreamEvent;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed for this PR diff. No concrete inline issue was selected after aggregation.

升级 tiycore 0.2.9 → 0.2.10-rc.2,把跨协议的
`Usage::context_size()` 作为统一的上下文用量事实来源,贯通
RunUsageDto / 前端 badge / 自动压缩触发,清理原先的
`initial_context_calibration` 启发式估算路径。

主要变更:
- 升级 `src-tauri/Cargo.toml` 至 tiycore 0.2.10-rc.2。
- `RunUsageDto`(`src-tauri/src/model/thread.rs`)新增
  `context_size: u64` 字段;doc-comment 区分 wire-level
  `total_tokens` 与统一语义 `context_size`;`From<Usage>` 用
  `value.context_size()` 填入。前端 `src/shared/types/api.ts`
  同步暴露 `contextSize`。
- `readUsage`(`src/services/bridge/agent-commands.ts`)解析
  `contextSize`,缺失时按 `input+output+cache_read+cache_write`
  兜底;新增 `readUsageField` 助手。
- `agent_run_event_handler.rs::ThreadUsageUpdated` 注释说明
  `context_size` 派生规则。
- 前端 `ThreadContextUsage` / `buildThreadContextBadgeData` /
  `onUsage` / `mapRunSummaryToContextUsage` 统一改用
  `contextSize` 判定百分比与 isExceeded,`totalTokens` 保留为
  wire-level 显示;对应测试 fixture / 新增用例同步。
- 删除 `AgentSession.initial_context_calibration` 与
  `ContextCompressionRuntimeState`(及 `effective_prompt_tokens` /
  `current_context_token_calibration` /
  `record_pending_prompt_estimate` /
  `observe_context_usage_calibration` /
  `build_initial_context_token_calibration` /
  `run_summary_matches_primary_model`)。
- `context_compression.rs` 删除 `calibrate_total_tokens` /
  `should_compress_with_calibration`,新增
  `should_compress_via_context_size(last_usage, settings)`;
  `run_auto_compression` 启发式兜底改为 `messages.is_empty()`
  fast-path;`should_compress` 保留为非 hot-path 估算。
- `AgentSession.last_observed_usage: Arc<StdMutex<Option<Usage>>>`
  替代旧 `context_compression_state`;`configure_agent` /
  `set_transform_context` 闭包 / `agent_session_events.rs` /
  `agent.subscribe` 回调统一改用新字段。
- `run_repo.rs` / `run_helper_repo.rs` /
  `tests/frontend_integration.rs` /
  `agent_session_tests.rs::make_run_summary` 同步补
  `context_size`。
- 删除 / 改写 calibration 相关测试;`message_end_usage_updates_*`
  改为 `message_end_usage_updates_record_observed_usage_once_per_change`,
  `usage_calibration_*` 改为
  `usage_record_observed_includes_cache_read_for_context_size`,
  清理未使用的测试 helper。

验证:
- `cargo fmt --check --manifest-path src-tauri/Cargo.toml` clean
- `npm run typecheck` clean
- `npm run test:unit -- --run` 53 文件 846 通过 / 1 跳过
- `cargo test --locked --manifest-path src-tauri/Cargo.toml`
  lib + integration 全通过
- `agent_review` verdict: PASS

BREAKING CHANGE: 内部 `AgentSession.context_compression_state`
及其关联启发式 API 已移除,替换为
`AgentSession.last_observed_usage: Arc<StdMutex<Option<Usage>>>`;
前端 `RunUsageDto` 新增必填 `contextSize`,旧 `totalTokens` 保留
为 wire-level 显示。任何外部消费方需更新到新字段。
Refs: tiycore 0.2.10-rc.2 `Usage::context_size()`

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 1
  • Findings with unknown confidence: 0
  • Inline comments attempted: 1
  • Target files: 28
  • Covered files: 15
  • Uncovered files: 13
    See the summary comment for detailed analysis and coverage details.

Comment thread src-tauri/Cargo.toml
git2 = { version = "0.20", features = ["vendored-libgit2", "vendored-openssl"] }
similar = "2"
tiycore = "0.2.9"
tiycore = "0.2.10-rc.2"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Cargo.toml depends on a release-candidate (rc) version of tiycore

The tiycore dependency now points to a pre-release (rc) version, which may be acceptable during development but should be stabilised before merging to master.

Suggestion: Change to a stable release version before promoting the branch, or justify the rc dependency in the PR description.

Risk: Using an rc in the mainline can cause unexpected churn when the crate is yanked or bumped, and may violate internal publishing policies.

Confidence: 0.85

[From SubAgent: general]

@jorben jorben merged commit 6bf6da9 into master Jun 11, 2026
4 checks passed
@jorben jorben deleted the fix/orphaned-helpers-turn branch June 11, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants