Skip to content

test(node): force derivation self-heal reorg once for QA#1004

Open
curryxbo wants to merge 3 commits into
mainfrom
test/derivation-force-reorg
Open

test(node): force derivation self-heal reorg once for QA#1004
curryxbo wants to merge 3 commits into
mainfrom
test/derivation-force-reorg

Conversation

@curryxbo

@curryxbo curryxbo commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

目的

专用测试分支(禁止合回 main),在线上 QA verify 节点上验证中心化 sequencer 的 reorg 能正常触发和处理。

改动(极小)

只改 node/derivation/derivation.go,完全复用既有的「按 batch reorg」实现(local-verify 下 blob hash 不匹配 → self-heal → deriveForce(batchInfoFull, 0) 整批重写 → EL SetCanonical,外层 withReactorsQuiesced)。

唯一的行为改动是 self-heal 的触发条件:

forced := !d.forceReorgDone
if forced || rebuilt[i] != batchInfo.blobHashes[i] {
    d.forceReorgDone = true
    ...
}

即让节点 derive 的第一个已提交 batch 无条件进入 self-heal 一次(forceReorgDone 去重),不再依赖真实 blob 不匹配。self-heal 循环体、deriveForce、reactor quiesce 全部原样不动。另:结构体加一个 forceReorgDone bool 字段,日志加一个 forced 字段。无新增配置 / env / flag。

关键日志(观测点)

  • blob hash mismatch; triggering self-heal reorg(带 forced=true
  • deriveForce 原有逐块 block written via NewSafeL2Block
  • geth:Chain reorg detected drop=.. add=..

行为说明

健康节点上重新 derive 该 batch 得到的块内容与本地一致,deriveForce 钉父块从 batch 首块下方重写,触发 EL 把 batch 内已有块先 drop 再按相同内容接回 → reorg 处理链路被完整跑一遍,末态链与原链一致。即「触发并正确处理 reorg」的 smoke 验证。

验证

gofmt / go build ./derivation/... / go vet ./derivation/... 通过。

注意

仅用于一次性观测的 verify 节点;不要 merge 到 main

Test-only hook (test/derivation-force-reorg branch) that forces the
local-verify self-heal path to fire exactly once, on the first committed
batch this process derives, regardless of whether the local blobs
actually diverge. This reuses the real batch-granular reorg mechanism
(deriveForce on the whole batch -> EL SetCanonical, wrapped in reactor
quiesce/restart) so reorg trigger + handling can be observed on a live
QA node. Adds L2 head BEFORE/AFTER logging alongside geth's
"Chain reorg detected".

QA verification only; must NOT be merged to main.

Co-authored-by: Cursor <cursoragent@cursor.com>
@curryxbo curryxbo requested a review from a team as a code owner June 24, 2026 02:39
@curryxbo curryxbo requested review from tomatoishealthy and removed request for a team June 24, 2026 02:39
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@curryxbo, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 47 minutes and 49 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aba3574c-442e-4bd0-a59d-ba44708e0b9a

📥 Commits

Reviewing files that changed from the base of the PR and between 4f1c761 and 33f8474.

📒 Files selected for processing (1)
  • node/derivation/derivation.go
📝 Walkthrough

Walkthrough

A forceReorgDone bool field is added to the Derivation struct as a QA-only hook. The self-heal trigger in derivationBlock's local-verify path is extended to fire either once as a forced reorg (on the first batch, gated by forceReorgDone) or on any blob-hash mismatch, replacing the previous first-mismatch-only logic. L2 head state is logged before and after the reorg attempt.

Changes

QA Force-Reorg Self-Heal Hook

Layer / File(s) Summary
forceReorgDone field and updated self-heal trigger logic
node/derivation/derivation.go
Adds forceReorgDone bool to Derivation with QA-only semantics; rewrites derivationBlock's mismatch loop to use mismatchIdx, trigger on forced-or-mismatch, set forceReorgDone = true, and log L2 head state before and after the self-heal reorg attempt.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit hops in, flips a flag just once,
"Force the reorg!" it squeaks with a bunny's blunt grunts.
The blob hashes checked, the L2 head logged clear,
Before and after — the trail now appears.
QA test done, the warren sleeps tight,
(Please don't merge this to main overnight!) 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: forcing a derivation self-heal reorg once for QA testing purposes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/derivation-force-reorg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
node/derivation/derivation.go (1)

440-491: 🩺 Stability & Availability | 🔵 Trivial

Caveat: forcing deriveForce(batchInfoFull, 0) reorgs the chain back to the batch tip.

deriveForce with skipNumber == 0 re-anchors at firstBlockNumber-1 and rewrites every block in the batch via SetCanonical. If the first batch this process derives is behind the current L2 head (e.g. derivation cursor lags blocks already pulled in via P2P), this will drop all blocks ahead of the batch tip on the QA node — not just observe a no-op reorg. Since the trigger fires on the first derived batch regardless of position, please confirm the QA scenario expects (or is fine with) discarding any P2P-synced blocks past the batch tip.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/derivation/derivation.go` around lines 440 - 491, The call to
deriveForce with skipNumber set to 0 will re-anchor the chain and potentially
drop all blocks ahead of the batch tip if the derivation cursor lags behind the
current L2 head. Add a comment in the code block containing the deriveForce call
(specifically within the withReactorsQuiesced closure) that explicitly documents
this caveat and confirms that the QA scenario is aware that any P2P-synced
blocks past the batch tip will be discarded. Alternatively, add a guard
condition before calling deriveForce to verify the batch is at an expected
position relative to the L2 head, or add an assertion that documents the
assumption being made about the derivation state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/derivation/derivation.go`:
- Around line 440-446: The `d.forceReorgDone` flag in the force reorg logic is
being set too early before the actual reorg work executes. Move the
`d.forceReorgDone = true` assignment from its current location (before
`fetchRollupDataByTxHash` and `deriveForce` calls) to after the
`withReactorsQuiesced` call completes successfully, so that the flag is only set
once the forced reorg has actually completed without error. This ensures that if
the reorg operations fail and return early, the flag remains unset and the
forced reorg can be retried by subsequent batch attempts.

---

Nitpick comments:
In `@node/derivation/derivation.go`:
- Around line 440-491: The call to deriveForce with skipNumber set to 0 will
re-anchor the chain and potentially drop all blocks ahead of the batch tip if
the derivation cursor lags behind the current L2 head. Add a comment in the code
block containing the deriveForce call (specifically within the
withReactorsQuiesced closure) that explicitly documents this caveat and confirms
that the QA scenario is aware that any P2P-synced blocks past the batch tip will
be discarded. Alternatively, add a guard condition before calling deriveForce to
verify the batch is at an expected position relative to the L2 head, or add an
assertion that documents the assumption being made about the derivation state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 465f5f35-a024-47cd-8821-babaddf71b8e

📥 Commits

Reviewing files that changed from the base of the PR and between 13dcf82 and 4f1c761.

📒 Files selected for processing (1)
  • node/derivation/derivation.go

Comment thread node/derivation/derivation.go Outdated
Comment on lines +440 to +446
if forceReorg || mismatchIdx >= 0 {
if forceReorg {
d.forceReorgDone = true
d.logger.Info("FORCE-REORG test: forcing self-heal reorg on this batch (QA only, test branch)",
"batchIndex", batchInfo.batchIndex,
"firstBlockNumber", batchInfo.firstBlockNumber,
"lastBlockNumber", batchInfo.lastBlockNumber)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

forceReorgDone is set before the reorg actually runs — a fetch/derive error silently consumes the one-shot.

d.forceReorgDone = true is set at Line 442, before fetchRollupDataByTxHash (Line 454) and deriveForce (Line 477). If either fails and returns, the forced reorg never happens but the hook is already disarmed, so no later batch retries it — defeating the QA goal of observing the reorg at least once. For a test hook, consider setting the flag only after a successful deriveForce.

Proposed adjustment
 			if forceReorg || mismatchIdx >= 0 {
 				if forceReorg {
-					d.forceReorgDone = true
 					d.logger.Info("FORCE-REORG test: forcing self-heal reorg on this batch (QA only, test branch)",

…and set d.forceReorgDone = true after the withReactorsQuiesced call returns without error (after Line 484).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/derivation/derivation.go` around lines 440 - 446, The `d.forceReorgDone`
flag in the force reorg logic is being set too early before the actual reorg
work executes. Move the `d.forceReorgDone = true` assignment from its current
location (before `fetchRollupDataByTxHash` and `deriveForce` calls) to after the
`withReactorsQuiesced` call completes successfully, so that the flag is only set
once the forced reorg has actually completed without error. This ensures that if
the reorg operations fail and return early, the flag remains unset and the
forced reorg can be retried by subsequent batch attempts.

corey and others added 2 commits June 24, 2026 10:45
Revert the loop restructuring. Now the only behavior change is the
self-heal trigger condition (forced || blob mismatch) plus a one-shot
flag; the loop body and BEFORE/AFTER head reads are removed. Rely on
deriveForce's existing per-block log and geth's "Chain reorg detected".

Co-authored-by: Cursor <cursoragent@cursor.com>
Make the reorg visible end-to-end from node logs:
- snapshot canonical hashes of the blocks about to be rewritten
- log EL head before the rewrite (batch tip)
- per block: oldHash -> newHash + live EL head after the write
  (the head drops to the pinned parent then climbs back -- the reorg)
- log EL head after the rewrite completes

On a healthy node the rewritten content is identical so the per-block
hash is unchanged; the EL head drop/climb plus geth's
"Chain reorg detected" are the proof the reorg actually happened.

Co-authored-by: Cursor <cursoragent@cursor.com>
@tomatoishealthy

Copy link
Copy Markdown
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants