test(node): force derivation self-heal reorg once for QA#1004
Conversation
Test-only hook (test/derivation-force-reorg branch) that forces the local-verify self-heal path to fire exactly once, on the first committed batch this process derives, regardless of whether the local blobs actually diverge. This reuses the real batch-granular reorg mechanism (deriveForce on the whole batch -> EL SetCanonical, wrapped in reactor quiesce/restart) so reorg trigger + handling can be observed on a live QA node. Adds L2 head BEFORE/AFTER logging alongside geth's "Chain reorg detected". QA verification only; must NOT be merged to main. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Warning Review limit reached
More reviews will be available in 47 minutes and 49 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA ChangesQA Force-Reorg Self-Heal Hook
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
node/derivation/derivation.go (1)
440-491: 🩺 Stability & Availability | 🔵 TrivialCaveat: forcing
deriveForce(batchInfoFull, 0)reorgs the chain back to the batch tip.
deriveForcewithskipNumber == 0re-anchors atfirstBlockNumber-1and rewrites every block in the batch viaSetCanonical. If the first batch this process derives is behind the current L2 head (e.g. derivation cursor lags blocks already pulled in via P2P), this will drop all blocks ahead of the batch tip on the QA node — not just observe a no-op reorg. Since the trigger fires on the first derived batch regardless of position, please confirm the QA scenario expects (or is fine with) discarding any P2P-synced blocks past the batch tip.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/derivation.go` around lines 440 - 491, The call to deriveForce with skipNumber set to 0 will re-anchor the chain and potentially drop all blocks ahead of the batch tip if the derivation cursor lags behind the current L2 head. Add a comment in the code block containing the deriveForce call (specifically within the withReactorsQuiesced closure) that explicitly documents this caveat and confirms that the QA scenario is aware that any P2P-synced blocks past the batch tip will be discarded. Alternatively, add a guard condition before calling deriveForce to verify the batch is at an expected position relative to the L2 head, or add an assertion that documents the assumption being made about the derivation state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/derivation/derivation.go`:
- Around line 440-446: The `d.forceReorgDone` flag in the force reorg logic is
being set too early before the actual reorg work executes. Move the
`d.forceReorgDone = true` assignment from its current location (before
`fetchRollupDataByTxHash` and `deriveForce` calls) to after the
`withReactorsQuiesced` call completes successfully, so that the flag is only set
once the forced reorg has actually completed without error. This ensures that if
the reorg operations fail and return early, the flag remains unset and the
forced reorg can be retried by subsequent batch attempts.
---
Nitpick comments:
In `@node/derivation/derivation.go`:
- Around line 440-491: The call to deriveForce with skipNumber set to 0 will
re-anchor the chain and potentially drop all blocks ahead of the batch tip if
the derivation cursor lags behind the current L2 head. Add a comment in the code
block containing the deriveForce call (specifically within the
withReactorsQuiesced closure) that explicitly documents this caveat and confirms
that the QA scenario is aware that any P2P-synced blocks past the batch tip will
be discarded. Alternatively, add a guard condition before calling deriveForce to
verify the batch is at an expected position relative to the L2 head, or add an
assertion that documents the assumption being made about the derivation state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 465f5f35-a024-47cd-8821-babaddf71b8e
📒 Files selected for processing (1)
node/derivation/derivation.go
| if forceReorg || mismatchIdx >= 0 { | ||
| if forceReorg { | ||
| d.forceReorgDone = true | ||
| d.logger.Info("FORCE-REORG test: forcing self-heal reorg on this batch (QA only, test branch)", | ||
| "batchIndex", batchInfo.batchIndex, | ||
| "firstBlockNumber", batchInfo.firstBlockNumber, | ||
| "lastBlockNumber", batchInfo.lastBlockNumber) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
forceReorgDone is set before the reorg actually runs — a fetch/derive error silently consumes the one-shot.
d.forceReorgDone = true is set at Line 442, before fetchRollupDataByTxHash (Line 454) and deriveForce (Line 477). If either fails and returns, the forced reorg never happens but the hook is already disarmed, so no later batch retries it — defeating the QA goal of observing the reorg at least once. For a test hook, consider setting the flag only after a successful deriveForce.
Proposed adjustment
if forceReorg || mismatchIdx >= 0 {
if forceReorg {
- d.forceReorgDone = true
d.logger.Info("FORCE-REORG test: forcing self-heal reorg on this batch (QA only, test branch)",…and set d.forceReorgDone = true after the withReactorsQuiesced call returns without error (after Line 484).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/derivation/derivation.go` around lines 440 - 446, The `d.forceReorgDone`
flag in the force reorg logic is being set too early before the actual reorg
work executes. Move the `d.forceReorgDone = true` assignment from its current
location (before `fetchRollupDataByTxHash` and `deriveForce` calls) to after the
`withReactorsQuiesced` call completes successfully, so that the flag is only set
once the forced reorg has actually completed without error. This ensures that if
the reorg operations fail and return early, the flag remains unset and the
forced reorg can be retried by subsequent batch attempts.
Revert the loop restructuring. Now the only behavior change is the self-heal trigger condition (forced || blob mismatch) plus a one-shot flag; the loop body and BEFORE/AFTER head reads are removed. Rely on deriveForce's existing per-block log and geth's "Chain reorg detected". Co-authored-by: Cursor <cursoragent@cursor.com>
Make the reorg visible end-to-end from node logs: - snapshot canonical hashes of the blocks about to be rewritten - log EL head before the rewrite (batch tip) - per block: oldHash -> newHash + live EL head after the write (the head drops to the pinned parent then climbs back -- the reorg) - log EL head after the rewrite completes On a healthy node the rewritten content is identical so the per-block hash is unchanged; the EL head drop/climb plus geth's "Chain reorg detected" are the proof the reorg actually happened. Co-authored-by: Cursor <cursoragent@cursor.com>
|
LGTM |
目的
专用测试分支(禁止合回 main),在线上 QA verify 节点上验证中心化 sequencer 的 reorg 能正常触发和处理。
改动(极小)
只改
node/derivation/derivation.go,完全复用既有的「按 batch reorg」实现(local-verify 下 blob hash 不匹配 → self-heal →deriveForce(batchInfoFull, 0)整批重写 → ELSetCanonical,外层withReactorsQuiesced)。唯一的行为改动是 self-heal 的触发条件:
即让节点 derive 的第一个已提交 batch 无条件进入 self-heal 一次(
forceReorgDone去重),不再依赖真实 blob 不匹配。self-heal 循环体、deriveForce、reactor quiesce 全部原样不动。另:结构体加一个forceReorgDone bool字段,日志加一个forced字段。无新增配置 / env / flag。关键日志(观测点)
blob hash mismatch; triggering self-heal reorg(带forced=true)deriveForce原有逐块block written via NewSafeL2BlockChain reorg detected drop=.. add=..行为说明
健康节点上重新 derive 该 batch 得到的块内容与本地一致,
deriveForce钉父块从 batch 首块下方重写,触发 EL 把 batch 内已有块先 drop 再按相同内容接回 → reorg 处理链路被完整跑一遍,末态链与原链一致。即「触发并正确处理 reorg」的 smoke 验证。验证
gofmt/go build ./derivation/.../go vet ./derivation/...通过。注意
仅用于一次性观测的 verify 节点;不要 merge 到 main。