Skip to content

test(derivation): continuous fullnode fork → self-heal chaos hook (QA only)#1005

Open
curryxbo wants to merge 2 commits into
mainfrom
test/fullnode-fork-selfheal
Open

test(derivation): continuous fullnode fork → self-heal chaos hook (QA only)#1005
curryxbo wants to merge 2 commits into
mainfrom
test/fullnode-fork-selfheal

Conversation

@curryxbo

@curryxbo curryxbo commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What

QA-only test branch (test/fullnode-fork-selfheal) to exercise the centralized-sequencer reorg / self-heal path on a live fullnode, with prominent staged logs.

This node continuously forks the chain tip. For every canonical block that extends the local head (number == head+1), instead of applying it we assemble this node's own block at the same height and apply that. The fork block's timestamp is pinned to the parent's (valid post-Emerald: header.Time == parent.Time), so its hash is guaranteed to differ from the canonical sibling (which uses time.Now()).

tendermint's StateV2 still advances on the canonical block it handed us, so geth runs a private fork at the unsafe tip while StateV2 tracks canonical heights.

How it heals

Recovery uses the real self-heal path — nothing in the reorg logic is changed:

  • As each L1 batch covering the forked range is derived, local-verify rebuilds the blob from local (forked) blocks and finds a blob-hash mismatch vs the committed batch.
  • That triggers deriveForce wrapped in withReactorsQuiesced (SetCanonical -> StartReactorsAfterReorg), which rewrites the whole batch back onto canonical and re-syncs StateV2 from geth.
  • The tip then forks again. Net effect: always forking, always reorging.

Changes (logs + a one-line block swap only)

  • node/core/executor.goApplyBlockV2: when the incoming block extends the tip, call injectForkBlockLocked (assemble own block on current head, apply it) instead of applying canonical. Logs REORG-TEST [1/3] FORKED per fork.
  • node/derivation/derivation.go — on the existing blob-mismatch -> deriveForce path, add REORG-TEST [2/3] REORG DETECTED and REORG-TEST [3/3] RECOVERED (with approximate reorg depth). Log-only.

Log markers

REORG-TEST [1/3] FORKED       (executor)    tip replaced with own fork block
REORG-TEST [2/3] REORG DETECTED (derivation) local blob mismatch vs L1 batch -> deriveForce
REORG-TEST [3/3] RECOVERED    (derivation)  deriveForce rewrote batch onto canonical

grep:

grep -E "REORG-TEST \[1/3\]" node.log   # forks at the tip
grep -E "REORG-TEST \[2/3\]" node.log   # batch reorg detected
grep -E "REORG-TEST \[3/3\]" node.log   # batch reorg recovered (with depth)

Safety

QA chaos branch. Deploy ONLY on a dedicated fullnode — never on a sequencer or in production. Based on origin/main.

Summary by CodeRabbit

  • Bug Fixes
    • Reduced divergence impact during short chain reorganizations using QA-only fork self-heal for the next canonical block.
    • Improved local verification recovery for gap fills and blob-hash mismatches, including clearer “recovered” context and depth estimates.
    • Enhanced reconciliation to more reliably rewrite missing or divergent blocks from the correct anchor point.
  • New Features
    • Added deterministic L2 block assembly via an explicit timestamp option to support QA recovery scenarios.

@curryxbo curryxbo requested a review from a team as a code owner June 24, 2026 03:38
@curryxbo curryxbo requested review from SecurityLife and removed request for a team June 24, 2026 03:38
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The executor now conditionally injects a next-height forked L2 block, L2 block assembly can take an explicit timestamp, and derivation updates its batch parsing, matching, and recovery flow.

Changes

QA Fork Injection and Reorg Recovery

Layer / File(s) Summary
Timestamped block assembly
node/types/retryable_client.go
AssembleL2BlockV2 forwards to AssembleL2BlockV2WithTimestamp, which accepts an explicit Unix timestamp for L2 block assembly.
Fork injection in executor
node/core/executor.go
injectForkBlockLocked assembles a forked block from the current head, applies it with NewL2BlockV2, and ApplyBlockV2 now tries that path when the incoming canonical block is head+1.
Block content matching helpers
node/derivation/batch_info.go
blockContentMatches compares local blocks against SafeL2Data by header fields and transaction bytes, and baseFeeEqual treats nil and zero base fees as equivalent.
Derivation recovery and logging
node/derivation/derivation.go
REORG-TEST logs now record pre-reorg head state and recovery details, parseBatch and L1-message handling are refactored, and deriveForce is rewritten to reconcile from firstBlockNumber-1 while keeping matching local blocks and rewriting the remainder.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • morph-l2/morph#966: Also changes node/core/executor.go, including Executor.ApplyBlockV2 and its block-application/reorg handling path.

Suggested reviewers

  • FletcherMan
  • Web3Jumb0

Poem

A bunny tapped the forked path bright,
And timestamped blocks took careful flight.
Old reorg trails were logged and told,
With matching bytes and heads made bold.
🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the QA-only fork-and-self-heal changes and the new chaos hook behavior.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/fullnode-fork-selfheal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
node/core/executor.go (1)

139-142: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Interval is a hardcoded const, not configurable; timeout literal duplicates it.

PR objectives and the change summary describe the cadence as configurable via forkChaosInterval (default 30s), but it is a compile-time const with no config wiring. Separately, injectForkBlock hardcodes 30*time.Second for the read/assemble/apply timeout (Line 168) rather than referencing a named value, so the two 30s values can silently drift. Source the interval from config and give the per-iteration timeout its own named constant.

Also applies to: 167-168

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/core/executor.go` around lines 139 - 142, The forkChaosInterval is
currently a hardcoded const but should be sourced from configuration with a
default of 30 seconds instead of being compile-time fixed. Additionally, the
injectForkBlock function contains a hardcoded 30*time.Second timeout literal
that duplicates the interval value, which can drift independently. Remove the
const forkChaosInterval declaration, wire it to be read from config with a
30-second default, and create a separate named constant for the per-iteration
read/assemble/apply timeout used in injectForkBlock (around line 168) to replace
the hardcoded literal so both timing values stay synchronized and are
maintainable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/core/executor.go`:
- Around line 154-162: The forkInjectRoutine function has an infinite loop with
no graceful shutdown path, causing goroutine leaks during executor shutdown. Add
a context.Context parameter to the forkInjectRoutine function and replace the
infinite for range t.C loop with a select statement that listens to both t.C and
ctx.Done(), returning when the context is cancelled. Ensure the context is
passed from the caller to properly coordinate shutdown with the executor's
lifecycle, similar to the pattern used in Derivation.Stop() in the derivation
package.
- Around line 131-137: The forkInjectRoutine call in the executor initialization
is spawning unconditionally for all non-DevSequencer executors, which means it
could corrupt production sequencers despite the routine's documentation stating
it should only run on dedicated fullnodes. Replace the unconditional go
executor.forkInjectRoutine() call with a gated execution that first checks an
explicit opt-in configuration flag (such as forkChaosEnabled or similar, which
should default to false) and asserts that the executor is running in a
fullnode-only role, not a sequencer or production role. This ensures the fork
injection routine only spawns when explicitly enabled and only on appropriate
fullnode instances.

---

Nitpick comments:
In `@node/core/executor.go`:
- Around line 139-142: The forkChaosInterval is currently a hardcoded const but
should be sourced from configuration with a default of 30 seconds instead of
being compile-time fixed. Additionally, the injectForkBlock function contains a
hardcoded 30*time.Second timeout literal that duplicates the interval value,
which can drift independently. Remove the const forkChaosInterval declaration,
wire it to be read from config with a 30-second default, and create a separate
named constant for the per-iteration read/assemble/apply timeout used in
injectForkBlock (around line 168) to replace the hardcoded literal so both
timing values stay synchronized and are maintainable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a3bfb76-b40a-4d5f-9542-a67018c4674c

📥 Commits

Reviewing files that changed from the base of the PR and between 13dcf82 and 5be6a3b.

📒 Files selected for processing (1)
  • node/core/executor.go

Comment thread node/core/executor.go Outdated
Comment on lines 131 to 137
// QA TEST BRANCH ONLY (test/fullnode-fork-selfheal): periodically fork the
// local chain so we can observe it self-heal back to canonical via reorg.
// Deploy ONLY on a dedicated fullnode — never on a sequencer or production.
go executor.forkInjectRoutine()

return executor, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift

Fork-injection runs on every non-dev executor, including production sequencers — no role gate.

The routine's own doc comment states it "MUST run only on a dedicated fullnode" and that on a sequencer "it would corrupt the chain it is supposed to produce." But this go executor.forkInjectRoutine() is spawned unconditionally for all non-DevSequencer executors. Production sequencers are not DevSequencer, so this code path would actively fork the canonical chain on a real sequencer.

There is also no enable/disable flag, contradicting the PR objective that this is "configurable via forkChaosInterval" and "never deployable on sequencer or production." Gate the spawn behind an explicit opt-in config that is verifiably off by default and that asserts a fullnode-only role.

🛡️ Suggested gating
-	// QA TEST BRANCH ONLY (test/fullnode-fork-selfheal): periodically fork the
-	// local chain so we can observe it self-heal back to canonical via reorg.
-	// Deploy ONLY on a dedicated fullnode — never on a sequencer or production.
-	go executor.forkInjectRoutine()
+	// QA TEST BRANCH ONLY (test/fullnode-fork-selfheal): periodically fork the
+	// local chain so we can observe it self-heal back to canonical via reorg.
+	// Deploy ONLY on a dedicated fullnode — never on a sequencer or production.
+	if config.ForkChaosEnabled {
+		go executor.forkInjectRoutine()
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/core/executor.go` around lines 131 - 137, The forkInjectRoutine call in
the executor initialization is spawning unconditionally for all non-DevSequencer
executors, which means it could corrupt production sequencers despite the
routine's documentation stating it should only run on dedicated fullnodes.
Replace the unconditional go executor.forkInjectRoutine() call with a gated
execution that first checks an explicit opt-in configuration flag (such as
forkChaosEnabled or similar, which should default to false) and asserts that the
executor is running in a fullnode-only role, not a sequencer or production role.
This ensures the fork injection routine only spawns when explicitly enabled and
only on appropriate fullnode instances.

Comment thread node/core/executor.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
node/derivation/derivation.go (1)

435-458: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Guard depthApprox against failed head reads and uint underflow.

Line 437 ignores BlockNumber errors, leaving preReorgHead as 0; Line 458 then subtracts batchInfo.firstBlockNumber, which can underflow and log a huge bogus reorg depth. Handle the read error and only compute depth when the head is at or after firstBlockNumber.

Suggested safe logging fix
-					preReorgHead, _ := d.l2Client.BlockNumber(ctx)
+					preReorgHead, preReorgHeadErr := d.l2Client.BlockNumber(ctx)
+					var depthApprox uint64
+					if preReorgHeadErr == nil && preReorgHead >= batchInfo.firstBlockNumber {
+						depthApprox = preReorgHead - batchInfo.firstBlockNumber + 1
+					}
 
 					err = d.withReactorsQuiesced(ctx, batchInfo.batchIndex, func() error {
 						var derErr error
 						lastHeader, derErr = d.deriveForce(batchInfoFull, 0)
 						return derErr
@@
 						"preReorgHead", preReorgHead,
+						"preReorgHeadErr", preReorgHeadErr,
 						"reorgedFromBlock", batchInfo.firstBlockNumber,
 						"reorgedToBlock", lastHeader.Number.Uint64(),
-						"depthApprox", preReorgHead-batchInfo.firstBlockNumber+1)
+						"depthApprox", depthApprox)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/derivation/derivation.go` around lines 435 - 458, The reorg recovery
logging in derivation.go needs to guard `depthApprox` against a failed
`d.l2Client.BlockNumber(ctx)` read and uint underflow. In the
`withReactorsQuiesced` block, handle the `BlockNumber` error instead of ignoring
it, and only log `depthApprox` when `preReorgHead` is valid and at or after
`batchInfo.firstBlockNumber`; otherwise omit it or use a safe fallback. Use the
existing symbols `preReorgHead`, `batchInfo.firstBlockNumber`, `lastHeader`, and
the `[3/3] RECOVERED` log to locate the change.
♻️ Duplicate comments (1)
node/core/executor.go (1)

507-520: 🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift

Gate fork injection behind explicit QA fullnode-only config.

This still injects forks for any executor whose incoming block extends the local head. The comment says this must never run on a sequencer or production, but the apply path shown has no opt-in flag or role guard, so a misdeployed binary can intentionally diverge the node. This is the same deployment-safety concern raised on the prior implementation.

Suggested guard shape
-	if head, hErr := e.l2Client.BlockNumber(context.Background()); hErr == nil && block.Number == head+1 {
+	if e.config.ForkChaosEnabled && e.config.IsFullnodeOnly {
+		head, hErr := e.l2Client.BlockNumber(context.Background())
+		if hErr == nil && block.Number == head+1 {
 		if forked, fErr := e.injectForkBlockLocked(block.Number); fErr == nil {
 			return forked, nil
 		} else {
 			e.logger.Error("FORK-INJECT failed; falling back to canonical apply", "number", block.Number, "err", fErr)
 		}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/core/executor.go` around lines 507 - 520, The fork injection logic in
Executor.apply currently runs whenever the incoming block extends the local
head, which is too broad for QA-only behavior. Add an explicit opt-in guard in
the apply path around the BlockNumber check and injectForkBlockLocked call so
this branch only executes for the intended test/fullnode-fork-selfheal
configuration and never on sequencer or production roles. Use the existing
Executor and injectForkBlockLocked flow as the insertion point, and keep the
canonical fallback path unchanged when the guard is not enabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/core/executor.go`:
- Around line 157-170: The executor methods are mixing bounded and unbounded RPC
contexts while holding e.mu, which can block other work if an RPC stalls. In
ApplyBlockV2, reuse the existing timeout context for NewL2BlockV2 instead of
context.Background(), and in the BlockNumber path around the executor mutex
replace the background context with a bounded timeout context as well. Update
the affected calls in executor.go so both NewL2BlockV2 and BlockNumber use the
same finite-lifetime context pattern as BlockByNumber and
AssembleL2BlockV2WithTimestamp.

---

Outside diff comments:
In `@node/derivation/derivation.go`:
- Around line 435-458: The reorg recovery logging in derivation.go needs to
guard `depthApprox` against a failed `d.l2Client.BlockNumber(ctx)` read and uint
underflow. In the `withReactorsQuiesced` block, handle the `BlockNumber` error
instead of ignoring it, and only log `depthApprox` when `preReorgHead` is valid
and at or after `batchInfo.firstBlockNumber`; otherwise omit it or use a safe
fallback. Use the existing symbols `preReorgHead`, `batchInfo.firstBlockNumber`,
`lastHeader`, and the `[3/3] RECOVERED` log to locate the change.

---

Duplicate comments:
In `@node/core/executor.go`:
- Around line 507-520: The fork injection logic in Executor.apply currently runs
whenever the incoming block extends the local head, which is too broad for
QA-only behavior. Add an explicit opt-in guard in the apply path around the
BlockNumber check and injectForkBlockLocked call so this branch only executes
for the intended test/fullnode-fork-selfheal configuration and never on
sequencer or production roles. Use the existing Executor and
injectForkBlockLocked flow as the insertion point, and keep the canonical
fallback path unchanged when the guard is not enabled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9e4baef-3776-486a-818a-bce578c250c1

📥 Commits

Reviewing files that changed from the base of the PR and between 5be6a3b and 8b55d67.

📒 Files selected for processing (3)
  • node/core/executor.go
  • node/derivation/derivation.go
  • node/types/retryable_client.go

Comment thread node/core/executor.go
Comment on lines +157 to +170
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

head, err := e.l2Client.BlockByNumber(ctx, nil)
if err != nil {
return false, fmt.Errorf("FORK-INJECT: read local head: %w", err)
}
forkTimestamp := head.Time()
l2Block, err := e.l2Client.AssembleL2BlockV2WithTimestamp(ctx, head.Hash(), forkTimestamp, nil)
if err != nil {
return false, fmt.Errorf("FORK-INJECT: assemble fork block: %w", err)
}
if _, err := e.l2Client.NewL2BlockV2(context.Background(), l2Block); err != nil {
return false, fmt.Errorf("FORK-INJECT: apply fork block: %w", err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Do not use unbounded RPC contexts while holding e.mu.

Line 157 creates a 30s context, but Line 169 discards it for NewL2BlockV2; Line 514 also calls BlockNumber with context.Background() inside the executor mutex. If either retrying RPC stalls, ApplyBlockV2 can block other executor work behind the lock. Use bounded contexts for both calls.

Suggested bounded-context fix
-	if _, err := e.l2Client.NewL2BlockV2(context.Background(), l2Block); err != nil {
+	if _, err := e.l2Client.NewL2BlockV2(ctx, l2Block); err != nil {
 		return false, fmt.Errorf("FORK-INJECT: apply fork block: %w", err)
 	}
-	if head, hErr := e.l2Client.BlockNumber(context.Background()); hErr == nil && block.Number == head+1 {
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+	if head, hErr := e.l2Client.BlockNumber(ctx); hErr == nil && block.Number == head+1 {

Also applies to: 514-519

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/core/executor.go` around lines 157 - 170, The executor methods are
mixing bounded and unbounded RPC contexts while holding e.mu, which can block
other work if an RPC stalls. In ApplyBlockV2, reuse the existing timeout context
for NewL2BlockV2 instead of context.Background(), and in the BlockNumber path
around the executor mutex replace the background context with a bounded timeout
context as well. Update the affected calls in executor.go so both NewL2BlockV2
and BlockNumber use the same finite-lifetime context pattern as BlockByNumber
and AssembleL2BlockV2WithTimestamp.

@tomatoishealthy

Copy link
Copy Markdown
Contributor

LGTM

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/derivation/batch_info.go`:
- Around line 258-280: The transaction comparison in blockContentMatches is only
checking SafeL2Data.Transactions against the full local block tx list, so blocks
with L1-message prefixes are falsely marked mismatched. Update
blockContentMatches to account for the L1-message prefix by comparing the local
transactions after the prefix against sd.Transactions, and make sure the caller
in deriveForce uses the same prefix-aware matching logic when deciding whether
to rewrite canonical blocks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c028e9b3-af50-43bd-b896-0bf695537830

📥 Commits

Reviewing files that changed from the base of the PR and between 5391ac1 and 5aea138.

📒 Files selected for processing (2)
  • node/derivation/batch_info.go
  • node/derivation/derivation.go

Comment thread node/derivation/batch_info.go
Continuously fork the local tip in the block-production path (distinct
timestamp so the fork is guaranteed to diverge from canonical) and emit
staged REORG-TEST [1/3] FORK / [2/3] DETECT / [3/3] RECOVER logs so QA can
watch a fullnode diverge and heal via L1 derivation. QA-only; not for
production.

Co-authored-by: Cursor <cursoragent@cursor.com>
@curryxbo curryxbo force-pushed the test/fullnode-fork-selfheal branch from 5aea138 to b59c859 Compare June 26, 2026 02:35
…them (#1008)

Verify-driven deriveForce: walking from the canonical anchor
(firstBlockNumber-1), keep each local block while its content matches the
batch, then rewrite via NewSafeL2Block from the first divergent or missing
height onward. Replaces the old skipNumber fill-gap, which blindly trusted
local blocks and could grow a permanent shadow chain (correct tx content,
wrong parent/state hashes) that never self-healed.

This is the fix under test by the chaos hook in the preceding commit; it is
the same change as PR #1008.

Co-authored-by: Cursor <cursoragent@cursor.com>
@curryxbo curryxbo force-pushed the test/fullnode-fork-selfheal branch from b59c859 to 03bd02b Compare June 26, 2026 03:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/derivation/derivation.go`:
- Around line 933-949: The keep-local-block path in derivation reconciliation
can accept a matching block even when the batch context skips numbers, so add a
contiguity guard in the block iteration that compares each
`blockData.SafeL2Data.Number` against `lastHeader.Number+1` before calling
`d.l2Client.BlockByNumber`. If the next number is not exactly consecutive,
switch to the rewrite path instead of continuing the keep logic, and keep the
existing `rewriting` flow and `blockContentMatches` check in `deriveForce`/the
`rollupData.blockContexts` loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af02f978-f84e-4089-babd-6124e956890a

📥 Commits

Reviewing files that changed from the base of the PR and between 5aea138 and 03bd02b.

📒 Files selected for processing (4)
  • node/core/executor.go
  • node/derivation/batch_info.go
  • node/derivation/derivation.go
  • node/types/retryable_client.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • node/derivation/batch_info.go
  • node/types/retryable_client.go
  • node/core/executor.go

Comment on lines +933 to +949
rewriting := false
for _, blockData := range rollupData.blockContexts {
// Skip blocks already present locally (scenario C). For scenario B
// skipNumber == 0 means this branch is never taken.
if blockData.SafeL2Data.Number <= skipNumber {
continue
// Keep the local block while its content still matches the batch; at
// the first divergent or missing height switch to rewrite for the
// rest of the range (canonical by induction from the anchor).
if !rewriting {
local, lErr := d.l2Client.BlockByNumber(d.ctx, big.NewInt(int64(blockData.SafeL2Data.Number)))
if lErr != nil && !errors.Is(lErr, ethereum.NotFound) {
return nil, fmt.Errorf("read local block %d: %w", blockData.SafeL2Data.Number, lErr)
}
if local != nil && blockContentMatches(local, blockData.SafeL2Data) {
lastHeader = local.Header()
continue
}
rewriting = true
d.logger.Info("deriveForce: local fork/gap; rewriting batch tail onto canonical",
"batchIndex", rollupData.batchIndex, "fromBlock", blockData.SafeL2Data.Number)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate contiguous block numbers before keeping local blocks.

The new keep path can jump from lastHeader to any existing local SafeL2Data.Number that content-matches. Since ParseBatch only decodes block context numbers in the provided contract, add an explicit lastHeader.Number+1 check before BlockByNumber so malformed/non-contiguous batch contexts cannot be treated as reconciled without rewriting.

Proposed fix
 	rewriting := false
 	for _, blockData := range rollupData.blockContexts {
+		expectedNumber := lastHeader.Number.Uint64() + 1
+		if blockData.SafeL2Data.Number != expectedNumber {
+			return nil, fmt.Errorf("non-contiguous batch block: got %d after %d, want %d",
+				blockData.SafeL2Data.Number, lastHeader.Number.Uint64(), expectedNumber)
+		}
+
 		// Keep the local block while its content still matches the batch; at
 		// the first divergent or missing height switch to rewrite for the
 		// rest of the range (canonical by induction from the anchor).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rewriting := false
for _, blockData := range rollupData.blockContexts {
// Skip blocks already present locally (scenario C). For scenario B
// skipNumber == 0 means this branch is never taken.
if blockData.SafeL2Data.Number <= skipNumber {
continue
// Keep the local block while its content still matches the batch; at
// the first divergent or missing height switch to rewrite for the
// rest of the range (canonical by induction from the anchor).
if !rewriting {
local, lErr := d.l2Client.BlockByNumber(d.ctx, big.NewInt(int64(blockData.SafeL2Data.Number)))
if lErr != nil && !errors.Is(lErr, ethereum.NotFound) {
return nil, fmt.Errorf("read local block %d: %w", blockData.SafeL2Data.Number, lErr)
}
if local != nil && blockContentMatches(local, blockData.SafeL2Data) {
lastHeader = local.Header()
continue
}
rewriting = true
d.logger.Info("deriveForce: local fork/gap; rewriting batch tail onto canonical",
"batchIndex", rollupData.batchIndex, "fromBlock", blockData.SafeL2Data.Number)
rewriting := false
for _, blockData := range rollupData.blockContexts {
expectedNumber := lastHeader.Number.Uint64() + 1
if blockData.SafeL2Data.Number != expectedNumber {
return nil, fmt.Errorf("non-contiguous batch block: got %d after %d, want %d",
blockData.SafeL2Data.Number, lastHeader.Number.Uint64(), expectedNumber)
}
// Keep the local block while its content still matches the batch; at
// the first divergent or missing height switch to rewrite for the
// rest of the range (canonical by induction from the anchor).
if !rewriting {
local, lErr := d.l2Client.BlockByNumber(d.ctx, big.NewInt(int64(blockData.SafeL2Data.Number)))
if lErr != nil && !errors.Is(lErr, ethereum.NotFound) {
return nil, fmt.Errorf("read local block %d: %w", blockData.SafeL2Data.Number, lErr)
}
if local != nil && blockContentMatches(local, blockData.SafeL2Data) {
lastHeader = local.Header()
continue
}
rewriting = true
d.logger.Info("deriveForce: local fork/gap; rewriting batch tail onto canonical",
"batchIndex", rollupData.batchIndex, "fromBlock", blockData.SafeL2Data.Number)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/derivation/derivation.go` around lines 933 - 949, The keep-local-block
path in derivation reconciliation can accept a matching block even when the
batch context skips numbers, so add a contiguity guard in the block iteration
that compares each `blockData.SafeL2Data.Number` against `lastHeader.Number+1`
before calling `d.l2Client.BlockByNumber`. If the next number is not exactly
consecutive, switch to the rewrite path instead of continuing the keep logic,
and keep the existing `rewriting` flow and `blockContentMatches` check in
`deriveForce`/the `rollupData.blockContexts` loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants