test(derivation): continuous fullnode fork → self-heal chaos hook (QA only)#1005
test(derivation): continuous fullnode fork → self-heal chaos hook (QA only)#1005curryxbo wants to merge 2 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe executor now conditionally injects a next-height forked L2 block, L2 block assembly can take an explicit timestamp, and derivation updates its batch parsing, matching, and recovery flow. ChangesQA Fork Injection and Reorg Recovery
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
node/core/executor.go (1)
139-142: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueInterval is a hardcoded const, not configurable; timeout literal duplicates it.
PR objectives and the change summary describe the cadence as configurable via
forkChaosInterval(default 30s), but it is a compile-timeconstwith no config wiring. Separately,injectForkBlockhardcodes30*time.Secondfor the read/assemble/apply timeout (Line 168) rather than referencing a named value, so the two30svalues can silently drift. Source the interval from config and give the per-iteration timeout its own named constant.Also applies to: 167-168
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/core/executor.go` around lines 139 - 142, The forkChaosInterval is currently a hardcoded const but should be sourced from configuration with a default of 30 seconds instead of being compile-time fixed. Additionally, the injectForkBlock function contains a hardcoded 30*time.Second timeout literal that duplicates the interval value, which can drift independently. Remove the const forkChaosInterval declaration, wire it to be read from config with a 30-second default, and create a separate named constant for the per-iteration read/assemble/apply timeout used in injectForkBlock (around line 168) to replace the hardcoded literal so both timing values stay synchronized and are maintainable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/core/executor.go`:
- Around line 154-162: The forkInjectRoutine function has an infinite loop with
no graceful shutdown path, causing goroutine leaks during executor shutdown. Add
a context.Context parameter to the forkInjectRoutine function and replace the
infinite for range t.C loop with a select statement that listens to both t.C and
ctx.Done(), returning when the context is cancelled. Ensure the context is
passed from the caller to properly coordinate shutdown with the executor's
lifecycle, similar to the pattern used in Derivation.Stop() in the derivation
package.
- Around line 131-137: The forkInjectRoutine call in the executor initialization
is spawning unconditionally for all non-DevSequencer executors, which means it
could corrupt production sequencers despite the routine's documentation stating
it should only run on dedicated fullnodes. Replace the unconditional go
executor.forkInjectRoutine() call with a gated execution that first checks an
explicit opt-in configuration flag (such as forkChaosEnabled or similar, which
should default to false) and asserts that the executor is running in a
fullnode-only role, not a sequencer or production role. This ensures the fork
injection routine only spawns when explicitly enabled and only on appropriate
fullnode instances.
---
Nitpick comments:
In `@node/core/executor.go`:
- Around line 139-142: The forkChaosInterval is currently a hardcoded const but
should be sourced from configuration with a default of 30 seconds instead of
being compile-time fixed. Additionally, the injectForkBlock function contains a
hardcoded 30*time.Second timeout literal that duplicates the interval value,
which can drift independently. Remove the const forkChaosInterval declaration,
wire it to be read from config with a 30-second default, and create a separate
named constant for the per-iteration read/assemble/apply timeout used in
injectForkBlock (around line 168) to replace the hardcoded literal so both
timing values stay synchronized and are maintainable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| // QA TEST BRANCH ONLY (test/fullnode-fork-selfheal): periodically fork the | ||
| // local chain so we can observe it self-heal back to canonical via reorg. | ||
| // Deploy ONLY on a dedicated fullnode — never on a sequencer or production. | ||
| go executor.forkInjectRoutine() | ||
|
|
||
| return executor, nil | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift
Fork-injection runs on every non-dev executor, including production sequencers — no role gate.
The routine's own doc comment states it "MUST run only on a dedicated fullnode" and that on a sequencer "it would corrupt the chain it is supposed to produce." But this go executor.forkInjectRoutine() is spawned unconditionally for all non-DevSequencer executors. Production sequencers are not DevSequencer, so this code path would actively fork the canonical chain on a real sequencer.
There is also no enable/disable flag, contradicting the PR objective that this is "configurable via forkChaosInterval" and "never deployable on sequencer or production." Gate the spawn behind an explicit opt-in config that is verifiably off by default and that asserts a fullnode-only role.
🛡️ Suggested gating
- // QA TEST BRANCH ONLY (test/fullnode-fork-selfheal): periodically fork the
- // local chain so we can observe it self-heal back to canonical via reorg.
- // Deploy ONLY on a dedicated fullnode — never on a sequencer or production.
- go executor.forkInjectRoutine()
+ // QA TEST BRANCH ONLY (test/fullnode-fork-selfheal): periodically fork the
+ // local chain so we can observe it self-heal back to canonical via reorg.
+ // Deploy ONLY on a dedicated fullnode — never on a sequencer or production.
+ if config.ForkChaosEnabled {
+ go executor.forkInjectRoutine()
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/core/executor.go` around lines 131 - 137, The forkInjectRoutine call in
the executor initialization is spawning unconditionally for all non-DevSequencer
executors, which means it could corrupt production sequencers despite the
routine's documentation stating it should only run on dedicated fullnodes.
Replace the unconditional go executor.forkInjectRoutine() call with a gated
execution that first checks an explicit opt-in configuration flag (such as
forkChaosEnabled or similar, which should default to false) and asserts that the
executor is running in a fullnode-only role, not a sequencer or production role.
This ensures the fork injection routine only spawns when explicitly enabled and
only on appropriate fullnode instances.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
node/derivation/derivation.go (1)
435-458: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winGuard
depthApproxagainst failed head reads and uint underflow.Line 437 ignores
BlockNumbererrors, leavingpreReorgHeadas0; Line 458 then subtractsbatchInfo.firstBlockNumber, which can underflow and log a huge bogus reorg depth. Handle the read error and only compute depth when the head is at or afterfirstBlockNumber.Suggested safe logging fix
- preReorgHead, _ := d.l2Client.BlockNumber(ctx) + preReorgHead, preReorgHeadErr := d.l2Client.BlockNumber(ctx) + var depthApprox uint64 + if preReorgHeadErr == nil && preReorgHead >= batchInfo.firstBlockNumber { + depthApprox = preReorgHead - batchInfo.firstBlockNumber + 1 + } err = d.withReactorsQuiesced(ctx, batchInfo.batchIndex, func() error { var derErr error lastHeader, derErr = d.deriveForce(batchInfoFull, 0) return derErr @@ "preReorgHead", preReorgHead, + "preReorgHeadErr", preReorgHeadErr, "reorgedFromBlock", batchInfo.firstBlockNumber, "reorgedToBlock", lastHeader.Number.Uint64(), - "depthApprox", preReorgHead-batchInfo.firstBlockNumber+1) + "depthApprox", depthApprox)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/derivation.go` around lines 435 - 458, The reorg recovery logging in derivation.go needs to guard `depthApprox` against a failed `d.l2Client.BlockNumber(ctx)` read and uint underflow. In the `withReactorsQuiesced` block, handle the `BlockNumber` error instead of ignoring it, and only log `depthApprox` when `preReorgHead` is valid and at or after `batchInfo.firstBlockNumber`; otherwise omit it or use a safe fallback. Use the existing symbols `preReorgHead`, `batchInfo.firstBlockNumber`, `lastHeader`, and the `[3/3] RECOVERED` log to locate the change.
♻️ Duplicate comments (1)
node/core/executor.go (1)
507-520: 🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy liftGate fork injection behind explicit QA fullnode-only config.
This still injects forks for any executor whose incoming block extends the local head. The comment says this must never run on a sequencer or production, but the apply path shown has no opt-in flag or role guard, so a misdeployed binary can intentionally diverge the node. This is the same deployment-safety concern raised on the prior implementation.
Suggested guard shape
- if head, hErr := e.l2Client.BlockNumber(context.Background()); hErr == nil && block.Number == head+1 { + if e.config.ForkChaosEnabled && e.config.IsFullnodeOnly { + head, hErr := e.l2Client.BlockNumber(context.Background()) + if hErr == nil && block.Number == head+1 { if forked, fErr := e.injectForkBlockLocked(block.Number); fErr == nil { return forked, nil } else { e.logger.Error("FORK-INJECT failed; falling back to canonical apply", "number", block.Number, "err", fErr) } + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/core/executor.go` around lines 507 - 520, The fork injection logic in Executor.apply currently runs whenever the incoming block extends the local head, which is too broad for QA-only behavior. Add an explicit opt-in guard in the apply path around the BlockNumber check and injectForkBlockLocked call so this branch only executes for the intended test/fullnode-fork-selfheal configuration and never on sequencer or production roles. Use the existing Executor and injectForkBlockLocked flow as the insertion point, and keep the canonical fallback path unchanged when the guard is not enabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/core/executor.go`:
- Around line 157-170: The executor methods are mixing bounded and unbounded RPC
contexts while holding e.mu, which can block other work if an RPC stalls. In
ApplyBlockV2, reuse the existing timeout context for NewL2BlockV2 instead of
context.Background(), and in the BlockNumber path around the executor mutex
replace the background context with a bounded timeout context as well. Update
the affected calls in executor.go so both NewL2BlockV2 and BlockNumber use the
same finite-lifetime context pattern as BlockByNumber and
AssembleL2BlockV2WithTimestamp.
---
Outside diff comments:
In `@node/derivation/derivation.go`:
- Around line 435-458: The reorg recovery logging in derivation.go needs to
guard `depthApprox` against a failed `d.l2Client.BlockNumber(ctx)` read and uint
underflow. In the `withReactorsQuiesced` block, handle the `BlockNumber` error
instead of ignoring it, and only log `depthApprox` when `preReorgHead` is valid
and at or after `batchInfo.firstBlockNumber`; otherwise omit it or use a safe
fallback. Use the existing symbols `preReorgHead`, `batchInfo.firstBlockNumber`,
`lastHeader`, and the `[3/3] RECOVERED` log to locate the change.
---
Duplicate comments:
In `@node/core/executor.go`:
- Around line 507-520: The fork injection logic in Executor.apply currently runs
whenever the incoming block extends the local head, which is too broad for
QA-only behavior. Add an explicit opt-in guard in the apply path around the
BlockNumber check and injectForkBlockLocked call so this branch only executes
for the intended test/fullnode-fork-selfheal configuration and never on
sequencer or production roles. Use the existing Executor and
injectForkBlockLocked flow as the insertion point, and keep the canonical
fallback path unchanged when the guard is not enabled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c9e4baef-3776-486a-818a-bce578c250c1
📒 Files selected for processing (3)
node/core/executor.gonode/derivation/derivation.gonode/types/retryable_client.go
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | ||
| defer cancel() | ||
|
|
||
| head, err := e.l2Client.BlockByNumber(ctx, nil) | ||
| if err != nil { | ||
| return false, fmt.Errorf("FORK-INJECT: read local head: %w", err) | ||
| } | ||
| forkTimestamp := head.Time() | ||
| l2Block, err := e.l2Client.AssembleL2BlockV2WithTimestamp(ctx, head.Hash(), forkTimestamp, nil) | ||
| if err != nil { | ||
| return false, fmt.Errorf("FORK-INJECT: assemble fork block: %w", err) | ||
| } | ||
| if _, err := e.l2Client.NewL2BlockV2(context.Background(), l2Block); err != nil { | ||
| return false, fmt.Errorf("FORK-INJECT: apply fork block: %w", err) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Do not use unbounded RPC contexts while holding e.mu.
Line 157 creates a 30s context, but Line 169 discards it for NewL2BlockV2; Line 514 also calls BlockNumber with context.Background() inside the executor mutex. If either retrying RPC stalls, ApplyBlockV2 can block other executor work behind the lock. Use bounded contexts for both calls.
Suggested bounded-context fix
- if _, err := e.l2Client.NewL2BlockV2(context.Background(), l2Block); err != nil {
+ if _, err := e.l2Client.NewL2BlockV2(ctx, l2Block); err != nil {
return false, fmt.Errorf("FORK-INJECT: apply fork block: %w", err)
}- if head, hErr := e.l2Client.BlockNumber(context.Background()); hErr == nil && block.Number == head+1 {
+ ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+ defer cancel()
+ if head, hErr := e.l2Client.BlockNumber(ctx); hErr == nil && block.Number == head+1 {Also applies to: 514-519
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/core/executor.go` around lines 157 - 170, The executor methods are
mixing bounded and unbounded RPC contexts while holding e.mu, which can block
other work if an RPC stalls. In ApplyBlockV2, reuse the existing timeout context
for NewL2BlockV2 instead of context.Background(), and in the BlockNumber path
around the executor mutex replace the background context with a bounded timeout
context as well. Update the affected calls in executor.go so both NewL2BlockV2
and BlockNumber use the same finite-lifetime context pattern as BlockByNumber
and AssembleL2BlockV2WithTimestamp.
|
LGTM |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/derivation/batch_info.go`:
- Around line 258-280: The transaction comparison in blockContentMatches is only
checking SafeL2Data.Transactions against the full local block tx list, so blocks
with L1-message prefixes are falsely marked mismatched. Update
blockContentMatches to account for the L1-message prefix by comparing the local
transactions after the prefix against sd.Transactions, and make sure the caller
in deriveForce uses the same prefix-aware matching logic when deciding whether
to rewrite canonical blocks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c028e9b3-af50-43bd-b896-0bf695537830
📒 Files selected for processing (2)
node/derivation/batch_info.gonode/derivation/derivation.go
Continuously fork the local tip in the block-production path (distinct timestamp so the fork is guaranteed to diverge from canonical) and emit staged REORG-TEST [1/3] FORK / [2/3] DETECT / [3/3] RECOVER logs so QA can watch a fullnode diverge and heal via L1 derivation. QA-only; not for production. Co-authored-by: Cursor <cursoragent@cursor.com>
5aea138 to
b59c859
Compare
…them (#1008) Verify-driven deriveForce: walking from the canonical anchor (firstBlockNumber-1), keep each local block while its content matches the batch, then rewrite via NewSafeL2Block from the first divergent or missing height onward. Replaces the old skipNumber fill-gap, which blindly trusted local blocks and could grow a permanent shadow chain (correct tx content, wrong parent/state hashes) that never self-healed. This is the fix under test by the chaos hook in the preceding commit; it is the same change as PR #1008. Co-authored-by: Cursor <cursoragent@cursor.com>
b59c859 to
03bd02b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/derivation/derivation.go`:
- Around line 933-949: The keep-local-block path in derivation reconciliation
can accept a matching block even when the batch context skips numbers, so add a
contiguity guard in the block iteration that compares each
`blockData.SafeL2Data.Number` against `lastHeader.Number+1` before calling
`d.l2Client.BlockByNumber`. If the next number is not exactly consecutive,
switch to the rewrite path instead of continuing the keep logic, and keep the
existing `rewriting` flow and `blockContentMatches` check in `deriveForce`/the
`rollupData.blockContexts` loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: af02f978-f84e-4089-babd-6124e956890a
📒 Files selected for processing (4)
node/core/executor.gonode/derivation/batch_info.gonode/derivation/derivation.gonode/types/retryable_client.go
🚧 Files skipped from review as they are similar to previous changes (3)
- node/derivation/batch_info.go
- node/types/retryable_client.go
- node/core/executor.go
| rewriting := false | ||
| for _, blockData := range rollupData.blockContexts { | ||
| // Skip blocks already present locally (scenario C). For scenario B | ||
| // skipNumber == 0 means this branch is never taken. | ||
| if blockData.SafeL2Data.Number <= skipNumber { | ||
| continue | ||
| // Keep the local block while its content still matches the batch; at | ||
| // the first divergent or missing height switch to rewrite for the | ||
| // rest of the range (canonical by induction from the anchor). | ||
| if !rewriting { | ||
| local, lErr := d.l2Client.BlockByNumber(d.ctx, big.NewInt(int64(blockData.SafeL2Data.Number))) | ||
| if lErr != nil && !errors.Is(lErr, ethereum.NotFound) { | ||
| return nil, fmt.Errorf("read local block %d: %w", blockData.SafeL2Data.Number, lErr) | ||
| } | ||
| if local != nil && blockContentMatches(local, blockData.SafeL2Data) { | ||
| lastHeader = local.Header() | ||
| continue | ||
| } | ||
| rewriting = true | ||
| d.logger.Info("deriveForce: local fork/gap; rewriting batch tail onto canonical", | ||
| "batchIndex", rollupData.batchIndex, "fromBlock", blockData.SafeL2Data.Number) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Validate contiguous block numbers before keeping local blocks.
The new keep path can jump from lastHeader to any existing local SafeL2Data.Number that content-matches. Since ParseBatch only decodes block context numbers in the provided contract, add an explicit lastHeader.Number+1 check before BlockByNumber so malformed/non-contiguous batch contexts cannot be treated as reconciled without rewriting.
Proposed fix
rewriting := false
for _, blockData := range rollupData.blockContexts {
+ expectedNumber := lastHeader.Number.Uint64() + 1
+ if blockData.SafeL2Data.Number != expectedNumber {
+ return nil, fmt.Errorf("non-contiguous batch block: got %d after %d, want %d",
+ blockData.SafeL2Data.Number, lastHeader.Number.Uint64(), expectedNumber)
+ }
+
// Keep the local block while its content still matches the batch; at
// the first divergent or missing height switch to rewrite for the
// rest of the range (canonical by induction from the anchor).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rewriting := false | |
| for _, blockData := range rollupData.blockContexts { | |
| // Skip blocks already present locally (scenario C). For scenario B | |
| // skipNumber == 0 means this branch is never taken. | |
| if blockData.SafeL2Data.Number <= skipNumber { | |
| continue | |
| // Keep the local block while its content still matches the batch; at | |
| // the first divergent or missing height switch to rewrite for the | |
| // rest of the range (canonical by induction from the anchor). | |
| if !rewriting { | |
| local, lErr := d.l2Client.BlockByNumber(d.ctx, big.NewInt(int64(blockData.SafeL2Data.Number))) | |
| if lErr != nil && !errors.Is(lErr, ethereum.NotFound) { | |
| return nil, fmt.Errorf("read local block %d: %w", blockData.SafeL2Data.Number, lErr) | |
| } | |
| if local != nil && blockContentMatches(local, blockData.SafeL2Data) { | |
| lastHeader = local.Header() | |
| continue | |
| } | |
| rewriting = true | |
| d.logger.Info("deriveForce: local fork/gap; rewriting batch tail onto canonical", | |
| "batchIndex", rollupData.batchIndex, "fromBlock", blockData.SafeL2Data.Number) | |
| rewriting := false | |
| for _, blockData := range rollupData.blockContexts { | |
| expectedNumber := lastHeader.Number.Uint64() + 1 | |
| if blockData.SafeL2Data.Number != expectedNumber { | |
| return nil, fmt.Errorf("non-contiguous batch block: got %d after %d, want %d", | |
| blockData.SafeL2Data.Number, lastHeader.Number.Uint64(), expectedNumber) | |
| } | |
| // Keep the local block while its content still matches the batch; at | |
| // the first divergent or missing height switch to rewrite for the | |
| // rest of the range (canonical by induction from the anchor). | |
| if !rewriting { | |
| local, lErr := d.l2Client.BlockByNumber(d.ctx, big.NewInt(int64(blockData.SafeL2Data.Number))) | |
| if lErr != nil && !errors.Is(lErr, ethereum.NotFound) { | |
| return nil, fmt.Errorf("read local block %d: %w", blockData.SafeL2Data.Number, lErr) | |
| } | |
| if local != nil && blockContentMatches(local, blockData.SafeL2Data) { | |
| lastHeader = local.Header() | |
| continue | |
| } | |
| rewriting = true | |
| d.logger.Info("deriveForce: local fork/gap; rewriting batch tail onto canonical", | |
| "batchIndex", rollupData.batchIndex, "fromBlock", blockData.SafeL2Data.Number) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/derivation/derivation.go` around lines 933 - 949, The keep-local-block
path in derivation reconciliation can accept a matching block even when the
batch context skips numbers, so add a contiguity guard in the block iteration
that compares each `blockData.SafeL2Data.Number` against `lastHeader.Number+1`
before calling `d.l2Client.BlockByNumber`. If the next number is not exactly
consecutive, switch to the rewrite path instead of continuing the keep logic,
and keep the existing `rewriting` flow and `blockContentMatches` check in
`deriveForce`/the `rollupData.blockContexts` loop.
What
QA-only test branch (
test/fullnode-fork-selfheal) to exercise the centralized-sequencer reorg / self-heal path on a live fullnode, with prominent staged logs.This node continuously forks the chain tip. For every canonical block that extends the local head (
number == head+1), instead of applying it we assemble this node's own block at the same height and apply that. The fork block's timestamp is pinned to the parent's (valid post-Emerald:header.Time == parent.Time), so its hash is guaranteed to differ from the canonical sibling (which usestime.Now()).tendermint's
StateV2still advances on the canonical block it handed us, so geth runs a private fork at the unsafe tip whileStateV2tracks canonical heights.How it heals
Recovery uses the real self-heal path — nothing in the reorg logic is changed:
deriveForcewrapped inwithReactorsQuiesced(SetCanonical->StartReactorsAfterReorg), which rewrites the whole batch back onto canonical and re-syncsStateV2from geth.Changes (logs + a one-line block swap only)
node/core/executor.go—ApplyBlockV2: when the incoming block extends the tip, callinjectForkBlockLocked(assemble own block on current head, apply it) instead of applying canonical. LogsREORG-TEST [1/3] FORKEDper fork.node/derivation/derivation.go— on the existing blob-mismatch ->deriveForcepath, addREORG-TEST [2/3] REORG DETECTEDandREORG-TEST [3/3] RECOVERED(with approximate reorg depth). Log-only.Log markers
grep:
Safety
QA chaos branch. Deploy ONLY on a dedicated fullnode — never on a sequencer or in production. Based on
origin/main.Summary by CodeRabbit