Skip to content

feat: add metric for sequencer mode#1007

Merged
tomatoishealthy merged 2 commits into
mainfrom
feat/sequencer-metric
Jun 25, 2026
Merged

feat: add metric for sequencer mode#1007
tomatoishealthy merged 2 commits into
mainfrom
feat/sequencer-metric

Conversation

@tomatoishealthy

@tomatoishealthy tomatoishealthy commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features
    • Added Prometheus metrics for HA coordination (Raft state/cluster membership), FSM apply/commit timings, and FSM→broadcast channel drops.
    • Added L1 lag (seconds) tracking metrics for the L1 health tracker.
    • Added Grafana dashboard + provisioning, and enabled Prometheus scraping in devnet configs (plus Prometheus scrape configuration).
  • Bug Fixes
    • Improved reliability by refreshing cached cluster-member state after membership/raft changes; also refreshed after successful joins.
  • Tests
    • Added unit tests covering Raft metrics observation filtering and immediate Raft-state gauge updates.
  • Chores
    • Updated Tendermint fork version pins used across builds.

@tomatoishealthy tomatoishealthy requested a review from a team as a code owner June 25, 2026 04:11
@tomatoishealthy tomatoishealthy requested review from r3aker86 and removed request for a team June 25, 2026 04:11
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2656cca5-8fff-407b-8b03-d635c2c69d73

📥 Commits

Reviewing files that changed from the base of the PR and between 5a2e7d0 and c04dd02.

⛔ Files ignored due to path filters (2)
  • node/go.sum is excluded by !**/*.sum
  • ops/tools/go.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • Makefile
  • bindings/go.mod
  • common/go.mod
  • contracts/go.mod
  • node/go.mod
  • ops/l2-genesis/go.mod
  • ops/tools/go.mod
  • tx-submitter/go.mod
✅ Files skipped from review due to trivial changes (2)
  • tx-submitter/go.mod
  • Makefile

📝 Walkthrough

Walkthrough

Adds HA-keeper and L1 metrics wiring, Raft observation handling, Prometheus/Grafana provisioning, a node metrics dashboard, and Tendermint dependency replacement updates.

Changes

Metrics observability rollout

Layer / File(s) Summary
HA-keeper metrics core
node/hakeeper/metrics.go, node/hakeeper/block_fsm.go
Defines HA-keeper metric collectors and records BlockFSM apply duration and dropped blocks.
HAService Raft wiring
node/hakeeper/ha_service.go, node/hakeeper/raft_metrics.go, node/hakeeper/ha_service_test.go
Creates HAService metrics, starts Raft observation handling, records commit timings, and refreshes cluster-member state after Raft changes.
L1 lag metrics
node/l1sequencer/metrics.go, node/l1sequencer/tracker.go, node/l1sequencer/tracker_test.go
Defines L1 lag metrics, updates the tracker to record lag on each poll, and initializes test trackers with no-op metrics.
Metrics provisioning
ops/devnet-morph/devnet/setup_nodes.py, ops/docker/prometheus/prometheus.yml, ops/docker/grafana/datasources/prometheus.yml, ops/docker/grafana/dashboards/dashboards.yml
Enables Prometheus metrics in devnet and adds Prometheus and Grafana provisioning files.
Morph node dashboard
ops/docker/grafana/dashboards/json/morph-node.json
Defines the Grafana dashboard metadata, rows, panels, thresholds, and dashboard settings for node metrics.

Tendermint dependency refresh

Layer / File(s) Summary
Dependency version bump
Makefile, bindings/go.mod, common/go.mod, contracts/go.mod, node/go.mod, ops/l2-genesis/go.mod, ops/tools/go.mod, tx-submitter/go.mod
Updates the Tendermint target version and module replace directives across build and Go modules.

Sequence Diagram(s)

sequenceDiagram
  participant HAService
  participant BlockFSM
  participant Metrics
  participant raftRaft
  participant raftObserver

  HAService->>Metrics: PrometheusMetrics(cfg.ServerID)
  HAService->>BlockFSM: NewBlockFSM(logger, metrics)
  HAService->>raftRaft: register observer
  raftRaft-->>raftObserver: RaftState / PeerObservation
  raftObserver->>HAService: handleRaftObservation(observation)
  HAService->>Metrics: SetRaftState / SetClusterMembers
  HAService->>Metrics: ObserveCommitDuration(encode, raft)
  BlockFSM->>Metrics: ObserveFSMApplyDuration(d)
  BlockFSM->>Metrics: IncFSMBlockChannelDrops()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • r3aker86
  • chengwenxi
  • Web3Jumb0

Poem

I hopped by gauges, bright and new,
while raft winds hummed the metrics through.
A dashboard blinked, the lag was tallied,
and bunny paws said, “all is rallied!”
🐇✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions a sequencer mode metric, but the changes add broader HA and L1 lag metrics rather than any sequencer mode metric. Retitle it to reflect the main change, such as adding HA and L1 sequencer lag metrics and related dashboards.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sequencer-metric

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
ops/devnet-morph/devnet/setup_nodes.py (1)

91-93: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Make the Prometheus toggle resilient to TOML formatting changes.

Line 92 uses exact text replacement; if config.toml formatting changes, metrics stay disabled with no signal. Use a key-based regex replacement and assert at least one match.

Suggested patch
-        # Enable prometheus metrics for all nodes
-        content = content.replace('prometheus = false', 'prometheus = true')
+        # Enable prometheus metrics for all nodes (format-tolerant)
+        content, n = re.subn(
+            r'(?m)^\s*prometheus\s*=\s*false\s*$',
+            'prometheus = true',
+            content,
+        )
+        if n == 0:
+            print(f"Error: failed to enable prometheus in {config_file}. Exiting.")
+            sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ops/devnet-morph/devnet/setup_nodes.py` around lines 91 - 93, The Prometheus
enablement in setup_nodes.py is too brittle because it relies on an exact string
replace in the content variable. Update the logic around the prometheus toggle
to use a key-based regex replacement that matches the prometheus setting
regardless of TOML spacing/formatting, and add an assertion or explicit failure
if no matches are found. Keep the change localized to the config update path in
the setup_nodes.py flow.
node/l1sequencer/tracker.go (1)

68-68: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff

Hardcoding PrometheusMetrics in the constructor hurts testability and reuse.

Production construction always registers on the global Prometheus registry (see registration concern in metrics.go Lines 29-42), while tests bypass NewL1Tracker entirely to use NopMetrics. Consider injecting *Metrics (or accepting a namespace + registerer) so callers control registration and the real constructor stays test-friendly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/l1sequencer/tracker.go` at line 68, The L1Tracker constructor is
hardcoding PrometheusMetrics("morphnode"), which forces global registration and
makes the real constructor hard to test or reuse. Update NewL1Tracker to accept
injected metrics (preferably *Metrics, or a namespace/registerer pair) and pass
that through to the tracker instead of constructing metrics internally; keep the
constructor aligned with NopMetrics so tests and callers can control
registration. Use the NewL1Tracker and PrometheusMetrics symbols as the main
entry points for the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/l1sequencer/metrics.go`:
- Line 52: `Metrics.SetLag` can emit a negative gauge because `time.Duration` is
being divided directly into whole seconds, so small clock-skew negatives
truncate toward zero instead of clamping. Update `SetLag` in the `Metrics` type
to ensure the published `LagSeconds` value never drops below 0, and keep the fix
localized to the `SetLag` method that writes to `m.LagSeconds`.

In `@ops/docker/grafana/dashboards/json/morph-node.json`:
- Line 930: The Grafana dashboard queries for go_goroutines and the other
non-prefixed PromQL expressions in the morph-node JSON are too broad and can
pick up series from unrelated jobs; update the affected panel queries to scope
them to morph-node targets using the existing label filters used elsewhere in
the dashboard. Locate the queries by their expr entries in the dashboard
definition and ensure each one only matches the intended node job/instance
series so the health panels reflect morph-node data exclusively.

---

Nitpick comments:
In `@node/l1sequencer/tracker.go`:
- Line 68: The L1Tracker constructor is hardcoding
PrometheusMetrics("morphnode"), which forces global registration and makes the
real constructor hard to test or reuse. Update NewL1Tracker to accept injected
metrics (preferably *Metrics, or a namespace/registerer pair) and pass that
through to the tracker instead of constructing metrics internally; keep the
constructor aligned with NopMetrics so tests and callers can control
registration. Use the NewL1Tracker and PrometheusMetrics symbols as the main
entry points for the change.

In `@ops/devnet-morph/devnet/setup_nodes.py`:
- Around line 91-93: The Prometheus enablement in setup_nodes.py is too brittle
because it relies on an exact string replace in the content variable. Update the
logic around the prometheus toggle to use a key-based regex replacement that
matches the prometheus setting regardless of TOML spacing/formatting, and add an
assertion or explicit failure if no matches are found. Keep the change localized
to the config update path in the setup_nodes.py flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d248e725-f4df-485c-b642-f3d58764f6ad

📥 Commits

Reviewing files that changed from the base of the PR and between 4631b00 and 5a2e7d0.

📒 Files selected for processing (13)
  • node/hakeeper/block_fsm.go
  • node/hakeeper/ha_service.go
  • node/hakeeper/ha_service_test.go
  • node/hakeeper/metrics.go
  • node/hakeeper/raft_metrics.go
  • node/l1sequencer/metrics.go
  • node/l1sequencer/tracker.go
  • node/l1sequencer/tracker_test.go
  • ops/devnet-morph/devnet/setup_nodes.py
  • ops/docker/grafana/dashboards/dashboards.yml
  • ops/docker/grafana/dashboards/json/morph-node.json
  • ops/docker/grafana/datasources/prometheus.yml
  • ops/docker/prometheus/prometheus.yml

Comment thread node/l1sequencer/metrics.go
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Scope non-prefixed PromQL queries to morph-node targets.

Lines 930, 988, and 1046 query global metric names without label filters, so this dashboard can mix in non-node series from other jobs and misreport node health.

Suggested patch
-          "expr": "go_goroutines",
+          "expr": "go_goroutines{service=\"morph-node\"}",
...
-          "expr": "go_memstats_alloc_bytes",
+          "expr": "go_memstats_alloc_bytes{service=\"morph-node\"}",
...
-          "expr": "tendermint_p2p_peers",
+          "expr": "tendermint_p2p_peers{service=\"morph-node\"}",

Also applies to: 988-988, 1046-1046

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ops/docker/grafana/dashboards/json/morph-node.json` at line 930, The Grafana
dashboard queries for go_goroutines and the other non-prefixed PromQL
expressions in the morph-node JSON are too broad and can pick up series from
unrelated jobs; update the affected panel queries to scope them to morph-node
targets using the existing label filters used elsewhere in the dashboard. Locate
the queries by their expr entries in the dashboard definition and ensure each
one only matches the intended node job/instance series so the health panels
reflect morph-node data exclusively.

@tomatoishealthy tomatoishealthy merged commit 52881da into main Jun 25, 2026
15 checks passed
@tomatoishealthy tomatoishealthy deleted the feat/sequencer-metric branch June 25, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants