feat: add metric for sequencer mode#1007
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughAdds HA-keeper and L1 metrics wiring, Raft observation handling, Prometheus/Grafana provisioning, a node metrics dashboard, and Tendermint dependency replacement updates. ChangesMetrics observability rollout
Tendermint dependency refresh
Sequence Diagram(s)sequenceDiagram
participant HAService
participant BlockFSM
participant Metrics
participant raftRaft
participant raftObserver
HAService->>Metrics: PrometheusMetrics(cfg.ServerID)
HAService->>BlockFSM: NewBlockFSM(logger, metrics)
HAService->>raftRaft: register observer
raftRaft-->>raftObserver: RaftState / PeerObservation
raftObserver->>HAService: handleRaftObservation(observation)
HAService->>Metrics: SetRaftState / SetClusterMembers
HAService->>Metrics: ObserveCommitDuration(encode, raft)
BlockFSM->>Metrics: ObserveFSMApplyDuration(d)
BlockFSM->>Metrics: IncFSMBlockChannelDrops()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
ops/devnet-morph/devnet/setup_nodes.py (1)
91-93: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winMake the Prometheus toggle resilient to TOML formatting changes.
Line 92 uses exact text replacement; if
config.tomlformatting changes, metrics stay disabled with no signal. Use a key-based regex replacement and assert at least one match.Suggested patch
- # Enable prometheus metrics for all nodes - content = content.replace('prometheus = false', 'prometheus = true') + # Enable prometheus metrics for all nodes (format-tolerant) + content, n = re.subn( + r'(?m)^\s*prometheus\s*=\s*false\s*$', + 'prometheus = true', + content, + ) + if n == 0: + print(f"Error: failed to enable prometheus in {config_file}. Exiting.") + sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/devnet-morph/devnet/setup_nodes.py` around lines 91 - 93, The Prometheus enablement in setup_nodes.py is too brittle because it relies on an exact string replace in the content variable. Update the logic around the prometheus toggle to use a key-based regex replacement that matches the prometheus setting regardless of TOML spacing/formatting, and add an assertion or explicit failure if no matches are found. Keep the change localized to the config update path in the setup_nodes.py flow.node/l1sequencer/tracker.go (1)
68-68: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoffHardcoding
PrometheusMetricsin the constructor hurts testability and reuse.Production construction always registers on the global Prometheus registry (see registration concern in metrics.go Lines 29-42), while tests bypass
NewL1Trackerentirely to useNopMetrics. Consider injecting*Metrics(or accepting a namespace + registerer) so callers control registration and the real constructor stays test-friendly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/l1sequencer/tracker.go` at line 68, The L1Tracker constructor is hardcoding PrometheusMetrics("morphnode"), which forces global registration and makes the real constructor hard to test or reuse. Update NewL1Tracker to accept injected metrics (preferably *Metrics, or a namespace/registerer pair) and pass that through to the tracker instead of constructing metrics internally; keep the constructor aligned with NopMetrics so tests and callers can control registration. Use the NewL1Tracker and PrometheusMetrics symbols as the main entry points for the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/l1sequencer/metrics.go`:
- Line 52: `Metrics.SetLag` can emit a negative gauge because `time.Duration` is
being divided directly into whole seconds, so small clock-skew negatives
truncate toward zero instead of clamping. Update `SetLag` in the `Metrics` type
to ensure the published `LagSeconds` value never drops below 0, and keep the fix
localized to the `SetLag` method that writes to `m.LagSeconds`.
In `@ops/docker/grafana/dashboards/json/morph-node.json`:
- Line 930: The Grafana dashboard queries for go_goroutines and the other
non-prefixed PromQL expressions in the morph-node JSON are too broad and can
pick up series from unrelated jobs; update the affected panel queries to scope
them to morph-node targets using the existing label filters used elsewhere in
the dashboard. Locate the queries by their expr entries in the dashboard
definition and ensure each one only matches the intended node job/instance
series so the health panels reflect morph-node data exclusively.
---
Nitpick comments:
In `@node/l1sequencer/tracker.go`:
- Line 68: The L1Tracker constructor is hardcoding
PrometheusMetrics("morphnode"), which forces global registration and makes the
real constructor hard to test or reuse. Update NewL1Tracker to accept injected
metrics (preferably *Metrics, or a namespace/registerer pair) and pass that
through to the tracker instead of constructing metrics internally; keep the
constructor aligned with NopMetrics so tests and callers can control
registration. Use the NewL1Tracker and PrometheusMetrics symbols as the main
entry points for the change.
In `@ops/devnet-morph/devnet/setup_nodes.py`:
- Around line 91-93: The Prometheus enablement in setup_nodes.py is too brittle
because it relies on an exact string replace in the content variable. Update the
logic around the prometheus toggle to use a key-based regex replacement that
matches the prometheus setting regardless of TOML spacing/formatting, and add an
assertion or explicit failure if no matches are found. Keep the change localized
to the config update path in the setup_nodes.py flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d248e725-f4df-485c-b642-f3d58764f6ad
📒 Files selected for processing (13)
node/hakeeper/block_fsm.gonode/hakeeper/ha_service.gonode/hakeeper/ha_service_test.gonode/hakeeper/metrics.gonode/hakeeper/raft_metrics.gonode/l1sequencer/metrics.gonode/l1sequencer/tracker.gonode/l1sequencer/tracker_test.goops/devnet-morph/devnet/setup_nodes.pyops/docker/grafana/dashboards/dashboards.ymlops/docker/grafana/dashboards/json/morph-node.jsonops/docker/grafana/datasources/prometheus.ymlops/docker/prometheus/prometheus.yml
| "steppedLine": false, | ||
| "targets": [ | ||
| { | ||
| "expr": "go_goroutines", |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Scope non-prefixed PromQL queries to morph-node targets.
Lines 930, 988, and 1046 query global metric names without label filters, so this dashboard can mix in non-node series from other jobs and misreport node health.
Suggested patch
- "expr": "go_goroutines",
+ "expr": "go_goroutines{service=\"morph-node\"}",
...
- "expr": "go_memstats_alloc_bytes",
+ "expr": "go_memstats_alloc_bytes{service=\"morph-node\"}",
...
- "expr": "tendermint_p2p_peers",
+ "expr": "tendermint_p2p_peers{service=\"morph-node\"}",Also applies to: 988-988, 1046-1046
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/docker/grafana/dashboards/json/morph-node.json` at line 930, The Grafana
dashboard queries for go_goroutines and the other non-prefixed PromQL
expressions in the morph-node JSON are too broad and can pick up series from
unrelated jobs; update the affected panel queries to scope them to morph-node
targets using the existing label filters used elsewhere in the dashboard. Locate
the queries by their expr entries in the dashboard definition and ensure each
one only matches the intended node job/instance series so the health panels
reflect morph-node data exclusively.
Summary by CodeRabbit