Skip to content

Experimental triggerNextLedger timer Change#4865

Merged
SirTyson merged 3 commits into
stellar:masterfrom
SirTyson:externalize-clock
Jun 17, 2026
Merged

Experimental triggerNextLedger timer Change#4865
SirTyson merged 3 commits into
stellar:masterfrom
SirTyson:externalize-clock

Conversation

@SirTyson

@SirTyson SirTyson commented Aug 4, 2025

Copy link
Copy Markdown
Contributor

Description

This adds an experimental flag that when set, uses the closeTime from the last externalized SCP message as the basis for setting the triggerNextLedger timer.

I include a couple of basic unit tests, making sure that the behavior of the trigger is correct when nodes are drifting and when we have long nomination timeouts. Most of the simulation testing is reported below using this super cluster change: stellar/supercluster#384

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

Comment thread src/herder/HerderImpl.cpp Outdated
@SirTyson SirTyson force-pushed the externalize-clock branch 2 times, most recently from 538021b to 72a3614 Compare February 26, 2026 18:49
@SirTyson SirTyson force-pushed the externalize-clock branch from 7a0d9bc to 226b1e7 Compare March 30, 2026 19:45
@SirTyson SirTyson force-pushed the externalize-clock branch from 226b1e7 to fbbbdc4 Compare April 14, 2026 23:17
@SirTyson SirTyson force-pushed the externalize-clock branch 2 times, most recently from 785022c to b17483d Compare May 1, 2026 09:16
@SirTyson

SirTyson commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

Key findings

Overall, the change works as expected. When clocks are mostly synced and most nodes enable the new timer,
ledger age falls to almost exactly 5 seconds, down from around 5.75 without the change. As network conditions
deteriorate, (i.e. not all nodes have enabled the flag or nodes have large clock drifts), we see a gradual
degradation in block time up to 5.75 seconds. We have some safeguards around trigger time such that we
"fall back" to the old timer if we think we're drifting from the network. This ensures nodes will never
schedule the trigger timer too far into the future and hang or trigger ledgers immediately, closing
ledger too fast and potentially snowballing the network with nomination load.

We do see that nomination timeouts and nomination timing overall increase fairly significantly when clocks
are unsynced, or when the network has a mix of new and old timers. When clocks are synced, we see a slight
increase in nomination timeouts and timing. This is probably just the result of the network speeding up,
where there's less time to do the same amount of work.

However, as clocks become unsynced, we do see a significant increase in timeouts and nomination timing.
While in this experimental setting, block time is still faster under bad clock conditions, in actual
network settings it's possible the additional nomination work will result in an overall decrease in
performance. With this change, it's important we instruct validator operators to sync system clocks
on a cron job with an ntp server (which is common on other blockchains). We should also have stronger
warnings about drifting clocks, given the potential decrease in network perf.

Importantly, while performance does decrease, the network does not get wedged, and degrades back to the
timer we have today. If a node thinks it's out of sync with the externalized time stamp, it falls back
to the ballot protocol based timer we use today. At very high levels of drift, we see similar performance
to today. A conservative upper bound on trigger time, based on the ballot prepare cadence, ensure that we
don't get "stuck" by validators scheduling nomination start into the distant future. We also have a lower
bound, making sure that clock drift does not make the network speed up beyond our target ledger close.

We actually see the peak decrease in performance at around 6 seconds of absolute clock drift. This seems to
be the worst case sweet spot where nodes can be maximally out of sync before the conservative safeguards
kick in. At much larger drifts, like 20 seconds, performance is improved and similar to what we see today.

I also tested with a mixed network, where some nodes did not have the flag enabled while others did. Block time gradually decreases
based on the percentage of tier 1 nodes who have switched to the timer. During this time, block time can have
higher variance, but still stays within the [5, 5.7] second bounds. We also see higher nomination time and more timeouts
for nodes who have switched to the experimental timer. Non-experimental nodes are not affected. As more and more
nodes switch timers, the nomination time of experimental nodes decreases.

This seems to indicate that this feature is fine as a non-protocol upgrade. Given that the network can proceed with
a mix of timers (with increased load due to nomination timeouts), it seems like this can be introduced as a feature
flag in a point release. We should not gradually flip this flag (i.e. test it out on just SDF nodes), but should
just make it default true in a release. The flag can be a safeguard if we see issues in prod, but we should still
try to make the switch as atomic as possible. In simulation we saw blocktimes close to 5 seconds at around 75%
adoption, though we still had increased timeouts. Stable blocktime of 5 seconds with no degradation in nomination
performance occurs as you approach 100% adoption.

Setup

  • Topology 3
    • tier 1 + 70 watchers
    • medium connection density
  • 250 TPS for 10 minutes

The general idea is to try to mimic pubnet as close as possible, without spinning up hundreds of
nodes. Specifically, we want to make sure tier 1 is not densely connected.

Changes to Metrics

Our traditional ledger.age.current-seconds benchmark is not accurate in determining the actual network
blocktime. Even on master, this metric sees high variance for all nodes ledger to ledger. This is expected,
as any node's local ledger age metric depends on its latency from the leader for that given block. Instead,
I'm using the following metric to determine the actual network block time across all nodes, instead of
extrapolating the value from individual nodes:

  avg(
    (
      1 / clamp_min(
        irate(stellar_core_ledger_ledger_close_seconds_count{
          kubernetes_namespace=~"$namespace$",
          network=~"^$network$",
          build=~".*${build:regex}$",
          kubernetes_pod_name=~".*${pods:regex}$"
        }[5m]),
        1e-9
      )
    ) < 10
  )

We've also added meta info to node names. pX and mY indicate the node has a drifting
clock, either +X seconds or -Y seconds. Note that actual drift values are in milliseconds, so
these are rounded values. Additionally, nodes with the "expr" string indicate that they are
running the experimental timer change.

Test Results

First, we want to analyze the change itself, where all nodes have the new timer change. We will then
compare this with a network where no nodes have the change.

All the following results have the experimental timer disabled on the left, and then enabled on all nodes
to the right.

No Drift

https://grafana.stellar-ops.com/goto/bGk5BZTDg?orgId=1

Block Time

image.png

We see a similar block time to pubnet in our control of 5.7s. With the experimental timer, this drops to 5.01 seconds.

Nomination Timeouts

image-1.png

We see an increase in timeouts with the new timer.

Nomination p75

We see an increase in nomination timing with the timer change:

  • mean [0.634s, 1.03s]
  • max [0.764s, 1.33s]

Compared to the control group:

  • mean [0.594s, 0.781s]
  • max [0.702s, 1.16s]

image-2.png

Nomination p99

We see an increase again with the timer change:

  • mean [0.743s, 1.20s]
  • max [0.853s, 2.26s]

Compared to the control group:

  • mean [0.768s, 0.970s]
  • max [1.01s, 1.45s]

image-3.png

2 Second Absolute Drift

All nodes given a random drift uniformly selected from [-1000,+1000] ms.

https://grafana.stellar-ops.com/goto/ttwsLWTvg?orgId=1

Block Time

image-4.png

We see little change compared to synchronized clocks.

Nomination Timeouts

image-5.png

Elevated nomination timeouts observed in both experimental and non-experimental runs.
Experimental timer still has more timeouts.

Nomination p75

We see a significant increase in nomination time in the experimental timer, compared to
the non-drifting test. This is directly correlated to the direction in which nodes are drifting.
Nodes that are drifting behind can experience very low nomination time, with some p75 mean around 0.35s. Nodes
drifting ahead had longer nomination times, with upper bound means around 2.35s. This delta does correlate
with the drift delta.

  • mean [0.302, 2.47]
  • max [0.626, 3.52]

Compared to no experimental flags

  • mean [0.487, 0.754]
  • max [0.806, 3.11]

image-6.png

Nomination p99

  • mean [0.574, 2.9]
  • max [0.856, 4.7]

Compared to no experimental flags

  • mean [0.552, 0.897]
  • max [0.806, 3.1]

image-7.png

6 Second Absolute Drift

All nodes given a random drift uniformly selected from [-3000,+3000] ms.

https://grafana.stellar-ops.com/goto/3w0a6Kovg?orgId=1

Block Time

At this point, we see higher variance in the experimental timer and slower blocks overall, but still faster than
the control.

image-8.png

Nomination Timeouts

We see a significant increase in timeouts compared to more synced clocks.

image-9.png

Nomination p75

Much higher nomination times as well, with the new timer:

  • mean [0.585, 4.09]
  • max [0.900, 5.37]

Compared to control:

  • mean [0.601, 0.781]
  • max [0.697, 1.07]

image-10.png

Nomination p99

image-11.png

Experimental Timer:

  • mean [1.10, 4.61]
  • max [1.65, 6.53]

Compared to control:

  • mean [0.746, 1.1]
  • max [1.07, 2.15]

20 Second Absolute Drift

All nodes given a random drift uniformly selected from [-10,+10] seconds. At this point,
all gains from the experimental timer are gone, and block time is basically the same as the
control.

https://grafana.stellar-ops.com/goto/I05-W5ovg?orgId=1

Block Time

Basically the same as the control block time with relatively low variance.

image-12.png

Nomination Timeouts

Still significantly greater than the control, but less than the 6 second absolute drift case.

image-14.png

Nomination p75

Still larger than the control, but much improved from the 6 second case. With trigger timer:

  • mean [0.662, 1.97]
  • max [0.759, 4.41]

vs baseline:

  • mean [0.638, 0.781]
  • max [0.720, 1.08]

image-15.png

Nomination p99

  • mean [0.848, 3.58]
  • max [1.17, 5.88]

vs baseline:

  • mean [0.771, 0.940]
  • max [0.960, 2.26]

image-16.png

Extreme bimodal distribution

Originally what I expected to be a worst case stress test. 25% of nodes with minor drift (within 1 second),
75% with bimodal distribution between [-20, -10] seconds and [+10, +20] seconds. While the block time
was worse, from a nomination standpoint, this was more stable than the 6 second of absolute drift case.

My laptop died between runs, so they are on separate graphs. Baseline,
experimental.

Block Time

Functionally equivalent.

Experimental timer:

image-18.png

Baseline:

image-17.png

Nomination Timeouts

Experimental timer:

image-19.png

Baseline:

image-20.png

Nomination p75

Experimental:

  • mean [0.632, 2.01]
  • max [0.697, 4.37]

image-21.png

Baseline:

  • mean [0.630, 0.758]
  • max [0.724, 1.24]

image-23.png

Nomination p99

Experimental:

  • mean [0.791, 3.24]
  • max [1, 4.92]

image-22.png

Baseline:

  • mean [0.760, 0.959]
  • max [0.931, 1.90]

image-24.png

Network with mix of experimental flag and non-experimental flag

For this test, we used a moderate clock drift of +- 1 second across all nodes. We then ran several
simulations, increasing the number of nodes using the experimental timer at each run.

This grafana board shows several
runs, going from 13%, 34%, 38%, 43%, 66%, 78%, then 90% experimental flag adoption.

Block Time

We see blocktime decrease gradually as more timers are enabled, but with higher variance in mixed networks. At around
75%, we achieve most of the block time gains.

image-25.png

Nomination

Whenever a node enables the experimental flag, it's nomination timeouts and timing increase based on how much of the
rest of the network has the flag enabled. I.e. if less of the network has enabled the timer, those who have enabled it
are more affected by timeouts and have longer nomination timing. As more nodes adopt the timer, the magnitude of the
degradation across all upgraded nodes lessens, eventually converging on values close to the non-experimental baseline
(assuming little clock drift).

During this in-between phase, only nodes with the new timer seem affected. Even when most of the nodes have upgraded their
timer, those left behind see little increase in nomination timeouts or timing. This allows a safe "escape hatch" should
we push the upgrade with a config flag to disable the timer. If we see network degradation, node operators can disable the
timer and quickly see nomination timings for their node go back to previous values.

Timeouts

image-26.png

Nomination Timing

image-27.png

Different topologies

Most testing was with topology 1, as this is the "closest" 100 node approximation to pubnet, where tier 1 is moderately
connected, but not directly connected. We also tested topology 0 (tier 1 only, complete graph), topology 1 (100 nodes, tier 1
fully connected) and topology 2 (100 nodes, tier 1 maximally apart in graph). Topology 0 and 1 performed better than topology
3, which was used for the rest of our tests. Topology 2 had worse performance, and is the worst case topology for this change.

While other tests were run at 250 TPS, both the control and experimental flag failed at this load, so we reduced to 150 TPS.

Worst case topology, no drift

https://grafana.stellar-ops.com/goto/U5eoCpTvg?orgId=1

We tested the worst case topology with "realistic" ntp drift (+- 100ms).
We found that block time was reduced to 5s and stable.

image-28.png

Nomination timeouts and timing were increased, more so than in topology
3. While increased relative to the baseline and relative to more connected
topologies, the network was still much healthier than more connected topologies
with higher rates of drift.

image-29.png

image-30.png

image-31.png

Worst case topology, worst case drift

With topology 3 and the worst drift of 6 seconds absolute, we see high
increases in nomination timeouts and timing. Block time has an average of 5.1s,
but is more variable. The network definitely has less TX capacity with this
topology, but that is true both with and without the experimental timer.

https://grafana.stellar-ops.com/goto/ja6WR2oDg?orgId=1

Max TPS

While not the primary motivation of this change, the max TPS test gives us some
idea of how the new timer behaves under load.

It looks like the experimental timer has a small increase in overall max tps,
from around ~2650 -> 2850. We see that the experimental timer maintains a faster
block time of around 5.77s, while the current timer averages 7.85s. Nomination
timeouts and timing is similar between the two timers. It seems like the new timer,
because it has a more consistent and smaller blocktime, can achieve a higher TPS
due to bandwidth savings in block propagation.

Here is the max TPS test with the experimental timer
vs the current timer.

@SirTyson SirTyson requested a review from marta-lokhova May 1, 2026 17:55
@SirTyson SirTyson marked this pull request as ready for review May 1, 2026 17:55
Copilot AI review requested due to automatic review settings May 1, 2026 17:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental mode to anchor HerderImpl’s triggerNextLedger scheduling off the last externalized SCP close time (with drift/availability fallbacks), plus test-only knobs to simulate clock drift and slow nomination message emission.

Changes:

  • Add EXPERIMENTAL_TRIGGER_TIMER and implement consensus-close-time-based trigger anchoring with fallback/metrics.
  • Add test-only support for simulated system clock drift and delayed nomination emit to exercise timeout/drift behavior.
  • Add a new SCP trigger fallback meter and a (hidden) herder simulation test covering drift and long nomination scenarios.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/util/Timer.h Adds test-only system clock offset state + new actual_and_fake_system_now() API.
src/util/Timer.cpp Implements drifted system_now() via injected offset and exposes paired sampling helper.
src/test/test.cpp Enables the experimental trigger timer in the default test config.
src/scp/Slot.h Adds a test-only SCP timer ID for delayed nomination emission.
src/scp/SCPDriver.h Adds a test-only virtual hook for configuring nomination emit delay.
src/scp/NominationProtocol.cpp Defers nomination broadcast in tests via a new driver timer when configured.
src/main/Config.h Adds EXPERIMENTAL_TRIGGER_TIMER and two new test-only timing knobs.
src/main/Config.cpp Initializes/parses new config options; extends testing-only option list.
src/main/ApplicationImpl.cpp Applies configured test-only system clock offset at startup via VirtualClock.
src/herder/test/HerderTests.cpp Adds a (hidden) simulation test for experimental trigger behavior under drift/slow nomination.
src/herder/HerderSCPDriver.h Declares test-only nomination emit delay accessor; exposes nomination timeout count getter.
src/herder/HerderSCPDriver.cpp Implements test-only nomination emit delay accessor and special-cases emit timer callback behavior.
src/herder/HerderImpl.h Declares new trigger anchor helper methods and adds a fallback meter to SCP metrics.
src/herder/HerderImpl.cpp Implements consensus-close-time anchoring logic, fallback conditions, and new metrics wiring.
docs/metrics.md Documents the new scp.trigger.prepare-start-fallback meter.

Comment thread src/test/test.cpp
Comment thread src/util/Timer.h Outdated
Comment thread src/herder/test/HerderTests.cpp Outdated
Comment thread src/herder/test/HerderTests.cpp
Comment thread src/main/Config.cpp
@marta-lokhova

Copy link
Copy Markdown
Contributor

We see an increase in nomination timing with the timer change

There seems to be a consistent increase in nomination time and timeouts across different measurements with the timer change. How can we reason about those? That doesn't seem expected, does it?

Comment thread src/herder/HerderImpl.cpp Outdated
Comment thread src/herder/HerderImpl.cpp
@SirTyson

Copy link
Copy Markdown
Contributor Author

How can we reason about those? That doesn't seem expected, does it?

Based on my tests, I see two different classes of increase: a small increase when clocks are synced, and a much larger increase under drifting clocks.

In the first case (see no drift above), I think the increase is more of an artifact of decreased block time. We've shaved 700 ms off of block time, which means we have less buffer for slow nodes. With the old timer, because we started blocks late, we have more time for slow nodes to catch up before starting nomination. We tested at a fairly high TPS for this topology/pod config, so nodes were stressed.

In addition to faster blocks, high latency nodes are much more likely to have more timeouts/higher nomination times because they trigger faster. Suppose we have a node N whose blocking set has latency L relative to each other, but N has latency 2L with its blocking set. Previously, node N starts nomination about L time behind the rest of it's blocking set. This is because the current timer is based on local ballot state. N will take 2L time to get the messages required to enter this phase, while the rest of the blocking set only takes L time. The nomination time of N appears smaller, since it started the timer later. With this change, all nodes start the block timer at the same time, regardless of latency. This results in N starting nomination faster, but actually taking more time to complete nomination since it still has 2L latency with it's peers.

The second case, where clocks are not synced, shows much larger nomination changes. I think this is because some nodes start nomination so early/so late that it's impossible to make progress in that round. For example, suppose a single node is 2 seconds ahead of it's blocking set. That node will have up to 2 whole seconds of attempting nomination where none of it's quorum has begun nomination.

Wrt the second case, my main priority is to ensure that the network remains relatively live in this scenario. Synchronized clocks are an assumption of basically every performance orientated L1 these days, and most node operator instructions include instructions for running NTP sync (which we need to give to tier 1). We should also warn more aggressively when clocks are out of sync to encourage good clock behavior from tier 1s. I don't think we should expect the network to maintain performance is this case (we can't), but we need to make sure that we don't overload or snowball the network via timeout load and are resistant to byzantine or incompetent validators.

@MonsieurNicolas

Copy link
Copy Markdown
Contributor

based on your findings above, would it make sense to first land the work around fully connected tier-1 that will place all validators within at most 2 hops?

@SirTyson

Copy link
Copy Markdown
Contributor Author

based on your findings above, would it make sense to first land the work around fully connected tier-1 that will place all validators within at most 2 hops?

It would definitely decrease nomination load with this change. One test I'm curious about though is if the nomination changes are more attributable to the timer change or the blocktime change. For desynced clocks, it's definitely caused by the timer. For in-sync or lightly desynced clocks, I wonder if it's more the timer or blocktime. I can run a test where I lower the target blocktime on the control timer to around 4.2s so we can try to get an average around 5 seconds to compare.

All that to say, the network will definitely perform better with the trigger timer + topology change, but I don't know if we're blocked on the topology change or not. I think it's likely we see increased nomination timeouts with this change regardless, given that we are increasing the TPS by ~15%. We definitely need to get clocks synced before rollout, but let me run a few more tests wrt topology.

The relevant questions seem to be:

  • Does the network deteriorate worse under bad topology conditions with the new timer given the same blocktime?
  • If not, is the network capable of safely reducing block time before adopting the topology change?

@SirTyson SirTyson force-pushed the externalize-clock branch from b17483d to bc66a97 Compare May 19, 2026 00:26
@SirTyson SirTyson force-pushed the externalize-clock branch from bc66a97 to ec7bb7c Compare May 26, 2026 16:47
@SirTyson

Copy link
Copy Markdown
Contributor Author

I've run a few more tests, comparing both timers with the same ledger close time. To do this I artificially lowed the target close time to ~4.25 seconds on the old timer such that when it overshot it averages around 5 seconds. I ran the test with "realistically" synced clocks (+- 100ms drift per node) on topology 3, our most realistic topology, and topolgy 2, our worst case topology.

TL;DR the results are basically the same, the new timer still demonstrates higher nomination timings. Here's a brief summary of the results:

Realistic 100 node topology

Run on the left is with the timer change, run on the right is the control with artificially low close time: https://grafana.stellar-ops.com/goto/dfnaoqoh9vmdca?orgId=1

We see virtually no timeouts in each case. Both nodes held a fairly steady close time around 5 seconds.

p75 nomination times:

  • New timer
    • mean [0.483, 1.03]
    • max [0.563, 1.20]
  • Old timer
    • mean [0.521, 0.596]
    • max [0.576, 0.759]

p99 nomination times:

  • New timer
    • mean [0.552, 1.11]
    • max [0.627, 1.20]
  • Old timer
    • mean [0.621, 0.670]
    • max [0.723, 1.23]

Worst case 100 node topology

I accidentally reversed these, so the left is the control without the chance, and the right has the change: https://grafana.stellar-ops.com/goto/dfnapb24cs7b4a?orgId=1

We see slightly higher levels of nomination timeouts across both nodes, but still virtually none. The old timer had more variance with close time, but still had an average around 5.06 seconds.

p75 nomination times:

  • New timer
    • mean [0.625, 1.04]
    • max [0.701, 1.13]
  • Old timer
    • mean [0.682, 0.770]
    • max [0.824, 0.969]

p99 nomination times:

  • New timer
    • mean [0.708, 1.18]
    • max [0.865, 1.65]
  • Old timer
    • mean [0.823, 0.883]
    • max [0.988, 1.12]

Takeaway

It looks like the increase in nomination time is intrinsic to the timer change, not a side effect of reducing block times. It appears that the new timer trades variance in nomination timing for stability in block time, vs. our old timer, which has more variance in block time for stable nomination periods. This is why we can't just keep the old timer but artificially reduce the target close time, as there is more instability of block times with load.

I think this nomination increase has to do with latency, given that each node keeps a local nomination clock with the old timer, vs a global timer with the new change. If we consider a lagging node, that is generally 300 ms behind the network, it enters ballot phase 300 ms behind it's peers. This means it starts it's nomination clock 300 ms after its peers. While from an absolute perspective this node will finish nomination later due to the latency, the timer itself started later, so it's not perceivable via this metrics.

Now, we have a global starting point, such that both the fast nodes and lagging node start their nomination timers at the same time. This makes the nomination metric larger, since the start of the timer is no longer offset by latency (aka time to start ballot phase compared to its peers). This goes both ways, as the fast nodes also experience longer nomination times whenever the slow node is leader (though the overall impact is less).

@SirTyson

SirTyson commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

A few more simulation runs. This time, we used the most recent pubnet survey data trimmed to 277 nodes. We ran mixed loadgen mode with SAC soroban load. First two runs with current network limits, the last 2 with proposed network limits for 200 SAC TPS. The old timer is on the left, the new trigger timer on the right.

This simulation showed similar results to previous ones, namely that block times decrease but nomination times increase. We've concluded that this is not a "real" increase in terms of wall clock time of nomination, but rather an artifact of changing when we start the nomination timer. This is still important, as this can trigger more timeouts. However, we've also realized that nomination is not uniformly increased, but increased just for high latency watcher nodes, whose topology is many hops away from tier 1. I've added a new panel in the link above that shows nomination times for named nodes (so tier 1 plus folks like sl8), and we actually see that nomination timings are almost identical across both timers. This is also reinforced by timeout values, which remain the same across both timers.

The reason for this is that the current nomination timer is not a true measure of how long nomination really takes. With the old timer, each node triggers at their local prepare_start + 5 seconds. This means that each node starts their tirgger, and therefore nomination timer, at different times. Consider a tightly connected tier 1, where most nodes are directly connected, and a single distant node, who is 2 hops away from the rest of tier 1. Each hop has latency L. At a minimum, the distant node will start prepare at least L time after the rest of it's quorum, so it's nomination timer will be started at least L after it's peers. The key observation is that the rest of its peers start nomination and have messages in-flight (and potentially even queued in the slow node) before the slow node actually enters nomination time. This means that with the current timer, slow nodes start nomination later and "cheat", since nomination messages are in flight before they actually start their timer. Note that for the tightly connected nodes, this is less of an issue. They have the same latency relative to each other, so they enter prepare and start their nomination timer at approximately the same time. These fast nodes don't cheat and don't have any messages in flight since they start nomination at the same time as most of their quorum.

The new timer change shifts this such that everyone starts the trigger timer and starts nomination at the same time, regardless of distance from tier 1. This means that the long tail of distant watcher nodes will have increased nomination times and potential for increased timeouts. However, so long as tier 1 is relatively densely connected, the fast nodes will not see a nomination time increase.

To answer @MonsieurNicolas question on fully connected tier 1, this would give us the safest rollout possible, and is something we should definitely do longer term with this timer change. However, it does not seem necessary with the current topology for the initial roll out. The most recent network survey data from earlier this month shows no nomination issues with tier 1, so it looks like the network is currently naturally dense enough to avoid issues. Given that we need the timer change for the new SLP, and we haven't really started working on fully connected tier 1, the risk seems low if we enable it after the timer change.

@marta-lokhova

Copy link
Copy Markdown
Contributor

I think the critical realization here is that we're not seeing "slower nomination" per se, but rather old timer masking some of the nomination latency as "idle time waiting to trigger". Because we currently trigger pessimistically, it actually offsets some of the total nomination time (specifically, extra latency for the first nomination message to reach a node).

With trigger timer change, we're getting a more accurate measurement of nomination latency. The down side of this is that latency is actually higher than what we report on pubnet right now, and this directly affects timeouts, so we're not exactly sure if this is going to be a problem on mainnet. I think what's blocking us right now is a proper rollout plan. We see that connectivity among Tier1 in the most recent network survey might be "good enough" to avoid extra timeouts. The problem is that the network is constantly churning, and even simple things like node restarts impact Tier1 connectivity.

@SirTyson given your proposal to make the change without changing the topology, could we put together the actual rollout plan, metrics/nodes to monitor, as well as plan in place in case things go sideways (and what does sideways actually mean here? is it just a perf degradation?)

@marta-lokhova

Copy link
Copy Markdown
Contributor

Btw, I think there are options to explore wrt rollout plan. For example, we could do something like better utilize preferred peers to guarantee hop count that is "good enough" for trigger time (assuming future Tier1 expansion to 10 orgs, for example)

@SirTyson SirTyson force-pushed the externalize-clock branch from dd8b84b to 411eff9 Compare June 11, 2026 17:31
@socket-security

socket-security Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​rsntp@​4.1.110010093100100

View full report

@SirTyson

Copy link
Copy Markdown
Contributor Author

I've put together a rollout plan here. I've also added in the last commit an NTP probe. By default, we configure a NTP server that core will periodically check to make sure it's local clock is synced. It's on by default for validators, but can be disabled. Hopefully this will help make sure the network has good clock hygiene before we make the switch.

@SirTyson SirTyson force-pushed the externalize-clock branch from 411eff9 to fdf1e98 Compare June 16, 2026 22:31
Comment thread src/main/NtpProbe.cpp Outdated
@marta-lokhova

Copy link
Copy Markdown
Contributor

I left a few cleanup comments. Given that with various simulation bugfixes described in this doc we don't observe nomination degradation with 100% new timer nodes, I think it's fine to merge this PR. The new functionality is behind an experimental flag, and we can finalize the rollout strategy independently to avoid bitrot.

@SirTyson SirTyson force-pushed the externalize-clock branch from fdf1e98 to 23b6095 Compare June 16, 2026 23:14
@SirTyson SirTyson force-pushed the externalize-clock branch from 23b6095 to ad88f37 Compare June 16, 2026 23:48
@SirTyson SirTyson added this pull request to the merge queue Jun 17, 2026
Merged via the queue into stellar:master with commit bbd9114 Jun 17, 2026
53 checks passed
@SirTyson SirTyson deleted the externalize-clock branch June 17, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants