Skip to content

Experiment with VMClock resume network handoff#254

Closed
sjmiller609 wants to merge 1 commit into
hypeship/minimal-network-handofffrom
codex/vmclock-network-handoff
Closed

Experiment with VMClock resume network handoff#254
sjmiller609 wants to merge 1 commit into
hypeship/minimal-network-handofffrom
codex/vmclock-network-handoff

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Summary

This is an experimental branch stacked on hypeship/minimal-network-handoff to evaluate the new ch-6.16.9-kernel-0.1-202605301 VMClock generation-counter notification path for guest-initiated resume network handoff.

Changes:

  • bump the experiment branch to Firecracker v1.15.1 and kernel ch-6.16.9-kernel-0.1-202605301
  • add guest-agent resume signal selection: auto, vmclock, vmgenid, and experimental vmclock-spin
  • add a gated resume-network signal perf harness
  • add gated ack-stage telemetry to split mailbox-observed time from netlink/apply time
  • add a gated host-side prefetch experiment for validation

Benchmark Notes

Remote host: deft-kernel-dev
Benchmark: TestResumeNetworkSignalPerf, Firecracker, Alpine, 1 vCPU, wait_for_network=true.
The first fork in each run is excluded from the comparison below because it consistently includes first-use setup/cache behavior and has very different total latency.

Results, excluding first fork

signal fork total avg / p50 / max reconfigure avg / p50 / max mailbox ack avg / p50 / max apply after mailbox avg / p50 / max
VMClock poll 437.8 / 384 / 652 ms 265.6 / 230 / 458 ms 262.1 / 227 / 453 ms 3.4 / 3 / 5 ms
VMGenID kmsg 444.2 / 455 / 518 ms 251.3 / 263 / 306 ms 246.7 / 257 / 300 ms 4.6 / 6 / 7 ms
VMClock mmap spin 435.1 / 421 / 664 ms 257.8 / 210 / 490 ms 255.0 / 208 / 486 ms 2.6 / 2 / 4 ms

Interpretation

VMClock did not improve the hot path in this benchmark. The guest-side netlink/apply work is already only about 2-7ms after the guest-agent sees the mailbox. Almost all remaining delay is before the guest-agent observes the resume signal and decodes the mailbox.

The mmap spin experiment did not materially improve over poll(), which suggests the delay is not just the VMClock notification path. It looks more like guest userspace/vCPU progress after Resume returns, or possibly restore-time memory/page/cache behavior.

Host-side prefetch experiments were inconclusive/not helpful:

  • 64MiB around the mailbox offset did not move the distribution meaningfully.
  • 1GiB prefetch added pre-resume cost and still left later forks around ~260-290ms mailbox ack.

Test Commands

# Local compile checks
env GOCACHE=/private/tmp/hypeman-go-build go test ./lib/system/guest_agent ./lib/instances -run 'TestResumeNetworkSignalPerf|TestPatchGuestResumeNetworkMailbox' -count=1
git diff --check

# Remote benchmark examples
sudo -E env PATH="$PATH" HOME="$HOME" GOCACHE=/tmp/hypeman-go-build \
  HYPEMAN_RUN_RESUME_NETWORK_SIGNAL_PERF=1 \
  HYPEMAN_RESUME_NETWORK_SIGNAL=vmclock \
  HYPEMAN_RESUME_NETWORK_SIGNAL_PERF_ITERS=10 \
  go test ./lib/instances -run TestResumeNetworkSignalPerf -count=1 -timeout=20m -v

@sjmiller609
Copy link
Copy Markdown
Collaborator Author

Closing this as an archived VMClock experiment while the production handoff path moves to #260. Keeping the old branch around for comparison.

@sjmiller609 sjmiller609 closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant