Skip to content

Add restore deep trace debug mode#255

Draft
sjmiller609 wants to merge 4 commits into
hypeship/minimal-network-handofffrom
codex/restore-deep-trace-debug
Draft

Add restore deep trace debug mode#255
sjmiller609 wants to merge 4 commits into
hypeship/minimal-network-handofffrom
codex/restore-deep-trace-debug

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Summary

  • add an opt-in restore deep trace mode gated by HYPEMAN_RESTORE_DEEP_TRACE=1
  • mark Resume -> guest network ack milestones in ftrace and capture /proc process/thread/io snapshots
  • add a Linux-only perf test harness for Firecracker restore deep traces with guest network stage acks

Remote result

On deft-kernel-dev, 3 sampled forks showed the Resume-returned -> guest-signal window dominated by Firecracker vCPU page faults and backing-file reads, not guest netlink work:

  • reconfigure_guest_network: 130ms, 342ms, 220ms
  • guest_resume_network_udp_ack_wait: 110ms, 341ms, 219ms
  • by guest_signal_seen: ~1.0k-1.1k minor faults, 3-29 major faults, and 19MB-140MB read_bytes on the Firecracker process, almost entirely fc_vcpu 0
  • ftrace showed thousands of kvm_page_fault events plus filemap page-cache adds and block read-ahead issues during that window
  • guest-internal signal -> applied remained about 4.7ms, 17.7ms, 14.2ms

Validation

  • gofmt
  • git diff --check
  • GOCACHE=/private/tmp/hypeman-go-build go test ./lib/system/guest_agent ./lib/instances -run 'TestPatchGuestResumeNetworkMailbox|TestForkSnapshotMapsWaitForNetwork|TestRestoreDeepTracePerf' -count=1
  • remote: sudo -E env ... HYPEMAN_RUN_RESTORE_DEEP_TRACE_PERF=1 HYPEMAN_RESTORE_DEEP_TRACE_PERF_ITERS=3 go test ./lib/instances -run TestRestoreDeepTracePerf -count=1 -timeout=20m -v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant