Add multi-event sampling#7
Open
GuillaumeLagrange wants to merge 5 commits into
Open
Conversation
Open extra hardware events as counting-only siblings in each sampling event's kernel event group, so every sample carries the group's counter values via PERF_SAMPLE_READ. The converter turns the cumulative values into per-sample deltas, tracked per (kernel event id, tid) because with inherit every thread has its own counter instance sharing the group's id. The deltas ride along on the thread's stack samples and are serialized as a new samplyEventDeltas object in the samples table (one column per event, null when a sample carries no value), next to threadCPUDelta. This keeps the association between stack and event values exact by construction, with per-thread attribution, instead of requiring a timestamp join against per-process counter tracks. The field is a samply extension which the Firefox Profiler UI ignores. If the extra events cannot be opened (PMU not exposed, or inherit + PERF_SAMPLE_READ requiring Linux 6.12), retry hardware-cycles sampling without them before falling back to software cpu-clock. No events are wired up yet; event selection comes separately. Refs COD-2810 Co-Authored-By: Claude <noreply@anthropic.com>
The interface is very low level on purpose to leave the logic of chosing which events to open and how to call it up to the caller.
Pre-6.12 kernels reject `inherit` combined with `PERF_SAMPLE_READ`, so the extra-event group (the per-sample hardware counters) couldn't be opened alongside the inherited sampling events; samply hit EINVAL and silently dropped the extra counters. Probe the kernel once for inherited sample-read support. When it's missing, open the sampling group per-CPU system-wide (pid=-1, no inherit) across all CPUs instead of per-task. This still captures every thread on the CPU — including ones spawned after attach, which the per-task no-inherit approach would miss — and keeps stacks and counters in a single sample record. Samples are filtered to the launched process tree (seeded from the profiled pids, grown on fork) so the system-wide buffers don't pull in unrelated processes. System-wide events aren't tied to the launched task, so they're enabled explicitly (ENABLE_ON_EXEC never fires for them) and profiling stops as soon as the launched process exits (system-wide events never close on their own). On >=6.12 the kernel accepts inherit + PERF_SAMPLE_READ, so the existing per-task + inherit path is used unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
not-matthias
approved these changes
Jun 12, 2026
not-matthias
left a comment
Member
There was a problem hiding this comment.
Looks good overall, just want to make sure we don't accidentally introduce variance/noise into benchmarks due to now profiling the entire system.
Member
There was a problem hiding this comment.
Is there a perf drawback to using system-wide capture? This could lead to unwanted variance in walltime benchmarks on macro runners.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related: AvalancheHQ/linux-perf-event-reader@908775c
Which is updated in our linux-perf-data reference
Initially opened wrongfully as #6