Skip to content

Harden quartile statistics and measurement timeout handling#392

Open
oleksandr-pavlyk wants to merge 16 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:harden-robust-stats-and-timeout-diag
Open

Harden quartile statistics and measurement timeout handling#392
oleksandr-pavlyk wants to merge 16 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:harden-robust-stats-and-timeout-diag

Conversation

@oleksandr-pavlyk

@oleksandr-pavlyk oleksandr-pavlyk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

This PR hardens the robust timing-summary utilities and the measurement paths that consume them.

Changes include:

  • Use std::move instead of std::forward when consuming an rvalue vector.
  • Avoid generating derived cold-measurement timing summaries when no samples were accepted, but keep wall-time summaries and timeout diagnostics available for zero-sample timeout cases.
  • Constrain direct percentile/quartile helpers to floating-point value types.
  • Guard percentile/quartile routines against NaN inputs before sorting or selection.
  • Add regression coverage for quartile behavior around the sort/selection threshold, including duplicate-heavy inputs.
  • Document the sort/selection threshold rationale.
  • Preserve legacy stdev/relative summary tags for low-sample runs by emitting the standard-deviation unavailable sentinel.
  • Treat missing, NaN, and negative stdev noise as unavailable in timeout diagnostics, while preserving +inf as the over-threshold sentinel.
  • Share timeout-warning logic between cold and CPU-only measurement paths.

Testing

  • Added/updated testing/statistics.cu coverage for quartiles, NaN inputs, invalid noise inputs, and threshold-boundary behavior.
  • Added testing/measure_timeout_warnings.cu coverage for timeout-warning noise diagnostics.
  • Existing measurement tests continue to exercise cold and CPU-only summary generation.

Changes are driven by agentic review.

Also add comment within percentile_rank to document precondition
on input values checked with assert statement.

Also, sharpened the comment around percentile_rank function
Prepare duplicate heavy input and check sort-based
quartile computation result with selection-based one.

std::nth_element only guarantees that the nth element
is the value that would appear there in sorted order;
it does not fully sort equal partitions. Bugs in the
selection implementation, especially when selecting Q1
from the left half and Q3 from the right half after
selecting the median, are more likely to show up when
many samples equal the quartile values.
…ulated

Cold measurement can discard throttled trials before incrementing the accepted
sample count, then stop on timeout with zero recorded samples. In that case,
only emit the sample-size summary and skip derived timing, bandwidth, clock, and
bulk summaries that require accepted samples.

This avoids divide-by-zero mean calculations and quartile/IQR computation over
empty sample vectors.

Keep timeout diagnostics reachable for zero-sample runs and add an explicit
warning when no accepted cold samples were recorded. Factor timeout warning
emission into a private helper so the zero-sample and normal paths share the
same diagnostic logic.

Suppress low-sample relative stdev noise

Add a statistics helper that returns no relative standard-deviation noise until
there are enough samples for a meaningful estimate. Use it for cold CPU/GPU and
CPU-only summaries so the low-sample +inf stdev sentinel is not published as
real relative noise or used for max-noise timeout warnings.

Add statistics coverage for suppressing the low-sample sentinel and computing
relative stdev noise once the sample threshold is reached.

compute_standard_deviation_noise return nullopt if standard deviation is not finite

Test verify that noise is nullopt when not enough samples are accumulated

Added statistics::has_enough_samples_for_noise_estimate(...)

Used it in standard_deviation, compute_standard_deviation_noise,
compute_robust_noise.

Added timeout diagnostics in cold and CPU-only paths.
if max-noise is configured and the run timed out before enough
samples exist to estimate noise, the log now says that explicitly,
otherwise the existing “over noise threshold” warning remains
unchanged.

Added a statistics test assertion for the new sample-count
predicate.
Add fixed expected-value assertions for quartile tests around the
sort/selection switch point, including duplicate-heavy inputs. This keeps the
tests from only proving that both implementations agree with each other.
Introduce new header file with inline implementation. Use it
from measure_cold.cuh and measure_cpu_only.cxx
@oleksandr-pavlyk

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

This comment was marked as outdated.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 63e553c4-61d8-43ef-bb86-49405d96d8da

📥 Commits

Reviewing files that changed from the base of the PR and between b1c61e5 and d230a16.

📒 Files selected for processing (8)
  • nvbench/detail/measure_cold.cu
  • nvbench/detail/measure_cold.cuh
  • nvbench/detail/measure_cpu_only.cxx
  • nvbench/detail/measure_timeout_warnings.cuh
  • nvbench/detail/statistics.cuh
  • testing/CMakeLists.txt
  • testing/measure_timeout_warnings.cu
  • testing/statistics.cu

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Enhanced measurement-timeout diagnostics with more contextual warnings when limits are exceeded or noise can’t be estimated.
  • Bug Fixes
    • Cleaner handling for runs with zero accepted samples (early, explicit warning) and improved relative stdev noise behavior.
    • Safer NaN-aware percentile/quartile outputs and more consistent noise estimation gating.
  • Tests
    • Added dedicated timeout-warning tests and expanded coverage for NaN quartiles/percentiles, duplicate-heavy inputs, and standard-deviation-noise calculations.

Walkthrough

Adds shared statistics helpers, NaN-aware percentile/quartile handling, centralized timeout-warning logging for CPU-only and cold measurements, and expanded tests for NaN, duplicate-heavy, and invalid-noise cases.

Changes

Statistics and timeout diagnostics

Layer / File(s) Summary
Noise helpers
nvbench/detail/statistics.cuh
Adds a shared sample-size predicate, uses it in standard-deviation and robust-noise gating, and adds compute_standard_deviation_noise(...).
NaN-aware percentiles
nvbench/detail/statistics.cuh
Makes percentile and quartile helpers return quiet NaNs for NaN-containing inputs and updates the quartile-selection crossover comment.
Timeout warnings in CPU-only measurements
nvbench/detail/measure_timeout_warnings.cuh, nvbench/detail/measure_cpu_only.cxx
Adds shared timeout-warning logging helpers and routes CPU-only timeout warnings through them with the new standard-deviation noise value.
Timeout warnings in cold measurements
nvbench/detail/measure_cold.cuh, nvbench/detail/measure_cold.cu
Declares and uses log_timeout_warnings(...), moves the walltime summary earlier, and adds the zero-accepted-samples early return.
Statistics validation cases
testing/CMakeLists.txt, testing/measure_timeout_warnings.cu, testing/statistics.cu
Adds the new timeout-warning test target and expands statistics tests for NaN, duplicate-heavy, and invalid-noise cases.

Comment @coderabbitai help to get the list of available commands.

coderabbitai[bot]

This comment was marked as resolved.

… constant

Expose quartile threshold value, use it in testing to test around that value.
Keep legacy stdev/relative summary tags present even when too few
samples are available to compute a meaningful standard-deviation noise
estimate. Use the standard-deviation unavailable sentinel for those
values so existing summary consumers continue to see the expected tags.

Factor the sentinel into the statistics helpers and use it from both
standard_deviation() and stdev_noise_or_sentinel(), keeping the schema
compatibility behavior explicit and tested.
…navailable

Add a focused test target, nvbench.test.measure_timeout_warnings, covering:

  - NaN stdev noise -> “unable to estimate noise”
  - negative stdev noise -> “unable to estimate noise”
  - +inf stdev noise -> “over noise threshold”
@oleksandr-pavlyk

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as resolved.

check_noise_warning() now takes std::optional<nvbench::float64_t>,
matching the production helper, and the test now covers
std::nullopt explicitly in addition to NaN, negative, and +inf.
@oleksandr-pavlyk

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as resolved.

Document that percentile helpers return quiet NaNs for NaN-containing inputs.

Make quartile expected-value tests compute ranks from the documented
round(p / 100 * (n - 1)) rule instead of reusing statistics::percentile_rank(),
so rank regressions are caught independently.

Extend timeout-warning coverage to exercise the too-few-samples max-noise path
in addition to unavailable, invalid, and infinite stdev-noise inputs.
@oleksandr-pavlyk

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

This comment was marked as outdated.

@oleksandr-pavlyk oleksandr-pavlyk marked this pull request as ready for review June 28, 2026 13:57
@oleksandr-pavlyk oleksandr-pavlyk requested a review from fbusato June 28, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant