Harden quartile statistics and measurement timeout handling#392
Open
oleksandr-pavlyk wants to merge 16 commits into
Open
Harden quartile statistics and measurement timeout handling#392oleksandr-pavlyk wants to merge 16 commits into
oleksandr-pavlyk wants to merge 16 commits into
Conversation
Also add comment within percentile_rank to document precondition on input values checked with assert statement. Also, sharpened the comment around percentile_rank function
Prepare duplicate heavy input and check sort-based quartile computation result with selection-based one. std::nth_element only guarantees that the nth element is the value that would appear there in sorted order; it does not fully sort equal partitions. Bugs in the selection implementation, especially when selecting Q1 from the left half and Q3 from the right half after selecting the median, are more likely to show up when many samples equal the quartile values.
…ulated Cold measurement can discard throttled trials before incrementing the accepted sample count, then stop on timeout with zero recorded samples. In that case, only emit the sample-size summary and skip derived timing, bandwidth, clock, and bulk summaries that require accepted samples. This avoids divide-by-zero mean calculations and quartile/IQR computation over empty sample vectors. Keep timeout diagnostics reachable for zero-sample runs and add an explicit warning when no accepted cold samples were recorded. Factor timeout warning emission into a private helper so the zero-sample and normal paths share the same diagnostic logic. Suppress low-sample relative stdev noise Add a statistics helper that returns no relative standard-deviation noise until there are enough samples for a meaningful estimate. Use it for cold CPU/GPU and CPU-only summaries so the low-sample +inf stdev sentinel is not published as real relative noise or used for max-noise timeout warnings. Add statistics coverage for suppressing the low-sample sentinel and computing relative stdev noise once the sample threshold is reached. compute_standard_deviation_noise return nullopt if standard deviation is not finite Test verify that noise is nullopt when not enough samples are accumulated Added statistics::has_enough_samples_for_noise_estimate(...) Used it in standard_deviation, compute_standard_deviation_noise, compute_robust_noise. Added timeout diagnostics in cold and CPU-only paths. if max-noise is configured and the run timed out before enough samples exist to estimate noise, the log now says that explicitly, otherwise the existing “over noise threshold” warning remains unchanged. Added a statistics test assertion for the new sample-count predicate.
Add fixed expected-value assertions for quartile tests around the sort/selection switch point, including duplicate-heavy inputs. This keeps the tests from only proving that both implementations agree with each other.
Introduce new header file with inline implementation. Use it from measure_cold.cuh and measure_cpu_only.cxx
Collaborator
Author
|
@coderabbitai full review |
This comment was marked as outdated.
This comment was marked as outdated.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (8)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds shared statistics helpers, NaN-aware percentile/quartile handling, centralized timeout-warning logging for CPU-only and cold measurements, and expanded tests for NaN, duplicate-heavy, and invalid-noise cases. ChangesStatistics and timeout diagnostics
Comment |
… constant Expose quartile threshold value, use it in testing to test around that value.
Keep legacy stdev/relative summary tags present even when too few samples are available to compute a meaningful standard-deviation noise estimate. Use the standard-deviation unavailable sentinel for those values so existing summary consumers continue to see the expected tags. Factor the sentinel into the statistics helpers and use it from both standard_deviation() and stdev_noise_or_sentinel(), keeping the schema compatibility behavior explicit and tested.
…navailable Add a focused test target, nvbench.test.measure_timeout_warnings, covering: - NaN stdev noise -> “unable to estimate noise” - negative stdev noise -> “unable to estimate noise” - +inf stdev noise -> “over noise threshold”
Collaborator
Author
|
@coderabbitai full review |
This comment was marked as outdated.
This comment was marked as outdated.
check_noise_warning() now takes std::optional<nvbench::float64_t>, matching the production helper, and the test now covers std::nullopt explicitly in addition to NaN, negative, and +inf.
Collaborator
Author
|
@coderabbitai full review |
This comment was marked as outdated.
This comment was marked as outdated.
Document that percentile helpers return quiet NaNs for NaN-containing inputs. Make quartile expected-value tests compute ranks from the documented round(p / 100 * (n - 1)) rule instead of reusing statistics::percentile_rank(), so rank regressions are caught independently. Extend timeout-warning coverage to exercise the too-few-samples max-noise path in addition to unavailable, invalid, and infinite stdev-noise inputs.
Collaborator
Author
|
@coderabbitai full review |
This comment was marked as outdated.
This comment was marked as outdated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR hardens the robust timing-summary utilities and the measurement paths that consume them.
Changes include:
std::moveinstead ofstd::forwardwhen consuming an rvalue vector.stdev/relativesummary tags for low-sample runs by emitting the standard-deviation unavailable sentinel.+infas the over-threshold sentinel.Testing
testing/statistics.cucoverage for quartiles, NaN inputs, invalid noise inputs, and threshold-boundary behavior.testing/measure_timeout_warnings.cucoverage for timeout-warning noise diagnostics.Changes are driven by agentic review.