Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new ChangesFP16 Overflow Detection and CAGRA Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173
📒 Files selected for processing (2)
cpp/src/neighbors/detail/cagra/cagra_build.cuhcpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)
70-88:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHIGH: Missing validation could cause kernel launch with zero grid blocks.
If
n_rows == 0(empty dataset) or ifsample_fractionis extremely small with a moderate dataset size (e.g.,n_rows=5000, sample_fraction=0.0001),n_samplebecomes 0, resulting ingrid_size = 0. Launching a kernel with zero blocks is invalid.Consider adding a minimum sample size guard:
Suggested fix
int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows)); if (n_rows <= 1000) { n_sample = n_rows; // for small datasets, just use them all and skip the sampling overhead } else if (n_rows > 100000) { // cap the sample size to 100k for speed and keep memory use within the limited workspace resource n_sample = 100000; } + + // Handle empty dataset or degenerate sample size + if (n_sample <= 0) { return 0.0f; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add a minimum sample size guard to prevent grid_size from being zero when launching the kernel. After the existing size constraint checks for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at least 1, since if n_sample is 0, the subsequent calculation of grid_size will be 0, resulting in an invalid kernel launch with zero blocks. This guards against both empty datasets (n_rows == 0) and extremely small sample fractions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab
📒 Files selected for processing (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
mfoerste4
left a comment
There was a problem hiding this comment.
Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.
mfoerste4
left a comment
There was a problem hiding this comment.
LGTM, thanks @huuanhhuyn for the PR!
| RAFT_EXPECTS(kDelay >= kSaturation, | ||
| "kDelay must not be smaller than kSaturation so that n_sample is always less than " | ||
| "or equal to n_rows"); | ||
| int64_t n_sample = (n_rows * kSaturation + (n_rows + kDelay - 1)) / (n_rows + kDelay); |
There was a problem hiding this comment.
Please use raft::ceildiv here.
Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.
This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.
Sampling: We uniformly sampled

n_samplesamples from then_rowsof the datasetMeasured runtime: several hundreds of milliseconds on 100M rows