Skip to content

Fix gather/scatter fast division bug#3059

Open
huuanhhuyn wants to merge 4 commits into
NVIDIA:mainfrom
huuanhhuyn:bug_3055
Open

Fix gather/scatter fast division bug#3059
huuanhhuyn wants to merge 4 commits into
NVIDIA:mainfrom
huuanhhuyn:bug_3055

Conversation

@huuanhhuyn

@huuanhhuyn huuanhhuyn commented Jun 24, 2026

Copy link
Copy Markdown

raft::matrix::gather/scatter crashes at int32 boundary even when it advertises int64 type -> issue 3055. This PR fixes this issue.

int64 is still not supported for performance reason with int128.

closes #3055

- gather/scatter crashes at int32 boundary even when it advertises int64 type. This change fixes this issue.
- int64 is still not supported for performance reason with int128.
@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@tarang-jain tarang-jain left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix.

@tarang-jain

Copy link
Copy Markdown
Contributor

Actually, can we add a test for this diff? We could either add a test for FastIntDiv directly, or we can test the raft::matrix::gather when N * D exceeds INT_MAX. I am more inclined to have tests for both -- testing FastIntDiv should be quick where we just compare with the C++ operators ( / and %). But the more important one is to test the raft::matrix::gather for those larger matrices.

@huuanhhuyn huuanhhuyn requested a review from a team as a code owner June 26, 2026 08:23
Comment thread cpp/tests/matrix/gather.cu Outdated
inplace_inputs_i64);
GATHER_TEST((GatherTest<false, false, true, float, int64_t, int64_t>),
GatherInplaceTestCI64I64,
inplace_inputs_i64_i32_max); // slow test, 8GB allocation, reproduces https://github.com/rapidsai/raft/issues/3055

@huuanhhuyn huuanhhuyn Jun 26, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfeher @tarang-jain Only float type is possible here (8GB allocation, 100ms). I would suggest to not have it. I already added a regression test for the fix in fast_int_div.cu

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to spend 100ms on this.

@tfeher tfeher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Huy for the fix, LGTM!

Comment thread cpp/tests/matrix/gather.cu Outdated
inplace_inputs_i64);
GATHER_TEST((GatherTest<false, false, true, float, int64_t, int64_t>),
GatherInplaceTestCI64I64,
inplace_inputs_i64_i32_max); // slow test, 8GB allocation, reproduces https://github.com/rapidsai/raft/issues/3055

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to spend 100ms on this.

@tfeher

tfeher commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/ok to test 9acb86f

@aamijar aamijar requested review from a team as code owners June 26, 2026 21:37
@aamijar

aamijar commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/ok to test 33ec5d7

@aamijar

aamijar commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/ok to test b18e428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] raft::matrix::gather (in-place overload): illegal memory access when N*D ≥ 2^31 (32-bit index overflow)

4 participants