Skip to content

Allow explicit data transfers to GPUs#156620

Draft
ZuseZ4 wants to merge 15 commits into
rust-lang:mainfrom
ZuseZ4:offload-explicit-datatransfer
Draft

Allow explicit data transfers to GPUs#156620
ZuseZ4 wants to merge 15 commits into
rust-lang:mainfrom
ZuseZ4:offload-explicit-datatransfer

Conversation

@ZuseZ4

@ZuseZ4 ZuseZ4 commented May 15, 2026

Copy link
Copy Markdown
Member

View all comments

So far we had our offload intrinsics handle data movement automatically to/from the gpu.
That's convenient (and reasonably fast once our LLVM opts land). However, Rust generally also allows being explicit. That might give perf benefits (where our LLVM opts fail), and it could also be nice for modelling, when passing data around but still preventing CPU users from accesing it.

@ZuseZ4 ZuseZ4 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. F-gpu_offload `#![feature(gpu_offload)]` labels May 15, 2026
@rustbot rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 15, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from e475c46 to da102aa Compare May 15, 2026 22:37
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4

ZuseZ4 commented May 15, 2026

Copy link
Copy Markdown
Member Author

Vendoring llvm/llvm-project#198033 for now.

@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from abc274d to 1d8d1e7 Compare May 29, 2026 01:47
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 1d8d1e7 to a94ef31 Compare May 29, 2026 02:58
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch 2 times, most recently from 4b77bad to 319ef7d Compare May 31, 2026 00:47
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from fba7eb2 to 358171b Compare May 31, 2026 02:35
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 2f1d614 to bbe3882 Compare May 31, 2026 20:03
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from bbe3882 to d290591 Compare May 31, 2026 20:32
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from d290591 to e8ad696 Compare May 31, 2026 21:59
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 4f5c325 to 6c8bec9 Compare June 1, 2026 01:36
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Comment thread library/core/src/offload/mod.rs Outdated
Comment thread library/core/src/offload/mod.rs Outdated
Comment on lines +39 to +41
// This exists so MIR creates Drop terminators for PreloadMut.
// rustc codegen intercepts those terminators and emits the
// offload return mapper.

@oli-obk oli-obk Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not just an intrinsic call here?

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly just experimenting, partly because intrinsics recently changed a bit, they got updated for more explicit Place handling, about which I didn't want to think for my mvp. I'll update them to intrinsics after my deadline.

Comment thread compiler/rustc_codegen_ssa/src/mir/block.rs Outdated

#[lang = "preload"]
#[unstable(feature = "offload", issue = "124509")]
pub fn preload<'a, T: ?Sized>(x: &'a T) -> Preload<'a, T> {

@oli-obk oli-obk Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think these should just be intrinsics instead of catching lang item calls during codegen of call terminators.

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Which lang items"? (my code who fails to catch an inlined terminator call :D)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, with an intrinsic it actually seems to work in release.
Not sure if we want one intrinsic with 2 arguments (mut/const, init/drop) or 4 intrinsics. Right now I have two.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@Sa4dUs Sa4dUs mentioned this pull request Jun 18, 2026
@rust-bors

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants