Skip to content

[Reduce_then_scan refactor pt 1] Relaxing requirement of trivial copyable types#2656

Draft
danhoeflinger wants to merge 33 commits intomainfrom
dev/dhoeflin/enable_reduce_then_scan_everywhere
Draft

[Reduce_then_scan refactor pt 1] Relaxing requirement of trivial copyable types#2656
danhoeflinger wants to merge 33 commits intomainfrom
dev/dhoeflin/enable_reduce_then_scan_everywhere

Conversation

@danhoeflinger
Copy link
Copy Markdown
Contributor

@danhoeflinger danhoeflinger commented Apr 7, 2026

This pull request relaxes the requirement of trivially copyable value types for reduce_then_scan, taking the first step toward enabling it for all cases.

  • Implements SLM and barrier-based fallback implementations for __group_broadcast and __shift_group_right. This requires extra SLM data to be allocated and passed.
  • To avoid many repeated branches in inner loops, this extracts the kernel bodies into member functions to branch once for SLM vs sycl-builtin implementations. This is the biggest change from a LOC standpoint, but is mostly just moving code.
  • Removes trivially copyable restriction in code dispatching to reduce_then_scan
  • Adds explicit return type to __gen_count_mask, __gen_expand_count_mask, __gen_set_balanced_path,
    __gen_set_op_from_known_balanced_path to correct intermediate type to be better controlled and resolve ambiguity with SLM pointer type.

Full picture:

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
@danhoeflinger danhoeflinger marked this pull request as draft April 7, 2026 16:15
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the SYCL reduce-then-scan implementation to support non-trivially-copyable value types by introducing SLM/barrier-based sub-group communication fallbacks, and updates related generator types/traits to use explicit return types for better type control.

Changes:

  • Adds SLM/barrier-based fallback implementations for sub-group communication primitives used by reduce-then-scan.
  • Removes the “trivially copyable” restriction in dispatch logic so reduce-then-scan can be selected for more value types.
  • Adds explicit return-type template parameters to several generator types and updates associated device-copyable trait specializations and tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/general/implementation_details/device_copyable.pass.cpp Updates static-asserts for new generator template parameters (explicit return types).
test/general/implementation_details/balanced_path_unit_tests.pass.cpp Updates balanced-path generator instantiation to include explicit return type.
include/oneapi/dpl/pstl/hetero/dpcpp/sycl_traits.h Updates forward declarations and sycl::is_device_copyable specializations for new generator signatures.
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h Removes trivially-copyable gating for reduce-then-scan selection in scan/segmented paths.
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h Implements SLM-based sub-group communication fallbacks and refactors kernels to branch once per kernel body.
include/oneapi/dpl/experimental/kt/internal/work_group/work_group_scan.h Updates calls into the shared sub-group scan primitives to match new signatures.
include/oneapi/dpl/experimental/kt/internal/sub_group/sub_group_scan.h Updates calls into the shared sub-group scan primitives to match new signatures.
include/oneapi/dpl/experimental/kt/internal/cooperative_lookback.h Updates calls into the shared sub-group scan primitives to match new signatures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/oneapi/dpl/experimental/kt/internal/work_group/work_group_scan.h Outdated
Comment thread include/oneapi/dpl/experimental/kt/internal/work_group/work_group_scan.h Outdated
Comment thread include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h Outdated
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
@danhoeflinger danhoeflinger added this to the 2022.13.0 milestone Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants