Predictable raft::resources#3052
Conversation
|
@coderabbitai review |
✅ Action performedReview finished.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe PR replaces the mutex-protected Changesraft::resources atomic cell model refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/include/raft/core/resource/device_memory_resource.hpp (1)
255-323: 📐 Maintainability & Code Quality | 🟠 Major | 🏗️ Heavy liftHIGH: Public workspace setter signatures now hard-break const-callers.
Changing
set_workspace_resource,set_workspace_to_pool_resource,set_workspace_to_global_resource, andset_large_workspace_resourcefromresources const&toresources&is a source-breaking change in a public header, and I don't see deprecated bridge overloads or migration shims here. Downstream code that forwards a const handle into these helpers will stop compiling immediately.As per path instructions, "public C++ APIs should be stateless and lightweight wrappers" and "breaking C++ API changes require deprecation over releases when possible".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/raft/core/resource/device_memory_resource.hpp` around lines 255 - 323, The public workspace setter APIs now require a mutable resources handle, which breaks existing const-callers. Restore compatibility in the workspace helper functions set_workspace_resource, set_workspace_to_pool_resource, set_workspace_to_global_resource, and set_large_workspace_resource by adding const-friendly overloads or a non-breaking bridge that forwards from resources const& without changing existing call sites, while keeping the new behavior internally where mutation is needed.Source: Path instructions
cpp/include/raft/core/resource/sub_comms.hpp (1)
54-67: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftHIGH:
set_subcommmutates shared state through a const handle.Line 59 keeps
resources const&, but the function inserts into a sharedunordered_mapwith no synchronization. Under the new shared-cell copy semantics that mutation propagates across copies and can race withget_subcommor anotherset_subcomm, which is at odds with the documented "different instances of raft::resources are safe" expectation.As per path instructions, "different instances of raft::resources are safe", and the PR objective says copies "can be accessed and modified independently without propagating changes."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/raft/core/resource/sub_comms.hpp` around lines 54 - 67, `set_subcomm` is mutating shared `resources` state through a const handle, which can propagate across copies and race with `get_subcomm` or another `set_subcomm`. Update the `set_subcomm` path in `sub_comms.hpp` so sub-communicator updates are isolated per `resources` instance rather than writing directly through shared state; use the existing `resources`/`sub_comms_resource` APIs to make the mutation copy-local and add synchronization or ownership separation as needed. Keep the fix centered around `set_subcomm`, `get_subcomm`, and `sub_comms_resource_factory` so the “different instances are safe” and independent-copy semantics are preserved.Source: Path instructions
🧹 Nitpick comments (2)
cpp/tests/core/handle.cpp (1)
354-360: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueStrengthen the propagation assertion (optional).
stream_a == stream_bis trivially satisfied because both resolve to the defaultcuda_stream_per_thread, so it doesn't actually prove the concrete resource is shared via the cell. Consider explicitly setting a non-default stream onabefore the lazy-init path, or rely solely onhas_resource_factory(b)as the propagation signal. Not a defect — thehas_resource_factorycheck on Line 358 already validates the core behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/core/handle.cpp` around lines 354 - 360, The propagation test in the resource lazy-init path is too weak because `resource::get_cuda_stream(a)` and `resource::get_cuda_stream(b)` can both return the default `cuda_stream_per_thread`, so the equality check in the `handle.cpp` test does not prove shared state. Update the test around `resource::get_cuda_stream` and `has_resource_factory` by either setting a non-default CUDA stream on `a` before lazy initialization and then asserting `b` sees the same concrete stream, or remove the trivial `ASSERT_EQ(stream_a, stream_b)` and rely on the `has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW)` assertion as the propagation check.cpp/include/raft/core/resource/cublaslt_handle.hpp (1)
49-50: 🚀 Performance & Scalability | 🔵 TrivialAvoid allocating the default factory on every
get_cublaslt_handle()call.
ensure_default_factory()still receives a freshly constructedcublaslt_resource_factoryeach time, so repeated lookups pay an unnecessary heap allocation before the CAS. Have you considered a lazy-builder overload or an earlyhas_resource_factory()check?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/raft/core/resource/cublaslt_handle.hpp` around lines 49 - 50, The get_cublaslt_handle() path is still creating a new cublaslt_resource_factory on every call before ensuring the default factory. Update the cublaslt_handle.hpp logic around get_cublaslt_handle() and ensure_default_factory() so the default factory is only constructed lazily when no factory is already present, using an early has_resource_factory() check or a lazy-builder overload. Keep the resource lookup behavior the same, but avoid the repeated heap allocation before the CAS.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/include/raft/core/resource/comms.hpp`:
- Line 61: The public resource setter API was changed from const to non-const,
which is a breaking change for existing callers using const resources handles.
Update the relevant public-facing documentation and migration/upgrade notes to
explicitly call out the `set_comms` signature change in `resources` and the
similar setter changes in this PR, and include guidance for downstream users to
update their call sites accordingly. Use the `set_comms` function and any
related setters as the symbols to reference in the release notes so the API
change is easy to find.
In `@cpp/include/raft/core/resource/cublas_handle.hpp`:
- Around line 62-64: The cublas handle access path in cublas_handle.hpp is
mutating a shared CUBLAS_HANDLE cell by rebinding its stream on every call,
which breaks safety when copied resources diverge via CUDA_STREAM_VIEW. Update
the cublas handle setup/access logic around cublas_resource_factory and
get_cublas_handle so each raft::resources instance keeps stream ordering
isolated, rather than reusing and retargeting the same handle across copies.
Ensure the handle’s stream is bound per-resource or per-stream-view in a way
that does not affect sibling copies and preserves correct ordering with
get_cuda_stream().
In `@cpp/include/raft/core/resource/cusolver_dn_handle.hpp`:
- Around line 68-69: get_cusolver_dn_handle currently returns a cached
cusolverDnHandle_t without reattaching it to the current stream, so after
set_cuda_stream() the handle may still target an old stream. Update
get_cusolver_dn_handle to call cusolverDnSetStream(...) on the retrieved handle
using get_cuda_stream(res) before returning it, and apply the same rebinding
pattern to get_cusparse_handle and get_cusolver_sp_handle so all stream-bound
solver handles stay aligned with the active raft::resources stream.
In `@cpp/include/raft/core/resource/multi_gpu.hpp`:
- Around line 55-58: The public root-rank API change in get_root_rank and
set_root_rank needs a temporary deprecated compatibility bridge to avoid
breaking downstream callers. Add a deprecated shim in the multi_gpu resource API
that preserves the previous mutable-reference/caller behavior while forwarding
to the new implementation, and make sure set_root_rank still accepts existing
const resources usage through a migration wrapper. Include Doxygen on any new
public API in resources/multi_gpu.hpp and mark the old entry points deprecated
with a clear migration path before removing them.
In `@cpp/include/raft/core/resource/nccl_comm.hpp`:
- Around line 78-80: `get_resource` in `nccl_comm.hpp` is using
`has_resource_factory(NCCL_COMM)` as an initialization gate, which races under
the shared-cell model and can return a partially initialized communicator vector
or run `_init_nccl_comms(res)` twice. Make `get_resource` (and the NCCL_COMM
path in `ensure_default_factory`/`nccl_comm_resource_factory`) rely on a single
atomic initialization mechanism so that only one caller publishes the factory
and performs `_init_nccl_comms(res)`, while concurrent callers on the same const
handle safely observe the fully initialized state.
In `@cpp/include/raft/core/resources.hpp`:
- Line 81: The public resource update API in resources.hpp has a breaking
constness change, so add a compatibility/deprecation path instead of only
exposing the new mutable form. Keep add_resource_factory and the related set_*
wrappers locatable as the affected symbols, and provide transitional overloads
or forwarding methods that preserve the old public surface where possible while
steering callers to the new API. For the public header, add Doxygen for any new
or transitional APIs and mark the breaking replacements with deprecation
warnings so downstream users get a clear migration path.
- Around line 100-104: `ensure_default_factory` currently indexes `cells_` with
`rtype` without validating the resource type, so a factory returning `LAST_KEY`
can access past the vector. Update `ensure_default_factory` in `resources.hpp`
to mirror the `add_resource_factory` guard for
`resource::resource_type::LAST_KEY` before touching `cells_`, and only perform
the `cells_[rtype]->factory.compare_exchange_strong(...)` path for valid
resource types.
---
Outside diff comments:
In `@cpp/include/raft/core/resource/device_memory_resource.hpp`:
- Around line 255-323: The public workspace setter APIs now require a mutable
resources handle, which breaks existing const-callers. Restore compatibility in
the workspace helper functions set_workspace_resource,
set_workspace_to_pool_resource, set_workspace_to_global_resource, and
set_large_workspace_resource by adding const-friendly overloads or a
non-breaking bridge that forwards from resources const& without changing
existing call sites, while keeping the new behavior internally where mutation is
needed.
In `@cpp/include/raft/core/resource/sub_comms.hpp`:
- Around line 54-67: `set_subcomm` is mutating shared `resources` state through
a const handle, which can propagate across copies and race with `get_subcomm` or
another `set_subcomm`. Update the `set_subcomm` path in `sub_comms.hpp` so
sub-communicator updates are isolated per `resources` instance rather than
writing directly through shared state; use the existing
`resources`/`sub_comms_resource` APIs to make the mutation copy-local and add
synchronization or ownership separation as needed. Keep the fix centered around
`set_subcomm`, `get_subcomm`, and `sub_comms_resource_factory` so the “different
instances are safe” and independent-copy semantics are preserved.
---
Nitpick comments:
In `@cpp/include/raft/core/resource/cublaslt_handle.hpp`:
- Around line 49-50: The get_cublaslt_handle() path is still creating a new
cublaslt_resource_factory on every call before ensuring the default factory.
Update the cublaslt_handle.hpp logic around get_cublaslt_handle() and
ensure_default_factory() so the default factory is only constructed lazily when
no factory is already present, using an early has_resource_factory() check or a
lazy-builder overload. Keep the resource lookup behavior the same, but avoid the
repeated heap allocation before the CAS.
In `@cpp/tests/core/handle.cpp`:
- Around line 354-360: The propagation test in the resource lazy-init path is
too weak because `resource::get_cuda_stream(a)` and
`resource::get_cuda_stream(b)` can both return the default
`cuda_stream_per_thread`, so the equality check in the `handle.cpp` test does
not prove shared state. Update the test around `resource::get_cuda_stream` and
`has_resource_factory` by either setting a non-default CUDA stream on `a` before
lazy initialization and then asserting `b` sees the same concrete stream, or
remove the trivial `ASSERT_EQ(stream_a, stream_b)` and rely on the
`has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW)` assertion as
the propagation check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 38c5e204-a599-42f7-a81d-cdc0784e4e52
📒 Files selected for processing (29)
cpp/CMakeLists.txtcpp/include/raft/core/device_resources.hppcpp/include/raft/core/handle.hppcpp/include/raft/core/memory_stats_resources.hppcpp/include/raft/core/memory_tracking_resources.hppcpp/include/raft/core/resource/comms.hppcpp/include/raft/core/resource/cublas_handle.hppcpp/include/raft/core/resource/cublaslt_handle.hppcpp/include/raft/core/resource/cuda_stream.hppcpp/include/raft/core/resource/cuda_stream_pool.hppcpp/include/raft/core/resource/cusolver_dn_handle.hppcpp/include/raft/core/resource/cusolver_sp_handle.hppcpp/include/raft/core/resource/cusparse_handle.hppcpp/include/raft/core/resource/custom_resource.hppcpp/include/raft/core/resource/detail/stream_sync_event.hppcpp/include/raft/core/resource/device_id.hppcpp/include/raft/core/resource/device_memory_resource.hppcpp/include/raft/core/resource/device_properties.hppcpp/include/raft/core/resource/managed_memory_resource.hppcpp/include/raft/core/resource/multi_gpu.hppcpp/include/raft/core/resource/nccl_comm.hppcpp/include/raft/core/resource/pinned_memory_resource.hppcpp/include/raft/core/resource/resource_types.hppcpp/include/raft/core/resource/stream_view.hppcpp/include/raft/core/resource/sub_comms.hppcpp/include/raft/core/resource/thrust_policy.hppcpp/include/raft/core/resources.hppcpp/internal/CMakeLists.txtcpp/tests/core/handle.cpp
| res.ensure_default_factory(std::make_shared<cublas_resource_factory>(get_cuda_stream(res))); | ||
| auto ret = *res.get_resource<cublasHandle_t>(resource_type::CUBLAS_HANDLE); | ||
| RAFT_CUBLAS_TRY(cublasSetStream(ret, get_cuda_stream(res))); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
HIGH: Shared cuBLAS handle is rebound across divergent stream copies.
After a resources copy changes its CUDA_STREAM_VIEW, both copies can still share the same CUBLAS_HANDLE cell, and Line 64 mutates that shared handle’s stream on every access. Interleaved use of the two handles can therefore enqueue BLAS work on the wrong stream or introduce false ordering between otherwise independent copies.
As per path instructions, "different instances of raft::resources are safe" and "ensure work enqueued on raft::resources’ internal streams is correctly ordered w.r.t. get_cuda_stream()".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/include/raft/core/resource/cublas_handle.hpp` around lines 62 - 64, The
cublas handle access path in cublas_handle.hpp is mutating a shared
CUBLAS_HANDLE cell by rebinding its stream on every call, which breaks safety
when copied resources diverge via CUDA_STREAM_VIEW. Update the cublas handle
setup/access logic around cublas_resource_factory and get_cublas_handle so each
raft::resources instance keeps stream ordering isolated, rather than reusing and
retargeting the same handle across copies. Ensure the handle’s stream is bound
per-resource or per-stream-view in a way that does not affect sibling copies and
preserves correct ordering with get_cuda_stream().
Source: Path instructions
| res.ensure_default_factory(std::make_shared<cusolver_dn_resource_factory>(get_cuda_stream(res))); | ||
| return *res.get_resource<cusolverDnHandle_t>(resource_type::CUSOLVER_DN_HANDLE); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n -C3 'get_cusolver|get_cusparse|SetStream|get_cuda_stream\(res\)' cpp/include/raft/core/resourceRepository: rapidsai/raft
Length of output: 9241
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect resource ownership/copy semantics and the sibling handle getters.
ast-grep outline cpp/include/raft/core/resource/resource.hpp --view expanded || true
ast-grep outline cpp/include/raft/core/resource/cublas_handle.hpp --view expanded || true
ast-grep outline cpp/include/raft/core/resource/cusparse_handle.hpp --view expanded || true
ast-grep outline cpp/include/raft/core/resource/cusolver_sp_handle.hpp --view expanded || true
ast-grep outline cpp/include/raft/core/resource/cusolver_dn_handle.hpp --view expanded || true
echo '--- resource/copy semantics search ---'
rg -n -C3 'shared_ptr|copy|copy-construct|clone|resource_cell|ensure_default_factory|get_resource<|set_cuda_stream|CUDA_STREAM_VIEW' cpp/include/raft/core/resource cpp/include/raft/core | sed -n '1,220p'Repository: rapidsai/raft
Length of output: 18111
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ast-grep outline cpp/include/raft/core/resource/cuda_stream.hpp --view expanded || true
sed -n '1,220p' cpp/include/raft/core/resource/cuda_stream.hppRepository: rapidsai/raft
Length of output: 2874
HIGH: Rebind stream-bound solver handles on access
set_cuda_stream() can replace the stream resource after this handle has already been cached, so get_cusolver_dn_handle() may keep enqueuing work on the old stream. get_cusparse_handle() and get_cusolver_sp_handle() have the same pattern; cusolverDnSetStream(...)/cusparseSetStream(...)/cusolverSpSetStream(...) on each access would keep the cached handle aligned with the current raft::resources stream.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/include/raft/core/resource/cusolver_dn_handle.hpp` around lines 68 - 69,
get_cusolver_dn_handle currently returns a cached cusolverDnHandle_t without
reattaching it to the current stream, so after set_cuda_stream() the handle may
still target an old stream. Update get_cusolver_dn_handle to call
cusolverDnSetStream(...) on the retrieved handle using get_cuda_stream(res)
before returning it, and apply the same rebinding pattern to get_cusparse_handle
and get_cusolver_sp_handle so all stream-bound solver handles stay aligned with
the active raft::resources stream.
Source: Path instructions
| inline int get_root_rank(resources const& res) | ||
| { | ||
| if (!res.has_resource_factory(resource_type::ROOT_RANK)) { | ||
| res.add_resource_factory(std::make_shared<root_rank_resource_factory>()); | ||
| } | ||
| res.ensure_default_factory(std::make_shared<root_rank_resource_factory>()); | ||
| return *res.get_resource<int>(resource_type::ROOT_RANK); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
HIGH: Add a deprecation bridge for the public root-rank API change.
Issue: get_root_rank no longer exposes the previous mutable reference path, and set_root_rank now rejects const resources& callers.
Why: This public header change can break downstream code at compile time; have you considered a temporary deprecated compatibility shim or explicit migration bridge before removal? As per coding guidelines, “API changes require deprecation warnings.” As per path instructions, “For public headers under cpp/include/raft, also require Doxygen on new APIs and deprecation warnings on breaking changes.”
Also applies to: 123-125
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/include/raft/core/resource/multi_gpu.hpp` around lines 55 - 58, The
public root-rank API change in get_root_rank and set_root_rank needs a temporary
deprecated compatibility bridge to avoid breaking downstream callers. Add a
deprecated shim in the multi_gpu resource API that preserves the previous
mutable-reference/caller behavior while forwarding to the new implementation,
and make sure set_root_rank still accepts existing const resources usage through
a migration wrapper. Include Doxygen on any new public API in
resources/multi_gpu.hpp and mark the old entry points deprecated with a clear
migration path before removing them.
Sources: Coding guidelines, Path instructions
| if (!res.has_resource_factory(resource_type::NCCL_COMM)) { | ||
| res.add_resource_factory(std::make_shared<nccl_comm_resource_factory>()); | ||
| res.ensure_default_factory(std::make_shared<nccl_comm_resource_factory>()); | ||
| _init_nccl_comms(res); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift
CRITICAL: Factory publication races NCCL communicator initialization.
has_resource_factory(NCCL_COMM) is no longer a safe "initialized" check under the shared-cell model. Once one caller stores the factory, another caller can skip _init_nccl_comms(res) and return an empty/partially initialized vector, and two callers that both observe "no factory" can both run _init_nccl_comms against the same resource.
As per path instructions, "Concurrent get_resource calls on the same const handle are safe" and "comment if concurrent get/set semantics can lead to wrong initialization, UAF, or data races."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/include/raft/core/resource/nccl_comm.hpp` around lines 78 - 80,
`get_resource` in `nccl_comm.hpp` is using `has_resource_factory(NCCL_COMM)` as
an initialization gate, which races under the shared-cell model and can return a
partially initialized communicator vector or run `_init_nccl_comms(res)` twice.
Make `get_resource` (and the NCCL_COMM path in
`ensure_default_factory`/`nccl_comm_resource_factory`) rely on a single atomic
initialization mechanism so that only one caller publishes the factory and
performs `_init_nccl_comms(res)`, while concurrent callers on the same const
handle safely observe the fully initialized state.
Source: Path instructions
| void ensure_default_factory(std::shared_ptr<resource::resource_factory> factory) const | ||
| { | ||
| resource::resource_type rtype = factory->get_resource_type(); | ||
| std::shared_ptr<resource::resource_factory> expected{}; | ||
| cells_[rtype]->factory.compare_exchange_strong(expected, std::move(factory)); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win
CRITICAL: Guard ensure_default_factory before indexing cells_.
add_resource_factory rejects LAST_KEY, but ensure_default_factory indexes cells_[rtype] without the same check. A factory returning LAST_KEY indexes one past the vector and can crash or corrupt memory.
Suggested fix
void ensure_default_factory(std::shared_ptr<resource::resource_factory> factory) const
{
+ RAFT_EXPECTS(factory != nullptr, "Resource factory must not be null.");
resource::resource_type rtype = factory->get_resource_type();
+ RAFT_EXPECTS(rtype != resource::resource_type::LAST_KEY,
+ "LAST_KEY is a placeholder and not a valid resource factory type.");
std::shared_ptr<resource::resource_factory> expected{};
cells_[rtype]->factory.compare_exchange_strong(expected, std::move(factory));
}As per coding guidelines, invalid memory access is a CRITICAL issue.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void ensure_default_factory(std::shared_ptr<resource::resource_factory> factory) const | |
| { | |
| resource::resource_type rtype = factory->get_resource_type(); | |
| std::shared_ptr<resource::resource_factory> expected{}; | |
| cells_[rtype]->factory.compare_exchange_strong(expected, std::move(factory)); | |
| void ensure_default_factory(std::shared_ptr<resource::resource_factory> factory) const | |
| { | |
| RAFT_EXPECTS(factory != nullptr, "Resource factory must not be null."); | |
| resource::resource_type rtype = factory->get_resource_type(); | |
| RAFT_EXPECTS(rtype != resource::resource_type::LAST_KEY, | |
| "LAST_KEY is a placeholder and not a valid resource factory type."); | |
| std::shared_ptr<resource::resource_factory> expected{}; | |
| cells_[rtype]->factory.compare_exchange_strong(expected, std::move(factory)); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/include/raft/core/resources.hpp` around lines 100 - 104,
`ensure_default_factory` currently indexes `cells_` with `rtype` without
validating the resource type, so a factory returning `LAST_KEY` can access past
the vector. Update `ensure_default_factory` in `resources.hpp` to mirror the
`add_resource_factory` guard for `resource::resource_type::LAST_KEY` before
touching `cells_`, and only perform the
`cells_[rtype]->factory.compare_exchange_strong(...)` path for valid resource
types.
Source: Coding guidelines
Make the
raft::resourcesresource initialization semantics more predictable.raft::resourceshandle construction: all copies of theraft::resourceshandle point to the same resources (not a breaking change, fixes the re-initialization issue).set_resourcechanges to non-const semantics (breaking change).Before: all
set_xxxresource-updating calls were operating on const handleNow: all
set_xxxresources require a non-const handle.Thread-safety
First and foremost, thread-safety of using a specific resource depends on that resource. Here we discuss the thread-safety of using
raft::resourceshandle itself (get_resource and set_resource functions).Accessing the same resource by const ref
Updates (set_resource) are not possible (a user should copy a handle and modify the new one). All concurrent get_resource calls are atomic and safe, even if they initialize the factories and resources under the hood. The worst can happen is the same resource being initialized concurrently in two threads but only one being stored in the handle (the other one is discarded).
Accessing the same resource by non-const ref
Using the same object by non-const ref from multiple threads is always unsafe.
Accessing copies of the same resource
The resources and factories are updated atomically. Modifying any resource doesn't propagate to the copies. Accessing a resource while another threads access or modifies the same resource via another handle is thus safe.
Implementation details
The PR adds one more layer of indirection (one extra shared_ptr) to each resource, which may cause an extra runtime overhead. This is unavoidable if we want to allow lazy-initialized resources back-propagate across handles.
On the other hand, the PR removes the handle mutex in favor of C++20
std::atomic<shared_ptr>, which may reduce the runtime overheads a little bit.Breaking changes
raft::resourceshandles.std::atomic<shared_ptr>requires enforcing C++20 for all dependent libraries. An alternative would be to put a mutex perraft::resource_cell.