Skip to content

Support per-backend role and destination in replication engine#2743

Open
maeldonn wants to merge 3 commits into
development/9.4from
improvement/BB-762/crr-multi
Open

Support per-backend role and destination in replication engine#2743
maeldonn wants to merge 3 commits into
development/9.4from
improvement/BB-762/crr-multi

Conversation

@maeldonn
Copy link
Copy Markdown
Contributor

Read destination bucket and role from each backends[] entry instead of the shared top-level fields, so a single source object can be replicated to multiple CRR destinations with their own role. Legacy entries without per-backend fields keep working via top-level fallback in ObjectQueueEntry's site-aware getters; MongoQueueProcessor's oplog path now matches every applicable rule and dedups backends per the design's (site, destination, role) rule.

Issue: BB-762

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 27, 2026

Hello maeldonn,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 27, 2026

Incorrect fix version

The Fix Version/s in issue BB-762 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.4.1

  • 9.5.0

Please check the Fix Version/s of BB-762, or the target
branch of this pull request.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 97.75281% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.95%. Comparing base (6223d3e) to head (91a58ad).

Files with missing lines Patch % Lines
extensions/replication/tasks/ReplicateObject.js 90.47% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/mongoProcessor/MongoQueueProcessor.js 66.81% <100.00%> (-2.60%) ⬇️
...sions/replication/queueProcessor/QueueProcessor.js 75.92% <100.00%> (+2.97%) ⬆️
...xtensions/replication/tasks/MultipleBackendTask.js 56.17% <ø> (+0.72%) ⬆️
...sions/replication/tasks/UpdateReplicationStatus.js 81.81% <100.00%> (+6.56%) ⬆️
lib/models/ObjectQueueEntry.js 94.73% <100.00%> (+3.43%) ⬆️
lib/models/QueueEntry.js 83.33% <100.00%> (+9.80%) ⬆️
extensions/replication/tasks/ReplicateObject.js 91.97% <90.47%> (+0.76%) ⬆️

... and 6 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 80.97% <100.00%> (-0.24%) ⬇️
Ingestion 70.13% <100.00%> (-0.51%) ⬇️
Lifecycle 79.46% <ø> (-0.04%) ⬇️
Oplog Populator 85.83% <ø> (ø)
Replication 61.05% <97.10%> (+1.38%) ⬆️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.4    #2743      +/-   ##
===================================================
+ Coverage            74.84%   74.95%   +0.10%     
===================================================
  Files                  199      199              
  Lines                13657    13667      +10     
===================================================
+ Hits                 10222    10244      +22     
+ Misses                3425     3413      -12     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.14% <1.12%> (-0.01%) ⬇️
api:routes 8.96% <0.00%> (-0.01%) ⬇️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 9.09% <0.00%> (-1.25%) ⬇️
ingestion 12.34% <10.11%> (-0.20%) ⬇️
lib 7.76% <0.00%> (-0.02%) ⬇️
lifecycle 18.90% <0.00%> (-0.04%) ⬇️
notification 1.02% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.14% <0.00%> (-0.01%) ⬇️
unit 52.77% <97.75%> (+1.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 27, 2026

  • ReplicateObject.js:310 — Rule matching via Rules.find() does not consider priority, while MongoQueueProcessor sorts by priority descending. If multiple rules target the same site with different Destination.Account values, the derived expectedDestRole may not match the entry's role, causing a spurious BadRole error.
    - tests/unit/lib/models/ObjectQueueEntry.spec.js — File shows as a binary diff in this PR, which typically indicates a BOM or encoding issue. Verify the file is clean UTF-8.

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 36b459a to 98c878f Compare May 28, 2026 15:17
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

  • Overall: well-structured refactor — per-backend roles/destinations are correctly propagated through MongoQueueProcessor, QueueProcessor, and the replication task chain. Legacy fallbacks tested. No critical issues found.
    - extensions/replication/tasks/ReplicateObject.js:270 — Minor: roles.length < 1 is unreachable because String.split(',') always returns an array with at least one element. Could simplify to roles.length > 2.
    - extensions/replication/tasks/MultipleBackendTask.js:59 — Note: entry.getReplicationRoles() is called without this.site, while the parent ReplicateObject._setupRolesOnce was updated to pass this.site. This appears intentional since cloud backends don't carry per-backend roles, but worth a comment or assertion to make the design choice explicit.

    LGTM

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@maeldonn maeldonn requested review from a team, SylvainSenechal and delthas May 29, 2026 08:57
@maeldonn maeldonn marked this pull request as ready for review May 29, 2026 08:57
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

  • package.json:57 — arsenal pinned to raw commit SHA instead of a tag; should use the resolved version tag (8.4.2) for consistency with other git-based deps

    The rest of the PR looks solid: per-backend role/destination plumbing is consistent across MongoQueueProcessor, QueueProcessor, ReplicateObject, and MultipleBackendTask. Legacy fallback paths are well-tested. The deletion of getLocationsFromStorageClass is safe — no remaining callers in production code.

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from d5a834d to 83c75ef Compare June 1, 2026 11:52
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

  • package.json:57 — Arsenal is pinned to a commit hash (39c1a64...) instead of a tag. The yarn.lock resolves to version 8.4.3, so please pin to the tag for consistency with other git-based deps.
    - extensions/replication/tasks/MultipleBackendTask.js:59 — _setupRolesOnce calls getReplicationRoles() without this.site, unlike the parent ReplicateObject._setupRolesOnce which was updated to pass it. This works today because cloud backends don't carry per-backend roles, but consider passing this.site for forward-compatibility with multi-destination CRR on cloud targets.

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

  • extensions/replication/tasks/ReplicateObject.js:298 — matchingRule lookup uses bare rule.Prefix but the replicationEnabled check above uses rule.Filter?.Prefix ?? rule.Prefix ?? ''. V2 rules with Filter.Prefix will never match, silently falling back to the legacy destination role instead of the per-account substituted role.
    - package.json:57 — Arsenal pinned to a commit hash instead of a tag (yarn.lock resolves to 8.4.3).

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 3426739 to 4791b8b Compare June 4, 2026 14:26
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Jun 4, 2026

  • extensions/replication/tasks/ReplicateObject.js:298 — Bug: matchingRule lookup uses rule.Prefix directly but should use rule.Filter?.Prefix ?? rule.Prefix ?? '' like the replicationEnabled check above it. V2 rules with Filter.Prefix will never match, causing the Account-based role resolution to silently fall through to the legacy comparison and reject valid entries with BadRole.
    - package.json:57 — Arsenal pinned to commit hash instead of a tag (resolved version 8.4.3).

    Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 4791b8b to 14ef8f1 Compare June 4, 2026 17:13
Comment thread extensions/replication/queueProcessor/QueueProcessor.js Outdated
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Jun 4, 2026

PR Review Summary

  • QueueProcessor.js:912 — Bug: this.replicationBackends is undefined (never set on the instance). Cloud backends will always use ReplicateObject instead of MultipleBackendTask. Should use the module-level replicationBackends constant imported at line 42.
  • package.json:57 — Arsenal pinned to a bare commit hash instead of a version tag, which violates project conventions for git-based deps.

Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 14ef8f1 to cfdd8fb Compare June 5, 2026 15:23
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Jun 5, 2026

  • package.json:57 — arsenal pinned to commit hash instead of a tag; pin to 8.4.3 tag if available

The rest of the PR looks solid. The per-backend key abstraction (site + destination + role) is well-designed, backward compatibility is properly handled via top-level fallback in the site-aware getters, and the async/await migration in QueueProcessor.processReplicationEntry is clean. Test coverage is thorough, including same-site multi-destination disambiguation, retry round-tripping, and mixed CRR + cloud backend scenarios.

Review by Claude Code

@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from cfdd8fb to 34fba43 Compare June 5, 2026 15:36
Comment thread package.json Outdated
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/arsenal#e4860ebc364db6c3cd5b3f3210339d0fa512ed24",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arsenal is pinned to a commit hash instead of a tag. The yarn.lock shows this resolves to version 8.4.3 — pin to the tag instead for consistency with the project's dependency pinning convention.

— Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Jun 5, 2026

  • package.json:57 — arsenal pinned to commit hash instead of tag; use 8.4.3 per project convention
    - extensions/replication/tasks/MultipleBackendTask.js:59 — getReplicationRoles() called without backend key, unlike the parent class ReplicateObject._setupRolesOnce which now uses entry.getReplicationRoles(entry.getBackendKey()). Safe today because cloud backends don't carry per-backend roles, but inconsistent and could silently break if that changes.

    Review by Claude Code

Read destination bucket and role from each backends[] entry instead of
the shared top-level fields, so a single source object can be replicated
to multiple CRR destinations with their own role. Legacy entries
without per-backend fields keep working via top-level fallback in
ObjectQueueEntry's site-aware getters; MongoQueueProcessor's oplog path
now matches every applicable rule and dedups backends per the design's
(site, destination, role) rule.

Issue: BB-762
@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 34fba43 to 7a21bbc Compare June 5, 2026 16:31
maeldonn added 2 commits June 5, 2026 18:35
Add unit tests covering pre-existing branches in code paths that
BB-762 restructured

Issue: BB-762
@maeldonn maeldonn force-pushed the improvement/BB-762/crr-multi branch from 7a21bbc to 91a58ad Compare June 5, 2026 16:35
Comment thread package.json Outdated
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/arsenal#e4860ebc364db6c3cd5b3f3210339d0fa512ed24",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arsenal is pinned to a commit hash instead of a tag. The resolved version is 8.4.3 — if that tag exists, pin to it for consistency with the other git-based deps (breakbeat, httpagent, etc.).

— Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Jun 5, 2026

  • package.json:57 — arsenal pinned to a commit hash instead of a tag; use the 8.4.3 tag if available

The code changes themselves are solid. The per-backend routing in QueueProcessor, the backendKey-based disambiguation in ObjectQueueEntry, the role-validation expansion in ReplicateObject._setupRolesOnce, and the callbackify bridge for the async migration are all well-structured. Legacy backward compatibility (top-level fallback when backends lack destination/role) is covered by tests. The Kafka message format extension (adding destination/role fields) is additive and safe for rolling upgrades.

Review by Claude Code

Comment thread package.json
const replicationEnabled = data.ReplicationConfiguration.Rules
.some(rule => rule.Status === 'Enabled' &&
entry.getObjectKey().startsWith(rule.Prefix));
entry.getObjectKey().startsWith(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultipleBackendTask._setupRolesOnce still calls entry.getReplicationRoles() without passing the backend key, while the parent ReplicateObject._setupRolesOnce was updated to entry.getReplicationRoles(entry.getBackendKey()). If arsenal's getReplicationRoles(key) resolves a per-backend role when a key is provided, this cloud-backend path will silently use the top-level (legacy) role string instead of the per-backend one, defeating the purpose of this PR for cloud destinations.

— Claude Code

Bucket: sourceEntry.getBucket(),
Key: sourceEntry.getObjectKey(),
StorageType: sourceEntry.getReplicationStorageType(),
StorageType: this._getReplicationEndpointType(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several calls in MultipleBackendTask.processQueueEntry still use this.site (a plain string) for per-backend lookups instead of the key object:\n- Line 1220: refreshedEntry.getReplicationSiteStatus(this.site) should be refreshedEntry.getReplicationSiteStatus(sourceEntry.getBackendKey())\n- Line 1240: sourceEntry.getReplicationSiteDataStoreVersionId(this.site) should also use the backend key\n\nThese were not updated to match the pattern applied in ReplicateObject and UpdateReplicationStatus. If arsenal's methods now support the key object, this will miss per-backend state.

— Claude Code

status: 'PENDING',
backends,
content,
role: ReplicationConfiguration.resolveSourceRole(bucketRepInfo.role),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _updateReplicationInfo call on line 349 sets role: ReplicationConfiguration.resolveSourceRole(bucketRepInfo.role) but doesn't set destination or storageClass in the replication info object. The old code set both destination and storageClass. If these fields are now handled inside ReplicationConfiguration.resolveBackends at the backend level, this is fine — but please confirm the downstream consumers (queue populator, status processor) don't rely on the top-level destination / storageClass fields being present.

— Claude Code

err => err ? reject(err) : resolve(),
);
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequential for...of loop processes backends one at a time. If one backend's taskScheduler.push rejects, the await will throw and subsequent backends will be skipped entirely (the error propagates to callbackify which calls done(err)). This means a transient failure replicating to destination A will prevent replication to destinations B and C for this Kafka entry. Consider whether each backend should be attempted independently (catching errors per-backend) so one failure doesn't block the others.

— Claude Code

toRetryEntry(key) {
const backend = this._findBackend(key);
return this.clone()
.setAccountId(this.getAccountId())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toRetryEntry method now uses this._findBackend(key) but the method _findBackend is not defined on ObjectQueueEntry — it must be on the parent ObjectMD in arsenal. Please confirm this method exists in the arsenal commit being pinned (c3f3bff), since it's not possible to verify from this repo alone.

— Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Jun 5, 2026

  • package.json:57 — Arsenal pinned to a commit hash instead of a tag; should pin to a release tag per project conventions.
    - extensions/replication/tasks/MultipleBackendTask.js:59 — _setupRolesOnce calls entry.getReplicationRoles() without the backend key, inconsistent with the updated ReplicateObject._setupRolesOnce which passes entry.getBackendKey(). May miss per-backend roles for cloud destinations.
    - extensions/replication/tasks/MultipleBackendTask.js:1220,1240 — processQueueEntry still uses this.site (plain string) for getReplicationSiteStatus and getReplicationSiteDataStoreVersionId instead of the backend key object, inconsistent with the other task files.
    - extensions/mongoProcessor/MongoQueueProcessor.js:345-351 — _updateReplicationInfo no longer sets destination, storageClass, or storageType on the replication info object (the old code set all three). Confirm downstream code doesn't rely on these top-level fields.
    - extensions/replication/queueProcessor/QueueProcessor.js:915-934 — Sequential for...of loop over backends means a failure on one backend aborts processing of subsequent backends for the same Kafka entry.
    - lib/models/ObjectQueueEntry.js:243 — toRetryEntry calls this._findBackend(key) which must be defined on the arsenal ObjectMD parent. Not verifiable without the arsenal commit — please confirm it exists.

    Review by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants