CLDSRV-908: CopyObject handle checksums#6176
Conversation
Hello leif-scality,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
|
LGTM — clean implementation of checksum propagation and recompute on CopyObject. Stream handling (jsutil.once guards, Azure per-part passthrough, error propagation) and the _shouldRecomputeChecksum decision logic are solid. One minor test style issue flagged inline. |
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 2 files with indirect coverage changes @@ Coverage Diff @@
## development/9.4 #6176 +/- ##
===================================================
+ Coverage 85.25% 85.32% +0.06%
===================================================
Files 208 208
Lines 13919 14048 +129
===================================================
+ Hits 11867 11986 +119
- Misses 2052 2062 +10
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
5ca489a to
57d9894
Compare
|
LGTM |
57d9894 to
5bd2262
Compare
|
5bd2262 to
8b2b196
Compare
|
LGTM |
8b2b196 to
720f686
Compare
|
LGTM |
| if (err) { | ||
| done(err); |
There was a problem hiding this comment.
When this error happens, it could be any error and the perPart is still piped.
So it could theoretically receive data from azure and continue to stream during a short time window until the final callback calls passthrough.destroy and unpipes the source perPart.
Maybe the perPart should be destroyed here to stop immediately the streaming
There was a problem hiding this comment.
This would make every thing more complex for little gain, lets just let error propagate naturally
| // into the master passthrough and use its 'end' as the completion | ||
| // signal — same pattern arsenal's data.copyObject uses. | ||
| const perPart = new PassThrough(); | ||
| perPart.once('error', done); |
There was a problem hiding this comment.
Should we consider encapsulating the error to include some part details in the error for better troubleshooting if the error is ever logged somewhere else in the streaming path.
There was a problem hiding this comment.
added a commit improving theerrors
| // and masterKeyId stored properly in metadata | ||
| if (sourceIsDestination && storeMetadataParams.locationMatch | ||
| && !isVersionedObj && !needsEncryption) { | ||
| && !isVersionedObj && !needsEncryption && !shouldRecomputeChecksum) { |
There was a problem hiding this comment.
If you don't skip, you'll trigger a data GET + PUT + DELETE (for previous location).
This only for checksum recomputation. Should you rather define another path where if it's only recompute checksum, you allow only the GET to stream into the ChecksumTransform stream and then you discard the end of the data stream, and avoid having to do a data PUT + DELETE
There was a problem hiding this comment.
Fixed I now Get and recompute but dont PUT and DELETE. I also removed recompute for external backend cases, so if the data does not go through cloudserver we dont GET, and the dest object receives no checksum if a recompute was required
720f686 to
6385c3e
Compare
|
Well-structured PR with thorough test coverage across all checksum scenarios (propagation, recompute, 0-byte, COMPOSITE, multi-part, Azure, copy-to-self, external backends). |
6385c3e to
8ede498
Compare
|
8ede498 to
df2187c
Compare
|
LGTM — solid implementation of all 9 checksum cases from the PR matrix. The propagation/recompute/default logic is clean, the |
df2187c to
8da10f6
Compare
|
LGTM |
8da10f6 to
20b6ba3
Compare
|
LGTM |
20b6ba3 to
4c36bc2
Compare
|
LGTM |
Copy-to-self reuses the existing data location, so the bytes never move. On an external backend a checksum recompute would mean a GET purely to hash. Skip it for any external backend (constants.externalBackends, incl. pfs/tlp which have no native server-side copy); local backends still recompute in place. The byte-moving path keeps using externalBackendCopy.
4c36bc2 to
22f8acf
Compare
|
LGTM — well-structured implementation with thorough test coverage across all the checksum scenarios (propagation, recompute, 0-byte, multi-part, Azure, copy-to-self, external backends, orphan cleanup, legacy string locations). |
| it('should lowercase mixed-case input', () => { | ||
| const result = getCopyObjectChecksumAlgorithm({ | ||
| 'x-amz-checksum-algorithm': 'Sha256', | ||
| }); | ||
| assert.strictEqual(result.error, null); | ||
| assert.strictEqual(result.algorithm, 'sha256'); | ||
| }); | ||
|
|
||
| it('should accept algorithms already in lowercase', () => { | ||
| const result = getCopyObjectChecksumAlgorithm({ | ||
| 'x-amz-checksum-algorithm': 'crc32', | ||
| }); | ||
| assert.strictEqual(result.error, null); | ||
| assert.strictEqual(result.algorithm, 'crc32'); | ||
| }); |
There was a problem hiding this comment.
you could include these 2 inside the validAlgorithms array above to run in the loop, the test body is the same
| it('should produce an arsenal InvalidRequest error via the standard mapping', () => { | ||
| const result = getCopyObjectChecksumAlgorithm({ | ||
| 'x-amz-checksum-algorithm': 'GARBAGE', | ||
| }); | ||
| const err = arsenalErrorFromChecksumError(result.error); | ||
| assert.strictEqual(err.is.InvalidRequest, true); | ||
| assert(err.description.includes('Checksum algorithm provided is unsupported'), | ||
| 'expected AWS-shaped error description'); | ||
| }); |
There was a problem hiding this comment.
This doesn't seem to be related to getCopyObjectChecksumAlgorithm but rather arsenalErrorFromChecksumError maybe you should have another describe separate for that function
GET+PUT+DELETEGET+PUT+DELETEGET+PUT+DELETECopyObject)beginCopyFromURL/CopyBlob)copyObject)CopyObject/…, server-side cross-region)CopyObject/…) — recompute skipped (was GET + PUT + DELETE before the fix)GET+PUT+DELETEGET+PUT+DELETEGETonly, to hash)CopyObject/…) → new slotGET+PUT+DELETE(incl.data.batchDeleteof the old slot)PUT(data.put(null))