Skip to content

Split chunked upload phases#165

Open
TorstenDittmann wants to merge 3 commits intomainfrom
feat-phased-chunk-upload-api
Open

Split chunked upload phases#165
TorstenDittmann wants to merge 3 commits intomainfrom
feat-phased-chunk-upload-api

Conversation

@TorstenDittmann
Copy link
Copy Markdown
Contributor

Summary

  • Preserve existing parallel chunk upload fixes and add public prepare, uploadChunk, and finalize phases to storage devices and telemetry.
  • Split Local and S3 chunk handling so chunk transfer does not finalize uploads.
  • Add focused Local and mocked S3 tests for phased upload behavior.

Tests

  • composer lint
  • ./vendor/bin/phpunit tests/Storage/Device/S3SlowDownTest.php
  • ./vendor/bin/phpunit tests/Storage/Device/LocalTest.php --filter 'testUploadChunkDoesNotFinalizeUntilFinalizeUpload|testFinalizeUploadRequiresAllLocalChunks|testPartUpload|testPartUploadRetry|testAbort|testPartRead'

Notes

  • Full unit suite still has existing environment/fixture failures: Local transfer tests require S3 credentials, and testGetFiles collides with local directory state.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR splits the monolithic upload() method into three explicit phases — prepareUpload, uploadChunk, and finalizeUpload — for both Local and S3 adapters, adds matching Telemetry delegation, and covers the new surface with focused unit tests.

  • Local: upload() and uploadData() are refactored to call the three new methods internally; move_uploaded_file now falls back to rename for CLI/test usage. uploadChunk self-initializes via an internal prepareUpload call, creating a telemetry bypass asymmetry with S3.
  • S3: uploadData no longer auto-creates or auto-completes multipart uploads. The ksort fix to use SORT_NUMERIC correctly addresses part ordering. finalizeUpload now issues an extra exists() HEAD request even for single-chunk uploads, a regression vs. the previous direct-write path.
  • Tests: new LocalTest and S3SlowDownTest cases verify phased behaviour with mocked S3 calls and real filesystem operations.

Confidence Score: 4/5

Safe to merge for multi-chunk uploads; single-chunk S3 uploads gain an extra network round-trip and the Local adapter has a minor telemetry gap.

The multi-chunk and phased-upload paths are correct and well-tested. The extra S3 exists() HEAD call on every single-chunk upload is a regression from the original direct-write return, and Local::uploadChunk calling prepareUpload internally bypasses Telemetry measurement. Neither breaks correctness but both affect observable behaviour.

src/Storage/Device/S3.php and src/Storage/Device/Local.php warrant a second look around the finalizeUpload early-return path and the internal prepareUpload self-call respectively.

Important Files Changed

Filename Overview
src/Storage/Device/Device.php Adds three new abstract methods (prepareUpload, uploadChunk, finalizeUpload) to the base class; clean interface extension with no logic changes.
src/Storage/Device/Local.php Splits chunked upload into prepare/chunk/finalize phases; uploadChunk self-calls prepareUpload internally, bypassing Telemetry measurement and causing double-init via upload().
src/Storage/Device/S3.php Introduces phased S3 upload; finalizeUpload issues an extra exists() HEAD request for single-chunk uploads. ksort SORT_NUMERIC fix is correct.
src/Storage/Device/Telemetry.php Adds prepareUpload, uploadChunk, and finalizeUpload delegations via measure(); consistent with existing pattern.
tests/Storage/Device/LocalTest.php Adds focused tests for out-of-order chunk upload and missing-chunk exception at finalize.
tests/Storage/Device/S3SlowDownTest.php Extends mock-based S3 tests with phased upload coverage including numeric part ordering.

Reviews (1): Last reviewed commit: "merge main into phased upload branch" | Re-trigger Greptile

Comment thread src/Storage/Device/S3.php
Comment on lines +204 to 228
public function finalizeUpload(string $path, int $chunks = 1, array &$metadata = []): bool
{
if ($this->exists($path)) {
return true;
}

if ($chunks === 1) {
return false;
}

if (empty($metadata['uploadId'])) {
throw new Exception('Missing multipart upload ID');
}

$metadata['parts'] ??= [];
for ($i = 1; $i <= $chunks; $i++) {
if (! array_key_exists($i, $metadata['parts'])) {
throw new Exception('Missing chunk '.$i);
}
}

$this->completeMultipartUpload($path, $metadata['uploadId'], $metadata['parts']);

return true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Extra HEAD request on every single-chunk S3 upload

upload() and uploadData() now call finalizeUpload() whenever $chunks === $chunksReceived, including for single-chunk uploads (where $chunksReceived === 1 === $chunks). For the single-chunk path, write() already completed the PUT, so the $this->exists($path) call at the top of finalizeUpload issues an extra HEAD (s3:info) request before returning true. The original code returned immediately after write() with no follow-up network call. Every single-file S3 upload now incurs an extra API request.

Comment on lines +76 to 79
public function uploadChunk(string $source, string $path, int $chunk = 1, int $chunks = 1, array &$metadata = []): int
{
$this->prepareUpload($path, '', $chunks, $metadata);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 uploadChunk self-calls prepareUpload, bypassing Telemetry wrapper

Local::uploadChunk calls $this->prepareUpload(...) directly on the Local instance (line 78). When a Telemetry device wraps Local and a caller invokes Telemetry::uploadChunk, the delegated Local::uploadChunk triggers Local::prepareUpload internally — never going through Telemetry::prepareUpload. That internal invocation is therefore not measured or recorded. Additionally, when upload() is the caller (which already called prepareUpload on the way in), the metadata is initialized twice, which works only because ??= is idempotent. S3::uploadChunk has no such internal call, making the two adapters asymmetric in their self-initialization behaviour.

Comment thread src/Storage/Device/S3.php
Comment on lines 242 to +252
public function uploadData(string $data, string $path, string $contentType, int $chunk = 1, int $chunks = 1, array &$metadata = []): int
{
$this->prepareUpload($path, $contentType, $chunks, $metadata);
$chunksReceived = $this->uploadChunkData($data, $path, $contentType, $chunk, $chunks, $metadata);

if ($chunks === $chunksReceived) {
$this->finalizeUpload($path, $chunks, $metadata);
}

return $chunksReceived;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 finalizeUpload return value is silently discarded

Both upload() (line 175) and uploadData() (line 248) call finalizeUpload() but never check or propagate its return value. finalizeUpload returns false for the single-chunk case when the file does not exist at finalization time. In that scenario the methods still return $chunksReceived, so the caller observes an apparent success while the object is absent. The same pattern exists in Local::upload() and Local::uploadData(). At minimum, a false return should throw or be propagated so callers can react.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant