feat: add REAPI wire compression support (zstd compressed-blobs)#2416
feat: add REAPI wire compression support (zstd compressed-blobs)#2416walter-zeromatter wants to merge 16 commits into
Conversation
|
@walter-zeromatter is attempting to deploy a commit to the native-link-web-assets Team on Vercel. A member of the Team first needs to authorize it. |
fe4a126 to
f49a21b
Compare
There was a problem hiding this comment.
This should not be part of the repository, please remove
| @@ -0,0 +1,183 @@ | |||
| // Copyright 2024 The NativeLink Authors. All rights reserved. | |||
| @@ -0,0 +1,342 @@ | |||
| // Copyright 2024 The NativeLink Authors. All rights reserved. | |||
|
Thanks for the early review! I'm still playing with this a bit so yeah, definitely not ready, but I'll get those fixed up. |
- Remove .hermes/plans/2026-06-09_reapi-wire-compression.md from repo - Add missing rust_binary import in nativelink-service/BUILD.bazel - Fix copyright year 2024→2026 in wire_compression.rs and bench - Apply flake.nix hardeningDisable patch from PR TraceMachina#2175 for coverage - Run cargo fmt on all changed files - Fix clippy: redundant_closure, items_after_statements, cast coercion
- Remove .hermes/plans/2026-06-09_reapi-wire-compression.md from repo - Add missing rust_binary import in nativelink-service/BUILD.bazel - Fix copyright year 2024→2026 in wire_compression.rs and bench - Apply flake.nix hardeningDisable patch from PR TraceMachina#2175 for coverage - Run cargo fmt on all changed files - Fix clippy: redundant_closure, items_after_statements, cast coercion
8f7c0ae to
364d343
Compare
- Sort zstd dependency in nativelink-service/Cargo.toml (pre-commit check) - Add @crates//:zstd to integration test suite deps in BUILD.bazel - Fix redundant_closure_for_method_calls in nativelink.rs (use as_deref) - Fix clippy violations in wire_compression_bench.rs (doc_markdown, cast_possible_truncation, print_stdout) - Fix cast_possible_truncation in cas_server_test.rs (use try_from)
There was a problem hiding this comment.
This shouldn't be in our repo, only as a local file at most
There was a problem hiding this comment.
Similarly, this also shouldn't be in our repo
palfrey
left a comment
There was a problem hiding this comment.
Couple of concerning unrelated items creeping in for some reason?
|
|
||
| impl R2Store { | ||
| #[allow(clippy::new_ret_no_self)] // Because usually everyone returns themselves | ||
| #[allow(clippy::new_ret_no_self)] // Returns a pinned future for async construction. |
There was a problem hiding this comment.
still 100% vibe coded - I'm still iterating on agent reviews & implementation before it's ready for real human review I think
| // not part of the API contract; collect-into-Vec callers | ||
| // already ignore order. | ||
| .buffer_unordered(usize::MAX), | ||
| .buffer_unordered(16), |
- Remove .rtk/filters.toml and CLAUDE.md from repo (local-only files, added to .gitignore) - Revert unrelated scheduler/origin_event refactoring changes that crept into the PR (RunningActionTelemetry, WorkerUpdate struct variant, origin_metadata_from_baggage helper, historical_resource async refactor) - Revert r2_store.rs comment to original (the new comment was factually incorrect about pinned futures) - Revert health_utils.rs buffer_unordered(16) back to usize::MAX (unrelated change with no justification) - Remove unrelated cspell dictionary entries (gh, npm, npx, etc.); keep only zstd
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Implements the REAPI compressed-blobs specification, allowing clients to upload and download zstd-compressed blob data over the gRPC wire. This is orthogonal to at-rest CompressionStore (LZ4) and operates entirely in the service layer -- the store always holds uncompressed data. Changes: - Add WireCompressor enum and supported_wire_compressors config field to CapabilitiesConfig (default: empty = no compression = zero behavior change) - Add wire_compression module with compress/decompress helpers using zstd::bulk API with expected_size cap to prevent memory exhaustion from malicious payloads - Add compressed-blobs support to ByteStreamServer (read + write paths) with buffer size cap (2x expected, min 64MB) on compressed write accumulation - Add compressed-blobs support to CasServer (BatchUpdateBlobs + BatchReadBlobs) with warn! logging on compression fallback - Add per-instance compressor advertisement in CapabilitiesServer via supported_wire_compressors_for_instance HashMap - Wire config through nativelink.rs to both server constructors using CapabilitiesConfig per-instance lookup - Add zstd integration tests for CAS batch update/read round-trip - Reject unknown compressor enum values in BatchUpdateBlobs (was silently defaulting to Identity); skip unknown values in BatchReadBlobs acceptable_compressors - Add Identity size validation in wire_compression::decompress() - Add ZSTD_COMPRESSION_LEVEL named constant (3) with documentation Security considerations: - Decompression capped by expected_size (digest size_bytes) to prevent memory bombs - Compressed write buffer capped at 2x expected size (min 64MB) - Unknown compressor values rejected rather than silently accepted - Defense-in-depth size validation preserved and documented Refs: DEVPROD-483
Review fixes: - BatchReadBlobs falls back to Identity when compression expands data - Compressed ByteStream writes register in active_uploads so QueryWriteStatus reports progress and in-flight status - Add MAX_COMPRESSED_UPLOAD_SIZE = 4 GiB hard cap on compressed uploads to prevent memory exhaustion from oversized expected_size - Use clamp() for compressed buffer size bounds CI fixes: - Run rustfmt with nightly settings (imports_granularity, group_imports) - Add wire_compression.rs and zstd dep to Bazel BUILD.bazel - Add wire_compression_bench binary target to BUILD.bazel - Fix let_underscore_drop warning in bench binary (let _warmup) - Include benchmark binary in commit
- Remove .hermes/plans/2026-06-09_reapi-wire-compression.md from repo - Add missing rust_binary import in nativelink-service/BUILD.bazel - Fix copyright year 2024→2026 in wire_compression.rs and bench - Apply flake.nix hardeningDisable patch from PR TraceMachina#2175 for coverage - Run cargo fmt on all changed files - Fix clippy: redundant_closure, items_after_statements, cast coercion
- Sort zstd dependency in nativelink-service/Cargo.toml (pre-commit check) - Add @crates//:zstd to integration test suite deps in BUILD.bazel - Fix redundant_closure_for_method_calls in nativelink.rs (use as_deref) - Fix clippy violations in wire_compression_bench.rs (doc_markdown, cast_possible_truncation, print_stdout) - Fix cast_possible_truncation in cas_server_test.rs (use try_from)
Performance: - Move zstd compress/decompress to spawn_blocking in bytestream_server and cas_server to avoid blocking async executor threads - Add compress_bytes() for zero-copy identity compression when caller already owns Bytes; compress() now calls zstd directly without intermediate copy for zstd path - Short-circuit identity compression before spawn_blocking in bytestream_server to avoid unnecessary executor hop - Cap health_utils buffer_unordered at 16 (was usize::MAX) - Replace full ActionInfoWithProps clone with lightweight RunningActionTelemetry struct in api_worker_scheduler Code quality: - Add origin_metadata_from_baggage() helper in origin_event.rs to deduplicate OriginMetadata construction across awaited_action.rs, cache_lookup_scheduler.rs, and historical_resource_scheduler.rs - Fix context snapshot mismatch: both call sites now read from the same captured baggage instead of mixing captured baggage with Context::current() - Make refresh_hints() single-flight: set last_attempt under lock before async file read to prevent concurrent reads; throttle failures via last_attempt timestamp - Replace std::fs with tokio::fs in historical_resource_scheduler - Replace Error::new with make_err! for project consistency - Use shared proto_to_wire_compressor in cas_server instead of inline match - Deduplicate supported_compressors construction in capabilities_server - Use named struct fields for WorkerUpdate::RunAction instead of Box<(tuple)> - Use production wire_compression helpers in bench instead of duplicate ZSTD_COMPRESSION_LEVEL constant - Revert set_freebind to Ok(()) on non-Linux (was changed to Err which breaks macOS startup)
- Remove .rtk/filters.toml and CLAUDE.md from repo (local-only files, added to .gitignore) - Revert unrelated scheduler/origin_event refactoring changes that crept into the PR (RunningActionTelemetry, WorkerUpdate struct variant, origin_metadata_from_baggage helper, historical_resource async refactor) - Revert r2_store.rs comment to original (the new comment was factually incorrect about pinned futures) - Revert health_utils.rs buffer_unordered(16) back to usize::MAX (unrelated change with no justification) - Remove unrelated cspell dictionary entries (gh, npm, npx, etc.); keep only zstd
9d12d9b to
978d20b
Compare
|
OK! Did a big chunk more work & reviewed. It's in a more-or less presentable state. I'm about to take a short break, but I'll address any feedback promptly when I get back. One open question I have is if this should be a toggle-able thing on the server side at all. Clients can always just not use --enable_remote_compression, so I'm not sure if there's real value in adding this as a config option. Also worth noting is that I'm planning a followp PR which adds zstd compression as an option on the storage side so that the server can just do direct read-through when clients have compression enabled |
Summary
Implements REAPI
compressed-blobs/zstdwire compression for NativeLink's remote cache paths. Bazel clients use this through--remote_cache_compression; NativeLink keeps the feature default-off and enables it per instance withCapabilitiesConfig.remote_cache_compression.Compression is handled at the service boundary. Stores continue to receive and return raw, uncompressed blob bytes; zstd is only a gRPC wire format concern.
Closes #260.
Current Design
remote_cache_compression: falseis the default; disabled instances do not advertise zstd and rejectcompressed-blobs/zstdrequests.RemoteCacheCompressionInstancesis derived from capabilities config and shared by capabilities, CAS, and ByteStream wiring so advertisement and acceptance stay aligned.compressed-blobs/{compressor}/.../{size}uses the uncompressed digest size. ByteStream compressed writes report compressed wire-bytecommitted_size.Changes
Config + Wiring
CapabilitiesConfig.remote_cache_compression.src/bin/nativelink.rs,CapabilitiesServer,CasServer, andByteStreamServer.CacheCapabilities.supported_compressorsandsupported_batch_update_compressorsonly for enabled instances.CAS
BatchUpdateBlobswhen the instance has remote cache compression enabled.BatchReadBlobswhen requested by the client and enabled for the instance.BatchUpdateBlobszero-copy after validating the expected size.ByteStream
compressed-blobs/zstd/...reads and writes.read_limitis rejected.WriteResponse.committed_sizein compressed wire bytes.Shared Utilities + Tests
nativelink-service/src/wire_compression.rsfor URI compressor resolution, per-instance enablement, zstd helpers, streaming encode/decode, and CAS batch helpers.BufChannelReaderand a blocking send path for zstd streaming adapters.proto_stream_utilsso identity uploads keep strict size checks while compressed uploads can differ between wire bytes and uncompressed digest size.Security / Correctness
Validation
Latest local validation on this branch:
bazel test //nativelink-service/... --jobs=1Additional targeted validation run during the refactor:
cargo test -p nativelink-service --lib wire_compression --jobs 1cargo test -p nativelink-service --test cas_server_test batch_update_blobs_zstd --jobs 1cargo test -p nativelink-service --test cas_server_test batch_read_blobs_zstd --jobs 1cargo test -p nativelink-service --test bytestream_server_test zstd_write --jobs 1cargo test -p nativelink-service --test bytestream_server_test zstd_read --jobs 1cargo check --bin nativelink --jobs 1Local note: the exact CI pre-commit command is
nix flake check, but this environment does not havenixinstalled. The service Bazel run above exercises the Rustfmt/Clippy/Test surface that caught the current pre-commit failure.Known Follow-Ups / Not Claimed Here
QueryWriteStatusfor acompressed-blobs/zstd/...resource still reports the stored uncompressed size once the upload is complete; active compressed uploads report compressed wire-byte progress.GrpcStoreshortcut paths still pass compressed resource names through to the backend as proxy semantics.wire_compression_benchbinary has been removed from the branch.@llvm-project//llvm:llvm-cxxfiltidentity-vs-zstd release-server smoke was not rerun after the latest refactor. That remains the final end-to-end merge gate if we want fresh Bazel remote-cache confirmation on the current branch state.This change is