Skip to content

feat: add REAPI wire compression support (zstd compressed-blobs)#2416

Open
walter-zeromatter wants to merge 16 commits into
TraceMachina:mainfrom
Reactor-Inc:user/wgray/zstd-cache
Open

feat: add REAPI wire compression support (zstd compressed-blobs)#2416
walter-zeromatter wants to merge 16 commits into
TraceMachina:mainfrom
Reactor-Inc:user/wgray/zstd-cache

Conversation

@walter-zeromatter

@walter-zeromatter walter-zeromatter commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Implements REAPI compressed-blobs/zstd wire compression for NativeLink's remote cache paths. Bazel clients use this through --remote_cache_compression; NativeLink keeps the feature default-off and enables it per instance with CapabilitiesConfig.remote_cache_compression.

Compression is handled at the service boundary. Stores continue to receive and return raw, uncompressed blob bytes; zstd is only a gRPC wire format concern.

Closes #260.

Current Design

Decision Current behavior
zstd only The only advertised and accepted non-identity compressor is zstd, matching Bazel's remote-cache compression path.
Default off remote_cache_compression: false is the default; disabled instances do not advertise zstd and reject compressed-blobs/zstd requests.
Per-instance enablement RemoteCacheCompressionInstances is derived from capabilities config and shared by capabilities, CAS, and ByteStream wiring so advertisement and acceptance stay aligned.
Service-layer compression ByteStream and CAS encode/decode at the API boundary; store contents remain identity/raw.
Uncompressed digest size compressed-blobs/{compressor}/.../{size} uses the uncompressed digest size. ByteStream compressed writes report compressed wire-byte committed_size.

Changes

Config + Wiring

  • Adds CapabilitiesConfig.remote_cache_compression.
  • Threads per-instance compression enablement through src/bin/nativelink.rs, CapabilitiesServer, CasServer, and ByteStreamServer.
  • Advertises zstd through CacheCapabilities.supported_compressors and supported_batch_update_compressors only for enabled instances.

CAS

  • Supports zstd BatchUpdateBlobs when the instance has remote cache compression enabled.
  • Supports zstd BatchReadBlobs when requested by the client and enabled for the instance.
  • Falls back to identity for batch reads when zstd output is not smaller.
  • Keeps identity BatchUpdateBlobs zero-copy after validating the expected size.

ByteStream

  • Supports compressed-blobs/zstd/... reads and writes.
  • Applies compressed read offsets to the uncompressed blob range before zstd encoding; nonzero compressed read_limit is rejected.
  • Streams compressed reads through a blocking zstd encoder instead of bulk-compressing whole blobs.
  • Streams compressed writes through a blocking zstd decoder into the store update path.
  • Validates compressed write offsets against compressed wire bytes.
  • Tracks active compressed upload progress and final WriteResponse.committed_size in compressed wire bytes.
  • Hashes decoded bytes and rejects decompressed digest/size mismatches.
  • Merges compressed write client/decode/store errors so store failures are not hidden by channel teardown errors.

Shared Utilities + Tests

  • Adds nativelink-service/src/wire_compression.rs for URI compressor resolution, per-instance enablement, zstd helpers, streaming encode/decode, and CAS batch helpers.
  • Adds BufChannelReader and a blocking send path for zstd streaming adapters.
  • Updates proto_stream_utils so identity uploads keep strict size checks while compressed uploads can differ between wire bytes and uncompressed digest size.
  • Adds integration coverage for capabilities advertisement, CAS zstd batch read/write, ByteStream zstd read/write, disabled-instance rejection, digest mismatch rejection, offset behavior, chunking, and identity compressed-blob paths.

Security / Correctness

Concern Handling
Decompression expansion zstd decode is bounded by the uncompressed digest size and rejects decoded output beyond that size.
Digest spoofing ByteStream compressed writes hash decoded bytes and reject digest mismatches.
Unknown/disabled compressors Unknown compressors are rejected; zstd is rejected unless the instance enables remote cache compression.
Identity size mismatch Identity paths validate exact expected size.
CAS read overhead zstd batch reads are returned only when smaller than identity.
Store error visibility Compressed write error merging keeps store failures in the returned error chain.

Validation

Latest local validation on this branch:

  • bazel test //nativelink-service/... --jobs=1

Additional targeted validation run during the refactor:

  • cargo test -p nativelink-service --lib wire_compression --jobs 1
  • cargo test -p nativelink-service --test cas_server_test batch_update_blobs_zstd --jobs 1
  • cargo test -p nativelink-service --test cas_server_test batch_read_blobs_zstd --jobs 1
  • cargo test -p nativelink-service --test bytestream_server_test zstd_write --jobs 1
  • cargo test -p nativelink-service --test bytestream_server_test zstd_read --jobs 1
  • cargo check --bin nativelink --jobs 1

Local note: the exact CI pre-commit command is nix flake check, but this environment does not have nix installed. The service Bazel run above exercises the Rustfmt/Clippy/Test surface that caught the current pre-commit failure.

Known Follow-Ups / Not Claimed Here

  • Completed QueryWriteStatus for a compressed-blobs/zstd/... resource still reports the stored uncompressed size once the upload is complete; active compressed uploads report compressed wire-byte progress.
  • GrpcStore shortcut paths still pass compressed resource names through to the backend as proxy semantics.
  • The standalone wire_compression_bench binary has been removed from the branch.
  • The LLVM @llvm-project//llvm:llvm-cxxfilt identity-vs-zstd release-server smoke was not rerun after the latest refactor. That remains the final end-to-end merge gate if we want fresh Bazel remote-cache confirmation on the current branch state.

This change is Reviewable

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@walter-zeromatter is attempting to deploy a commit to the native-link-web-assets Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant

CLAassistant commented Jun 10, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@palfrey palfrey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need the flake.nix patch from #2175 to get coverage working (because of musl and fortify fun), and rustfmt needs running to fix various other build issues.

OTOH, looking good so far, interesting addition!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be part of the repository, please remove

Comment thread nativelink-service/BUILD.bazel Outdated
@@ -0,0 +1,183 @@
// Copyright 2024 The NativeLink Authors. All rights reserved.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 2026

@@ -0,0 +1,342 @@
// Copyright 2024 The NativeLink Authors. All rights reserved.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 2026

@walter-zeromatter

Copy link
Copy Markdown
Author

Thanks for the early review! I'm still playing with this a bit so yeah, definitely not ready, but I'll get those fixed up.

walter-zeromatter added a commit to Reactor-Inc/nativelink that referenced this pull request Jun 11, 2026
- Remove .hermes/plans/2026-06-09_reapi-wire-compression.md from repo
- Add missing rust_binary import in nativelink-service/BUILD.bazel
- Fix copyright year 2024→2026 in wire_compression.rs and bench
- Apply flake.nix hardeningDisable patch from PR TraceMachina#2175 for coverage
- Run cargo fmt on all changed files
- Fix clippy: redundant_closure, items_after_statements, cast coercion
walter-zeromatter added a commit to Reactor-Inc/nativelink that referenced this pull request Jun 11, 2026
- Remove .hermes/plans/2026-06-09_reapi-wire-compression.md from repo
- Add missing rust_binary import in nativelink-service/BUILD.bazel
- Fix copyright year 2024→2026 in wire_compression.rs and bench
- Apply flake.nix hardeningDisable patch from PR TraceMachina#2175 for coverage
- Run cargo fmt on all changed files
- Fix clippy: redundant_closure, items_after_statements, cast coercion
walter-zeromatter added a commit to Reactor-Inc/nativelink that referenced this pull request Jun 18, 2026
- Sort zstd dependency in nativelink-service/Cargo.toml (pre-commit check)
- Add @crates//:zstd to integration test suite deps in BUILD.bazel
- Fix redundant_closure_for_method_calls in nativelink.rs (use as_deref)
- Fix clippy violations in wire_compression_bench.rs (doc_markdown,
  cast_possible_truncation, print_stdout)
- Fix cast_possible_truncation in cas_server_test.rs (use try_from)
Comment thread .rtk/filters.toml Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be in our repo, only as a local file at most

Comment thread CLAUDE.md Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this also shouldn't be in our repo

@palfrey palfrey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of concerning unrelated items creeping in for some reason?

Comment thread nativelink-store/src/r2_store.rs Outdated

impl R2Store {
#[allow(clippy::new_ret_no_self)] // Because usually everyone returns themselves
#[allow(clippy::new_ret_no_self)] // Returns a pinned future for async construction.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still 100% vibe coded - I'm still iterating on agent reviews & implementation before it's ready for real human review I think

Comment thread nativelink-util/src/health_utils.rs Outdated
// not part of the API contract; collect-into-Vec callers
// already ignore order.
.buffer_unordered(usize::MAX),
.buffer_unordered(16),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

walter-zeromatter added a commit to Reactor-Inc/nativelink that referenced this pull request Jun 20, 2026
- Remove .rtk/filters.toml and CLAUDE.md from repo (local-only files,
  added to .gitignore)
- Revert unrelated scheduler/origin_event refactoring changes that
  crept into the PR (RunningActionTelemetry, WorkerUpdate struct
  variant, origin_metadata_from_baggage helper, historical_resource
  async refactor)
- Revert r2_store.rs comment to original (the new comment was
  factually incorrect about pinned futures)
- Revert health_utils.rs buffer_unordered(16) back to usize::MAX
  (unrelated change with no justification)
- Remove unrelated cspell dictionary entries (gh, npm, npx, etc.);
  keep only zstd
@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nativelink Ready Ready Preview, Comment Jul 3, 2026 2:39am
nativelink-aidm Ready Ready Preview, Comment Jul 3, 2026 2:39am

Request Review

Implements the REAPI compressed-blobs specification, allowing clients
to upload and download zstd-compressed blob data over the gRPC wire.
This is orthogonal to at-rest CompressionStore (LZ4) and operates
entirely in the service layer -- the store always holds uncompressed
data.

Changes:
- Add WireCompressor enum and supported_wire_compressors config field
  to CapabilitiesConfig (default: empty = no compression = zero
  behavior change)
- Add wire_compression module with compress/decompress helpers using
  zstd::bulk API with expected_size cap to prevent memory exhaustion
  from malicious payloads
- Add compressed-blobs support to ByteStreamServer (read + write paths)
  with buffer size cap (2x expected, min 64MB) on compressed write
  accumulation
- Add compressed-blobs support to CasServer (BatchUpdateBlobs +
  BatchReadBlobs) with warn! logging on compression fallback
- Add per-instance compressor advertisement in CapabilitiesServer via
  supported_wire_compressors_for_instance HashMap
- Wire config through nativelink.rs to both server constructors using
  CapabilitiesConfig per-instance lookup
- Add zstd integration tests for CAS batch update/read round-trip
- Reject unknown compressor enum values in BatchUpdateBlobs (was
  silently defaulting to Identity); skip unknown values in
  BatchReadBlobs acceptable_compressors
- Add Identity size validation in wire_compression::decompress()
- Add ZSTD_COMPRESSION_LEVEL named constant (3) with documentation

Security considerations:
- Decompression capped by expected_size (digest size_bytes) to prevent
  memory bombs
- Compressed write buffer capped at 2x expected size (min 64MB)
- Unknown compressor values rejected rather than silently accepted
- Defense-in-depth size validation preserved and documented

Refs: DEVPROD-483
Review fixes:
- BatchReadBlobs falls back to Identity when compression expands data
- Compressed ByteStream writes register in active_uploads so
  QueryWriteStatus reports progress and in-flight status
- Add MAX_COMPRESSED_UPLOAD_SIZE = 4 GiB hard cap on compressed uploads
  to prevent memory exhaustion from oversized expected_size
- Use clamp() for compressed buffer size bounds

CI fixes:
- Run rustfmt with nightly settings (imports_granularity, group_imports)
- Add wire_compression.rs and zstd dep to Bazel BUILD.bazel
- Add wire_compression_bench binary target to BUILD.bazel
- Fix let_underscore_drop warning in bench binary (let _warmup)
- Include benchmark binary in commit
- Remove .hermes/plans/2026-06-09_reapi-wire-compression.md from repo
- Add missing rust_binary import in nativelink-service/BUILD.bazel
- Fix copyright year 2024→2026 in wire_compression.rs and bench
- Apply flake.nix hardeningDisable patch from PR TraceMachina#2175 for coverage
- Run cargo fmt on all changed files
- Fix clippy: redundant_closure, items_after_statements, cast coercion
- Sort zstd dependency in nativelink-service/Cargo.toml (pre-commit check)
- Add @crates//:zstd to integration test suite deps in BUILD.bazel
- Fix redundant_closure_for_method_calls in nativelink.rs (use as_deref)
- Fix clippy violations in wire_compression_bench.rs (doc_markdown,
  cast_possible_truncation, print_stdout)
- Fix cast_possible_truncation in cas_server_test.rs (use try_from)
Performance:
- Move zstd compress/decompress to spawn_blocking in bytestream_server
  and cas_server to avoid blocking async executor threads
- Add compress_bytes() for zero-copy identity compression when caller
  already owns Bytes; compress() now calls zstd directly without
  intermediate copy for zstd path
- Short-circuit identity compression before spawn_blocking in
  bytestream_server to avoid unnecessary executor hop
- Cap health_utils buffer_unordered at 16 (was usize::MAX)
- Replace full ActionInfoWithProps clone with lightweight
  RunningActionTelemetry struct in api_worker_scheduler

Code quality:
- Add origin_metadata_from_baggage() helper in origin_event.rs to
  deduplicate OriginMetadata construction across awaited_action.rs,
  cache_lookup_scheduler.rs, and historical_resource_scheduler.rs
- Fix context snapshot mismatch: both call sites now read from the
  same captured baggage instead of mixing captured baggage with
  Context::current()
- Make refresh_hints() single-flight: set last_attempt under lock
  before async file read to prevent concurrent reads; throttle
  failures via last_attempt timestamp
- Replace std::fs with tokio::fs in historical_resource_scheduler
- Replace Error::new with make_err! for project consistency
- Use shared proto_to_wire_compressor in cas_server instead of
  inline match
- Deduplicate supported_compressors construction in
  capabilities_server
- Use named struct fields for WorkerUpdate::RunAction instead of
  Box<(tuple)>
- Use production wire_compression helpers in bench instead of
  duplicate ZSTD_COMPRESSION_LEVEL constant
- Revert set_freebind to Ok(()) on non-Linux (was changed to Err
  which breaks macOS startup)
- Remove .rtk/filters.toml and CLAUDE.md from repo (local-only files,
  added to .gitignore)
- Revert unrelated scheduler/origin_event refactoring changes that
  crept into the PR (RunningActionTelemetry, WorkerUpdate struct
  variant, origin_metadata_from_baggage helper, historical_resource
  async refactor)
- Revert r2_store.rs comment to original (the new comment was
  factually incorrect about pinned futures)
- Revert health_utils.rs buffer_unordered(16) back to usize::MAX
  (unrelated change with no justification)
- Remove unrelated cspell dictionary entries (gh, npm, npx, etc.);
  keep only zstd
@walter-zeromatter

Copy link
Copy Markdown
Author

OK! Did a big chunk more work & reviewed. It's in a more-or less presentable state. I'm about to take a short break, but I'll address any feedback promptly when I get back.

One open question I have is if this should be a toggle-able thing on the server side at all. Clients can always just not use --enable_remote_compression, so I'm not sure if there's real value in adding this as a config option.

Also worth noting is that I'm planning a followp PR which adds zstd compression as an option on the storage side so that the server can just do direct read-through when clients have compression enabled

@walter-zeromatter walter-zeromatter marked this pull request as ready for review July 3, 2026 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for compressed blob uploads

3 participants