fix(storage): verify ClientPut receiver against over-query window by jacderida · Pull Request #142 · WithAutonomi/ant-node

jacderida · 2026-06-13T20:56:21Z

Problem

After #140 (XOR-only verification) and #141 (issuer-closeness over-query window + network fallback), upload success improved dramatically — large files 100%, small files ~92–98%. The residual few-percent failures are now dominated by the receiver storage-responsibility check (AntProtocol::validate_store_membership):

ClientPut receiver <peer> is not among this node's local 9 closest peers (close group plus storage margin)

They surface most visibly as Failed to store public DataMap — the public DataMap is a single critical chunk per upload, so one receiver-check rejection of it fails the whole upload regardless of file size.

Cause

This is the same divergence #141 fixed for the issuer check, on the receiver side. The uploader queries 2 * CLOSE_GROUP_SIZE peers and PUTs each chunk to the CLOSE_GROUP_SIZE closest successful responders (ant-client get_store_quotes). When closer peers are slow or NAT-stuck, the storer it legitimately PUT to sits at XOR positions up to 2 * close_group_size. But the receiver check verified only the bare close_group_size + storage margin (9) of the node's local routing table, with exact self-membership — so it rejected honest PUTs.

#141 only touched the issuer check (PaymentVerifier); the receiver check still had the strict local-only / width-9 / exact-membership logic.

Fix (symmetric with #141)

Widen to 2 * close_group_size, matching the uploader's over-query window. This does not amplify replica count — the uploader still PUTs to only its selected storers — it just lets a legitimately-selected storer at position 10..14 accept.
Keep XOR-only ordering (find_closest_nodes_local_with_self reranks by reachability and would demote XOR-close relay-only / NAT'd storers).
Hybrid source: try the local routing-table view first; only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to choose the storers), wrapped in the shared CLOSENESS_LOOKUP_TIMEOUT. Reject only if absent from both.

CLOSENESS_LOOKUP_TIMEOUT is promoted to pub(crate) so the receiver check reuses the merkle/issuer timeout.

Note

Same cost caveat as #141: the fallback can issue a per-chunk network lookup when the local view misses; the local fast-path keeps that off the hot path for storers we already know. One storage-specific consideration: a node may now accept a chunk it is the ~10th–14th-closest to, which the replication/pruning close-group logic could later treat as borderline; replica count is unaffected since the uploader controls the PUT targets.

Builds on #140 and #141.

🤖 Generated with Claude Code

Mirror the paid-quote issuer fix (WithAutonomi#141) on the receiver storage-responsibility check. After WithAutonomi#140/WithAutonomi#141 removed the reachability re-rank and widened the issuer check, uploads recovered substantially, but a residual few percent still fail on: ClientPut receiver <peer> is not among this node's local 9 closest peers (close group plus storage margin) most visibly as "Failed to store public DataMap" — the DataMap is a single critical chunk, so one receiver-check rejection fails the whole upload regardless of file size. Cause is the same divergence the issuer check had: the uploader queries 2 * CLOSE_GROUP_SIZE peers and PUTs each chunk to the CLOSE_GROUP_SIZE closest *successful responders* (ant-client get_store_quotes), so when closer peers are slow or NAT-stuck the storer it legitimately PUT to sits at XOR positions up to 2 * close_group_size. The receiver check verified only the bare close_group_size + storage margin (9) of the node's *local* routing table with exact self-membership, so it rejected honest PUTs. Bring it in line with the issuer check: - Widen to 2 * close_group_size, matching the uploader's over-query window. This does not amplify replica count — the uploader still PUTs to only its selected storers — it just lets a legitimately-selected storer at position 10..14 accept. - Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by reachability and would demote XOR-close relay-only / NAT'd storers). - Hybrid source: try the local routing-table view first, and only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to choose the storers), wrapped in the shared CLOSENESS_LOOKUP_TIMEOUT. Reject only if absent from both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dirvine · 2026-06-14T12:12:59Z

Hermes review for PR #142

Verdict: changes requested before merge (review-team consensus: 6 agents / 2 groups). CI is green, and local git diff --check, cargo test -q storage::handler::tests --lib, and cargo check -q pass, but I found two blocking issues.

Blockers

Unauthenticated PUTs can trigger a 240s network lookup before payment verification

handle_put_inner calls validate_store_membership() before verify_payment() (src/storage/handler.rs:278-315). With this PR, a local membership miss runs find_closest_nodes_network(address, lookup_width) under PaymentVerifier::CLOSENESS_LOOKUP_TIMEOUT (src/storage/handler.rs:446-466), now 240s.

A requester only needs syntactically valid content/address bytes; no valid payment proof is required to reach this lookup. For random addresses the local top-2 * close_group_size check will usually miss, so this becomes an unpaid DHT lookup amplifier.

This is made worse by the chunk protocol handler semaphore (src/node.rs:744, src/node.rs:766-771): 64 concurrent unpaid local-miss PUTs can hold all permits for up to the lookup timeout, stalling valid PUT/GET/quote handling.

Suggested fix: do not perform the authoritative network fallback in the pre-payment hot path unless it is separately bounded. Options include rejecting missing/obviously invalid proofs first, moving the expensive fallback after cheap/payment validation, adding a dedicated short timeout/concurrency limit, and adding singleflight/cache/rate-limiting for receiver-membership lookups.

Receiver admission width now diverges from storage retention/replication responsibility width

The PR changes direct ClientPut receiver admission to 2 * close_group_size (src/storage/handler.rs:428), but replication/admission/pruning still use storage_admission_width(close_group_size) = close group + margin (src/replication/config.rs:50-54, src/replication/admission.rs:74-76,116-121, src/replication/pruning.rs:383-424).

That means a node ranked 10–14 (for K=7) can accept/store a paid direct PUT while the rest of the storage responsibility machinery may treat that record as out-of-range and eventually prune it. That may be intended as an ingress-overquery policy, but the code/comments call it storage responsibility. Please either align the retention/replication semantics with the widened window, or explicitly document and test the distinction between “ClientPut ingress acceptance width” and “stored-record retention / proof authority width”.

Non-blocking but should be fixed

Stale comments/error text still say the receiver check is local close-group-plus-margin (src/storage/handler.rs:278-281, src/storage/handler.rs:383-386; related VerificationContext comments in src/payment/verifier.rs:125-130).
No focused tests cover the new production branch: local XOR hit at 2*K, local miss + network hit, network error/timeout, and final rejection when both views omit self. Existing storage handler tests pass but mostly use the test override / no-P2P test path.

Verified

GitHub checks for current head f377d970aebda318f041836827e211211ca34317: all passing.
Local checks:
- git diff --check origin/rc-2026.6.2...HEAD — pass
- cargo test -q storage::handler::tests --lib — 16 passed
- cargo check -q — pass

I’d hold merge until the pre-payment lookup DoS path is bounded and the widened receiver/retention semantics are made explicit.

dirvine · 2026-06-14T12:58:42Z

Follow-up after Mick's note: I agree the cleaner direction is likely to fix the uploader's quote target selection rather than widen receiver admission.

If the protocol invariant is that quotes/storage responsibility are bound to the strict close group (or the existing close-group+storage-margin admission width), then accepting 2 * close_group_size at the receiver is treating a client-side over-query workaround as protocol authority. That is what creates the admission/retention mismatch called out above.

So my recommendation would be stronger than “document the distinction”: unless maintainers explicitly want a new ingress-overquery role, keep receiver admission aligned with the intended close-group/storage-admission width and adjust the uploader so it does not select quote/storage targets outside that authority set.

dirvine · 2026-06-14T13:03:45Z

Checked the ant-client history for the single-node payment upload path.

The over-query appears to have been introduced in WithAutonomi/ant-client by commit bf4aa562323d4a09006bdf1f7a5cf8becd34ddf3 on 2026-04-01:

fix: over-query 2x peers for quote collection fault tolerance

That changed Client::get_store_quotes from:

let mut remote_peers = self.close_group_peers(address).await?;
remote_peers.truncate(CLOSE_GROUP_SIZE);

to:

let over_query_count = CLOSE_GROUP_SIZE * 2;
let remote_peers = self
    .network()
    .find_closest_peers(address, over_query_count)
    .await?;

and then it sorts successful responses by XOR distance and truncates back to CLOSE_GROUP_SIZE.

So the answer to Mick's question is: since 2026-04-01 in ant-client commit bf4aa56. Before that, this path queried/truncated to CLOSE_GROUP_SIZE only. That supports the direction of reverting/fixing the client quote collection rather than widening ant-node receiver authority.

dirvine · 2026-06-14T13:23:15Z

Follow-up answer to Chris' question about #139/#140/#141 against Mick's “no over-querying for SNP” principle:

fix(payment): widen paid-quote issuer admission to close group + margin #139: yes, it went in the wrong direction for SNP quote authority. It widened paid-quote issuer admission from strict close_group_size to storage_admission_width(close_group_size) (K+margin). It was less aggressive than fix(payment): verify single-node issuer closeness against over-query window #141, but still allowed SNP paid quote issuers outside the strict close group. Note: fix(payment): use XOR-only local lookup for close-group verification #140 then removed that margin for the issuer check.
fix(payment): use XOR-only local lookup for close-group verification #140: mostly no. It changed ordering/source from reachability-ranked local lookup to XOR-only local lookup, while keeping the SNP issuer width at close_group_size. That aligns better with “same K, pure XOR” rather than over-querying. It did also switch receiver storage admission to XOR-only while preserving the existing storage_admission_width; that margin is a separate storage-admission policy question, not the same as quote over-querying.
fix(payment): verify single-node issuer closeness against over-query window #141: yes, clearly. It widened SNP paid-quote issuer verification to 2 * close_group_size and added a network fallback to match the client over-query behaviour. Under Mick's stated principle, this is the same wrong direction as fix(storage): verify ClientPut receiver against over-query window #142, just on the payment issuer side rather than the receiver side.

So I’d keep #140’s XOR-only ordering idea, but revert/replace the #141 SNP issuer widening, and not revive #139’s K+margin issuer widening unless maintainers explicitly decide SNP quote authority is wider than strict K.

jacderida closed this Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(storage): verify ClientPut receiver against over-query window#142

fix(storage): verify ClientPut receiver against over-query window#142
jacderida wants to merge 1 commit into
WithAutonomi:rc-2026.6.2from
jacderida:fix/receiver-membership-hybrid

jacderida commented Jun 13, 2026

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacderida commented Jun 13, 2026

Problem

Cause

Fix (symmetric with #141)

Note

Uh oh!

dirvine commented Jun 14, 2026

Hermes review for PR #142

Blockers

Non-blocking but should be fixed

Verified

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

dirvine commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants