fix(storage): verify ClientPut receiver against over-query window#142
fix(storage): verify ClientPut receiver against over-query window#142jacderida wants to merge 1 commit into
Conversation
Mirror the paid-quote issuer fix (WithAutonomi#141) on the receiver storage-responsibility check. After WithAutonomi#140/WithAutonomi#141 removed the reachability re-rank and widened the issuer check, uploads recovered substantially, but a residual few percent still fail on: ClientPut receiver <peer> is not among this node's local 9 closest peers (close group plus storage margin) most visibly as "Failed to store public DataMap" — the DataMap is a single critical chunk, so one receiver-check rejection fails the whole upload regardless of file size. Cause is the same divergence the issuer check had: the uploader queries 2 * CLOSE_GROUP_SIZE peers and PUTs each chunk to the CLOSE_GROUP_SIZE closest *successful responders* (ant-client get_store_quotes), so when closer peers are slow or NAT-stuck the storer it legitimately PUT to sits at XOR positions up to 2 * close_group_size. The receiver check verified only the bare close_group_size + storage margin (9) of the node's *local* routing table with exact self-membership, so it rejected honest PUTs. Bring it in line with the issuer check: - Widen to 2 * close_group_size, matching the uploader's over-query window. This does not amplify replica count — the uploader still PUTs to only its selected storers — it just lets a legitimately-selected storer at position 10..14 accept. - Keep the XOR-only lookup (find_closest_nodes_local_with_self reranks by reachability and would demote XOR-close relay-only / NAT'd storers). - Hybrid source: try the local routing-table view first, and only on a local miss fall back to an authoritative find_closest_nodes_network lookup (the same view the uploader used to choose the storers), wrapped in the shared CLOSENESS_LOOKUP_TIMEOUT. Reject only if absent from both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hermes review for PR #142Verdict: changes requested before merge (review-team consensus: 6 agents / 2 groups). CI is green, and local Blockers
A requester only needs syntactically valid content/address bytes; no valid payment proof is required to reach this lookup. For random addresses the local top- This is made worse by the chunk protocol handler semaphore ( Suggested fix: do not perform the authoritative network fallback in the pre-payment hot path unless it is separately bounded. Options include rejecting missing/obviously invalid proofs first, moving the expensive fallback after cheap/payment validation, adding a dedicated short timeout/concurrency limit, and adding singleflight/cache/rate-limiting for receiver-membership lookups.
The PR changes direct That means a node ranked 10–14 (for K=7) can accept/store a paid direct PUT while the rest of the storage responsibility machinery may treat that record as out-of-range and eventually prune it. That may be intended as an ingress-overquery policy, but the code/comments call it storage responsibility. Please either align the retention/replication semantics with the widened window, or explicitly document and test the distinction between “ClientPut ingress acceptance width” and “stored-record retention / proof authority width”. Non-blocking but should be fixed
Verified
I’d hold merge until the pre-payment lookup DoS path is bounded and the widened receiver/retention semantics are made explicit. |
|
Follow-up after Mick's note: I agree the cleaner direction is likely to fix the uploader's quote target selection rather than widen receiver admission. If the protocol invariant is that quotes/storage responsibility are bound to the strict close group (or the existing close-group+storage-margin admission width), then accepting So my recommendation would be stronger than “document the distinction”: unless maintainers explicitly want a new ingress-overquery role, keep receiver admission aligned with the intended close-group/storage-admission width and adjust the uploader so it does not select quote/storage targets outside that authority set. |
|
Checked the ant-client history for the single-node payment upload path. The over-query appears to have been introduced in
That changed let mut remote_peers = self.close_group_peers(address).await?;
remote_peers.truncate(CLOSE_GROUP_SIZE);to: let over_query_count = CLOSE_GROUP_SIZE * 2;
let remote_peers = self
.network()
.find_closest_peers(address, over_query_count)
.await?;and then it sorts successful responses by XOR distance and truncates back to So the answer to Mick's question is: since 2026-04-01 in ant-client commit |
|
Follow-up answer to Chris' question about #139/#140/#141 against Mick's “no over-querying for SNP” principle:
So I’d keep #140’s XOR-only ordering idea, but revert/replace the #141 SNP issuer widening, and not revive #139’s K+margin issuer widening unless maintainers explicitly decide SNP quote authority is wider than strict K. |
Problem
After #140 (XOR-only verification) and #141 (issuer-closeness over-query window + network fallback), upload success improved dramatically — large files 100%, small files ~92–98%. The residual few-percent failures are now dominated by the receiver storage-responsibility check (
AntProtocol::validate_store_membership):They surface most visibly as
Failed to store public DataMap— the public DataMap is a single critical chunk per upload, so one receiver-check rejection of it fails the whole upload regardless of file size.Cause
This is the same divergence #141 fixed for the issuer check, on the receiver side. The uploader queries
2 * CLOSE_GROUP_SIZEpeers and PUTs each chunk to theCLOSE_GROUP_SIZEclosest successful responders (ant-clientget_store_quotes). When closer peers are slow or NAT-stuck, the storer it legitimately PUT to sits at XOR positions up to2 * close_group_size. But the receiver check verified only the bareclose_group_size+ storage margin (9) of the node's local routing table, with exact self-membership — so it rejected honest PUTs.#141only touched the issuer check (PaymentVerifier); the receiver check still had the strict local-only / width-9 / exact-membership logic.Fix (symmetric with #141)
2 * close_group_size, matching the uploader's over-query window. This does not amplify replica count — the uploader still PUTs to only its selected storers — it just lets a legitimately-selected storer at position 10..14 accept.find_closest_nodes_local_with_selfreranks by reachability and would demote XOR-close relay-only / NAT'd storers).find_closest_nodes_networklookup (the same view the uploader used to choose the storers), wrapped in the sharedCLOSENESS_LOOKUP_TIMEOUT. Reject only if absent from both.CLOSENESS_LOOKUP_TIMEOUTis promoted topub(crate)so the receiver check reuses the merkle/issuer timeout.Note
Same cost caveat as #141: the fallback can issue a per-chunk network lookup when the local view misses; the local fast-path keeps that off the hot path for storers we already know. One storage-specific consideration: a node may now accept a chunk it is the ~10th–14th-closest to, which the replication/pruning close-group logic could later treat as borderline; replica count is unaffected since the uploader controls the PUT targets.
Builds on #140 and #141.
🤖 Generated with Claude Code