feat: MCP rate-limit handling (typed 429, init cool-off, init race fix, ensure_initialized, docs)#25
Merged
Merged
Conversation
… race Classify HTTP 200 + JSON-RPC error code -32000 with a rate-limit message as HyperpingRateLimitError with the parsed retry_after, alongside the existing HTTP 429 path. After a rate-limited initialize, latch a process-monotonic cool-off so subsequent call_tool invocations on the same client fail fast with HyperpingRateLimitError until the deadline elapses, without burning more slots from the server's bucket. Make initialize() idempotent under a dedicated _init_lock with the double-checked flag, closing the lazy-init TOCTOU race where two concurrent first calls could each POST initialize. Mirror the change in the async transport using asyncio.Lock held across the awaitable handshake. Pin that call_tool's transient retry never catches a rate-limit (HTTP 429 or JSON-RPC -32000). Add ensure_initialized() on HyperpingMcpClient and AsyncHyperpingMcpClient for startup health checks, delegating to the transport's idempotent initialize(). Document the new behaviour in README under "MCP rate limits and connection lifecycle" and record it under an [Unreleased] CHANGELOG block.
Addresses the issues surfaced by the unbiased review of #25: - Tighten the rate-limit marker to "rate limit exceeded" so future server messages that merely mention "rate limit" cannot be misclassified. - Broaden the Retry-After parser to accept "Retry-After: Ns" header-style, "retry after N seconds" wordy units, mixed case, and no-units variants. Parametrized tests cover the variants. - Persist the originating status_code (200 vs 429) on the cool-off latch so the short-circuit no longer falsely reports 200 for HTTP 429 sources. - Use math.ceil(remaining) for the cool-off retry_after instead of int(remaining)+1; eliminates the systematic +1s over-report. - Treat retry_after=0 as "no latch" (server says retry now) instead of falling back to the 30s default. - Add a lockless fast path on initialize() so post-handshake call_tool invocations do not acquire _init_lock on every call. - Classify JSON-RPC -32000 rate-limit signals returned on the notifications/initialized leg too; previously they were silently swallowed by the early notification short-circuit. - Add response_body to the HTTP 429 path for symmetry with the JSON-RPC path. - Strengthen tests: concurrency test inspects request bodies for the initialize method count, idempotency test asserts call_count before/after the second initialize(), cool-off-clears test asserts route.call_count at each phase so a regression that hit the network during the latch would fail. - Add tests for status_code preservation, math.ceil semantics, retry_after=0 no-latch behavior, the tightened marker, and the notification-leg case. - Simplify the CHANGELOG docs gate to a single regex assertion. - Expand HyperpingRateLimitError, ensure_initialized(), and README copy to accurately reflect the new behavior; cite the Hyperping MCP docs for the "stateless over HTTP" claim and call out that the latch is per-process. pytest: 482 passed; ruff: clean; mypy --strict: clean; coverage: 95.76%.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes how the MCP client handles initialize-bucket rate limiting and adds a small ergonomic surface:
_send_rpcnow recognise HTTP 200 responses carrying JSON-RPCerror.code == -32000with"rate limit"in the message and raiseHyperpingRateLimitErrorwithretry_afterparsed fromRetry after Ns,status_code=200, and the original error preserved asresponse_body. Previously these surfaced as a plainHyperpingAPIErrorwith no typed handling and no.retry_after._init_blocked_untilon a monotonic clock. A rate-limited handshake arms the latch withmax(retry_after, 30); subsequentcall_toolcalls within the window short-circuit viainitialize()raisingHyperpingRateLimitError(retry_after=remaining)with zero further HTTP requests until the deadline elapses. Prevents thefor _ in range(...): HyperpingMcpClient(...)repro from burning further slots after the first hit._init_lock(separate from_lock, which still guards_request_id).initialize()now takes the init lock and either returns the cached_init_resultor runs_initialize_locked().call_toolunconditionally callsinitialize()(no separate flag read), so two concurrent first calls produce exactly one handshake.HyperpingRateLimitErroris never retried bycall_tool's 5xx-only retry block. Docstrings updated.ensure_initialized()):HyperpingMcpClientandAsyncHyperpingMcpClientgained anensure_initialized()method that delegates to the transport. Lets services perform a startup-time handshake probe and catchHyperpingRateLimitErrorearly.[Unreleased]block.Server-side asks (undocumented per-verb
initializecap, true rolling window, HTTP 429 vs 200/JSON-RPC, accurate Retry-After) remain open and should be raised separately with Hyperping; this change is pure client-side mitigation.Test Plan
pytest -q-> 454 passed, 0 skipped, coverage 96.07% (gate 85%).ruff check src tests-> clean.mypy --strict src-> clean.-32000classification with one positive and three negative variants (sync + async); TOCTOU concurrency withthreading.Barrier(2)(sync) andasyncio.gather(async);initialize()idempotency; cool-off latch with monkeypatchedtime.monotonic; latch-clear after deadline; rate-limit-not-retried invariants; the user's 6-fresh-client repro;ensure_initialized()delegation and real-transport idempotency; a README docs-artifact gate test.