Skip to content

feat: MCP rate-limit handling (typed 429, init cool-off, init race fix, ensure_initialized, docs)#25

Merged
KhaledSalhab-Develeap merged 5 commits into
mainfrom
mcp-rate-limit-fixes
May 21, 2026
Merged

feat: MCP rate-limit handling (typed 429, init cool-off, init race fix, ensure_initialized, docs)#25
KhaledSalhab-Develeap merged 5 commits into
mainfrom
mcp-rate-limit-fixes

Conversation

@KhaledSalhab-Develeap
Copy link
Copy Markdown
Collaborator

Summary

Fixes how the MCP client handles initialize-bucket rate limiting and adds a small ergonomic surface:

  • A1 (JSON-RPC rate-limit classification): Sync and async _send_rpc now recognise HTTP 200 responses carrying JSON-RPC error.code == -32000 with "rate limit" in the message and raise HyperpingRateLimitError with retry_after parsed from Retry after Ns, status_code=200, and the original error preserved as response_body. Previously these surfaced as a plain HyperpingAPIError with no typed handling and no .retry_after.
  • A2 (initialize cool-off latch): Both transports gained _init_blocked_until on a monotonic clock. A rate-limited handshake arms the latch with max(retry_after, 30); subsequent call_tool calls within the window short-circuit via initialize() raising HyperpingRateLimitError(retry_after=remaining) with zero further HTTP requests until the deadline elapses. Prevents the for _ in range(...): HyperpingMcpClient(...) repro from burning further slots after the first hit.
  • A3 (TOCTOU init race): Added a dedicated _init_lock (separate from _lock, which still guards _request_id). initialize() now takes the init lock and either returns the cached _init_result or runs _initialize_locked(). call_tool unconditionally calls initialize() (no separate flag read), so two concurrent first calls produce exactly one handshake.
  • A4 (narrow retry): Explicit regression tests pin that HyperpingRateLimitError is never retried by call_tool's 5xx-only retry block. Docstrings updated.
  • A5 (ensure_initialized()): HyperpingMcpClient and AsyncHyperpingMcpClient gained an ensure_initialized() method that delegates to the transport. Lets services perform a startup-time handshake probe and catch HyperpingRateLimitError early.
  • B (docs): New "MCP rate limits and connection lifecycle" subsection in README; CHANGELOG [Unreleased] block.

Server-side asks (undocumented per-verb initialize cap, true rolling window, HTTP 429 vs 200/JSON-RPC, accurate Retry-After) remain open and should be raised separately with Hyperping; this change is pure client-side mitigation.

Test Plan

  • pytest -q -> 454 passed, 0 skipped, coverage 96.07% (gate 85%).
  • ruff check src tests -> clean.
  • mypy --strict src -> clean.
  • 29 new tests covering: JSON-RPC -32000 classification with one positive and three negative variants (sync + async); TOCTOU concurrency with threading.Barrier(2) (sync) and asyncio.gather (async); initialize() idempotency; cool-off latch with monkeypatched time.monotonic; latch-clear after deadline; rate-limit-not-retried invariants; the user's 6-fresh-client repro; ensure_initialized() delegation and real-transport idempotency; a README docs-artifact gate test.

… race

Classify HTTP 200 + JSON-RPC error code -32000 with a rate-limit message
as HyperpingRateLimitError with the parsed retry_after, alongside the
existing HTTP 429 path. After a rate-limited initialize, latch a
process-monotonic cool-off so subsequent call_tool invocations on the
same client fail fast with HyperpingRateLimitError until the deadline
elapses, without burning more slots from the server's bucket.

Make initialize() idempotent under a dedicated _init_lock with the
double-checked flag, closing the lazy-init TOCTOU race where two
concurrent first calls could each POST initialize. Mirror the change in
the async transport using asyncio.Lock held across the awaitable
handshake. Pin that call_tool's transient retry never catches a
rate-limit (HTTP 429 or JSON-RPC -32000).

Add ensure_initialized() on HyperpingMcpClient and AsyncHyperpingMcpClient
for startup health checks, delegating to the transport's idempotent
initialize(). Document the new behaviour in README under "MCP rate
limits and connection lifecycle" and record it under an [Unreleased]
CHANGELOG block.
Addresses the issues surfaced by the unbiased review of #25:

- Tighten the rate-limit marker to "rate limit exceeded" so future server
  messages that merely mention "rate limit" cannot be misclassified.
- Broaden the Retry-After parser to accept "Retry-After: Ns" header-style,
  "retry after N seconds" wordy units, mixed case, and no-units variants.
  Parametrized tests cover the variants.
- Persist the originating status_code (200 vs 429) on the cool-off latch so
  the short-circuit no longer falsely reports 200 for HTTP 429 sources.
- Use math.ceil(remaining) for the cool-off retry_after instead of
  int(remaining)+1; eliminates the systematic +1s over-report.
- Treat retry_after=0 as "no latch" (server says retry now) instead of
  falling back to the 30s default.
- Add a lockless fast path on initialize() so post-handshake call_tool
  invocations do not acquire _init_lock on every call.
- Classify JSON-RPC -32000 rate-limit signals returned on the
  notifications/initialized leg too; previously they were silently swallowed
  by the early notification short-circuit.
- Add response_body to the HTTP 429 path for symmetry with the JSON-RPC path.
- Strengthen tests: concurrency test inspects request bodies for the
  initialize method count, idempotency test asserts call_count before/after
  the second initialize(), cool-off-clears test asserts route.call_count at
  each phase so a regression that hit the network during the latch would fail.
- Add tests for status_code preservation, math.ceil semantics, retry_after=0
  no-latch behavior, the tightened marker, and the notification-leg case.
- Simplify the CHANGELOG docs gate to a single regex assertion.
- Expand HyperpingRateLimitError, ensure_initialized(), and README copy to
  accurately reflect the new behavior; cite the Hyperping MCP docs for the
  "stateless over HTTP" claim and call out that the latch is per-process.

pytest: 482 passed; ruff: clean; mypy --strict: clean; coverage: 95.76%.
@KhaledSalhab-Develeap KhaledSalhab-Develeap merged commit afae9e1 into main May 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant